Tuesday, November 12, 2019

ELCE in Lyon 2019

I went to the ELCE conference that took place in Lyon during 28-30th of October 2019.
A day before the conference, the outreachy inters and mentors were invited to a dinner in a pizza restaurant. There I met my mentor Hans Verkuil for the first time.
I slept in an Airbnb apartment by myself during the conference time, just a few minutes walk from the main train station Gare de Lyon-Part-Dieu. This was comfortable since it is not too far from the conference place and also there is a direct shuttel from there to the airport. As I arrived to the conference at the first day, I got the badge with my name and background (Outrachy intern) and some goodies like 3 day tickets for transportation, a voucher for wine or beer, voucher for the talkers dinner and a T-shirt. The main hall, where all the booths were was pretty huge and was crowded all the time. There were small croissant served in the mornings and coffee and fruits served all day.
I was in the conference with many of my colleagues from Collabora which I met for the first time just a week before during a company meeting in Nice. We had a Collabora booth there where people could play SuperTuxKart on a RK3399 SoC. I came to the conference officially as an Outreachy allumi intern and also gave a 5 min talk about my internship project.
There were talks in the conference all they long about various subjects. I found it hard to understand technical talks if I don't have enough background. So for example I went to a talk about u-boot and a talk about the GPU Linux subsystem, and I felt was just setting there warming the chair.
The conference is an opportunity for the developers to meet face to face as they are in many cases spread around the world. The v4l community held several discussions. I went to a discussion about codecs and about libcamera. The discussions are long and not always easy to follow. I remember for example a discussion about what should be the behavior of a decoder driver if at some point during the decoding, the initial allocated buffers are not big enough to contain the decoded frame. There are several ways to deal with it. Some questions that arise: Should the driver inform the userspace about it? Should it be the userspace responsibility to stop the streaming and allocate new buffers? Also, is it possible to track at which frame it happened and try to decode it again?
The other discussion was about libcamera. Libcamera is a a userspace library that supplies an API for applications to interact with cameras. At the second day of the conference there was a talk on libcamera by Jacopo Mondi where he showed through live coding how applications should use the API. The project, lead by Laurent Pinchart is one year old and is entering API stabilization phase. The libcamera discussion was held in the morning after the talk and people interested in it could ask questions.
At the very end of the conference there was a game where at each round two facts related to Linux / Computers are given and the people should decide which of them is correct. Then after that there was a scissors paper rock game against the host. I failed very badly in both games.

After the conference was over I stayed two more days in the Lyon, I found a nice coworking space 'Le 18 Coworking' where I worked from.
Those were two rainy days and the second of them was apparently a holiday in France called All Saints’ Day.

Sunday, February 17, 2019

running fwht codecs from userspace

The fwht codec can also be executed from userspace.

steps:
1. Load the vivid driver:
It will create two files /dev/videoX, /dev/videoY.
We will use the file with lower index to generate a picture from the vivid webcam.
Listen to a local port with netcat and dump the input to a file:


$ nc -l 8888 > from-vivid-fwht-stream.brr


Then in another terminal run:
$ v4l2-ctl -d0 --stream-mmap   --stream-to-host 0.0.0.0:8888


So now a stream with a specific format is dumped into from-vivid-fwht-stream.brr
The stream is the vivid generated video compressed as a fwht format. The compression is done in userspace.
The codeflow is:
streaming_set_cap->do_handle_cap->write_buffer_to_file -> fwht_compress


Now to do the opposite:
In the opposite direction the v4l2-ctl command listens to a port and waits.
So the v4l2-ctl should be executed first:


$ v4l2-ctl -d1 --stream-out-mmap   --stream-from-host 0.0.0.0:8888
$ cat from-vivid-fwht-stream.brr | nc 0.0.0.0 8888

And the codeflow is:
streaming_set_out->do_handle_out->fill_buffer_from_file -> fwht_decompress


So what we see is that:
1. For capture device, the v4l2-ctl  can take the captured frames, compress them with the fwht format
in userspace and then send them to some <host:port> server.
In this case, the visible_width/height should be the one that the capture device output to userspace
which are the composing on the capture if supported.


2. For output device, the v4l2-ctl can serve as a server, it listen to some <host:port>.
When a client sets a connection, it then reads a compressed fwht stream from the client and
decompress it. The visible_width/height are then the crop values of the output buffer of the output
device.


In qvidcap  there is also an fwht_decompress call. This is a case where actually no video
device driver is at all used. All it does is listen to <host:port>, read the compressed frames from
a client, decompress them and show them. So in that case there is no crop/compose values involved.
So the visible_width/height is the same as the coded_width/height.


So,
terminal 1:
dafna@ubuntu:~/clean-v4l-utils$ ./utils/qvidcap/qvidcap --port=8888
terminal 2:
cat from-vivid-fwht-stream.brr2 | nc 0.0.0.0 8888

Friday, February 15, 2019

Week 7 - Task for the week - Modifying Expectations

Here is the original very general plan from Hans:

1) Add support to vicodec to compress formats whose resolution does not
   align with the macroblock size (8x8). An example of that is 1440x900.
   This will require some work in the fwht codec and support for
   selection rectangles needs to be added to the v4l2 API.

2) v4l2-compliance: add code to detect codecs based on the M2M(_MPLANE)
   capability and which side (CAPTURE or OUTPUT) has only compressed
   formats. This is needed so test code can be added specifically
   for codecs.

3) Add missing features of the stateful codec specification, specifically
   stopping and enumerating formats (dependency of the raw formats of the
   chosen compressed format).

   v4l2-compliance has to be extended with the corresponding tests.

4) Add support for mid-stream resolution changes to vicodec, again including
   v4l2-compliance tests.

   Depending on how long 1-3 take, this can be postponed to the end of the
   internship.

5) Add stateless codec support to vicodec + v4l2-compliance tests. This will
   probably be subdivided into smaller pieces as we get to it.

6) Create libva layer for this stateless codec so it can be used by ffmpeg etc.

So items 1-3 should be done first, item 4 can be rescheduled depending on where
we are timewise.

Items 5 and 6 are very high-level at the moment, and I hope I can give a more
detailed schedule by the time you are ready to work on that.

Note that it would not surprise me if the work on v4l2-compliance will takelonger than making the vicodec changes. Testing is hard work.


What happened eventually:

Out of the first 4 tasks, I actually didn't do tasks 2,3 only 1,4.
Tasks 1 seemed to me very simple, I was sure it would take me just a few days.
Eventually it took much longer, first version was sent 16 days after the internship started.
Making the alignment correctly took time and also understanding the selection API was not obvious.
Task 4 - This one was as well not that easy and took me a while. It needed quite a lot of new code in both kernel and user space.
Task 5 is the main task of the internship, I started to work on it at Jan 21 and I'm still working on it at the time of writing this post. I already sent one version of patch sets for the kernel and the userspace.
Hans did not assign specific deadlines to the tasks as it was hard to predict how long the tasks would take me. I was mostly too optimistic regarding task 1. I thought that all is needed is to round the dimensions and pad the edge of the image with zeros, but it turned out to be more complicated.
I have 20 days until the end of the internship and I hope to finish this task by then or before:)


Sunday, January 13, 2019

week 5 post - Task for the week - explain my project to a newcomer to my community.

Hi,
My project is about video decoders and encoders in kernel space.
A codec is a compressed format for videos, just like zip is used to compress text/pdf/doc/whatever files, a codec is used to compress a video/audio file.
There are hardware chips that are dedicated for coding and decoding video, the code interacting with the hardware should be a driver in kernel space.
There is an intend to specify a uniform API of how userspace applications should interact with such drivers. In order to test the userpsace code, there is a driver called vicodec. This driver is implemented in software only so it does not need special hardware.
Having such POC driver is a good way to both decide for the correct API to publish and for userspace to test the code without a need of specific hardware.
The vicodec driver should behave just like a "real" driver. So for example most (all?) hardware needs to have the buffers of the video a multiple of some power of 2. So the vicodec should support that as well - whenever the client sets the video dimensions "width X height" the vicodec will round it to the closest multiple of (in vicodec case) 8.
The vicodec exports two pseudo files to userspace:  /dev/video0 - the encoder, and  /dev/video1 - the decoder.
There are two APIs one for encoding - compressing a raw video, and one for decoding - decompressing a compressed video.
The APIs are described as a state machine. Basically the application and the driver first agree on the video type and dimensions, then the application asks the driver to allocate buffers.
The userspace application and the driver exchange the buffers. The idea is that they both share the buffers  -  the buffers are allocated by the driver and the user then maps them to his memspace with mmap. They can then exchange buffers by queueing and dequeueing them into/from a queue.

There are two queues of buffers:
The Output queue - In this queue the buffers are filled with data by the usersapce application and then the application queue them in the queue. The driver can dequeue buffers from this queue and process them. (The term "Output queue" was a bit confusing for me since I expected that "output" means buffers that the driver send to userspace (The output of the driver) but actually its the opposite)

The Capture queue - This is the queue to which the driver queue buffers that it generated and the userspace can dequeue them and read them.

One thing to note is that both userspace and the driver queue and dequeue to/from both queues.
In the Capture queue for example, after the userspace dequeued a buffer in order to read it it should then queue the buffer back to the queue so the driver can reuse it.

Another notation related to codecs drivers in streaming - the exchange of buffers where userspace feeds the driver with output buffers and receive back capture buffers can happen only when the driver is "streaming" The userspace can command the driver to start and stop the streaming on each buffer separately. The buffers exchange can happen only when both queues are "streaming" this is like pressing the "Play" button.

When I started the project, the vicodec was already live and kicking but it lacked some features that were needed.

My first task was to add support to video dimensions that are not a multiple of 8 by rounding the dimensions. This included adding support in the API that allow userspace to crop the dimension back to the original.

The second task is to add support for the decoder to read the video dimensions from the header of the compressed video.
In the compressed video, each compressed frame starts with a header. The header has some information such as the dimensions of the frame and the colorspace. So the requirement is that the usespace doesn't have to know and negotiate with the driver about the video capture format/dimensions. The format and dimension is decided later when the driver starts to receive the compressed data with the header. Then there is a sequence called "source change event" where the driver informs the userspace about the dimensions and the user space should restart the streaming.

This is quite complicated to implement, since in addition to the correct behavior expected from userspace by the API, the driver should be peppered to any incorrect behavior and avoid crashes/memleaks etc. There are all kind scenarios that need to be considered.

Links to the APIs:
deocder API

encoder API