outreachy intern blog

Tuesday, November 12, 2019

ELCE in Lyon 2019

I went to the ELCE conference that took place in Lyon during 28-30th of October 2019.

A day before the conference, the outreachy inters and mentors were invited to a dinner in a pizza restaurant. There I met my mentor Hans Verkuil for the first time.

I slept in an Airbnb apartment by myself during the conference time, just a few minutes walk from the main train station Gare de Lyon-Part-Dieu. This was comfortable since it is not too far from the conference place and also there is a direct shuttel from there to the airport. As I arrived to the conference at the first day, I got the badge with my name and background (Outrachy intern) and some goodies like 3 day tickets for transportation, a voucher for wine or beer, voucher for the talkers dinner and a T-shirt. The main hall, where all the booths were was pretty huge and was crowded all the time. There were small croissant served in the mornings and coffee and fruits served all day.

I was in the conference with many of my colleagues from Collabora which I met for the first time just a week before during a company meeting in Nice. We had a Collabora booth there where people could play SuperTuxKart on a RK3399 SoC. I came to the conference officially as an Outreachy allumi intern and also gave a 5 min talk about my internship project.

There were talks in the conference all they long about various subjects. I found it hard to understand technical talks if I don't have enough background. So for example I went to a talk about u-boot and a talk about the GPU Linux subsystem, and I felt was just setting there warming the chair.

The conference is an opportunity for the developers to meet face to face as they are in many cases spread around the world. The v4l community held several discussions. I went to a discussion about codecs and about libcamera. The discussions are long and not always easy to follow. I remember for example a discussion about what should be the behavior of a decoder driver if at some point during the decoding, the initial allocated buffers are not big enough to contain the decoded frame. There are several ways to deal with it. Some questions that arise: Should the driver inform the userspace about it? Should it be the userspace responsibility to stop the streaming and allocate new buffers? Also, is it possible to track at which frame it happened and try to decode it again?

The other discussion was about libcamera. Libcamera is a a userspace library that supplies an API for applications to interact with cameras. At the second day of the conference there was a talk on libcamera by Jacopo Mondi where he showed through live coding how applications should use the API. The project, lead by Laurent Pinchart is one year old and is entering API stabilization phase. The libcamera discussion was held in the morning after the talk and people interested in it could ask questions.

At the very end of the conference there was a game where at each round two facts related to Linux / Computers are given and the people should decide which of them is correct. Then after that there was a scissors paper rock game against the host. I failed very badly in both games.

After the conference was over I stayed two more days in the Lyon, I found a nice coworking space 'Le 18 Coworking' where I worked from.

Those were two rainy days and the second of them was apparently a holiday in France called All Saints’ Day.

Sunday, February 17, 2019

running fwht codecs from userspace

The fwht codec can also be executed from userspace.

steps:

1. Load the vivid driver:

It will create two files /dev/videoX, /dev/videoY.

We will use the file with lower index to generate a picture from the vivid webcam.

Listen to a local port with netcat and dump the input to a file:

$ nc -l 8888 > from-vivid-fwht-stream.brr

Then in another terminal run:

$ v4l2-ctl -d0 --stream-mmap --stream-to-host 0.0.0.0:8888

So now a stream with a specific format is dumped into from-vivid-fwht-stream.brr

The stream is the vivid generated video compressed as a fwht format. The compression is done in userspace.

The codeflow is:

streaming_set_cap->do_handle_cap->write_buffer_to_file -> fwht_compress

Now to do the opposite:

In the opposite direction the v4l2-ctl command listens to a port and waits.
So the v4l2-ctl should be executed first:

$ v4l2-ctl -d1 --stream-out-mmap --stream-from-host 0.0.0.0:8888

$ cat from-vivid-fwht-stream.brr | nc 0.0.0.0 8888

And the codeflow is:

streaming_set_out->do_handle_out->fill_buffer_from_file -> fwht_decompress

So what we see is that:

1. For capture device, the v4l2-ctl can take the captured frames, compress them with the fwht format
in userspace and then send them to some <host:port> server.

In this case, the visible_width/height should be the one that the capture device output to userspace
which are the composing on the capture if supported.

2. For output device, the v4l2-ctl can serve as a server, it listen to some <host:port>.

When a client sets a connection, it then reads a compressed fwht stream from the client and
decompress it. The visible_width/height are then the crop values of the output buffer of the output
device.

In qvidcap there is also an fwht_decompress call. This is a case where actually no video
device driver is at all used. All it does is listen to <host:port>, read the compressed frames from
a client, decompress them and show them. So in that case there is no crop/compose values involved.
So the visible_width/height is the same as the coded_width/height.

So,

terminal 1:

dafna@ubuntu:~/clean-v4l-utils$ ./utils/qvidcap/qvidcap --port=8888

terminal 2:

cat from-vivid-fwht-stream.brr2 | nc 0.0.0.0 8888

Friday, February 15, 2019

Week 7 - Task for the week - Modifying Expectations

Here is the original very general plan from Hans:

1) Add support to vicodec to compress formats whose resolution does not
align with the macroblock size (8x8). An example of that is 1440x900.

This will require some work in the fwht codec and support for
selection rectangles needs to be added to the v4l2 API.

2) v4l2-compliance: add code to detect codecs based on the M2M(_MPLANE)
capability and which side (CAPTURE or OUTPUT) has only compressed
formats. This is needed so test code can be added specifically
for codecs.

3) Add missing features of the stateful codec specification, specifically
stopping and enumerating formats (dependency of the raw formats of the
chosen compressed format).

v4l2-compliance has to be extended with the corresponding tests.

4) Add support for mid-stream resolution changes to vicodec, again including
v4l2-compliance tests.

Depending on how long 1-3 take, this can be postponed to the end of the
internship.

5) Add stateless codec support to vicodec + v4l2-compliance tests. This will
probably be subdivided into smaller pieces as we get to it.

6) Create libva layer for this stateless codec so it can be used by ffmpeg etc.

So items 1-3 should be done first, item 4 can be rescheduled depending on where
we are timewise.

Items 5 and 6 are very high-level at the moment, and I hope I can give a more
detailed schedule by the time you are ready to work on that.

Note that it would not surprise me if the work on v4l2-compliance will takelonger than making the vicodec changes. Testing is hard work.

What happened eventually:

Out of the first 4 tasks, I actually didn't do tasks 2,3 only 1,4.
Tasks 1 seemed to me very simple, I was sure it would take me just a few days.
Eventually it took much longer, first version was sent 16 days after the internship started.
Making the alignment correctly took time and also understanding the selection API was not obvious.
Task 4 - This one was as well not that easy and took me a while. It needed quite a lot of new code in both kernel and user space.
Task 5 is the main task of the internship, I started to work on it at Jan 21 and I'm still working on it at the time of writing this post. I already sent one version of patch sets for the kernel and the userspace.
Hans did not assign specific deadlines to the tasks as it was hard to predict how long the tasks would take me. I was mostly too optimistic regarding task 1. I thought that all is needed is to round the dimensions and pad the edge of the image with zeros, but it turned out to be more complicated.
I have 20 days until the end of the internship and I hope to finish this task by then or before:)

Sunday, January 13, 2019

week 5 post - Task for the week - explain my project to a newcomer to my community.

Hi,
My project is about video decoders and encoders in kernel space.
A codec is a compressed format for videos, just like zip is used to compress text/pdf/doc/whatever files, a codec is used to compress a video/audio file.
There are hardware chips that are dedicated for coding and decoding video, the code interacting with the hardware should be a driver in kernel space.
There is an intend to specify a uniform API of how userspace applications should interact with such drivers. In order to test the userpsace code, there is a driver called vicodec. This driver is implemented in software only so it does not need special hardware.
Having such POC driver is a good way to both decide for the correct API to publish and for userspace to test the code without a need of specific hardware.
The vicodec driver should behave just like a "real" driver. So for example most (all?) hardware needs to have the buffers of the video a multiple of some power of 2. So the vicodec should support that as well - whenever the client sets the video dimensions "width X height" the vicodec will round it to the closest multiple of (in vicodec case) 8.
The vicodec exports two pseudo files to userspace: /dev/video0 - the encoder, and /dev/video1 - the decoder.
There are two APIs one for encoding - compressing a raw video, and one for decoding - decompressing a compressed video.
The APIs are described as a state machine. Basically the application and the driver first agree on the video type and dimensions, then the application asks the driver to allocate buffers.
The userspace application and the driver exchange the buffers. The idea is that they both share the buffers - the buffers are allocated by the driver and the user then maps them to his memspace with mmap. They can then exchange buffers by queueing and dequeueing them into/from a queue.

There are two queues of buffers:
The Output queue - In this queue the buffers are filled with data by the usersapce application and then the application queue them in the queue. The driver can dequeue buffers from this queue and process them. (The term "Output queue" was a bit confusing for me since I expected that "output" means buffers that the driver send to userspace (The output of the driver) but actually its the opposite)

The Capture queue - This is the queue to which the driver queue buffers that it generated and the userspace can dequeue them and read them.

One thing to note is that both userspace and the driver queue and dequeue to/from both queues.
In the Capture queue for example, after the userspace dequeued a buffer in order to read it it should then queue the buffer back to the queue so the driver can reuse it.

Another notation related to codecs drivers in streaming - the exchange of buffers where userspace feeds the driver with output buffers and receive back capture buffers can happen only when the driver is "streaming" The userspace can command the driver to start and stop the streaming on each buffer separately. The buffers exchange can happen only when both queues are "streaming" this is like pressing the "Play" button.

When I started the project, the vicodec was already live and kicking but it lacked some features that were needed.

My first task was to add support to video dimensions that are not a multiple of 8 by rounding the dimensions. This included adding support in the API that allow userspace to crop the dimension back to the original.

The second task is to add support for the decoder to read the video dimensions from the header of the compressed video.
In the compressed video, each compressed frame starts with a header. The header has some information such as the dimensions of the frame and the colorspace. So the requirement is that the usespace doesn't have to know and negotiate with the driver about the video capture format/dimensions. The format and dimension is decided later when the driver starts to receive the compressed data with the header. Then there is a sequence called "source change event" where the driver informs the userspace about the dimensions and the user space should restart the streaming.

This is quite complicated to implement, since in addition to the correct behavior expected from userspace by the API, the driver should be peppered to any incorrect behavior and avoid crashes/memleaks etc. There are all kind scenarios that need to be considered.

Links to the APIs:
deocder API

encoder API

Thursday, December 20, 2018

Outreachy - end of week 2 report

I was working on my first task - adding support to image dimensions that are not multiple of 8.
I first thought it shouldn't take more than few days, but it turned out to be more complicated than I thought. It took me some time to understand the exact requirement and how the API should work.
Then after I more or less understood what I should do, I had a lot of bugs related to implementing the padding.
The task involved adding code to the kernel both for the encoder and the decoder and then updating the userspace to use to new API. Both userspace and kernel space had changes in the encoder side and decoder side. This made it a bit hard to debug since when the output video looked bad, that means the bug could be hidden in any of the 4 components. So trekking bugs took a while.
For testing I used a fascinating video showing jellyfish moving in the water.
It can be downloaded here.
And here are some screenshots.

I wrote some utilities to help me debug. One utility gives a value of a pixel in a yuv420 video for a given tuple (frame,plane,row,column). Other utility that helped a lot is a code that separates a yuv420 video to 3 separate videos - one for each plane. This is very helpful for understanding which of the 3 planes was defected.
Here are the utilities code: first and second (both compiled with gcc without any special flags).
Here are examples of how to use:

dafna@ubuntu:~/out2$ ./get_pix_in_yuv420 images/jelly-1920-1080.YU12 1920 1080 b 5 4 1000
the value in plane 'b', frame 5, raw 4 col 1000 is 134 (0x86)

dafna@ubuntu:~/out2$ ./split_yuv420_planes images/jelly-1920-1080.YU12 1920 1080 y.grey u.grey v.grey
reading frame 0
dafna@ubuntu:~/out2$ ffplay -v info -f rawvideo -pixel_format gray -video_size 1920x1080 y.grey

Apparently the word "gray" can be spelled with either "e" or "a", depends on whether you come from UK or USA.

Those utilities with a combination of a pile of debugs prints in the kernel and v4l-utils helped me understand where the bugs reside.

Here are some unpleasant surprises I got during my work:

After fixing the code I sent two patches, one for the kernel and one for v4l-utils. Now I am in a phase of re sending the patches after review.

Monday, December 3, 2018

Starting the Outreachy internship

I am starting my outreachy internship officially tomorrow.
During my intern I will demonstrate the statefull and stateless APIs for implementing codecs in the media subsystem of the Linux Kernel.
There are 2 new APIs that media drivers such as codecs, webcams and so on should implement.
In order to test and demonstrate the APIs, two testing drivers where implemented: vivid and vicodec.
Also a userspace library v4l-utils implements the API from the userspace side.
I will mainly add API functionally to the vicodec codec.
This blog is a report of my internship.It will be mainly technical.

My mentors are Helen Koike and Hans Verkuil, we already chat and had a video meeting and they were both really great and helping, thanks:)

I am pretty excited and also kind of worried for not accomplishing enough or not do things good enough.
The outreachy is a full-time commitment for 3 month working from home.
Working from home is both fun but also challenging - I have to decide for my own schedule, resist tempting time wasters such as facebook etc and be my own boss for better and worse.
Another issue with working from home is the social isolation - no one to lunch with, no one for coffee break... Just me and the passive-aggressive neighbors cat.

During the contribution phase I did a lot of preparations:

- Prepping cleanup patches for staging drivers in the kernel: this is a bit tedious procedure that teaches how to work with git and send patches. It is defiantly a very good preparation for the understanding how the kernel community works and how to work with it.
The patches are sent to the outreachy-kernel group: https://groups.google.com/forum/#!forum/outreachy-kernel
It is view by Julia Lawall and Vaishali Thakkar which gives comments on the patches and are very responsive and willing to help. Thank you:)

- I bought a new computer after 6 years with an old Dell Insperion , I decided to buy a new Leptop. So I bought a Thinkpad P52s. Preparing the new computer took quite a time.
I installed xubuntu which is an Ubuntu but nicer :) . Then I figured out some nvidia drivers are missing and it used a driver "nouveau" that kept crashing. So I had to reboot my leptop each time it crashed until I figured out how to remove it.

- Then after all the installation and configurations that I like I installed a vm. I started with installing the compiled kernels on my hard machine but this is not a good idea, the whole computer becomes a guinea pig and should be rebooted on Oops (kernel modules crash). So I installed a regular Ubuntu as a vmware VM and I compiled and installed my kernel there.

- One more thing that occupied my conscious is the IDE - I used to program with emacs, but I never bothered to learn all the extension to emacs , so I never really used it as an IDE. The kernel code is huge and it is hard to work with without and IDE. I was wandering if I should use one of many IDEs out there or should I use vim. I decided to move to vim as it seems more used by kernel developers. Learning vim is a projects by itself. It is still a mystery how it became so popular. I will probably post some vim tips in this blog as well.

By the way, English is a second language for me, so I am a bit limited with vocabulary and might have stupid grammar mistakes, sorry.

Thanks a lot for the outreachy organizers and organization. Bringing more people from different cultures and genders from around the world to FOSS is really great.