A comprehensive guide to parallel video decoding

As promised, today we’ll talk about video decoding. We will review the most important operations that a decoder has to fulfill, and for each case see what kind of speed boost we can expect with a shader based video decoding. Because video decoding is a complex process and one blog post can hardly be thorough, I’ll provide related links for each chapter, if you wish to start your own research on a particular subject ;-)

Let’s begin with a quick overview of the most important operations of a VP8 video decoding process :

My GSoC project status

Hi !
It’s been a long time that I didn’t blogged about my Google Summer of Code project, and I’m sorry about that. But today I have some good news !
I have successfully set up a VP8 decoder inside Mesa, built from Google’s libvpx. Let’s walk through the different parts of this new decoding stack.

We need a VP8 video to begin with. As the current decoder is based on the libvpx, the official VP8 Codec SDK, all of the VP8 functionalities are supported and so, every webm file can be used. That’s an important point, and that’s one of the reason why I initially decided to start using an existing decoder first and not write a new one from scratch. Then I could progressively replace keys computations using shaders while having a solid base known to work in every situations.


To link together a video player and a video decoder through the VDPAU API, the first component used is the libvdpau. Formerly part of the NVIDIA video driver, it is now a standalone package allowing third party implementation of the API into video player or video decoder.
This library has been patched to support the VP8 codec, and to handle a VdpPictureInfoVP8 structure. That structure is loaded by a VDPAU video player with information contained into each VP8 frame header and then passed to a VDPAU decoder alongside with the bitstream buffer.

mplayer/mplayer2 and ffmeg/libav

Then, we need a media player. I have patched mplayer (and its recent fork mplayer2) together with ffmpeg (and its recent fork libav) to support VP8 decoding through VDPAU. The patches between mplayer and mplayer 2 are almost the same, but the actual VDPAU implementation is different with an advantage to mplayer 2. The ffmpeg/libav patches are identical.

To launch mplayer the following arguments must be used :

mplayer -vo vdpau -vc ffvp8vdpau myvideofile.webm

  • -vo vdpau tells mplayer to initialize a VDPAU video output
  • -vc ffvp8vdpau tells mplayer to use the ffvp8vdpau video codec, a slightly modified version of the native VP8 codec of ffmpeg/libav

So basically, mplayer use the libvdpau to find an available decoder, initialize it, create a surface that is gonna be filled by the decoder, then after each decoded frame, draw the surface on screen.

As mplayer/mplayer2 relies on ffmpeg/libav to do the frame decoding itself, a hook has to be added into the frame decoding process, to bypass the regular decoder and send the datas to the VDPAU decoder.


The “big part” of this GSoC is obviously the VP8 decoder implementation living inside Mesa. This piece of code is gonna be identified as a “device” by libvdpau. While most of the time a device is a hardware driver, the mesa decoder just register itself as an available video decoder. We know that this device is gonna be called by ffmpeg/libav for every frame to decode, with these arguments :

  1. The content of the frame header (with various information like frame size, type, …)
  2. The bitstream buffer, which contain the compressed data representing exactly one frame
  3. Up to 3 “reference frames”, which are already decoded frames used for motion compensation

With that, the decoder can do its job and decode frames. When a frame is ready to be drawn on screen, the decoder must load that frame into the surface provided by the VDPAU video output created by mplayer.

  • The first step was to create a new decoder stub, based on the existing g3dvl interface (where all the Gallium3D video decoding work take place), and add it to the VDPAU state tracker, to advertise VP8 decoding capabilities.
  • The second step was to plug a working decoder into these new function stubs.

I used the official libvpx (close to the 0.9.7 version) and stripped out several functionalities. The goal of course is to have a lightweigth standalone decoder. Example of removed code are the VP8 encoder, multi-threading (which was actually counter productive into the decoder part), the libvpx API, CPU run time detection, frame scaling functionality, a custom memory manager, etc…

And it worked !

This new decoder can be used by the patched mplayer, through the VDPAU Gallium3D state tracker, to decode VP8 video. Seeking, pausing, and everything works as intended.


Let’s see how our different components interacts :


So far only mplayer can use the VP8 decoder inside Mesa, but other implementation can be provided for other popular media players like VLC and who knows, the flash media player ?! Similarly, the patched version of mplayer can only use the Mesa VP8 decoder, but only because there is no other VDPAU VP8 implementation in the wild right now.

What’s left

The work on libvdpau and mplayer/ffmpeg is done and only need to be reviewed.

Right now the VP8 decoder is a pure C implementation running on the CPU (only color space conversion is done by the GPU, tested with r600g), and it is about 3 times slower than the regular libvpx (mostly because all CPU SIMD code as been removed). Ultimately different parts of the decoder are going to be rewritten to fit GPU decoding, and I don’t intend to port code from libvpx anymore.

Next week, a more technical post where we will see what decoding operations are the biggest CPU time eater, and what we are gonna do about it :-)

If you have any question, go for it !


Get every new post delivered to your Inbox.