Monday, October 01, 2007

multithreading in ffmpeg and the mpstat program

My new server, the Dell SC1430, is dual Xeon processor, quad core. Therefore, I have a full eight cores available for processing tasks. As I have recently completed a new install of FC6, 64-bit on this system, I've been focused on Cinelerra performance optimization. As an adjunct, I happened to notice that when I ran command line ffmpeg, only one of my processors was being used. I had thought that FFMPEG was multithreaded by default, so I was perplexed.

Chasing My Tail
Thinking it was a compile option that needed to be specified, I bounced a few ideas off my friend Graham and at the time, we were thinking "compile option." I googled FFMPEG_CFLAGS, ffmpeg smp and a host of other searches while sniffing down what was to be the wrong track. Taking a step back, I figured I'd try to find information from the source, rather than looking for just a command line option solution. I found from the FFMPEG site (http://ffmpeg.mplayerhq.hu/changelog.html) that that as of version 0.4.9-pre1, FFMPEG supports multithreading/smp for the following codecs:
- multithreaded/SMP motion estimation
- multithreaded/SMP encoding for MPEG-1/MPEG-2/MPEG-4/H.263
- multithreaded/SMP decoding for MPEG-2

OK. So it supports multithreading, but not for all codecs. The absence of multithreading of jpeg/mjpeg was a bummer. And when I ran the following conversion script to convert a DVD to a smaller format MPEG:
ffmpeg -i testdvd.mpg -target svcd output.mpg

I saw that only one of my processors was being utilized. Let's investigate this further.

mpstat to the rescue!
A new find for me is mpstat. mpstat is a program available in RedHat/Fedora that allows you to view the CPU utilization of each processor in your system. Nice! From its output, I saw that only one processor out of eight was being utilized:
[root@localhost ~]# mpstat -P 0 -P 1 -P 2 -P 3 -P 4 -P 5 -P 6 -P 7 4
09:51:21 AM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
09:51:23 AM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 6.00
09:51:23 AM 1 89.50 0.00 1.50 1.00 0.00 0.00 0.00 7.50 17.00
09:51:23 AM 2 7.00 0.00 0.50 0.00 0.00 0.00 0.00 93.00 3.00
09:51:23 AM 3 8.00 0.00 0.00 0.00 0.00 0.00 0.00 92.00 0.00
09:51:23 AM 4 5.50 0.00 0.00 0.00 0.00 0.00 0.00 94.50 250.50
09:51:23 AM 5 3.00 0.00 0.00 0.00 0.00 0.00 0.00 97.00 0.00
09:51:23 AM 6 5.50 0.00 0.50 1.00 0.00 0.00 0.00 93.00 0.00
09:51:23 AM 7 9.00 0.00 0.00 0.00 0.00 0.00 0.00 91.00 0.00


So something is wrong. As I was out of ideas, I finally decided to ask the folks who should know: the ffmpeg-users mailing list:
http://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-user

I soon received an answer from Lukas: the "-threads" option!

I tried the "-threads" parameter with various settings (1,2,8 threads). As I have eight processors, the limit was eight threads. If I used more threads than available CPUs, I saw this error at the bottom of the FFMPEG output:
[mpeg2video @ 0x3bfd518850]too many threads

So I then ran a couple of interesting tests.

TEST 1
Convert QT mov file to MPEG2 DVD

Syntax:
ffmpeg -i test.mov -threads 8 -target dvd output.mpg

In this test, the Quicktime file used MJPEG video compression scheme and is not supported for multithreading in FFMPEG. However, MPEG2 is supported.

From the output of top, I did see that process utilization increased slightly each time I increased the number of threads:
1 thread: 12.5% cpu used
2 threads: 14.7% cpu used
8 threads: 16.3% cpu used


However, when I looked at the output of mpstat, it showed the original behavior, whereby one processor was getting fed the entire task:
09:51:29 AM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
09:51:31 AM 0 7.00 0.00 0.00 0.00 0.00 0.00 0.00 93.00 6.00
09:51:31 AM 1 95.00 0.00 1.00 1.00 0.00 0.00 0.00 3.00 10.00
09:51:31 AM 2 5.00 0.00 0.00 0.00 0.00 0.00 0.00 95.00 3.00
09:51:31 AM 3 7.00 0.00 0.50 0.00 0.00 0.00 0.00 92.50 0.00
09:51:31 AM 4 5.00 0.00 0.00 0.00 0.00 0.00 0.00 95.00 0.00
09:51:31 AM 5 3.50 0.00 0.00 0.00 0.00 0.00 0.00 97.00 250.50
09:51:31 AM 6 5.00 0.00 0.00 0.00 0.00 0.00 0.00 95.00 0.00
09:51:31 AM 7 3.50 0.00 0.00 0.00 0.00 0.00 0.00 96.50 0.00


Hmmm. On to test two:

TEST 2
Convert a DVD of high quality to smaller resolution mpeg2video

Syntax:
ffmpeg -i testdvd.mpg -threads 8 -target svcd output.mpg

In this test, both the source and destination codecs are supported for multithreading in FFMPEG. Now this is where the testing got fun. The output from mpstat was somewhat different this time:
10:00:28 AM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
10:00:32 AM 0 22.50 0.00 0.50 0.00 0.00 0.00 0.00 77.00 5.00
10:00:32 AM 1 17.50 0.00 0.00 0.00 0.00 0.00 0.00 82.50 3.00
10:00:32 AM 2 23.00 0.00 1.50 0.00 0.00 0.00 0.00 75.75 0.00
10:00:32 AM 3 12.00 0.00 0.25 0.00 0.00 0.00 0.00 88.00 250.25
10:00:32 AM 4 29.00 0.00 1.25 0.00 0.00 0.00 0.00 70.25 0.00
10:00:32 AM 5 12.00 0.00 0.25 2.25 0.00 0.00 0.00 85.75 0.00
10:00:32 AM 6 71.25 0.00 3.25 4.00 0.00 0.00 0.00 22.00 17.00
10:00:32 AM 7 18.75 0.00 0.25 0.00 0.00 0.00 0.00 81.00 0.00


Sweet! Notice that all my processors are being utilized. Best part of all, my resulting render fps went from 48fps to 150fps. Awesome!

The Key Thing to Remember
So the key is that multithreading using the "-threads" option in FFMPEG only works when BOTH the source and destination files are of the supported types:
- multithreaded/SMP motion estimation
- multithreaded/SMP encoding for MPEG-1/MPEG-2/MPEG-4/H.263
- multithreaded/SMP decoding for MPEG-2

Remember this, Grasshopper.

And I am so very happy that I don't have to recompile..

thanks to Graham Evans and the ffmpeg-users mail list!
The Mule

related posts
http://crazedmuleproductions.blogspot.com/2010/01/batch-render-redux.html
/2010/01/compile-times-performance-improved.html

2 comments:

MeTheSheeple said...

Thanks for this very helpful post! It popped up early in Google, and has let me nearly double my performance.

Would it be rude to mention that you get 150fps on really big iron, and I'm getting 127fps on a $429 eMachines box? =)

Cacasodo said...

Sheeple,
Glad the post was helpful.

Interesting that your inexpensive box gets an excellent framerate. I assume that's with a DVD resolution vid?

I might speculate that it is the relatively low speed of my CPUs..1.6Ghz..that may be the constraint.

cm

ps - love the doggie icon. We just lost our beloved Dalmatian who was fifteen.
:(