Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

Wednesday, January 27, 2010

batch render redux

As a follow up to this original post, I thought I'd give a few hints on working with the batch render function in Cinelerra. I've been using the batch render to prepare me for my new Fedora 12, x86-64 system. Specifically, I am using batch render to profile the capacity and speed of my current Fedora 10 system. After I install the new Fedora 12 system, I can then use the Fedora 10 performance baseline to tell me how much faster (or slower) the new system is in relation to the old system.

Selecting a Cinelerra Project
The first thing I did was use a short (~1 min) clip of a project that I had been working on:


I then used this project to output a short clip as the basis for the batch render:


Batch render is accessed by typing Shift-B within Cinelerra:


You can save a batch render list to XML format. This format is similar to the XML of the edit decision list (EDL) that Cinelerra stores when a project is saved. You can then load that XML to use later.

Batch Render Gotchas
I used a previous batch list to render out that short clip to about twenty different file formats. The batch render blew up a few times, so I had to get over a few obstacles:
1) the batch list I had saved months ago was out of date and the directory pointers in the XML were incorrect. I fixed the incorrect pointers by going into the vi text editor and doing a wholesale conversion with a sed construct:
:1,$ s/videos\/oldpath/videos\/newpath/g

1,$ says look for and make replacements on all lines. s means "search for.." The matching expression looks for the string "videos/oldpath" in the file and replaces it with "videos/newpath". g means "do the replacement globally"

2) my project had an improperly formatted Quicktime video track on the timeline. I kept getting Quicktime errors when the batch ran, which would crash Cinelerra. Once I removed that errant track, the batch render worked correctly.

3) once I got the batch running, my disk would fill up quickly as I was rendering to a few uncompressed formats. I deleted some very large, extraneous files and the batch was able to complete.

So you can see that you need to prepare both the base project and your system if you expect your batch render to run properly.

Command Line Batching
Once you get your batch working, another nice feature of batch rendering you can take advantage of is the ability of Cinelerra to do this at a command line. I just kicked off a batch job at the command line and see that it does work on a box with X installed. Nicely, it also give you an ETA:

[sodo@tbear ~]$ cinelerra -r /mnt/videos/cinelerra/batch/batchList.xml
/mnt/videos/cinelerra/batch/renderCompatibility.xml
Cinelerra 2.1CV (C) 2006 Heroine Virtual Ltd.
Compiled on Sat Jan 23 01:32:17 EST 2010

Cinelerra is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. There is absolutely no warranty for Cinelerra.
Render::run: /mnt/videos/cinelerra/batch/renderCompatibility.xml
Render::render: starting render farm
[mpeg4 @ 0x7ff313d3ba00]warning: first frame is no keyframe
[mpeg4 @ 0x7ff313d3ba00]concealing 1 DC, 1 AC, 1 MV errors
12% ETA: 0:07:45



Batch Results
So my goal was to profile my current system's capacity by capturing the system's cpu and disk utilization while the files rendered. To really see what is going on though, I thought it might be nice to have a graphical representation of the render as it occurs over time. So I spent some time writing a gnuplot script to plot the system utilization as the files rendered:

A bit of explanation may be required. I captured the output of vmstat to a file. VMstat has CPU load and wait i/o (disk utilization) statistics. While capturing that output, I kicked off the batch render.

In the graphic, you can see that different types of renders have different utilization profiles. For example, the mpeg4 renders were generally lower in CPU utilization (red line), while h264 renders used a lot of CPU. Similarly, the uncompressed formats like rgb/rgba/yuv420planar stress out the disk quite a bit (green line). Please excuse the fact that the filenames aren't perfectly lined up with each file's render profile..this was my first effort at graphing render times.

It will be interesting to see how the new Fedora 12 install affects the CPU. Also, I am planning on installing a new hardware RAID set, so I expect those green lines to go to zero (hopefully)!

ciao,
the mule

Sunday, January 24, 2010

Fedora 12, x86-64 upgrade

The time has come again..sytem upgrade. Ugh.

From Fedora 10 x86-64 to Fedora 12 x86-64
I say "ugh", but I truly am excited as Fedora 12 does have some nice performance improvements (ext4, kernel modesetting, faster boot, rpm) that they've packaged since the Fedora 10 system I'm working with now:
http://fedoraproject.org/wiki/Releases/11/FeatureList
http://fedoraproject.org/wiki/Releases/12/FeatureList

To be clear, I don't do upgrades. I will tar up my /home directory to USB, install the new OS from scratch and then blast my /home directory onto the clean new OS and RAID array.

Thinking Hard
I've spent quite a bit of time planning this upgrade. One of the big things I am doing is to profile the performance of my system before and after the OS and hardware upgrades. Of course, I won't be able to determine whether or not the performance gain is coming from the OS or the new RAID array, but at the end of the day, I simply want to be able to say "my system is now X% faster."

I will be looking at the performance of the system from the OS, Cinelerra and encoder perspectives.

Learning about Fedora 12
http://fedoraproject.org/wiki/Common_F12_bugs
http://www.scribd.com/doc/24513176/Fedora-12-Installation-Guide
Changes_in_Fedora_for_Desktop_Users

Hardware changes going in
New RAID configuration:
3WARE Pci-e 9650SE RAID card with Battery Backup
four Western Digital 1.5 TB Green SATA 32MB Cache Hard Drive

Virtual Machine Testbed
One of the things that has helped me in the process is using VMware Server to test out Fedora 12. I've caught a couple things right off the bat: as it is a proprietary format, FAAC is not installed with FFmpeg by default. I was able to resolve this through Doran's excellent post here:
http://fozzolog.fozzilinymoo.org/tech/2009/11/recompiling-ffmpeg-for-fedora-12-to-add-faac-support.html

Also, H264 encoder magic has changed a bit. Other than that, my output testing to various formats (MPEG-PS, HDV, DVD, iPod/iPhone) has worked very well.

General prep work
work out bugs with Fedora 12 virtual machine
clean up old F10 system
backup F10 system files via script
backup /home directory via tar to external drive

Installation steps
Install new F12, Developer's edition
Install RPMs via script
Build and install FFmpeg RPM with faac support from nonfree RPM Fusion repo via script
Install favorite programs
Install Cinelerra dependencies
Install Cinelerra

For those with strong constitutions, here's the full project plan:
http://spreadsheets.google.com/ccc?key=0AjSzE_zejuQZdFphck9aQUVBbzZVOWhyOC1CaVFVQmc&hl=en

I'm almost there..most of the planning is done. Now, to execute! I'll let you know how it goes.
The Mule

Reference
http://www.graphics-muse.org/wp/?p=501

Sunday, October 14, 2007

rendering on the dual quad core Dell SC1430

I'm starting to get used to the new rig and what works and what doesn't. Here's a video describing how a render works on the new Dell SC1430, dual quad core xeon box running FC6, 64-bit Cinelerra. The render format (not shown in the video) is the following:
File format: Quicktime for Linux
Audio compression scheme: MPEG-4 audio, 128kbps bitrate, quantization quality 50%
Video compression scheme: H.264, 1000000 fixed bitrate (1Mbps)

I was listening to Aaron Newcomb's SourceShow podcast from LinuxWorld in Ohio while describing the render, so that's the voice you'll hear in the background.


PS - I rendered to Quicktime for Linux in this example, but you can use the
-threads [numberOfCpus]
command line switch to use more than one CPU.

enjoy,
the mule

Monday, October 01, 2007

multithreading in ffmpeg and the mpstat program

My new server, the Dell SC1430, is dual Xeon processor, quad core. Therefore, I have a full eight cores available for processing tasks. As I have recently completed a new install of FC6, 64-bit on this system, I've been focused on Cinelerra performance optimization. As an adjunct, I happened to notice that when I ran command line ffmpeg, only one of my processors was being used. I had thought that FFMPEG was multithreaded by default, so I was perplexed.

Chasing My Tail
Thinking it was a compile option that needed to be specified, I bounced a few ideas off my friend Graham and at the time, we were thinking "compile option." I googled FFMPEG_CFLAGS, ffmpeg smp and a host of other searches while sniffing down what was to be the wrong track. Taking a step back, I figured I'd try to find information from the source, rather than looking for just a command line option solution. I found from the FFMPEG site (http://ffmpeg.mplayerhq.hu/changelog.html) that that as of version 0.4.9-pre1, FFMPEG supports multithreading/smp for the following codecs:
- multithreaded/SMP motion estimation
- multithreaded/SMP encoding for MPEG-1/MPEG-2/MPEG-4/H.263
- multithreaded/SMP decoding for MPEG-2

OK. So it supports multithreading, but not for all codecs. The absence of multithreading of jpeg/mjpeg was a bummer. And when I ran the following conversion script to convert a DVD to a smaller format MPEG:
ffmpeg -i testdvd.mpg -target svcd output.mpg

I saw that only one of my processors was being utilized. Let's investigate this further.

mpstat to the rescue!
A new find for me is mpstat. mpstat is a program available in RedHat/Fedora that allows you to view the CPU utilization of each processor in your system. Nice! From its output, I saw that only one processor out of eight was being utilized:
[root@localhost ~]# mpstat -P 0 -P 1 -P 2 -P 3 -P 4 -P 5 -P 6 -P 7 4
09:51:21 AM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
09:51:23 AM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 6.00
09:51:23 AM 1 89.50 0.00 1.50 1.00 0.00 0.00 0.00 7.50 17.00
09:51:23 AM 2 7.00 0.00 0.50 0.00 0.00 0.00 0.00 93.00 3.00
09:51:23 AM 3 8.00 0.00 0.00 0.00 0.00 0.00 0.00 92.00 0.00
09:51:23 AM 4 5.50 0.00 0.00 0.00 0.00 0.00 0.00 94.50 250.50
09:51:23 AM 5 3.00 0.00 0.00 0.00 0.00 0.00 0.00 97.00 0.00
09:51:23 AM 6 5.50 0.00 0.50 1.00 0.00 0.00 0.00 93.00 0.00
09:51:23 AM 7 9.00 0.00 0.00 0.00 0.00 0.00 0.00 91.00 0.00


So something is wrong. As I was out of ideas, I finally decided to ask the folks who should know: the ffmpeg-users mailing list:
http://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-user

I soon received an answer from Lukas: the "-threads" option!

I tried the "-threads" parameter with various settings (1,2,8 threads). As I have eight processors, the limit was eight threads. If I used more threads than available CPUs, I saw this error at the bottom of the FFMPEG output:
[mpeg2video @ 0x3bfd518850]too many threads

So I then ran a couple of interesting tests.

TEST 1
Convert QT mov file to MPEG2 DVD

Syntax:
ffmpeg -i test.mov -threads 8 -target dvd output.mpg

In this test, the Quicktime file used MJPEG video compression scheme and is not supported for multithreading in FFMPEG. However, MPEG2 is supported.

From the output of top, I did see that process utilization increased slightly each time I increased the number of threads:
1 thread: 12.5% cpu used
2 threads: 14.7% cpu used
8 threads: 16.3% cpu used


However, when I looked at the output of mpstat, it showed the original behavior, whereby one processor was getting fed the entire task:
09:51:29 AM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
09:51:31 AM 0 7.00 0.00 0.00 0.00 0.00 0.00 0.00 93.00 6.00
09:51:31 AM 1 95.00 0.00 1.00 1.00 0.00 0.00 0.00 3.00 10.00
09:51:31 AM 2 5.00 0.00 0.00 0.00 0.00 0.00 0.00 95.00 3.00
09:51:31 AM 3 7.00 0.00 0.50 0.00 0.00 0.00 0.00 92.50 0.00
09:51:31 AM 4 5.00 0.00 0.00 0.00 0.00 0.00 0.00 95.00 0.00
09:51:31 AM 5 3.50 0.00 0.00 0.00 0.00 0.00 0.00 97.00 250.50
09:51:31 AM 6 5.00 0.00 0.00 0.00 0.00 0.00 0.00 95.00 0.00
09:51:31 AM 7 3.50 0.00 0.00 0.00 0.00 0.00 0.00 96.50 0.00


Hmmm. On to test two:

TEST 2
Convert a DVD of high quality to smaller resolution mpeg2video

Syntax:
ffmpeg -i testdvd.mpg -threads 8 -target svcd output.mpg

In this test, both the source and destination codecs are supported for multithreading in FFMPEG. Now this is where the testing got fun. The output from mpstat was somewhat different this time:
10:00:28 AM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
10:00:32 AM 0 22.50 0.00 0.50 0.00 0.00 0.00 0.00 77.00 5.00
10:00:32 AM 1 17.50 0.00 0.00 0.00 0.00 0.00 0.00 82.50 3.00
10:00:32 AM 2 23.00 0.00 1.50 0.00 0.00 0.00 0.00 75.75 0.00
10:00:32 AM 3 12.00 0.00 0.25 0.00 0.00 0.00 0.00 88.00 250.25
10:00:32 AM 4 29.00 0.00 1.25 0.00 0.00 0.00 0.00 70.25 0.00
10:00:32 AM 5 12.00 0.00 0.25 2.25 0.00 0.00 0.00 85.75 0.00
10:00:32 AM 6 71.25 0.00 3.25 4.00 0.00 0.00 0.00 22.00 17.00
10:00:32 AM 7 18.75 0.00 0.25 0.00 0.00 0.00 0.00 81.00 0.00


Sweet! Notice that all my processors are being utilized. Best part of all, my resulting render fps went from 48fps to 150fps. Awesome!

The Key Thing to Remember
So the key is that multithreading using the "-threads" option in FFMPEG only works when BOTH the source and destination files are of the supported types:
- multithreaded/SMP motion estimation
- multithreaded/SMP encoding for MPEG-1/MPEG-2/MPEG-4/H.263
- multithreaded/SMP decoding for MPEG-2

Remember this, Grasshopper.

And I am so very happy that I don't have to recompile..

thanks to Graham Evans and the ffmpeg-users mail list!
The Mule

related posts
http://crazedmuleproductions.blogspot.com/2010/01/batch-render-redux.html
/2010/01/compile-times-performance-improved.html

Sunday, August 05, 2007

screen capture using Cinelerra

Unfortunately, capturing video live into Cinelerra CVS is broken, as of 1/21/2009. However! I tested it out and Cinelerra CAN capture screen activity directly to the timeline! This is a really nice feature.

The basic steps are:
1) go into Preferences -> Recording
2) select the destination File Format and whether you want to capture audio, video or both
3) set Audio In prefs (TwosComplement and keep your sample rate low!)
4) set Video In prefs (MPEG4 worked for me)
5) set Record Driver to Screencapture (set size of captured frame here and FPS)
6) apply your changes
7) press "r" for record and you'll see the Cinelerra Video In box popup with the active display
8) click the record button, which is the red, round button next to Transport: and you'll start recording as noted by the Position
9) click the stop button, which is the white square button next to Transport:
10) select your insertion strategy (I left mine at "Paste at insertion point"
11) click the green checkmark or just hit enter to accept and paste your captured video

If you click the "Monitor Video" radio button, you'll see the part of the screen to be captured.  If you have dual monitors, note that you can pan the area of the desktop that you can record by click-dragging the desktop area within the "Monitor Video" window.  I stumbled upon that undocumented feature.

The resolution of captured video is proportionate to the speed of your system overall. Thus, faster CPU, high-speed memory and striped hard drives help get you screen captures that are larger in resolution and smoother in playback. But there are other things than hardware upgrades that you can change in Cinelerra in order to increase the relative smoothness of your video capture. By "relative smoothness", I mean decreasing video frame drops and clipped audio samples.

For better performance, do the following:
- record using a lower audio sample rate (22Khz or below)
- record to an uncompressed video format. RGB/RGBA works well for me. I do this because compressed video formats like MPEG4 tend to hog CPU power and thus contribute to video frame drops. Your final output will most likely be a compressed format, so the uncompressed format will only be an intermediary that you will discard. Be careful with uncompressed formats, though! Five minutes of video sucked up about a gigabyte of disk! :)
- limit your mouse movements while recording. Try to use keyboard shortcuts to open, close and move windows

Here's a video of the process:

Saturday, October 21, 2006

basic OpenGL vs XV performance stats

Thought I'd share some Cinelerra performance stats using OpenGL versus non-OpenGL (XV) display drivers. I used two tracks of identical length (32s), but different content.

OpenGL
one 32s 720P track: 20fps
two 32s 720P tracks, both play enabled, no fade on top track: 2.97fps
two 32s 720P tracks, both play enabled, 50% fade on top track: 3.04fps

Stopped then restarted Cinelerra and performed next test.

XV
one 32s 720P track: 10.6fps
two 32s 720P tracks, both play enabled, no fade on top track: 2.2fps
two 32s 720P tracks, both play enabled, 50% fade on top track: 2.18fps

Not fabulous performance for using > 1 track, but still slight improvement over XV.

Thursday, October 19, 2006

HDV MPEG2 transport stream file sizes/render rates

The data rate of the 720P MPEG2-TS files output from my cam is about 108.95MB/min or 1.82MB/s. Here is a table of video length-to-size conversions.

duration size

12m 1.32GB
15m 1.65GB
18m35s 2.07GB
19m12s 2.12GB
20m01s 2.18GB
34m 3.70GB
Exporting 720P HDV from Cinelerra takes two processes:
1) render the video
2) render the audio

Here are some rendering times using mpeg2enc and mpeg layer 2 audio compression:

duration mpeg2enc render rate
63m 310m 4.92min per min of video

duration mp2 render rate
63m 6m 0.09min per min of audio

Mplex takes about 7 minutes to mux about an hour of audio and video.

Friday, March 10, 2006

optimal drive/partition setup for Cinelerra

To keep it simple, I would use five drives and do four logical partitions. This would allow for all filesystems to be on separate drive spindles:
- one system drive for /root, /usr, etc - ext3
- one storage drive for source files - ext3
- one working drive for index files in .bcast - ext2
- two working drives in RAID0 (stripe set) for destination render - ext2

If you only have four drives, just keep the source files on the system drive. I would think the index files would be the most used/most reads. I have to test this use "iostat -x " to monitor read/write stats of each physical device. I assume the render partition would be the heaviest hit, so make that your RAID stripe. You can do software RAID, but as a person who runs a decent sized HP web server farm, we've always depended on hardware RAID because CPU cycles are offloaded to the RAID card itself, rather than the operating system using CPU cycles for software RAID. Since rendering is pretty much all CPU, except for the input/index file reads and the destination render file writes, it makes sense that software RAID would tend to slow your rendering down.

Finally, ext2 is very important..no journaling necessary for the working drives..just storage.

When I used Adobe Premiere, I noticed that my ATI All In Wonder actually sped up the MPEG2 render times by 50%. However, I haven't been able to get the fglrx driver to work w/my dual monitor setup on FC4. Fglrx works with one monitor, but dammit, there's no way I'm going back to just one monitor! But I will need to test out whether or not the fglrx driver speeds rendering for Cinelerra as it does Premiere.