Showing posts with label raid. Show all posts
Showing posts with label raid. Show all posts

Saturday, February 06, 2010

lost root partition..oops.

I was running some disk performance statistics on the new Fedora 12 64-bit yesterday according to the very good benchmarking article on 3ware's site:

Benchmarking
I was benchmarking the write performance of my RAID set when it seemed to stall out. The process I was running was writing a bunch of zeros to a 20 gigabyte file. I believe the stall was due to the fact that my RAID controller card's battery was disconnected; hence, write-cacheing was disabled.

I let the process try to finish for four hours. I figured it should have finished writing that 20GB file by that time. However, the fact that the system was still slow to non-responsive indicated that activity was still taking place. But, being an impetuous fool, I was anxious to get working on some video and also thought it might be an interesting test of the resilience of the ext4 filesystem if I just shut the system down. So I as a soft reboot did not do the trick, I hard powered the box off.

Sleeping the Sleep of the Dead
In retrospect, I should have let the box finish whatever it was doing, because as you may have guessed it, the box didn't come back up. Here was the first indication from the kernel messages:
Boot has failed..sleeping forever

And in the dmesg output:
can't mount root filesystem
can't access tty job control turned off

Woops. Dracut did find my volume group:
dracut: 2 logical voumes in "vg_ogre" now active

Something was wrong with the root filesystem mount:
mount: you must specify the filesystem type

Just in case, I rebooted with the following kernel parameters in grub to see more debugging and to drop me to an emergency shell to see if I could debug the problem:
kernel .. debug rdshell

What Up, ext4?
Oh boy. So, ext4 is not as resilient as I believed. I thought the best course of action would be to load up Fedora Live, and look at the disk stats. Since fdisk does not work with GPT partitions, I used parted and thought that I'd use e2fsck to fix any bad blocks. After booting the Live CD, here's what I found:

The swap drive seemed in tact (oh, great):
[liveuser@localhost ~]$ dmesg | grep vg
vgaarb: device added: PCI:0000:07:00.0,decodes=io+mem,owns=io+mem,locks=none
vgaarb: loaded
Adding 12369912k swap on /dev/mapper/vg_ogre-lv_swap. Priority:-1 extents:1 across:12369912k


I thought I'd try to manually mount my / partition. I had to become superuser in order to do this:
[liveuser@localhost ~]$ su
[root@localhost liveuser]# mkdir /mnt/root
[root@localhost liveuser]# mount -t ext4 /dev/mapper/vg_ogre-lv_root /mnt/root
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg_ogre-lv_root,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so


Dmesg tells me what I already know:
[root@localhost liveuser]# dmesg | tail
[drm] nouveau 0000:07:00.0: 0x00409910: 0x3fbf3fdb
[drm] nouveau 0000:07:00.0: 0x00409e08: 0x0002dea8
[drm] nouveau 0000:07:00.0: 0x00409e0c: 0x00000000
[drm] nouveau 0000:07:00.0: 0x00409e24: 0x0a21026f
EXT4-fs (dm-2): VFS: Can't find ext4 filesystem


I just want to see what fdisk reads about my hardware RAID5 array (3ware 9650SE):
[root@localhost liveuser]# fdisk -l /dev/sda

WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.

WARNING: The size of this disk is 4.5 TB (4499967049728 bytes).
DOS partition table format can not be used on drives for volumes
larger than (2199023255040 bytes) for 512-byte sectors. Use parted(1) and GUID
partition table format (GPT).


Disk /dev/sda: 4500.0 GB, 4499967049728 bytes
255 heads, 63 sectors/track, 547089 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000f0844

Device Boot Start End Blocks Id System
/dev/sda1 1 267350 2147483647+ ee GPT


What does parted see about /dev/sda?
[root@localhost liveuser]# parted /dev/sda print
Model: AMCC 9650SE-4LP DISK (scsi)
Disk /dev/sda: 4500GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 17.9kB 210MB 210MB ext4 boot
2 210MB 4500GB 4500GB lvm


At least the partition is there. But it looks like parted does not have support for checking ext4 filesystems yet:
[root@localhost liveuser]# parted /dev/sda
GNU Parted 1.9.0
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) check 1
No Implementation: Support for opening ext4 file systems is not implemented yet.
(parted) check 2
Error: Could not detect file system.
(parted) quit


e2fsck bound!
Let me run e2fsck (which does have support for ext4 filesystems) and see if I can fix the problem:
[root@localhost liveuser]# e2fsck
Usage: e2fsck [-panyrcdfvtDFV] [-b superblock] [-B blocksize]
[-I inode_buffer_blocks] [-P process_inode_size]
[-l|-L bad_blocks_file] [-C fd] [-j external_journal]
[-E extended-options] device

Emergency help:
-p Automatic repair (no questions)
-n Make no changes to the filesystem
-y Assume "yes" to all questions
-c Check for bad blocks and add them to the badblock list
-f Force checking even if filesystem is marked clean
-v Be verbose
-b superblock Use alternative superblock
-B blocksize Force blocksize when looking for superblock
-j external_journal Set location of the external journal
-l bad_blocks_file Add to badblocks list
-L bad_blocks_file Set badblocks list


My skills at e2fsck are pretty basic. I use the -n option to make no changes while I review what e2fsck finds out about the array:
[root@localhost liveuser]# e2fsck -n /dev/mapper/vg_ogre-lv_root
e2fsck 1.41.9 (22-Aug-2009)
e2fsck: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).
Clear? no

e2fsck: Illegal inode number while checking ext3 journal for /dev/mapper/vg_ogre-lv_root


Invalid journal..oops.
[root@localhost liveuser]# e2fsck -v /dev/mapper/vg_ogre-lv_root
e2fsck 1.41.9 (22-Aug-2009)
e2fsck: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).
Clear
?y

I had thought that ext4 gave us the safety of a journalled filesystem (like ext3) with increased performance. You would have thought it could have recovered from being shutdown while writing a bunch of zeros to a 20 gigabyte file.

And then of course, hundreds to thousands of these various errors:
Group descriptor 32923 checksum is invalid. FIXED.

Entry 'e61abf8156cc476151baa07d67337cae-le64.cache-3' in ??? (57347) has deleted/unused inode 212. Clear? yes

Unconnected directory inode 98305 (...)
Connect to /lost+found? yes

Free blocks count wrong for group #138 (32768, counted=557).
Fix? yes

Free inodes count wrong for group #308 (8192, counted=8186).
Fix? yes

Directories count wrong for group #308 (0, counted=6).
Fix? yes


Finally..at the bottom of the list of errors:
Recreate journal? yes

Creating journal (32768 blocks): yyyyyyy Done.

*** journal has been re-created - filesystem is now ext3 again ***

/dev/mapper/vg_ogre-lv_root: ***** FILE SYSTEM WAS MODIFIED *****

327475 inodes used (0.12%)
585 non-contiguous files (0.2%)
130 non-contiguous directories (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 310327/414/1
239381919 blocks used (21.85%)
0 bad blocks
42 large files

283167 regular files
27385 directories
0 character device files
0 block device files
0 fifos
3953 links
16849 symbolic links (16659 fast symbolic links)
63 sockets
--------
331417 files
[root@localhost liveuser


Result?
So let's see if I have files in tact after that 18 hour experience..
[root@localhost liveuser]# mount -t ext4 /dev/mapper/vg_ogre-lv_root /mnt/root/
[root@localhost liveuser]# ls /mnt/root
lost+found
[root@localhost liveuser]# ls /mnt/root
lost+found
[root@localhost liveuser]# ls /mnt/root/lost+found/

*348489 *723483 324843 238390

Ah..that would be a "no." Time to reinstall F12. Ugh. Lesson learned. But I need to know why I couldn't recover a journal. Maybe I did not look in the right place. I need to understand journalling better.

Things I Learned Along the Way
Some boot info from the Live CD
[root@localhost liveuser]# grep EFI_ /boot/config-2.6.31.5-127.fc12.i686
CONFIG_EFI_VARS=y
CONFIG_EFI_PARTITION=y


You can force a filesystem check upon the next reboot with this command:
shutdown -rF now

You can run verbose debug messages and drop to an emergency shell by placing these commands on the kernel line in grub:
kernel /vmlinuz-2.6.31.12-174.2.3.fc12.x86_64 ro root=/dev/mapper/vg_ogre-lv_root debug rdshell

References

Sunday, January 24, 2010

Fedora 12, x86-64 upgrade

The time has come again..sytem upgrade. Ugh.

From Fedora 10 x86-64 to Fedora 12 x86-64
I say "ugh", but I truly am excited as Fedora 12 does have some nice performance improvements (ext4, kernel modesetting, faster boot, rpm) that they've packaged since the Fedora 10 system I'm working with now:
http://fedoraproject.org/wiki/Releases/11/FeatureList
http://fedoraproject.org/wiki/Releases/12/FeatureList

To be clear, I don't do upgrades. I will tar up my /home directory to USB, install the new OS from scratch and then blast my /home directory onto the clean new OS and RAID array.

Thinking Hard
I've spent quite a bit of time planning this upgrade. One of the big things I am doing is to profile the performance of my system before and after the OS and hardware upgrades. Of course, I won't be able to determine whether or not the performance gain is coming from the OS or the new RAID array, but at the end of the day, I simply want to be able to say "my system is now X% faster."

I will be looking at the performance of the system from the OS, Cinelerra and encoder perspectives.

Learning about Fedora 12
http://fedoraproject.org/wiki/Common_F12_bugs
http://www.scribd.com/doc/24513176/Fedora-12-Installation-Guide
Changes_in_Fedora_for_Desktop_Users

Hardware changes going in
New RAID configuration:
3WARE Pci-e 9650SE RAID card with Battery Backup
four Western Digital 1.5 TB Green SATA 32MB Cache Hard Drive

Virtual Machine Testbed
One of the things that has helped me in the process is using VMware Server to test out Fedora 12. I've caught a couple things right off the bat: as it is a proprietary format, FAAC is not installed with FFmpeg by default. I was able to resolve this through Doran's excellent post here:
http://fozzolog.fozzilinymoo.org/tech/2009/11/recompiling-ffmpeg-for-fedora-12-to-add-faac-support.html

Also, H264 encoder magic has changed a bit. Other than that, my output testing to various formats (MPEG-PS, HDV, DVD, iPod/iPhone) has worked very well.

General prep work
work out bugs with Fedora 12 virtual machine
clean up old F10 system
backup F10 system files via script
backup /home directory via tar to external drive

Installation steps
Install new F12, Developer's edition
Install RPMs via script
Build and install FFmpeg RPM with faac support from nonfree RPM Fusion repo via script
Install favorite programs
Install Cinelerra dependencies
Install Cinelerra

For those with strong constitutions, here's the full project plan:
http://spreadsheets.google.com/ccc?key=0AjSzE_zejuQZdFphck9aQUVBbzZVOWhyOC1CaVFVQmc&hl=en

I'm almost there..most of the planning is done. Now, to execute! I'll let you know how it goes.
The Mule

Reference
http://www.graphics-muse.org/wp/?p=501

Sunday, November 16, 2008

using partimage with RAID

Background
As I am planning on a purchase of a 1080p cam, I will need my system to be up on the latest and greatest kernal and software to get the highest performance from Cinelerra. In that light, I'd like to backup my current Fedora 7 boot and root filesystems, just in case something goes wrong with the Fedora 9 install.

Partimage and My System
I will use partimage to backup these filesystems. Partimage will need to see source and destination filesystems. My first task is to figure out what I have. I built this system over a year ago and don't remember all the specifics of which physical drive has x or y filesystems. I could go back into my notes to find out how I partitioned my system, but that would be cheating. So let's see what the filesystem tells me.

The first thing I do is look at the output of df:
[mule@ogre ~]# df -m
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/md0 457295 6720 426972 2% /
/dev/md2 469453 417004 28602 94% /mnt/videos
/dev/sda1 99 19 76 20% /boot
tmpfs 1007 0 1007 0% /dev/shm


I have two RAID devices, one mounted as my root partition (/dev/md0) and one mounted as my video storage (/dev/md2). Next, I see that /dev/sda1 is my boot partition. Finally, there is a filesystem defined for shared memory, though I am not concerned about saving the contents of that as it is RAM.

How It Works
Partimage backs up filesystems that are not mounted. But partimage is started from a bootable rescue disk, like Knoppix or SysRescCd. The twist here is that I am using RAID partitions. Thus, when I boot with one of these CDs, I will need to assemble my RAID drives in order to have a source to backup (my root filesystem) and a destination to write to (my /mnt/videos filesystem). Partimage will not use a mounted filesystem as a source, but I will need to mount the destination.

Assembling My RAID Drives
I have forgotten the configuration of my RAID drives, so I look at /etc/mdadm.conf to figure out what partitions and UUIDs make up my two RAID sets:
[mule@ogre ~]# cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root
ARRAY /dev/md0 level=raid0 num-devices=2 uuid=c0d4b597:c33b3014:ab694cee:76920165
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=1705b387:1c71d83e:364b60b4:fb0cc192


This tells me that my root partition (/dev/md0) is a stripe set (RAID0) and that the storage for my important stuff, all my videos in /mnt/video is mirrored (RAID1) in case of a failure. I'm glad I built the system this way, as I like the performance benefits of a stripe set for my root partition, but I would consider it tragic if I lost all my work. Therefore, I've mirrored the video drive on two drives in case of a failure.

Video Display Problem with Linux and NVidia Card
For some reason I have not figured out, I cannot see virtual consoles once I exit Gnome. This is due to some incompatibility between the NVidia 8800GT card and my Dell SC1430. This also effects the display when I boot with either Knoppix or SysRescCD. Using these tools, the screen goes black and I can't see any terminal sessions or virtual consoles. Therefore, in order to use the boot cd, I removed the NVidia card and booted using Dell's ATI ES1000 onboard video.

Booting with Knoppix
Once Knoppix is fully booted, I need to assemble my two RAID partitions. You can use either use the UUID or the super-minor number of each RAID set to do this. I chose the super-minor, as it was simpler.

Assemble the Source RAID set
Here I am assembling the source drive, my root filesystem:
root@Knoppix:/ramdisk/home/knoppix# modprobe md
root@Knoppix:/ramdisk/home/knoppix# mdadm --assemble -m 0 /dev/md0


Mount the Destination Partition
Since I want to store the backup image on the same mirrored drive set that holds my videos, I'll mount that partition as the destination for the partimage. Of course, I first have to create the mount point:
mkdir /mnt/videos
mount -t ext3 /dev/md2 /mnt/videos


Run Partimage to Backup Root Partition
I'll need three things to run partimage:
-an assembled RAID set of the source, my root/boot partitions, unmounted
-an assembled RAID set of the destination, mounted
-a compression method

Here's the partimage process:
1) Select the partition to save and give the backup a destination and name. Note that the "Save partition into an image file" is selected as the default behavior:


2) Select a compression method:


3) Give the backup image a description (optional):


4) Partimage takes a few minutes to gather information about large (500GB+ drives), but then displays basic information about the partition to be backed up:


5) Partimage starts the imaging process. I had about 6GB to backup:


6) Partimage took about 20 minutes to create the backup image:


Backup complete. The restore process is similar, but instead of backing up an image file as in Step 1 above, you'll choose the "Restore Partition from an image file" option.

Run Partimage for Boot Partition
Since my boot partition is small 128MB, creating a backup image shouldn't take very long. My boot partition is /dev/sda1


Now I should be ready to an upgrade to Fedora 9. One hurdle I already see: the Fedora 9 installation doesn't recognize pre-existing RAID sets. Yarg. Looks like I might have to blow away the existing stripe set that is home to my root partition. Let you know how that goes.

Update 11/17/2008
The Fedora 9 x86-64 install went well. Here are some of the natty details:
http://crazedmuleproductions.blogspot.com/2008/11/fedora-9-x86-64-install.html
end update

Good day,
The Mule

References
mdadm man page

Friday, October 05, 2007

moving my RAID set to a new box: collision!

For performance, I have my videos stored on a stripe set, using Fedora's software RAID technology. I've recently setup my Dell Octo Core box, but had not yet migrated the RAID set to it. This morning, at about midnight, I decided to start the migration. That was my first mistake.

Contention for the Same Device Name
The RAID set is a couple of 120GB IDE drives on a Sil680 PCI card. Not the best performers, but I was minding my pennies when I bought the drives and card. So I popped the card and the drives in the server. Thankfully, the card was immediately recognized by the BIOS on bootup. However, from the dmesg output:
Oct 4 23:53:53 localhost kernel: md: considering hdd1 ...
Oct 4 23:53:53 localhost kernel: md: adding hdd1 ...
Oct 4 23:53:53 localhost kernel: md: adding hdc1 ...
Oct 4 23:53:53 localhost kernel: md: md0 already running, cannot run hdd1

I saw that the device name of RAID set that held my videos /dev/md0 conflicted with the RAID set that I had created as my / (root) partition for 64-bit Core 6. Argh! Once per year, like Christmas, I have to dust off my rusty mdadm skills. Ugh. This was that time.

The Plan
After reading a number of references listed below, I decided to eliminate the contention, by renaming my video RAID set from /dev/md0 to /dev/md1. To accomplish this, I had to update the superblock on the RAID set to a different preferred minor number. More on this in a moment.

Since putting the drives in the new server, I was a little nervous about the condition of the data on them drives. To give myself a bit more of comfort, I decided on the following course of action:
- put the drives and card back in the original computer
- renumber the preferred minor number of the RAID set there
- test to verify that I can still mount the filesystems on the RAID and access the data
- move the devices back into the new server
- assemble, test and mount the RAID

So Let's Get Started!
I put the card and drives back into the original box. Here is the detail of what the RAID set looked like there:
[root@computer ~]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Sat Aug 19 23:57:28 2006
Raid Level : raid0
Array Size : 234436352 (223.58 GiB 240.06 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Fri Oct 5 14:31:37 2007
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 64K
UUID : 9c4c078f:8935e3e4:bfface8f:6a3c2c18
Events : 0.15

Number Major Minor RaidDevice State
0 22 1 0 active sync /dev/hdc1
1 22 65 1 active sync /dev/hdd1


Update the RAID Device Number (Preferred Minor)
I first stopped the RAID set:
[root@computer ~]# mdadm --stop /dev/md0
mdadm: stopped /dev/md0


Next, I issued the following command to update the minor number. Unfortunately, it didn't work, as I received the following error:
[root@computer ~]# mdadm --assemble /dev/md1 --update=super-minor -m0 /dev/hdd1 /dev/hdc1
mdadm: error opening /dev/md1: No such file or directory


Oh boy. From the error, it looked like I needed to have a block device file called /dev/md1 created. I wasn't sure, though, as my mdadm and RAID chops were rusty. So, after a LOT of research (references listed below), I learned that I needed to create the block device file.

Creating a Block Device
Referring to these instructions, I created the block device for /dev/md1 with the following commands:
[root@computer ~]# mknod /dev/md1 b 9 1

I wanted to keep the permissions consistent with the old /dev/md0 device file, so I ran the following commands:
[root@computer ~]# chmod 640 /dev/md1;chown disk /dev/md1
[root@computer ~]# ll /dev/md*
brw-r----- 1 root disk 9, 0 Oct 5 14:24 /dev/md0
brw-r----- 1 root disk 9, 1 Oct 5 14:43 /dev/md1


Updating and Testing the Preferred Minor Number (device id)
Once the block device file was created, I issued the command to update the preferred minor number of the RAID set to 1:
[root@computer ~]# mdadm --assemble /dev/md1 --update=super-minor -m0 /dev/hdd1 /dev/hdc1
mdadm: /dev/md1 has been started with 2 drives.

Sweet! The RAID device started! Let's see how it looks (note the Preferred Minor number!):
[root@computer ~]# mdadm --detail /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Sat Aug 19 23:57:28 2006
Raid Level : raid0
Array Size : 234436352 (223.58 GiB 240.06 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Fri Oct 5 15:43:48 2007
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 64K
UUID : 9c4c078f:8935e3e4:bfface8f:6a3c2c18
Events : 0.20

Number Major Minor RaidDevice State
0 22 1 0 active sync /dev/hdc1
1 22 65 1 active sync /dev/hdd1


I like the word "clean"! And how are the individual drives making up the set doing?
[root@computer ~]# mdadm -E /dev/hdc1
/dev/hdc1:
Magic : a92b4efc
Version : 00.90.01
UUID : 9c4c078f:8935e3e4:bfface8f:6a3c2c18
Creation Time : Sat Aug 19 23:57:28 2006
Raid Level : raid0
Device Size : 117218176 (111.79 GiB 120.03 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1

Update Time : Fri Oct 5 16:03:24 2007
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 8bd047df - correct
Events : 0.21
Chunk Size : 64K

Number Major Minor RaidDevice State
this 0 22 1 0 active sync /dev/hdc1

0 0 22 1 0 active sync /dev/hdc1
1 1 22 65 1 active sync /dev/hdd1

[root@computer ~]# mdadm -E /dev/hdd1
/dev/hdd1:
Magic : a92b4efc
Version : 00.90.01
UUID : 9c4c078f:8935e3e4:bfface8f:6a3c2c18
Creation Time : Sat Aug 19 23:57:28 2006
Raid Level : raid0
Device Size : 117218176 (111.79 GiB 120.03 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1

Update Time : Fri Oct 5 16:03:24 2007
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 8bd04821 - correct
Events : 0.21
Chunk Size : 64K

Number Major Minor RaidDevice State
this 1 22 65 1 active sync /dev/hdd1

0 0 22 1 0 active sync /dev/hdc1
1 1 22 65 1 active sync /dev/hdd1


Love the word "correct"!

Is My Data Still There?
So how about we try a mount?
[root@computer ~]# mount -t ext2 /dev/md1 /mnt/videos
[root@computer ~]#

No errors on the mount! That's great! Now for the finale..let's look at a test file:
[root@computer ~]# head -2 /mnt/videos/paris/newtrip.xml
<?xml version="1.0"?>
<EDL VERSION="2.0CV" PROJECT_PATH="/root/installFiles/paris/newtrip.xml">

Awesome! I'm very relieved I can read the content off the drive. That is a load off my mind. The last task was to edit /etc/fstab and reboot to make sure the RAID set comes up correctly on boot. Blissfully, those steps were also successful.

Put 'Em In Da New Box!
I then took the whole kit and caboodle to the new server. I am very happy to report that the kernel recognized the newly renumbered RAID set, as shown in the output of dmesg:
md: created md1
md1: setting max_sectors to 128, segment boundary to 32767


and created the /dev/md1 device, as shown in this file listing:
[root@ogre ~]# ll /dev/md*
brw-r----- 1 root disk 9, 0 Oct 5 19:27 /dev/md0
brw-r----- 1 root disk 9, 1 Oct 5 19:27 /dev/md1


I added the following line to /etc/fstab:
/dev/md1 /mnt/videos ext2 defaults 1 1

And ran "mount -a" to reinitialize the file system table. Lo and behold, I've got data on my drive!
[root@ogre ~]# ls /mnt/videos
20060319 20060812 20070316 20070811 axe cinelerra movies paris_tape1 stockholm_tape1
20060406 20070111 20070425 20070912 bloody lost+found paris paris_tape2 stockholm_tape2


Caveat for RAID under a Knoppix CD
At one point in my debugging, I pulled out my trusty Knoppix bootable CD. If you need to load your RAID set from a rescue disk or more specifically, Knoppix, you'll need to load the md module and then run mdadm --assemble to start your existing RAID set.
root@Knoppix:/ramdisk/home/knoppix# modprobe md
root@Knoppix:/ramdisk/home/knoppix# mdadm --assemble -m 0 /dev/md0


Well, another chapter in the life of the Mule is closed. Hopefully, someone will find these notes instructive.

Update 2009/03/25
Some hdparm drive read measurements. Note the 60% read speed increase of the stripe set versus the mirrored set.

/dev/md0 is a software RAID0 (stripe) of two 500GB, 16MB cache SATA drives:
[mule@ogre ~]$ sudo hdparm -tT /dev/md0
sudo hdparm -tT /dev/md0

/dev/md0:
Timing cached reads: 5748 MB in 2.00 seconds = 2877.62 MB/sec
Timing buffered disk reads: 352 MB in 3.02 seconds = 116.68 MB/sec


/dev/md0 is a software RAID1 (mirror) of two 500GB, 16MB cache SATA drives:
[mule@ogre ~]$ sudo hdparm -tT /dev/md2

/dev/md2:
Timing cached reads: 5218 MB in 2.00 seconds = 2612.72 MB/sec
Timing buffered disk reads: 218 MB in 3.03 seconds = 72.04 MB/sec


*** end update ***

The Mule

References
http://www.redhat.com/magazine/019may06/departments/tips_tricks
http://www.linuxdevcenter.com/pub/a/linux/2002/12/05/RAID.html?page=1
http://www.docunext.com/category/raid/

Nice Beginner's Guide
http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch26_:_Linux_Software_RAID

The Man Page
http://www.linuxmanpages.com/man8/mdadm.8.php

HowTo (with good description of chunk sizes)
http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html

MDADM Recipes
http://www.koders.com/noncode/fid76840E0EBBC19222CBCC0913D4AED97C1F5D2A45.aspx

Notes for Debian MDADM users
http://svn.debian.org/wsvn/pkg-mdadm/mdadm/trunk/debian/README.upgrading-2.5.3?op=file

Friday, September 15, 2006

upgraded video workstation has arrived!

My video rig is finally ready to go! After one month of hardware and software upgrades, I'm ready to start editing. Here are the upgrades I did:
- from 1GB non-ECC, Dual DDR to 2GB ECC Dual DDR mem
- from a system and a content drive to a system and RAID0 content drive
- from a 128MB ATI video card to a 512MB NVidia video card
- from non-OpenGL to OpenGL editing software

Update (9/19): converted my old 80GB dual-boot XP/Core 4 system drive to shiny new 250GB SATA w/16MB cache! Took about seven hours to backup the filesystems, partition the new drive and restore, with a few lessons learned along the way. Here's how this went:
http://cacasodo.blogspot.com/2006/09/replacing-old-dual-boot-system-drive.html

So, we're ready to go with da editing! Hooray!!

Install notes:
- FC4 install
- fstab/ntfs kernel module/NVidia drivers/xorg.conf/mdadm.conf/rpmkeys/remove libdv/install libdv4
- libdvDeps.sh
- install cinelerra
- cinSourceDeps.sh
- create /mnt/win, /mnt/nt
- window prefs/google/cin window placemnet/save on exit

rebuilding the workstation: lost my damn RAID set!

I *almost* had a minor tragedy last night while rebuilding my Fedora Core 4 box. The *almost* tragedy occurred while reinstalling Core 4. While going through the process, I told the installer to use my existing RAID set, but do not reformat it. I figured what harm could come to the drive if it doesn't get formatted? Well, a lot, apparently. Since I was rushing through the install, I neglected to take proper care of the 40GB of MPEG2 files on the RAID set. Here's what happened.

After the FC installer finished, it asks to reboot the box, so I rebooted. On bootup, the system gave me errors regarding an unrecognized filesystem and dropped me to a filesystem shell to fix what was wrong. I didn't know what was wrong, so I rebooted into a linux rescue disk and simply took the RAID filesystem out of /etc/fstab and rebooted.

On the second bootup, I started fdisk to look at the drives. To my dismay, fdisk did not recognize the RAID set partitions. Agh! I did some preliminary research online. I came to the determination that I figured I had lost my data, though I did have it backed up. But it would be a pain to retrieve the 40GB or so of videos from my backup system. Damn it. So I bit the bullet and created new partitions of type "fd", a Linux RAID autodetect partition as I had done when I setup the RAID set initially. When I wrote the partition table, fdisk gave me some error saying that the drives will resync after the box reboots. I rebooted and looked at the output of "fdisk -l". Things seemed alright, as the drives were recognized as Linux RAID autodetect. So I then reenabled /dev/md0 in /etc/fstab and made sure that /etc/mdadm.conf was correct. Dejectedly I rebooted.

When the system started, it dropped me into a prompt complaining of filesystem problems again. Now what? This time, I looked at the man pages for fsck and figured out the commands I needed to repair the disk. They ran something like this:
fsck -t ext2 /dev/md0 -V -r
-t filesystem type
-V verbose
-r prompt for each repair


A couple of inodes were missing or corrupt. OK. Fsck seemed to continue on with the different stages of the five stage check procedure. About five or ten minutes of this, I was getting worried. Happily, it finished the checks and dropped me to a prompt in order to exit and reboot. OK..that's progress! Also, the box came up clean with no errors. Sweet. I now went to view the filesystem, and to my shock and surprise, my original files were there! Awesome!! But now the real test is reading from and writing to files. I first viewed one of the videos in mplayer. This worked! I then performed an extensive write test. Cinelerra needs table of contents files for each video. These are index file, essentially. So at a prompt, I generated a bunch of toc files for the 30GB or so of video I had by using this one command:
for i in `ls -1 *.m2t` ; do echo $i ; mpeg3toc $i $i.toc ; done

The file creation took about twenty minutes but worked! I then loaded the files into Cinelerra, started editing and wouldn't you know they are good to go! Hooray! But I'm still an idiot.

Moral of the story is make sure you do your research on RAID before you decide to implement it.

Sunday, August 20, 2006

system reconfig, final entry

Sil680 Not Recognized by Fedora
OK. I'm tired. Going to make this quick. Sil680 ATA RAID0 stripe set not recognized automatically by Fedora. Tried reinstall of Fedora. Does not recognize the RAID0 set I created. ARGH.

Saved by MDADM
I had to dust off my very, very rusty Linux RAID creation skills and manually create a software RAID set. In short:
- fdisk to mark the drives as part of a raid set
- use mdadm to make the raid set active
- create a mdadm.conf for the array
- put it in /etc/fstab
- format the stripe set

Here is the most important snip of what I did:
[root@computer /]# mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/hdg1 /dev/hdh1
mdadm: array /dev/md0 started.
[root@computer /]# mdadm --detail --scan >> /etc/mdadm.conf
[root@computer /]# cat /etc/mdadm.conf
DEVICE /dev/hdg* /dev/hdh*
ARRAY /dev/md0 level=raid0 num-devices=2 UUID=9c4c078f:8935e3e4:bfface8f:6a3c2c18
devices=/dev/hdg1,/dev/hdh1

[root@computer RPMS]# cat /etc/fstab
# This file is edited by fstab-sync - see 'man fstab-sync' for details
LABEL=/ / ext3 defaults 1 1
LABEL=/boot /boot ext3 defaults 1 2
/dev/devpts /dev/pts devpts gid=5,mode=620 0 0
/dev/shm /dev/shm tmpfs defaults 0 0
/dev/proc /proc proc defaults 0 0
/dev/sys /sys sysfs defaults 0 0
/dev/hda5 swap swap defaults 0 0
/dev/fd0 /media/floppy auto pamconsole,exec,noauto,managed 0 0
/dev/hdc /media/cdrecorder auto pamconsole,exec,noauto,managed 0 0
/dev/md0 /mnt/videos ext2 defaults 1 1
[root@computer /]# mkfs.ext2 /dev/md0
mke2fs 1.37 (21-Mar-2005)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
29310976 inodes, 58609088 blocks
2930454 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=58720256
1789 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 32 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.


Congrats to me/Girlfriend Doesn't Care
Pretty good for a guy who didn't know mdadm before tonight. So yeah! RAID0 set works! Hoohah! Copied a movie to it and then tested it in Cinelerra. HOLY SMOKES! Getting 50fps on a 1280x720 HDV movie! Damn that's fast! I can't understand why my girlfriend doesn't care about this at 1am??!

Gotta crash. I think my work is done here.

Update, 9/11/07:
Here is a very nicely organized article on adding new hard drives to Fedora:
http://fedoranews.org/tchung/storage/

Saturday, August 19, 2006

system reconfig, #3

Cinelerra Installed..Working Another Matter
OK. So I am going to take a two hour break to watch Apollo 13. What better movie to watch to get you psyched to configure a RAID set! Before I am leaving, I have installed Core 4 successfully, got yum working after some RPM key stickiness, and downloaded all the dependencies for Cinelerra:

Dependencies Resolved
=============================================================================
Package Arch Version Repository Size
=============================================================================
Installing:
cinelerra i386 2.0-0.4.20051210.2.fc4 /root/Desktop/cinelerra-2.0-0.4.20051210.2.fc4.i386.rpm 23 M
Installing for dependencies:
OpenEXR i386 1.2.2-6.fc4 extras 404 k
a52dec i386 0.7.4-8.fc4.rf dries 50 k
faac i386 1.24-3.fc4.rf dries 75 k
faad2 i386 2.0-8.fc4.rf dries 382 k
ffmpeg i386 0.4.9-0.lvn.0.21.20051228.4 livna 1.6 M
fftw i386 3.1.1-1.fc4 extras 865 k
fltk i386 1.1.6-1.2.fc4.rf dries 1.0 M
gsm i386 1.0.10-5.2.fc4.rf dries 39 k
imlib2 i386 1.2.1-1.fc4 extras 562 k
lame i386 3.96.1-4.fc4.rf dries 394 k
libiec61883 i386 1.0.0-0.2.fc4 freshrpms 35 k
libquicktime i686 0.9.7-0.lvn.8.4 livna 355 k
libsndfile i386 1.0.15-1.fc4 extras 218 k
mjpegtools i686 1.8.0-1.2.fc4 freshrpms 763 k
Updating for dependencies:
cpp i386 4.0.2-8.fc4 updates-released 2.1 M
gcc i386 4.0.2-8.fc4 updates-released 2.8 M
gcc-c++ i386 4.0.2-8.fc4 updates-released 2.8 M
gcc-gfortran i386 4.0.2-8.fc4 updates-released 2.3 M
gcc-java i386 4.0.2-8.fc4 updates-released 2.3 M
libgcc i386 4.0.2-8.fc4 updates-released 60 k
libgcj i386 4.0.2-8.fc4 updates-released 7.6 M
libgcj-devel i386 4.0.2-8.fc4 updates-released 1.1 M
libgfortran i386 4.0.2-8.fc4 updates-released 152 k
libraw1394 i386 1.2.0-1.fc4 updates-released 37 k
libstdc++ i386 4.0.2-8.fc4 updates-released 307 k
libstdc++-devel i386 4.0.2-8.fc4 updates-released 9.0 M
libtool i386 1.5.16.multilib2-3 updates-released 656 k

Now that's alotta dependencies. And this doesn't even cover the development versions of the programs so that you could compile the CVS version of Cinelerra. Cinelerra now works! Untested however..

Note to self: create image of this pristine FC4 root filesystem BEFORE it gets gobbed up!

Now let's watch the movie!

Friday, March 10, 2006

optimal drive/partition setup for Cinelerra

To keep it simple, I would use five drives and do four logical partitions. This would allow for all filesystems to be on separate drive spindles:
- one system drive for /root, /usr, etc - ext3
- one storage drive for source files - ext3
- one working drive for index files in .bcast - ext2
- two working drives in RAID0 (stripe set) for destination render - ext2

If you only have four drives, just keep the source files on the system drive. I would think the index files would be the most used/most reads. I have to test this use "iostat -x " to monitor read/write stats of each physical device. I assume the render partition would be the heaviest hit, so make that your RAID stripe. You can do software RAID, but as a person who runs a decent sized HP web server farm, we've always depended on hardware RAID because CPU cycles are offloaded to the RAID card itself, rather than the operating system using CPU cycles for software RAID. Since rendering is pretty much all CPU, except for the input/index file reads and the destination render file writes, it makes sense that software RAID would tend to slow your rendering down.

Finally, ext2 is very important..no journaling necessary for the working drives..just storage.

When I used Adobe Premiere, I noticed that my ATI All In Wonder actually sped up the MPEG2 render times by 50%. However, I haven't been able to get the fglrx driver to work w/my dual monitor setup on FC4. Fglrx works with one monitor, but dammit, there's no way I'm going back to just one monitor! But I will need to test out whether or not the fglrx driver speeds rendering for Cinelerra as it does Premiere.