Friday, September 15, 2006

rebuilding the workstation: lost my damn RAID set!

I *almost* had a minor tragedy last night while rebuilding my Fedora Core 4 box. The *almost* tragedy occurred while reinstalling Core 4. While going through the process, I told the installer to use my existing RAID set, but do not reformat it. I figured what harm could come to the drive if it doesn't get formatted? Well, a lot, apparently. Since I was rushing through the install, I neglected to take proper care of the 40GB of MPEG2 files on the RAID set. Here's what happened.

After the FC installer finished, it asks to reboot the box, so I rebooted. On bootup, the system gave me errors regarding an unrecognized filesystem and dropped me to a filesystem shell to fix what was wrong. I didn't know what was wrong, so I rebooted into a linux rescue disk and simply took the RAID filesystem out of /etc/fstab and rebooted.

On the second bootup, I started fdisk to look at the drives. To my dismay, fdisk did not recognize the RAID set partitions. Agh! I did some preliminary research online. I came to the determination that I figured I had lost my data, though I did have it backed up. But it would be a pain to retrieve the 40GB or so of videos from my backup system. Damn it. So I bit the bullet and created new partitions of type "fd", a Linux RAID autodetect partition as I had done when I setup the RAID set initially. When I wrote the partition table, fdisk gave me some error saying that the drives will resync after the box reboots. I rebooted and looked at the output of "fdisk -l". Things seemed alright, as the drives were recognized as Linux RAID autodetect. So I then reenabled /dev/md0 in /etc/fstab and made sure that /etc/mdadm.conf was correct. Dejectedly I rebooted.

When the system started, it dropped me into a prompt complaining of filesystem problems again. Now what? This time, I looked at the man pages for fsck and figured out the commands I needed to repair the disk. They ran something like this:
fsck -t ext2 /dev/md0 -V -r
-t filesystem type
-V verbose
-r prompt for each repair

A couple of inodes were missing or corrupt. OK. Fsck seemed to continue on with the different stages of the five stage check procedure. About five or ten minutes of this, I was getting worried. Happily, it finished the checks and dropped me to a prompt in order to exit and reboot. OK..that's progress! Also, the box came up clean with no errors. Sweet. I now went to view the filesystem, and to my shock and surprise, my original files were there! Awesome!! But now the real test is reading from and writing to files. I first viewed one of the videos in mplayer. This worked! I then performed an extensive write test. Cinelerra needs table of contents files for each video. These are index file, essentially. So at a prompt, I generated a bunch of toc files for the 30GB or so of video I had by using this one command:
for i in `ls -1 *.m2t` ; do echo $i ; mpeg3toc $i $i.toc ; done

The file creation took about twenty minutes but worked! I then loaded the files into Cinelerra, started editing and wouldn't you know they are good to go! Hooray! But I'm still an idiot.

Moral of the story is make sure you do your research on RAID before you decide to implement it.

No comments: