Saturday, February 06, 2010

lost root partition..oops.

I was running some disk performance statistics on the new Fedora 12 64-bit yesterday according to the very good benchmarking article on 3ware's site:

I was benchmarking the write performance of my RAID set when it seemed to stall out. The process I was running was writing a bunch of zeros to a 20 gigabyte file. I believe the stall was due to the fact that my RAID controller card's battery was disconnected; hence, write-cacheing was disabled.

I let the process try to finish for four hours. I figured it should have finished writing that 20GB file by that time. However, the fact that the system was still slow to non-responsive indicated that activity was still taking place. But, being an impetuous fool, I was anxious to get working on some video and also thought it might be an interesting test of the resilience of the ext4 filesystem if I just shut the system down. So I as a soft reboot did not do the trick, I hard powered the box off.

Sleeping the Sleep of the Dead
In retrospect, I should have let the box finish whatever it was doing, because as you may have guessed it, the box didn't come back up. Here was the first indication from the kernel messages:
Boot has failed..sleeping forever

And in the dmesg output:
can't mount root filesystem
can't access tty job control turned off

Woops. Dracut did find my volume group:
dracut: 2 logical voumes in "vg_ogre" now active

Something was wrong with the root filesystem mount:
mount: you must specify the filesystem type

Just in case, I rebooted with the following kernel parameters in grub to see more debugging and to drop me to an emergency shell to see if I could debug the problem:
kernel .. debug rdshell

What Up, ext4?
Oh boy. So, ext4 is not as resilient as I believed. I thought the best course of action would be to load up Fedora Live, and look at the disk stats. Since fdisk does not work with GPT partitions, I used parted and thought that I'd use e2fsck to fix any bad blocks. After booting the Live CD, here's what I found:

The swap drive seemed in tact (oh, great):
[liveuser@localhost ~]$ dmesg | grep vg
vgaarb: device added: PCI:0000:07:00.0,decodes=io+mem,owns=io+mem,locks=none
vgaarb: loaded
Adding 12369912k swap on /dev/mapper/vg_ogre-lv_swap. Priority:-1 extents:1 across:12369912k

I thought I'd try to manually mount my / partition. I had to become superuser in order to do this:
[liveuser@localhost ~]$ su
[root@localhost liveuser]# mkdir /mnt/root
[root@localhost liveuser]# mount -t ext4 /dev/mapper/vg_ogre-lv_root /mnt/root
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg_ogre-lv_root,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

Dmesg tells me what I already know:
[root@localhost liveuser]# dmesg | tail
[drm] nouveau 0000:07:00.0: 0x00409910: 0x3fbf3fdb
[drm] nouveau 0000:07:00.0: 0x00409e08: 0x0002dea8
[drm] nouveau 0000:07:00.0: 0x00409e0c: 0x00000000
[drm] nouveau 0000:07:00.0: 0x00409e24: 0x0a21026f
EXT4-fs (dm-2): VFS: Can't find ext4 filesystem

I just want to see what fdisk reads about my hardware RAID5 array (3ware 9650SE):
[root@localhost liveuser]# fdisk -l /dev/sda

WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.

WARNING: The size of this disk is 4.5 TB (4499967049728 bytes).
DOS partition table format can not be used on drives for volumes
larger than (2199023255040 bytes) for 512-byte sectors. Use parted(1) and GUID
partition table format (GPT).

Disk /dev/sda: 4500.0 GB, 4499967049728 bytes
255 heads, 63 sectors/track, 547089 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000f0844

Device Boot Start End Blocks Id System
/dev/sda1 1 267350 2147483647+ ee GPT

What does parted see about /dev/sda?
[root@localhost liveuser]# parted /dev/sda print
Model: AMCC 9650SE-4LP DISK (scsi)
Disk /dev/sda: 4500GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 17.9kB 210MB 210MB ext4 boot
2 210MB 4500GB 4500GB lvm

At least the partition is there. But it looks like parted does not have support for checking ext4 filesystems yet:
[root@localhost liveuser]# parted /dev/sda
GNU Parted 1.9.0
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) check 1
No Implementation: Support for opening ext4 file systems is not implemented yet.
(parted) check 2
Error: Could not detect file system.
(parted) quit

e2fsck bound!
Let me run e2fsck (which does have support for ext4 filesystems) and see if I can fix the problem:
[root@localhost liveuser]# e2fsck
Usage: e2fsck [-panyrcdfvtDFV] [-b superblock] [-B blocksize]
[-I inode_buffer_blocks] [-P process_inode_size]
[-l|-L bad_blocks_file] [-C fd] [-j external_journal]
[-E extended-options] device

Emergency help:
-p Automatic repair (no questions)
-n Make no changes to the filesystem
-y Assume "yes" to all questions
-c Check for bad blocks and add them to the badblock list
-f Force checking even if filesystem is marked clean
-v Be verbose
-b superblock Use alternative superblock
-B blocksize Force blocksize when looking for superblock
-j external_journal Set location of the external journal
-l bad_blocks_file Add to badblocks list
-L bad_blocks_file Set badblocks list

My skills at e2fsck are pretty basic. I use the -n option to make no changes while I review what e2fsck finds out about the array:
[root@localhost liveuser]# e2fsck -n /dev/mapper/vg_ogre-lv_root
e2fsck 1.41.9 (22-Aug-2009)
e2fsck: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).
Clear? no

e2fsck: Illegal inode number while checking ext3 journal for /dev/mapper/vg_ogre-lv_root

Invalid journal..oops.
[root@localhost liveuser]# e2fsck -v /dev/mapper/vg_ogre-lv_root
e2fsck 1.41.9 (22-Aug-2009)
e2fsck: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).

I had thought that ext4 gave us the safety of a journalled filesystem (like ext3) with increased performance. You would have thought it could have recovered from being shutdown while writing a bunch of zeros to a 20 gigabyte file.

And then of course, hundreds to thousands of these various errors:
Group descriptor 32923 checksum is invalid. FIXED.

Entry 'e61abf8156cc476151baa07d67337cae-le64.cache-3' in ??? (57347) has deleted/unused inode 212. Clear? yes

Unconnected directory inode 98305 (...)
Connect to /lost+found? yes

Free blocks count wrong for group #138 (32768, counted=557).
Fix? yes

Free inodes count wrong for group #308 (8192, counted=8186).
Fix? yes

Directories count wrong for group #308 (0, counted=6).
Fix? yes the bottom of the list of errors:
Recreate journal? yes

Creating journal (32768 blocks): yyyyyyy Done.

*** journal has been re-created - filesystem is now ext3 again ***

/dev/mapper/vg_ogre-lv_root: ***** FILE SYSTEM WAS MODIFIED *****

327475 inodes used (0.12%)
585 non-contiguous files (0.2%)
130 non-contiguous directories (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 310327/414/1
239381919 blocks used (21.85%)
0 bad blocks
42 large files

283167 regular files
27385 directories
0 character device files
0 block device files
0 fifos
3953 links
16849 symbolic links (16659 fast symbolic links)
63 sockets
331417 files
[root@localhost liveuser

So let's see if I have files in tact after that 18 hour experience..
[root@localhost liveuser]# mount -t ext4 /dev/mapper/vg_ogre-lv_root /mnt/root/
[root@localhost liveuser]# ls /mnt/root
[root@localhost liveuser]# ls /mnt/root
[root@localhost liveuser]# ls /mnt/root/lost+found/

*348489 *723483 324843 238390

Ah..that would be a "no." Time to reinstall F12. Ugh. Lesson learned. But I need to know why I couldn't recover a journal. Maybe I did not look in the right place. I need to understand journalling better.

Things I Learned Along the Way
Some boot info from the Live CD
[root@localhost liveuser]# grep EFI_ /boot/config-

You can force a filesystem check upon the next reboot with this command:
shutdown -rF now

You can run verbose debug messages and drop to an emergency shell by placing these commands on the kernel line in grub:
kernel /vmlinuz- ro root=/dev/mapper/vg_ogre-lv_root debug rdshell


No comments: