I was running some disk performance statistics on the new Fedora 12 64-bit yesterday according to the very good benchmarking article on 3ware's site:
Benchmarking
I was benchmarking the write performance of my RAID set when it seemed to stall out. The process I was running was writing a bunch of zeros to a 20 gigabyte file. I believe the stall was due to the fact that my RAID controller card's battery was disconnected; hence, write-cacheing was disabled.
I let the process try to finish for four hours. I figured it should have finished writing that 20GB file by that time. However, the fact that the system was still slow to non-responsive indicated that activity was still taking place. But, being an impetuous fool, I was anxious to get working on some video and also thought it might be an interesting test of the resilience of the ext4 filesystem if I just shut the system down. So I as a soft reboot did not do the trick, I hard powered the box off.
I let the process try to finish for four hours. I figured it should have finished writing that 20GB file by that time. However, the fact that the system was still slow to non-responsive indicated that activity was still taking place. But, being an impetuous fool, I was anxious to get working on some video and also thought it might be an interesting test of the resilience of the ext4 filesystem if I just shut the system down. So I as a soft reboot did not do the trick, I hard powered the box off.
Sleeping the Sleep of the Dead
In retrospect, I should have let the box finish whatever it was doing, because as you may have guessed it, the box didn't come back up. Here was the first indication from the kernel messages:
Boot has failed..sleeping forever
And in the dmesg output:
can't mount root filesystem
can't access tty job control turned off
Woops. Dracut did find my volume group:
dracut: 2 logical voumes in "vg_ogre" now active
Something was wrong with the root filesystem mount:
mount: you must specify the filesystem type
Just in case, I rebooted with the following kernel parameters in grub to see more debugging and to drop me to an emergency shell to see if I could debug the problem:
kernel .. debug rdshell
What Up, ext4?
Oh boy. So, ext4 is not as resilient as I believed. I thought the best course of action would be to load up Fedora Live, and look at the disk stats. Since fdisk does not work with GPT partitions, I used parted and thought that I'd use e2fsck to fix any bad blocks. After booting the Live CD, here's what I found:
The swap drive seemed in tact (oh, great):
[liveuser@localhost ~]$ dmesg | grep vg
vgaarb: device added: PCI:0000:07:00.0,decodes=io+mem,owns=io+mem,locks=none
vgaarb: loaded
Adding 12369912k swap on /dev/mapper/vg_ogre-lv_swap. Priority:-1 extents:1 across:12369912k
vgaarb: device added: PCI:0000:07:00.0,decodes=io+mem,owns=io+mem,locks=none
vgaarb: loaded
Adding 12369912k swap on /dev/mapper/vg_ogre-lv_swap. Priority:-1 extents:1 across:12369912k
I thought I'd try to manually mount my / partition. I had to become superuser in order to do this:
[liveuser@localhost ~]$ su
[root@localhost liveuser]# mkdir /mnt/root
[root@localhost liveuser]# mount -t ext4 /dev/mapper/vg_ogre-lv_root /mnt/root
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg_ogre-lv_root,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
[root@localhost liveuser]# mkdir /mnt/root
[root@localhost liveuser]# mount -t ext4 /dev/mapper/vg_ogre-lv_root /mnt/root
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg_ogre-lv_root,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
Dmesg tells me what I already know:
[root@localhost liveuser]# dmesg | tail
[drm] nouveau 0000:07:00.0: 0x00409910: 0x3fbf3fdb
[drm] nouveau 0000:07:00.0: 0x00409e08: 0x0002dea8
[drm] nouveau 0000:07:00.0: 0x00409e0c: 0x00000000
[drm] nouveau 0000:07:00.0: 0x00409e24: 0x0a21026f
EXT4-fs (dm-2): VFS: Can't find ext4 filesystem
I just want to see what fdisk reads about my hardware RAID5 array (3ware 9650SE):[root@localhost liveuser]# dmesg | tail
[drm] nouveau 0000:07:00.0: 0x00409910: 0x3fbf3fdb
[drm] nouveau 0000:07:00.0: 0x00409e08: 0x0002dea8
[drm] nouveau 0000:07:00.0: 0x00409e0c: 0x00000000
[drm] nouveau 0000:07:00.0: 0x00409e24: 0x0a21026f
EXT4-fs (dm-2): VFS: Can't find ext4 filesystem
[root@localhost liveuser]# fdisk -l /dev/sda
WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.
WARNING: The size of this disk is 4.5 TB (4499967049728 bytes).
DOS partition table format can not be used on drives for volumes
larger than (2199023255040 bytes) for 512-byte sectors. Use parted(1) and GUID
partition table format (GPT).
Disk /dev/sda: 4500.0 GB, 4499967049728 bytes
255 heads, 63 sectors/track, 547089 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000f0844
Device Boot Start End Blocks Id System
/dev/sda1 1 267350 2147483647+ ee GPT
WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.
WARNING: The size of this disk is 4.5 TB (4499967049728 bytes).
DOS partition table format can not be used on drives for volumes
larger than (2199023255040 bytes) for 512-byte sectors. Use parted(1) and GUID
partition table format (GPT).
Disk /dev/sda: 4500.0 GB, 4499967049728 bytes
255 heads, 63 sectors/track, 547089 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000f0844
Device Boot Start End Blocks Id System
/dev/sda1 1 267350 2147483647+ ee GPT
What does parted see about /dev/sda?
[root@localhost liveuser]# parted /dev/sda print
Model: AMCC 9650SE-4LP DISK (scsi)
Disk /dev/sda: 4500GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 17.9kB 210MB 210MB ext4 boot
2 210MB 4500GB 4500GB lvm
Model: AMCC 9650SE-4LP DISK (scsi)
Disk /dev/sda: 4500GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 17.9kB 210MB 210MB ext4 boot
2 210MB 4500GB 4500GB lvm
At least the partition is there. But it looks like parted does not have support for checking ext4 filesystems yet:
[root@localhost liveuser]# parted /dev/sda
GNU Parted 1.9.0
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) check 1
No Implementation: Support for opening ext4 file systems is not implemented yet.
(parted) check 2
Error: Could not detect file system.
(parted) quit
e2fsck bound!
[root@localhost liveuser]# parted /dev/sda
GNU Parted 1.9.0
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) check 1
No Implementation: Support for opening ext4 file systems is not implemented yet.
(parted) check 2
Error: Could not detect file system.
(parted) quit
e2fsck bound!
Let me run e2fsck (which does have support for ext4 filesystems) and see if I can fix the problem:
[root@localhost liveuser]# e2fsck
Usage: e2fsck [-panyrcdfvtDFV] [-b superblock] [-B blocksize]
[-I inode_buffer_blocks] [-P process_inode_size]
[-l|-L bad_blocks_file] [-C fd] [-j external_journal]
[-E extended-options] device
Emergency help:
-p Automatic repair (no questions)
-n Make no changes to the filesystem
-y Assume "yes" to all questions
-c Check for bad blocks and add them to the badblock list
-f Force checking even if filesystem is marked clean
-v Be verbose
-b superblock Use alternative superblock
-B blocksize Force blocksize when looking for superblock
-j external_journal Set location of the external journal
-l bad_blocks_file Add to badblocks list
-L bad_blocks_file Set badblocks list
Usage: e2fsck [-panyrcdfvtDFV] [-b superblock] [-B blocksize]
[-I inode_buffer_blocks] [-P process_inode_size]
[-l|-L bad_blocks_file] [-C fd] [-j external_journal]
[-E extended-options] device
Emergency help:
-p Automatic repair (no questions)
-n Make no changes to the filesystem
-y Assume "yes" to all questions
-c Check for bad blocks and add them to the badblock list
-f Force checking even if filesystem is marked clean
-v Be verbose
-b superblock Use alternative superblock
-B blocksize Force blocksize when looking for superblock
-j external_journal Set location of the external journal
-l bad_blocks_file Add to badblocks list
-L bad_blocks_file Set badblocks list
My skills at e2fsck are pretty basic. I use the -n option to make no changes while I review what e2fsck finds out about the array:
[root@localhost liveuser]# e2fsck -n /dev/mapper/vg_ogre-lv_root
e2fsck 1.41.9 (22-Aug-2009)
e2fsck: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).
Clear? no
e2fsck: Illegal inode number while checking ext3 journal for /dev/mapper/vg_ogre-lv_root
e2fsck 1.41.9 (22-Aug-2009)
e2fsck: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).
Clear? no
e2fsck: Illegal inode number while checking ext3 journal for /dev/mapper/vg_ogre-lv_root
Invalid journal..oops.
[root@localhost liveuser]# e2fsck -v /dev/mapper/vg_ogre-lv_root
e2fsck 1.41.9 (22-Aug-2009)
e2fsck: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).
Clear?y
I had thought that ext4 gave us the safety of a journalled filesystem (like ext3) with increased performance. You would have thought it could have recovered from being shutdown while writing a bunch of zeros to a 20 gigabyte file.
And then of course, hundreds to thousands of these various errors:
e2fsck 1.41.9 (22-Aug-2009)
e2fsck: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).
Clear
I had thought that ext4 gave us the safety of a journalled filesystem (like ext3) with increased performance. You would have thought it could have recovered from being shutdown while writing a bunch of zeros to a 20 gigabyte file.
And then of course, hundreds to thousands of these various errors:
Group descriptor 32923 checksum is invalid. FIXED.
Entry 'e61abf8156cc476151baa07d67337cae-le64.cache-3' in ??? (57347) has deleted/unused inode 212. Clear? yes
Unconnected directory inode 98305 (...)
Connect to /lost+found? yes
Free blocks count wrong for group #138 (32768, counted=557).
Fix? yes
Free inodes count wrong for group #308 (8192, counted=8186).
Fix? yes
Directories count wrong for group #308 (0, counted=6).
Fix? yes
Finally..at the bottom of the list of errors:
Recreate journal? yes
Creating journal (32768 blocks): yyyyyyy Done.
*** journal has been re-created - filesystem is now ext3 again ***
/dev/mapper/vg_ogre-lv_root: ***** FILE SYSTEM WAS MODIFIED *****
327475 inodes used (0.12%)
585 non-contiguous files (0.2%)
130 non-contiguous directories (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 310327/414/1
239381919 blocks used (21.85%)
0 bad blocks
42 large files
283167 regular files
27385 directories
0 character device files
0 block device files
0 fifos
3953 links
16849 symbolic links (16659 fast symbolic links)
63 sockets
--------
331417 files
[root@localhost liveuser
Result?
So let's see if I have files in tact after that 18 hour experience..
[root@localhost liveuser]# mount -t ext4 /dev/mapper/vg_ogre-lv_root /mnt/root/
[root@localhost liveuser]# ls /mnt/root
lost+found
[root@localhost liveuser]# ls /mnt/root
lost+found
[root@localhost liveuser]# ls /mnt/root/lost+found/
*348489 *723483 324843 238390
Ah..that would be a "no." Time to reinstall F12. Ugh. Lesson learned. But I need to know why I couldn't recover a journal. Maybe I did not look in the right place. I need to understand journalling better.
Things I Learned Along the Way
Entry 'e61abf8156cc476151baa07d67337cae-le64.cache-3' in ??? (57347) has deleted/unused inode 212. Clear
Unconnected directory inode 98305 (...)
Connect to /lost+found
Free blocks count wrong for group #138 (32768, counted=557).
Fix
Free inodes count wrong for group #308 (8192, counted=8186).
Fix
Directories count wrong for group #308 (0, counted=6).
Fix? yes
Finally..at the bottom of the list of errors:
Creating journal (32768 blocks): yyyyyyy Done.
*** journal has been re-created - filesystem is now ext3 again ***
/dev/mapper/vg_ogre-lv_root: ***** FILE SYSTEM WAS MODIFIED *****
327475 inodes used (0.12%)
585 non-contiguous files (0.2%)
130 non-contiguous directories (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 310327/414/1
239381919 blocks used (21.85%)
0 bad blocks
42 large files
283167 regular files
27385 directories
0 character device files
0 block device files
0 fifos
3953 links
16849 symbolic links (16659 fast symbolic links)
63 sockets
--------
331417 files
[root@localhost liveuser
Result?
So let's see if I have files in tact after that 18 hour experience..
[root@localhost liveuser]# mount -t ext4 /dev/mapper/vg_ogre-lv_root /mnt/root/
[root@localhost liveuser]# ls /mnt/root
lost+found
[root@localhost liveuser]# ls /mnt/root
lost+found
[root@localhost liveuser]# ls /mnt/root/lost+found/
*348489 *723483 324843 238390
Things I Learned Along the Way
Some boot info from the Live CD
[root@localhost liveuser]# grep EFI_ /boot/config-2.6.31.5-127.fc12.i686
CONFIG_EFI_VARS=y
CONFIG_EFI_PARTITION=y
CONFIG_EFI_VARS=y
CONFIG_EFI_PARTITION=y
shutdown -rF now
kernel /vmlinuz-2.6.31.12-174.2.3.fc12.x86_64 ro root=/dev/mapper/vg_ogre-lv_root debug rdshell
No comments:
Post a Comment