The Debian Woody/Sid 2.4 Kernel RAID 1 DevFS ReiserFS HOWTO
Or, #include nifty-feature-set.h
James Bromberger, September 3rd, 2001
This document should help you get your Hardware running Debian GNU/Linux on Software RAID 1. This was born out of two week's worth of frustration and lack of documentation on this specific combination. I will write this as a step-by-step guide, and point out issues that arise and reasons as we go.
I recommend that you read and re-read the Software RAID HOWTO and Boot + Root + RAID + LILO HOWTO before going any further. I found that these two documents explained enough to get me started. However, using them alone I repeatedly got kernel panics as I tried to move to the new RAID1 root filesystem right after the initrd image ran (actually, while it is trying to mount the root is where the problem was for me, see ROOT= below).
My aim was to use standard packages, with no recompliation, and having all packages upgradeable by standard (apt) means.
Hardware Requirements
I am going to discuss booting on i386 hardware, because that is what I used. Some of that I discuss may be relevant to other architectures, but I don't know.
I was using a 1 GHz PIII machine, with two 80 GB IDE Hard Disk drives, on a Soltek SL-65KV2 motherboard, with 512 MB of RAM, and a floppy drive, in a reasonable case, plus an Intel Etherpro 100 NIC, and a snazzy Adaptec 29160 SCSI card. You don't need the SCSI card, but a network card for which there is already a Linux driver for is of use. The important bit is that I had two identical Hard Drives.
Getting Started
Boot Disks
Create a set of boot disks: the files you will need are available on your closest Debian mirror. At this point in time, you need the Rescue Disk (boot.bin), Root File System (root.bin), and the four (4) driver disks. All these images are located in $debian/dists/testing/disks-i386/. Refer to the documentation on using "dd" or "rawrite2" for putting these onto your media.
Hardware Assembly
Everything plugs together like normal, except your Hard Drives are attached to each of your IDE Controllers. Most mother boards support up to four (4) Hard Drives: two on the first controller, and two on the second. Each controller supports two drives: one as a master, and one as a slave. There are generally "jumpers" on hard drives to set them as master or slave: both the drives you have should be set to master. Since they are equivalent, it doesn't matter which one is plugged into the 1st IDE controller, and which is on the second.
Initial Install
To get started, you need to get a base system set up on one disk. Our plan is to have everything installed on one partition, then create our RAID devices, boot on to it, and then create our other partitions and migrate to it as needed. We aim to end up with one partition for /boot and the root filesystem, and others as needed. We can do the whole thing on one RAID partition, but thats your choice. We will also use DevFS, and ReiserFS.
Boot from the rescue disk, then the root, then follow the installation instructions. When you get prompted to partition, think about how bug your disk is, and how you would like it carved up in the long run. You can change this later, but starting out with a plan can be simpler for you.
For example, I had 80 GB to play with (two 80 GB drives under RAID 1 gives 80 GB of space at the end). I chose to have 4 GB for my root partition, and divide the rest up later on for /home, /usr, /var, and /usr/local (plus a little bit for swap).
Don't worry about running dselect or installing any "task-" packages. We just want a simple system set up on vanilla ext2 filesystem. We should end up with a self booting system on ext2, running whatever standard kernel was chosen.
Step up to RAID Capability
Now we need to prepare for running a RAID setup. Our packages need an update. Use apt, because it rocks, and install the following:
- DevFSd
- kernel-image-2.4.x (whatever suits you)
- reiserfsprogs
- raidtools2
- less
- screen
- vim
- ...Anything else you need and can't live without for the next 10 minutes
Edit /etc/modules and add the following modules:
- reiserfs
- md
- raid1
- ext2
- ide-disk
- ide-probe-mod
- ide-mod
Edit /etc/mkinitrd/modules, and add the same modules to this list. Your initrd image needs to be able to read and write to your RAID array, before your filesystem is mounted. Initrd is the trick here. You probably also want to see if you need to edit /etc/mkinitrd/mkinitrd.cfg and set the variable ROOT=probe to be ROOT=/dev/md0, or possibly, if using DevFS, ROOT=/dev/md/0.
Regenerate your initrd image for your new kernel with mkinitrd -o /tmp/initrd-new /lib/modules/2.4.x-... . If all is good, move this to /boot/initrd-2.4.x-... and edit your /etc/lilo.conf to add initrd=/boot/initrd against the "Linux" kernel entry. Run lilo, and you should see an asterisk next to the boot image "Linux".
Reboot into your new kernel
Create the RAID partitions
You now have a system that can use RAID on root. Fire up cfdisk, and partition your disk that will be one half of your RAID 1 array. Make sure you set the partition type to fd on all of them, and the first one should be marked as boot-able.
Create your raidtab. Copy /usr/share/doc/raidtools2/examples/raid1.conf or similar, and define your RAID devices. /dev/md0 will be our first one; you can leave the rest for the moment. Define it as having two devices, one being /dev/hdc1, and /dev/hda1, in that order. Be sure to mark /dev/hda1 as a failed-disk! You are still using this disk, and don't want it trampled on.
Now make the md0 device: mkraid /dev/md0. Now format the device: mkfs.reiserfs /dev/md0. Mount it somewhere: mount /dev/md0 /mnt. Copy your current system to it: cd /; find . -xdev | cpio -p /mnt. Edit the new systems /etc/fstab, which will currently be located at /mnt/etc/fstab: change the root partition to /dev/md0, and the partition type to reiserfs.
Edit your lilo.conf (on both systems) and update the definition for your Linux mount with: append="md=0,/dev/hdc1,/dev/hda1", and root=/dev/md0. Run lilo again to update it.
Reboot now onto your RAID 1 on reiserfs on root partition. Check that it worked with df.
Partition the rest of hdc now if you hadn't already. You may want swap on RAID 1, but you'll be far better off purchasing a bit of extra memory in its place. Thats up to you.
Now time to bring hda back into the fold. You can duplicate your hdc partition table to hda using sfdisk -d /dev/hdc | sfdisk /dev/hda.
You can bring hda online now by editing your /etc/raidtab and replacing the failed-disk against hda1 with raid-disk, and then running raidhotadd /dev/md0 /dev/hda1. The drive should start to sync up: you can watch /proc/mdstat for details on how this is going. You can continue while it does this.
Adding the other RAID partitions
So, now you can move other partitions to RAID 1. You don't need to muck around with failed-disk directives any more. You already have the partitions you need, so all you need to do is edit /etc/raidtab and define them, then mkraid /dev/mdi, format it as above, mount it somewhere, and copy your current filesystem tree to it. The a quick way of activating it is, using /usr as an example: mv /usr /usr2; mkdir /usr; mount /dev/mdi /usr, and if all has gone well, then remove /usr2. Later, rinse, repeat.
Features
You have a modern filesystem. Now, I recommend that you use a filesystem that is in the standard kernel tree, and available as a package. You entire system can be updated to newer kernels automatically: your new initrds will be generated with the required modules and ROOT definition. Just apt-get your new kernel, and all *should* be well. However, I make no promises.
Other Notes
You may wish to have your Hard Disks set up with hdparm to use multi-byte transfers. Install the hdparm package, and read the man page. I edited /etc/init.d/raid2, and against the "start" option, added hdparm -m 16 /dev/hda; hdparm -m 16 /dev/hdc: your options to -m may vary: read the manual page.
I ran into problems using a boot parmater of "devfs=mount". I'm not sure, but DevFS may be moving the traditional /dev/md0 to /dev/md/0 too early, and since DevFS daemon is not running at this point, there is no symbolic link to follow. Hence, devfs=mount may require that your kernel initrd image have ROOT=/dev/md/0 in place of ROOT=/dev/md0. This should have been fixed from version 0.1.12 of the initrd-tools package.
phobe:~> bonnie -s 1000 Writing with putc()...done Writing intelligently...done Rewriting...done Reading with getc()...done Reading intelligently...done start 'em...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.02 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP phobe 1000M 10898 90 29458 63 9559 13 10947 77 31891 14 317.4 1 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 9265 70 +++++ +++ 15273 99 12506 99 +++++ +++ 12694 99 phobe,1000M,10898,90,29458,63,9559,13,10947,77,31891,14,317.4,1,16,9265,70,+++++,+++,15273,99,12506,99,+++++,+++,12694,99
phobe:~> df -k Filesystem 1k-blocks Used Available Use% Mounted on /dev/md0 4003520 91092 3912428 3% / /dev/md1 4003584 304068 3699516 8% /usr /dev/md2 4003584 232564 3771020 6% /var /dev/md3 4003520 1071384 2932136 27% /home /dev/md5 62035860 11357460 50678400 19% /usr/local
phobe:~> cat /proc/mdstat Personalities : [raid1] read_ahead 1024 sectors md5 : active raid1 ide/host0/bus1/target0/lun0/part7[1] ide/host0/bus0/target0/lun0/part7[0] 62037760 blocks [2/2] [UU] md4 : active raid1 ide/host0/bus1/target0/lun0/part6[1] ide/host0/bus0/target0/lun0/part6[0] 97664 blocks [2/2] [UU] md3 : active raid1 ide/host0/bus1/target0/lun0/part5[1] ide/host0/bus0/target0/lun0/part5[0] 4003648 blocks [2/2] [UU] md2 : active raid1 ide/host0/bus1/target0/lun0/part3[1] ide/host0/bus0/target0/lun0/part3[0] 4003712 blocks [2/2] [UU] md1 : active raid1 ide/host0/bus1/target0/lun0/part2[1] ide/host0/bus0/target0/lun0/part2[0] 4003712 blocks [2/2] [UU] md0 : active raid1 ide/host0/bus0/target0/lun0/part1[1] ide/host0/bus1/target0/lun0/part1[0] 4003648 blocks [2/2] [UU] unused devices: <none>
My Raidtab
Remember, you need to have this in /etc/raid/raidtab, and a symlink from /etc/raidtab.
raiddev /dev/md/0 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 chunk-size 4 device /dev/hdc1 raid-disk 0 device /dev/hda1 #failed-disk 1 raid-disk 1 #/usr raiddev /dev/md1 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 chunk-size 4 device /dev/hda2 raid-disk 0 device /dev/hdc2 raid-disk 1 #/var raiddev /dev/md2 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 chunk-size 4 device /dev/hda3 raid-disk 0 device /dev/hdc3 raid-disk 1 #/home raiddev /dev/md3 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 chunk-size 4 device /dev/hda5 raid-disk 0 device /dev/hdc5 raid-disk 1 # swap raiddev /dev/md4 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 chunk-size 4 device /dev/hda6 raid-disk 0 device /dev/hdc6 raid-disk 1 # usr/local raiddev /dev/md5 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 chunk-size 4 device /dev/hda7 raid-disk 0 device /dev/hdc7 raid-disk 1
My lilo.conf
# Support LBA for large hard disks. lba32 # Specifies the boot device. This is where Lilo installs its boot # block. It can be either a partition, or the raw device, in which # case it installs in the MBR, and will overwrite the current MBR. boot=/dev/md0 raid-extra-boot="/dev/hda,/dev/hdc" # Specifies the device that should be mounted as root. (`/') root=/dev/md0 # Installs the specified file as the new boot sector install=/boot/boot.b # Specifies the location of the map file map=/boot/map # Specifies the number of deciseconds (0.1 seconds) LILO should # wait before booting the first image. delay=20 vga=normal default=LinuxNoDevFS image=/vmlinuz label=Linux root=/dev/md0 append="md=0,/dev/hda1,/dev/hdc1 devfs=mount" read-only initrd=/initrd.img image=/vmlinuz label=LinuxNoDevFS root=/dev/md0 append="md=0,/dev/hda1,/dev/hdc1" read-only initrd=/initrd.img image=/vmlinuz.old label=LinuxOLD root=/dev/md0 append="md=0,/dev/hda1,/dev/hdc1 devfs=mount" read-only initrd=/initrd.img.old image=/vmlinuz.old label=LinuxOLDNoDevFS read-only append="md=0,/dev/hda1,/dev/hdc1" optional root=/dev/hda1 initrd=/initrd.img.old
In an emergency, break glass
So what to do if you can't get your root RAID1 filesystem to boot? Here is a straightforward way to get to your md0:
- Find the 2.4 kernel install media from $DEBIAN/dists/unstable/main/disks-i386, and download the bf2.4 set of disks. Actually, you only need the rescue and root images
- Find the corresponding kernel-image-2.4.18-bf2.4_2.4.18-4_i386.deb or similar; and unpack this somewhere with dpkg-deb -x kernel-image-2.4.yy-bf45.deb temp/
- In the temp directory, find the md.o and raid1.o modules. Copy them to a new floppy in /floppy/boot.
- Copy /sbin/raid* to the root of the floppy disk (/floppy). You'll notice that all the raid programs are symlinks to the same binary; doesn't matter, since you probably have a vfat disk that doesnt know about symlinks. Just make multiple copies. (Or be smart here and use an ext2 disk).
- Now is a good time to also copy your RAIDTAB to the root of the floppy.
- Boot with the resuce, then with the root disk
- After choosing a language and keyboard from the installer, choose to preload some modules. Grab that third disk you just put those modules and binaries on, and put it in the floppy drive.
- Load up md.o first, and then raid1.o.
- Press Alt-F2 to get a text console.
- mount /floppy
- cp /etc/raid* /sbin # (Ie: copy to the ramfs /sbin)
- mkdir /etc/raid && cp /floppy/raidtab /etc/raid && ln -s /etc/raid/raidtab /etc/raidtab
- raidstart /dev/md0
- mount -t reiserfs /dev/md0 /target
- relax
Still ticking over: May 2003
Just to make sure everyone is convinced this works: the system I installed this on is still working fine today, 8th May 2003 (touch wood). Over this time I have had only one failed disk, and many peacful nights of sleep knowing that one disk failing at any time isnt a huge job.