Saturday, June 21, 2008

Where's GRUB?! Ubuntu 8.04 LTS Server & RAID1

Where to begin? (I guess check out my previous post?)

First, my release to release upgrade to Ubuntu 8.04 LTS server was going along fairly well. The upgrade from Edgy (6.10) to Feisty (7.04) went fairly well, except that it dropped one of my drives from the RAID arrays—easily remedied. Just added the partitions back into their respective RAID devices.

Next up, time to move from Feisty (7.04) to Gutsy (7.10) and if all went well, the final move to Hardy Heron (8.04 LTS).

All did not go well in the upgrade from 7.04 to 7.10—although, I must admit here that most of it was my fault. This time I did the upgrade the "official" way.

Network upgrade for Ubuntu servers (recommended)

If you run an Ubuntu server, you should use the new server upgrade system.
  1. enable the "dapper-updates" repository
  2. install the new "update-manager-core" package - dependencies include python-apt, python-gnupginterface and python2.4-apt.
  3. run "sudo do-release-upgrade" in a terminal window
  4. follow the steps on the terminal window
This approach seemed to work just fine, and since my box is headless I even ran it over SSH without any issue (even with the warning that doing the upgrade over SSH is probably not ideal).

When I rebooted however the box had no network connectivity. ifconfig revealed only the lo interface. eth0 was gone. sudo lshw showed that the NIC was disabled for some reason. I finally tracked the problem down to /etc/udev/rules.d/70-persistent-net.rules that had "updated" eth0 to eth1 for whatever bizarre reason. I simply change eth0 to eth1 in /etc/network/interfaces and ran sudo /etc/init.d/network restart and all was well again.

Next (again), one of my drives was missing from the RAID array devices. It should be a simple matter of adding them back in via the Webmin Linux RAID module. However, I wasn't paying enough attention and it appears that I tried to add a partition already in use to it's own array (why Webmin would even list the partition to add when it is already in use is questionable). It is possible that I am totally confused on this point but when I did a cat /proc/mdstat it showed the array "rebuilding" so slowly that I was sure something was definitely wrong.

Here my brilliance really shines through. Since the other partitions were delaying sync until the first one finished, I thought I would stave off as much damage as I could by shutting down the box. I can't recall if I tried this via telinit 0 or if I simply powered off the box in my haste. At any rate, I really wreaked havoc on my /home & /data RAID5 partitions. / on md0 (RAID1) was fine. The important partitions did not fair so well. reiserfsck --rebuild-tree did it's best to salvage the carnage but a lot of damage was done. I quickly determined that a restore of /home & /data from my external backup drive would be necessary. [big sad sigh]

Well, if I was going to go through that grief I figured I might as just rebuild the whole box with Hardy Heron from a fresh CD install using the ext3 file system instead of reiserfs since any further development of reiserfs is almost certainly at an end.

And thus it began.

Installing from the 8.04 LTS is really quick and fairly painless, aside from manually setting up the RAID partitions--even that goes pretty fast though (once you've got through it about a dozen times). This is where most of my troubles began. I would set up the RAID arrays & partitions during the install but it would go crazy. The arrays would start rebuilding before the process was completed. RAID devices would show up that weren't even added during the partitioning process. On & on the troubles went.

I can't tell you how many times I tried getting things to work and how many different approaches I took to the problem. I will save you the gory details. The fix is rather arcane and it took forever to figure out. Google was not my friend on this matter. Am I the only one to have these issues? Lucky me...

Here is the problem: Even though I would completely delete the partitions and format them with ext3 instead of reiserfs it didn't fix anything. What was happening is that the install program was seeing the old RAID superblocks from my original setup and using them to rebuild arrays during the install process. This had dreadful effects. I had to get rid of those old RAID superblocks and start fresh. Enter Knoppix.

Initially, I thought I would completely wipe the three drives with the following from Knoppix CLI command:

dd if=/dev/zero of=/dev/hda (hde & hdg)

As each drive is 320GB, this would have taken FOR-EV-ER. Forget it. Next please...

Note: If you ever need to use the following procedure, don't delete your partitions before running this command (because you won't be able to).

You can probably do the same thing from the install CD by exiting to a shell, but Knoppix booted to init 2 was fine for my purposes...

If you are doing this from the install CD use the following first:
make sure the RAID devices are not mounted (i.e. umount /dev/md0 etc.)
sudo mdadm –stop /dev/md0 (repeat until all RAID arrays are stopped, i.e. md1, md2, etc.)

Using either Knoppix or install CD, kill the RAID super-blocks:
mdadm –-misc –-zero-superblock /dev/hda1 (or sda1 if the distro installed shows your IDE drives as SCSI.)

Repeat for each RAID partitions on each of the drives! For example:

mdadm –-misc –-zero-superblock /dev/sda1
mdadm –-misc –-zero-superblock /dev/sda2
mdadm –-misc –-zero-superblock /dev/sdb1
mdadm –-misc –-zero-superblock /dev/sdb2

You get the idea...

OK. The partitioning problem is solved. Your back to installing via CD and partitioning is working just as you want it to. The rest of the installation process runs smooth as glass.

Enter problem two...

Upon completion of the installation , the system WILL NOT BOOT?!?!

And when I say it won't boot, I mean not at all. Grub doesn't even try to load. I was stuck at "Booting CD" and it just hung there!

Unbelievable. I tried reinstalling Grub from the install CD in Rescue Mode to no avail. I tried nuking the MBR on each drive via dd if=/dev/zero of=/dev/hda bs=512 count=1 (same command for the other two drives, hde & hdg) and then attempted to install Grub again... Nothing I did mattered. It would not boot!

At this point I gave up on installing Ubuntu 8.04 LTS Hardy Heron server edition.

I was beaten.

Since Dapper Drake 6.06 LTS is still supported (until June 2011) and I had the install disc. I decided to give it a go--what the heck, after all the time I'd wasted already, why not give it a go?

It installed perfectly. It booted perfectly. It updated via apt-get perfectly.

After I determined I wasn't dreaming and I had not as yet spent the hours it would take to restore the backup files to the server drives, I figured I would try a network upgrade from 6.06 to 8.04 LTS (since you can skip all the intermediate releases when going from one LTS version to the next).

It worked!

Everything appears to be in order. It booted up just fine. The network started properly. The RAID arrays are running. cat /proc/mdstat indicates no problems with them. All is well so far.

I just had to make a slight change in the (official) upgrade process:

sudo apt-get install update-manager-core

sudo do-release-upgrade --mode=server


Without --mode=server it didn't think there was an upgrade available.

Time to install Webmin (it just makes life easier). Don't use apt-get to install Webmin, get it from the main site. This is what I did to get it rolling:

sudo nano /etc/apt/sources.list

Add the following lines:

deb http://us.archive.ubuntu.com/ubuntu/ hardy universe
deb-src http://us.archive.ubuntu.com/ubuntu/ hardy universe
deb http://security.ubuntu.com/ubuntu hardy-security universe
deb-src http://security.ubuntu.com/ubuntu hardy-security universe
# deb http://us.archive.ubuntu.com/ubuntu/ hardy-backports main restricted universe multiverse
# deb-src http://us.archive.ubuntu.com/ubuntu/ hardy-backports main restricted universe multiverse

Next run the following:

wget -v http://some-mirror/sourceforge/webadmin/webmin_some-version_all.deb

md5sum webmin_
some-version_all.deb and check it against the hash listed at webmin.com

Follow these instructions when ready:

sudo apt-get update

sudo apt-get install perl libnet-ssleay-perl openssl libauthen-pam-perl libpam-runtime libio-pty-perl libmd5-perl

sudo dpkg --install webmin_some-version_all.deb

And that's it. I restored my files to the /data & /home partitions, configured my Samba shares and all is right with the world.

For now...

2 comments:

phillip murphy said...

hi i see this post from a while ago http://explicate.blogspot.com/2005/12/building-ubuntu-510-server.html talking about doing software raid with webmin -- how do you do this?

i too have been having problems with 8.04 ubuntu

i have it running but with my raid 1 setup if i pull a drive out and test to see if it will reboot - it does not...

my last ditch effort is here to get this running with raid 1 http://ubuntuforums.org/showthread.php?t=855656

thoughts on how i can get my situation fixed?

many thanks erv - intoitall@yahoo.com

Anonymous said...

Who knows where to download XRumer 5.0 Palladium?
Help, please. All recommend this program to effectively advertise on the Internet, this is the best program!