Solaris 10 on mirrored disks

Posted by Dick on September 27, 2007

Solaris 10update 4 is out, and so is glassfish v2. First we need to get our
OS on.

My test x86 machine is a 3Ghz P4 with 1Gb RAM and twin 40Gb disks.
Disks are a bit pokey, but having 2 makes playing around with RAID and ZFS more fun.

Since ZFS root isn’t here yet, I’ll use Solaris Volume Manager (SVM) to mirror the root
filesystem. Applications, /export/home , etc. will live on a ZFS mirrored pool.

(NB: the procedure to install Solaris Express is almost identical, except you can skip the PCA step)

sunshine in a bag

I got the Solaris 10 Update 4 DVD ISO and burnt it off.
The install is straightforward, with a couple of caveats:

  • SVM can only mirror slices on solaris fdisk partitions, so make 1 big solaris primary partition.
  • only install onto the first disk (c1d0) – we’ll add the second one later.
  • choose ‘custom install’ to choose your disk layout
slice file system size notes
0 / 6000Mb
1 swap 1100Mb (must be bigger than RAM to save crashdumps)
3 /metadb 10Mb (this is just to reserve the space for SVM bookkeeping)
7 /zfs 32035Mb (the rest of the disk will be a ZFS storage pool)

Note I haven’t set up a slice for
Live Upgrade
. I’ll detach one submirror before an upgrade, then I can rollback or keep the upgrade by choosing which way to resync them afterwards.

I chose ‘Entire Distribution’, then went off to find a sandwich and play a bit of Hotel Dusk.

After the reboot, you can login as root,
unmount /metadb (c1d0s3) and /zfs (c1d0s7) , remove them from /etc/vfstab,
and delete the mountpoints (you could just set them up later, but the installer is a bit eaiser to explain than the ‘format’ command).

slice up the second disk

We’ll set the second disk to have 1 Solaris fdisk partition.
Pipe the disklabel from c1d0 onto c2d0 so the slice sizes on both are identical:

  fdisk -B /dev/rdsk/c2d0p0
  prtvtoc /dev/rdsk/c1d0s2 | fmthard -s - /dev/rdsk/c2d0s2

We also need to install grub, so it’s bootable if the first disk dies:

/sbin/installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c2d0s0
  stage1 written to partition 0 sector 0 (abs 16065)
  stage2 written to partition 0, 260 sectors starting at 50 (abs 16115)

Add an entry for c2d0 in /boot/grub/menu.lst:

  # second half of SVM root mirror
  title alternate root
  root (hd1,0,a)
  kernel /platform/i86pc/multiboot
  module /platform/i86pc/boot_archive

  title alternate root failsafe
  root (hd1,0,a)
  kernel /boot/multiboot kernel/unix -s
  module /boot/x86.miniroot-safe

setting up the state databases

SVM stores its config on-disk, in
state database replicas .
You need half of them to be online at any given time, which means I
need 2 copies on each disk (each is about 4Mb, hence the 10Mb /metadb slice I set aside):

metadb -a -f -c 2 c1d0s3 c2d0s3

which says :

  • add some state database replicas (-a)
  • it’s ok that there aren’t any existing replicas (-f)
  • there’ll be 2 database replicas on each device (-c 2)
  • and use the slices we set aside earlier (c1d0s3 c2d0s3)

Check they got created OK:

metadb
      flags           first blk       block count
   a        u         16              8192            /dev/dsk/c1d0s3
   a        u         8208            8192            /dev/dsk/c1d0s3
   a        u         16              8192            /dev/dsk/c2d0s3
   a        u         8208            8192            /dev/dsk/c2d0s3

The ‘u’ flag means the replica is up to date (’metadb -i’ gives a legend).

setting up the root RAID-1 mirror

I’ll use my existing root fs as one submirror, then hook up the second disk.

First we tell SVM about the (existing) root slice:

metainit -f d1 1 1 c1d0s0
  d1: Concat/Stripe is setup

which says:

  • make a volume called d1 (d1)
  • with one stripe (1)
  • with one component per stripe (1)
  • out of my existing root slice (c1d0s0)
  • oh, and yes, I know it contains a filesystem (-f)

We do the same thing for the second disks root slice (this is empty, so we don’t need ’-f’):

metainit d2 1 1 c2d0s0
   d2: Concat/Stripe is setup

Now we create a mirror volume made up of the populated submirror, d1:

metainit d0 -m d1
   d0: mirror is setup

which says:

  • make a volume called d0 (d0)
  • which is a mirror made up of volume d1 ( -m d1)

I’.ll start using this volume as the root fs
before I attach the other submirror (if you’re going to fail, fail early).
The ‘metaroot’ command edits /etc/vfstab and /etc/system for you:

metaroot d0
reboot

And when it comes back up, we’re running on the logical device:

df -h /
  Filesystem             size   used  avail capacity  Mounted on
  /dev/md/dsk/d0         5.8G   3.1G   2.6G    56%    /

Last thing to do is attach the other half of the mirror:

metattach d0 d2

You can watch the mirror syncing up:

metastat -c
  d0               m  5.9GB d1 d2 (resync-15%)
      d1           s  5.9GB c1d0s0
      d2           s  5.9GB c2d0s0

Takes about 5 minutes, and that’s pretty much it.

multi-mirror swap shop

Up to you whether to do this – you can use the second swap device for more VM,
but mirroring should help if a disk dies while you’re running.
The process is very similar to the root slice:

metainit -f d51 1 1 c1d0s1
metainit d52 1 1 c2d0s1
metainit d50 -m d51
metattach d50 d52
swap -d /dev/dsk/c1d0s1
swap -a /dev/md/dsk/d50

Update /etc/vfstab to use /dev/md/dsk/d50 instead of /dev/dsk/c0d0s1

Setup the ZFS mirror

I want a ZFS mirror for home directories, apps, etc.
It’s not that I don’t trust SVM (although I don’t know it yet),
but it’s just a volume manager – you still have all the hassles of filesystems
on top of it, and if I wanted that I’d still be on Linux LVM.

zpool create tank mirror c1d0s7 c2d0s7
zpool status
    pool: tank
   state: ONLINE
   scrub: none requested
  config:
NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1d0s7  ONLINE       0     0     0
            c2d0s7  ONLINE       0     0     0
errors: No known data errors

And that’s it.

Well, actually, no.

The next thing to do is pull out disks and check you can still boot
the machine. But this is getting a bit long-winded now, so that’ll be another
post.

no, honestly, you can stop reading now

My post-install checklist includes:

  • hooking the machine up for outbound email
      echo DSsmarthost.whatever.com >> /etc/mail/sendmail.cf
      echo 'root: me@whatever.com >> /etc/mail/newaliases
      svcadm restart sendmail
      newaliases
  • hardcode duplex settings
  • hook up pca
  • setup a firewall
  • setup NTP
      echo 'server time.apple.com' > /etc/inet/ntp.conf
      ntpdate -b time.apple.com
      svcadm enable ntp
  • create a user
      zfs create -o mountpoint=/export/home tank/home
      zfs create tank/home/dick
      useradd -c 'Dick Davies' -d /export/home/dick -s /usr/bin/zsh dick
      projadd -c 'Dick Davies' user.dick
      chown -R dick /export/home/dick
      passwd dick
  • switch on the (zone-friendly) Fair Share Scheduler
      dispadmin -d FSS
      reboot

It would be nice to Jumpstart this, and once we get a decent PXE solution that’ll be exactly what I’ll do. This will help no end.

Trackbacks

Use this link to trackback from your own site.

Comments

Leave a response

  1. Anonymous Wed, 03 Oct 2007 17:34:51 GMT

    Great article on how to use SVM and thank you for actually explaining how the commands work rather just listing the commands you used (like a lot of articles do).

    I’ve been looking for a way to combine SVM and LU and it appears that you may have the solution I’ve been looking for. In the above article, you stated: “Note I haven’t set up a slice for Live Upgrade . I’ll detach one submirror before an upgrade, then I can rollback or keep the upgrade by choosing which way to resync them afterwards.”This makes total sense to me and it sounds like it should work just fine. However, I’m not comfortable enough w/SVM (yet) to attempt this with out seeing an example first.

    Could you please elaborate on this? Perhaps in a follow up article in a similar format to the one above?

    Thanks!

  2. Dick Davies Wed, 03 Oct 2007 21:35:16 GMT

    @anonymous : in fairness, all the options are documented in the respective manpages (but that doesn’t help if you don’t have a sun box already).

    I’ve been told LiveUpgrade does the splitting/resyncing of mirrors automatically, which gobsmacked me.
    Haven’t tried it yet, but when Xen is integrated into OpenSolaris (b75, in about a month) I’ll be Live Upgrading back to Solaris Express, so stay tuned :)

  3. Ceri Davies Fri, 12 Oct 2007 19:35:36 GMT

    Our JumpStart setup already does all of that stuff you mentioned, by the way (just not for x86 because of the PXE thing).

Comments