Solaris 10 on mirrored disks

Posted by Dick on September 27, 2007

Solaris 10update 4 is out, and so is glassfish v2. First we need to get our
OS on.

My test x86 machine is a 3Ghz P4 with 1Gb RAM and twin 40Gb disks.
Disks are a bit pokey, but having 2 makes playing around with RAID and ZFS more fun.

Since ZFS root isn’t here yet, I’ll use Solaris Volume Manager (SVM) to mirror the root
filesystem. Applications, /export/home , etc. will live on a ZFS mirrored pool.

(NB: the procedure to install Solaris Express is almost identical, except you can skip the PCA step)

sunshine in a bag

I got the Solaris 10 Update 4 DVD ISO and burnt it off.
The install is straightforward, with a couple of caveats:

  • SVM can only mirror slices on solaris fdisk partitions, so make 1 big solaris primary partition.
  • only install onto the first disk (c1d0) – we’ll add the second one later.
  • choose ‘custom install’ to choose your disk layout
slice file system size notes
0 / 6000Mb
1 swap 1100Mb (must be bigger than RAM to save crashdumps)
3 /metadb 10Mb (this is just to reserve the space for SVM bookkeeping)
7 /zfs 32035Mb (the rest of the disk will be a ZFS storage pool)

Note I haven’t set up a slice for
Live Upgrade
. I’ll detach one submirror before an upgrade, then I can rollback or keep the upgrade by choosing which way to resync them afterwards.

I chose ‘Entire Distribution’, then went off to find a sandwich and play a bit of Hotel Dusk.

After the reboot, you can login as root,
unmount /metadb (c1d0s3) and /zfs (c1d0s7) , remove them from /etc/vfstab,
and delete the mountpoints (you could just set them up later, but the installer is a bit eaiser to explain than the ‘format’ command).

slice up the second disk

We’ll set the second disk to have 1 Solaris fdisk partition.
Pipe the disklabel from c1d0 onto c2d0 so the slice sizes on both are identical:

  fdisk -B /dev/rdsk/c2d0p0
  prtvtoc /dev/rdsk/c1d0s2 | fmthard -s - /dev/rdsk/c2d0s2

We also need to install grub, so it’s bootable if the first disk dies:

/sbin/installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c2d0s0
  stage1 written to partition 0 sector 0 (abs 16065)
  stage2 written to partition 0, 260 sectors starting at 50 (abs 16115)

Add an entry for c2d0 in /boot/grub/menu.lst:

  # second half of SVM root mirror
  title alternate root
  root (hd1,0,a)
  kernel /platform/i86pc/multiboot
  module /platform/i86pc/boot_archive

  title alternate root failsafe
  root (hd1,0,a)
  kernel /boot/multiboot kernel/unix -s
  module /boot/x86.miniroot-safe

setting up the state databases

SVM stores its config on-disk, in
state database replicas .
You need half of them to be online at any given time, which means I
need 2 copies on each disk (each is about 4Mb, hence the 10Mb /metadb slice I set aside):

metadb -a -f -c 2 c1d0s3 c2d0s3

which says :

  • add some state database replicas (-a)
  • it’s ok that there aren’t any existing replicas (-f)
  • there’ll be 2 database replicas on each device (-c 2)
  • and use the slices we set aside earlier (c1d0s3 c2d0s3)

Check they got created OK:

metadb
      flags           first blk       block count
   a        u         16              8192            /dev/dsk/c1d0s3
   a        u         8208            8192            /dev/dsk/c1d0s3
   a        u         16              8192            /dev/dsk/c2d0s3
   a        u         8208            8192            /dev/dsk/c2d0s3

The ‘u’ flag means the replica is up to date (’metadb -i’ gives a legend).

setting up the root RAID-1 mirror

I’ll use my existing root fs as one submirror, then hook up the second disk.

First we tell SVM about the (existing) root slice:

metainit -f d1 1 1 c1d0s0
  d1: Concat/Stripe is setup

which says:

  • make a volume called d1 (d1)
  • with one stripe (1)
  • with one component per stripe (1)
  • out of my existing root slice (c1d0s0)
  • oh, and yes, I know it contains a filesystem (-f)

We do the same thing for the second disks root slice (this is empty, so we don’t need ’-f’):

metainit d2 1 1 c2d0s0
   d2: Concat/Stripe is setup

Now we create a mirror volume made up of the populated submirror, d1:

metainit d0 -m d1
   d0: mirror is setup

which says:

  • make a volume called d0 (d0)
  • which is a mirror made up of volume d1 ( -m d1)

I’.ll start using this volume as the root fs
before I attach the other submirror (if you’re going to fail, fail early).
The ‘metaroot’ command edits /etc/vfstab and /etc/system for you:

metaroot d0
reboot

And when it comes back up, we’re running on the logical device:

df -h /
  Filesystem             size   used  avail capacity  Mounted on
  /dev/md/dsk/d0         5.8G   3.1G   2.6G    56%    /

Last thing to do is attach the other half of the mirror:

metattach d0 d2

You can watch the mirror syncing up:

metastat -c
  d0               m  5.9GB d1 d2 (resync-15%)
      d1           s  5.9GB c1d0s0
      d2           s  5.9GB c2d0s0

Takes about 5 minutes, and that’s pretty much it.

multi-mirror swap shop

Up to you whether to do this – you can use the second swap device for more VM,
but mirroring should help if a disk dies while you’re running.
The process is very similar to the root slice:

metainit -f d51 1 1 c1d0s1
metainit d52 1 1 c2d0s1
metainit d50 -m d51
metattach d50 d52
swap -d /dev/dsk/c1d0s1
swap -a /dev/md/dsk/d50

Update /etc/vfstab to use /dev/md/dsk/d50 instead of /dev/dsk/c0d0s1

Setup the ZFS mirror

I want a ZFS mirror for home directories, apps, etc.
It’s not that I don’t trust SVM (although I don’t know it yet),
but it’s just a volume manager – you still have all the hassles of filesystems
on top of it, and if I wanted that I’d still be on Linux LVM.

zpool create tank mirror c1d0s7 c2d0s7
zpool status
    pool: tank
   state: ONLINE
   scrub: none requested
  config:
NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1d0s7  ONLINE       0     0     0
            c2d0s7  ONLINE       0     0     0
errors: No known data errors

And that’s it.

Well, actually, no.

The next thing to do is pull out disks and check you can still boot
the machine. But this is getting a bit long-winded now, so that’ll be another
post.

no, honestly, you can stop reading now

My post-install checklist includes:

  • hooking the machine up for outbound email
      echo DSsmarthost.whatever.com >> /etc/mail/sendmail.cf
      echo 'root: me@whatever.com >> /etc/mail/newaliases
      svcadm restart sendmail
      newaliases
  • hardcode duplex settings
  • hook up pca
  • setup a firewall
  • setup NTP
      echo 'server time.apple.com' > /etc/inet/ntp.conf
      ntpdate -b time.apple.com
      svcadm enable ntp
  • create a user
      zfs create -o mountpoint=/export/home tank/home
      zfs create tank/home/dick
      useradd -c 'Dick Davies' -d /export/home/dick -s /usr/bin/zsh dick
      projadd -c 'Dick Davies' user.dick
      chown -R dick /export/home/dick
      passwd dick
  • switch on the (zone-friendly) Fair Share Scheduler
      dispadmin -d FSS
      reboot

It would be nice to Jumpstart this, and once we get a decent PXE solution that’ll be exactly what I’ll do. This will help no end.

bonjour and solaris

Posted by Dick on September 12, 2007

I’ve been using zeroconf / bonjour / rendezvous
for a while now .
I don’t have a DNS server (it isn’t much use without static DHCP entries, which my linksys doesn’t do),
so multicast DNS is a neat way to do the same job (if you said /etc/hosts, you lose).

I could have built mDNSResponder for Solaris, but
Nevada b72 has a multicast DNS SMF service
(README ).
My boxes are either Macs or Solaris, so this is very handy indeed.

we don’ need no steenkin servers

mDNS is perfect for ‘link-local’ domain names (local subnet only, e.g. a home network) where a DNS server
would just be a pain to maintain.

hypnotoad% ping hypnotoad.local
ping: unknown host hypnotoad.local
hypnotoad% svcadm enable multicast/dns
hypnotoad% grep mdns /etc/nsswitch.dns
hosts: files dns mdns
ipnodes: files dns mdns
hypnotoad% pfexec cp /etc/nsswitch.dns /etc/nsswitch.conf
hypnotoad% ping -s hypnotoad.local
PING hypnotoad.local: 56 data bytes
64 bytes from 192.168.1.101: icmp_seq=0. time=0.291 ms
^C

The mDNS-using machines on my LAN (== all of them) can resolve that name now, and I can resolve theirs. No DNS, no static DHCP entries (and no /etc/hosts), but the names stick to the machines, which is exactly what I need.

it’s called security. perhaps you’ve heard of it?

If this rings alarm bells (what, you don’t want my mac posing as suicidegirls.com?)
then relax. mdnsd only responds to queries for ‘well-known’ zeroconf-related domains, specifically:

hypnotoad% svcprop -p nss_mdns_config/domain dns/multicast
local b.e.f.ip6.arpa a.e.f.ip6.arpa 9.e.f.ip6.arpa 8.e.f.ip6.arpa 254.169.in-addr.arpa

You’ll need to allow multicast queries in and out of your firewall if you want to advertise anything,
so anyone using ipfilter will need these lines to /etc/ipfilter/ipf.conf

pass in quick proto tcp/udp from any to 224.0.0.251 port mdns
pass out quick proto tcp/udp from any to 224.0.0.251 port mdns

(yup, ‘mdns’ made it into /etc/services)

Some people are twitchy about advertising their services (see below).
The way I see it, at worst it saves an attacker a port scan.
Nothing here makes your service less secure – you can still use SSL, authentication, etc.
(if you rely on hostnames for access control, you might wan to pause before starting a debate about security).

this little light of mine, I’m gonna let it shine

Name resolution is nice, but the real point of zeroconf is
service discovery .

Whereever possible you want to avoid users having to remember URLs, hostnames, etc.
I’m particularly interested in advertising network services to clients – Macs, obviously,
but also Linux (ubuntu has particularly strong zeroconf support) and
even Windows boxen .

It’s extremely handy if you’re building network appliances.
I hated having to figure out what IP my NSLU2 had DHCPed for itself –
with mDNS installed, it just advertises its admin webapp as a service.

Things like Samba have their own browsing support, but there are plenty of other services
FTP shares, ssh daemons, etc. – that can be made much more accessible with a little mDNS fairy dust.
Apples list has details.

The dns-sd(1) command line tool is a nice mDNS toolkit, and lets you tell mdnsd what services to advertise
(you also need to run the services you advertise, or your users will get pissed off pretty quickly).

For example,

dns-sd -R "Welcome, new users" _http._tcp . 80 page=/newusers/quickstart.html

will cause a new bookmark to popup in safari users ‘bonjour bookmark’ menu pointing to http://servername.local:80/newusers/quickstart.html

similarly, something like:

dns-sd -R "Sopranos" _nfs._tcp . 2049 path=/export/torrents

makes your NFS shares appear under ‘Network’ in the OS X 10.4 Finder.

(word to the wise: NFS over 100Mbit is fast enough for viewing full screen VLC. Samba isn’t)

dns-sd is pretty handy for testing, but for production you probably want something more seamless.

For NFS, a simple script could poll the sharetab and advertise what it found.

Apache has mod_zeroconf or mod_dnssd
to automatically advertise UserDirs and VirtualHosts.
mod_dnssd is more full-featured, but relies on Avahi
(whose author is currently a bit miffed , incidentally),
so it might be simpler to try building the first one

my protocols. let me show you them.

If any of this sounds interesting, I’d highly recommend
Stuart Cheshires book
– the first really good O’Reilly book I’ve read in years.

Nicely explains the low-level design of the protocols and how they strove to keep
it simple, portable and robust (yes, UPnP and Jini, I am looking at you) by reusing existing technologies as much as possible.

He gave a Google tech talk on zeroconf
that’s worth a look if you have an hour to kill, too.

geek christmas comes early

Posted by Dick on September 04, 2007

Solaris 10 Update 4 finally shipped today – the changelog lists all the nice features of Solaris Express I’ve been banging on about for the last year or so including

plus a load of other stuff I didn’t get a chance to try yet, like

I’ll probably stick with Solaris Express on my laptop , mainly because the wireless/NWAM bits work so well. Plus I’m really looking forward to the
upcoming mdns (aka zeroconf) support in Nevada b72 to play nice with OSX.

Servers are another matter, though.
With iSCSI in a supported Solaris release, Thumpers finally make sense.
And Glassfish v2 is in final RC now, and should be out by the end of the month. A small GF cluster is high on my todo list, and by October I should have a fully-supported stack.

Oh, and tomorrow Brian Apple will be releasing his new liquid iPods which I hear are delicious.