zones, clones and lazybones

Posted by Dick on April 09, 2007

The third time you do something, you automate it. I’ve been building a lot of
zones lately.

Advance Wars ain’t gonna play itself

Creating a zone is straightforward, but can take a while:

  • get an IP and hostname reserved for it
  • configure a new zone (where to put it, IP address, etc.)
  • install it
  • boot it
  • give sysidconfig information (root pass, DNS setup, terminal type, timezone, etc.)
  • go in and customize it (setup RSA keys, disable services, etc.)
  • install whatever I wanted the zone for in the first place

sysidconfig and customizing the zone are the most involved (and error prone) steps.

All my zones are configured identically (same DNS servers etc) so
I’ll build a template zone tweaked to my ‘standard build’ and
clone zones from it .
This avoids a (slow) install, instead copying the ‘parent’ zonepath to the clones zonepath.
More importantly, any customizations made to the template will be present in the clones.

sysidconfig can be fed a sysidcfg
file (which contains answers to the setup questions) – we’ll do that too.

the mother of all zones

The first thing to do is build the template zone and customize it. Time spent on this step will be time saved later, so anything that makes life easier should go in.

First we do a standard zone config (each clone zone gets its own config later)

vera # zonecfg -z template
create
set zonepath=/zones/template
set autoboot=false
add net
  set physical=iprb0
  set address=1.2.3.4/17
end
commit
exit
vera # zoneadm -z template install

Next, login and customize the zone.

vera # zoneadm -z template boot
vera # zlogin -C -e ^ template

My checklist is:

  1. change roots shell to zsh and home dir to /root
  2. make roots home directory
  3. give root a sane prompt and a decent pager
  4. copy my pubkey into /root/.ssh
  5. enable tcp port forwarding, rsa ssh logins only for root
  6. set up sendmail smarthost and aliases

Since I did all this for my glassfish zone
the other day, I can just copy config files from that:

vera # cp /zones/goldfish/root/etc/passwd /zones/template/root/etc/passwd
vera # mkdir /zones/template/root/root
vera # cp -Rp /zones/goldfish/root/root/.bash_profile /zones/template/root/root/
vera # cp -Rp /zones/goldfish/root/root/.ssh/ /zones/template/root/root/.
vera # cp /zones/goldfish/root/etc/ssh/sshd_config /zones/template/root/etc/ssh/sshd_config
vera # cp /zones/goldfish/root/etc/mail/sendmail.cf /zones/template/root/etc/mail/sendmail.cf
vera # cp /zones/goldfish/root/etc/mail/aliases /zones/template/root/etc/mail/aliases
vera # cp /zones/goldfish/root/etc/mail/aliases.db /zones/template/root/etc/mail/aliases.db

an answer for everything

All my zones:

  • have the same DNS config
  • have the root password disabled (I login with ssh RSA keys or with zlogin)
  • don’t need the network interface setup (since they’re zones)

The only thing that changes between zones is their hostname (‘ZONENAME’) which I’ll change
when I copy my sysidcfg template
into the zones /etc directory.

the payoff

Given the zone name and the IP of its interface, you can bang out zones in around 10 seconds with a ten-line shell script

vera # time /zones/bang_one_out.sh goldfish 1.2.3.4/24
 Cloning snapshot tank/zones/template@SUNWzone1
 Instead of copying, a ZFS clone has been created for this zone.
ID NAME             STATUS     PATH                           BRAND    IP
 0 global           running    /                              native   shared
 6 goldfish       running    /zones/goldfish              native   shared
 - template         installed  /zones/template                native   shared
 real    0m8.593s
 user    0m0.229s
 sys     0m0.397s

That’s 9 seconds to configure, build and boot a zone to a state where you can SSH in as root – all services done, ssh keys generated, etc.

To be fair, /zones being on ZFS speeds things up tremendously (using ZFS clones for the copy). But not having to copy keys, edit ssh/sendmail/passwd configs is very nice.

feeping creaturism

It’s tempting to add to the script, add resource controls etc. (some semblance of fricking argument checking probably wouldn’t kill me) but there’s a project by much
smarter people doing that already.

Zone Manager
is the swiss army knife of zone administration.
I find the number of options a bit overwhelming myself, but
take a look if you’re in search of a good CLI tool for zone administration.

glassfish in a zone

Posted by Dick on March 29, 2007

It’s been a while since I cared about j2ee, but it looks like Glassfish runs rails via jruby
(and I like a free nano as much as the next guy). Reason enough to kick the tyres.

a plastic castle…

I like to run services in a dedicated zone
(especially if I’m expecting to try, hate and delete them over the course of 15 minutes).

vera # zonecfg -z goldfish
goldfish: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:goldfish> create
zonecfg:goldfish> set zonepath=/zones/goldfish
zonecfg:goldfish> set autoboot=true
zonecfg:goldfish> add net
zonecfg:goldfish:net> set address=10.0.0.5/24
zonecfg:goldfish:net> set physical=iprb0
zonecfg:goldfish:net> end
zonecfg:goldfish> commit
zonecfg:goldfish> exit
vera #
vera # zoneadm -z goldfish install
vera # zoneadm -z goldfish boot
vera # zlogin -C goldfish

The last command connects us to the zones console (where sysidconfig sets up timezone, root password, etc.).

…and a treasure chest

I want to create snapshots and filesystems within the zone, so
I’ll let the zone have 4GB of my zpool, using the ‘add dataset’ command.

vera # zfs create tank/delegated ; zfs create tank/delegated/goldfish
vera # zfs set quota=4G tank/delegated/goldfish
vera # zfs set mountpoint=none tank/delegated/goldfish
vera # zonecfg -z goldfish 'add dataset; set name=tank/delegated/goldfish;end'
vera # zoneadm -z goldfish reboot

NB: mountpoint must be ‘none’ or you’ll get an error:

   could not verify zfs dataset tank/delegated/goldfish: mountpoint cannot be inherited
   zoneadm: zone goldfish failed to verify

(this ensures we don’t inadvertantly leak information into the zone).

The zone can now see the dataset (its parents are visible too but you obviously can’t access them).
Note the quota we set earlier.

goldfish # zfs list
NAME                        USED  AVAIL  REFER  MOUNTPOINT
tank                       24.7G  11.5G  1.50K  legacy
tank/delegated               49K  11.5G  24.5K  legacy
tank/delegated/goldfish  24.5K  4.00G  24.5K  none
goldfish # zfs create tank/delegated/goldfish/j2ee
goldfish # zfs set mountpoint=/j2ee tank/delegated/goldfish/j2ee
goldfish # zfs list tank/delegated/goldfish/j2ee
NAME                             USED  AVAIL  REFER  MOUNTPOINT
tank/delegated/goldfish/j2ee  24.5K  4.00G  24.5K  /j2ee

The global zone can still see the dataset, snapshot it, etc.
but it has a special property set – ‘zoned’ – which tells the global zone to ignore the mountpoint property (for security again).

vera # zfs list tank/delegated/goldfish/j2ee
NAME                             USED  AVAIL  REFER  MOUNTPOINT
tank/delegated/goldfish/j2ee  24.5K  4.00G  24.5K  /j2ee
vera # zfs get zoned tank/delegated/goldfish/j2ee
NAME                            PROPERTY  VALUE                           SOURCE
tank/delegated/goldfish/j2ee  zoned     on                              inherited from tank/delegated/goldfish
vera # ls /j2ee
/j2ee: No such file or directory
vera # zfs set mountpoint=/wee tank/delegated/goldfish/j2ee
cannot set property for 'tank/delegated/goldfish/j2ee': 'mountpoint' cannot be set on dataset in a non-global zone

You can manually unset the zoned property if you like doing really stupid things.

installing glassfish

Now I ssh into the zone,
download the Java EE 5 SDK Update 3 Preview with JDK
and run it.

Plenty of Java-based installers crap out if you don’t have X installed.
Sun thoughtfully added a ’-console’ flag to the installer to allow text-only installs.

More importantly, when the X installer craps out it tells you to try rerunning it with the ’-console’ option.

If that sounds like an obvious thing to do you clearly haven’t run many commercial software installers.

goldfish # chmod +x java_ee_sdk-5_03-preview-solaris-i586.bin
goldfish # ./java_ee_sdk-5_03-preview-solaris-i586.bin -console
  1. change the install directory to /j2ee
  2. set an admin username and password
  3. accept the defaults for everything else
   Product: Java Platform, Enterprise Edition 5 SDK
   Location: /j2ee
   Space Required: 265.56 MB
   ------------------------------------------------
   Java 2 SDK, Standard Edition 6.0
   Sun Java System Message Queue 4.0
   Sun Java System Application Server Platform Edition 9
   Sample Applications
   Java BluePrints
   Your First Cup: An Introduction to the Java EE Platform

   Ready to Install

When the installer completes, it tells you

  1. how to start the app server
  2. where the admin port is
  3. where the readmes are (in case the above doesn’t work)
  goldfish # /j2ee/bin/asadmin start-domain domain1

Browse to http://yourzonesip.com:4848 , log in with the admin credentials, job done.

is that it?

Course, you don’t have to install it in a zone, but it takes 5 minutes and this way you don’t have to worry about running it as non-root.

Zones are great for hosting several glassfish instances on one physical server:

This will make it much easier to try out things like clustering.

First impressions of Glassfish itself are:

  • the install was really painless
  • a Google found surprisingly good Solaris integration (SMF, RBAC, privileges)
  • the admin UI
    • is nice and responsive, especially considering vera (the global zone) is doing a lot of other work already
    • looks a lot nicer than Tomcat
  • autodeploy seems to actually work, which is always nice.

I still won’t be surprised to find it floating upside down tomorrow.

update

Well, it’s still there :) I’ve since
deployed Roller to it
and added some resource caps
(about 600Mb seems to be much more than it needs).

Oh, and I won the Nano :D

ZIL communication

Posted by Dick on February 12, 2007

If you’re dealing with lots of small files, a
NFS and ZFS combination
can run slower than you’d like.

For example, untarring the apache source tree on the NFS client:

planb:$ time tar xvf httpd-2.2.4.tar.gz
real    2m9.594s
user    0m0.356s
sys     0m0.508s

Ouch.

BenR mentioned turning off the ZIL
(ZFS intent log ) to speed up ZFS+NFS over aggregated Gbit ethernet – I doubted my 100Mbit link would have the same bottleneck for anything,
but it was worth a go.

Tell the ZFS/NFS server to switch off the ZIL:

vera# echo 'set zfs:zil_disable=1' >> /etc/system

Then either reboot, or run:

vera # echo ‘zil_disable/W 1’ | mdb -kw
vera # zpool export tank
vera # zpool import tank

In my case, I might as well reboot.

planb:$ mv httpd-2.2.4 outoftheway
planb:$ time tar xzf httpd-2.2.4.tar.gz
real    0m4.862s
user    0m0.368s
sys     0m0.440s

Holy crap.

There are some
implications to the ‘correctness’ of this from the NFS clients point of view but on the ZFS box itself it’s non-lethal, so I think
I’ll keep it (I’m snapshotting the share three times a day, so I’m
reasonably safe if when Linux shits itself).

where do you want these LUNs, love?

Posted by Dick on January 28, 2007

I posted the other day about
Solaris Express ZFS / iSCSI integration .
As I said, there’s no TPGT support in the ZFS offering
yet
so to present targets on specific interfaces you
need to use the iscsitadm commands directly.

As usual, the manpages are good, so treat this as an introduction to them.

start the daemon

The only setup needed is to assign a config directory

vera # svcadm enable iscsitgt
vera # iscsitadm create admin -d /etc/iscsi

make the TPGT

An iSCSI TPGT (Target Portal Group Tag) is just a list of IP addresses.
When you assign a target to the TPGT, you make it reachable on those IPs.

I want my targets on iprb0, so I create a new TPGT and add iprb0s IP address to it.

vera # ifconfig iprb0
....
        inet 1.2.3.4 ....
....
vera # iscsitadm create tpgt 1
vera # iscsitadm modify tpgt -i 1.2.3.4 1
vera # iscsitadm list tpgt -v 1
TPGT: 1
    IP Address: 1.2.3.4

sanity check

You are running a firewall, right? iSCSI is on tcp/3260, so a line like this in /etc/ipf/ipf.conf would be a good idea:

pass in quick on iprb0 proto tcp from iscsiclient to 1.2.3.4 port = 3260 flags S keep state

make the targets

I want eight targets, 2Gb each. So I make a zvol for each, then make a target on it:

vera # zfs create tank/iscsiluns
vera # for i in first second third fourth fifth sixth seventh eighth
do
 zfs create -V 2G tank/iscsiluns/$i
 iscsitadm create target -b /dev/zvol/rdsk/tank/iscsiluns/$i $i
done

The important bit here is the ’-b’ flag to ‘back’ the targets with zvols.

The zvol targets are ‘onlined’ immediately.
By default, iscsitadm backs targets with files.
These are zero-filled before being started, so they take ages to online (and by default
they get created in your ‘admin’ directory which may fill your root filesystem).

vera # iscsitadm list target -v eighth
Target: eighth
    iSCSI Name: iqn.1986-03.com.sun:02:d456321a-e2f9-ca43-8537-88a0d23b4a33.eighth
    Connections: 0
    ACL list:
    TPGT list:
    LUN information:
        LUN: 0
            GUID: 0
            VID: SUN
            PID: SOLARIS
            Type: disk
            Size: 2.0G
            Backing store: /dev/zvol/rdsk/tank/iscsiluns/eighth
            Status: online

add targets to the TPGT

vera # for i in first second third fourth fifth sixth seventh eighth
do
iscsitadm modify target -p 1 $i
done
vera # iscsitadm list target -v eighth
Target: eighth
    iSCSI Name: iqn.1986-03.com.sun:02:d456321a-e2f9-ca43-8537-88a0d23b4a33.eighth
    Connections: 0
    ACL list:
    TPGT list:
        TPGT: 1
    LUN information:
        LUN: 0
            GUID: 0
            VID: SUN
            PID: SOLARIS
            Type: disk
            Size: 2.0G
            Backing store: /dev/zvol/rdsk/tank/iscsiluns/eighth
            Status: online

The volumes are of course still zvols, so you can snapshot/clone them as usual:

vera # zfs list -t volume
NAME                     USED  AVAIL  REFER  MOUNTPOINT
tank/iscsiluns/eighth   36.5K  18.1G  36.5K  -
tank/iscsiluns/fifth    36.5K  18.1G  36.5K  -
tank/iscsiluns/first    36.5K  18.1G  36.5K  -
tank/iscsiluns/fourth   36.5K  18.1G  36.5K  -
tank/iscsiluns/second   36.5K  18.1G  36.5K  -
tank/iscsiluns/seventh  36.5K  18.1G  36.5K  -
tank/iscsiluns/sixth    36.5K  18.1G  36.5K  -
tank/iscsiluns/third    36.5K  18.1G  36.5K  -

ZFS for Linux (and OS X and Windows and BSD)

Posted by Dick on January 12, 2007

Ubuntu is a good Linux, but reiserfs let me down this autumn (nothing personal, the other linux filesystems suck too), and I had
a particularly stupid installer destroy ~/.mozilla last month.

I badly wanted a ZFS home directory (not badly enough to go near ZFS-FUSE; I’m not deranged :D ).
Nothing fancy, just a mirrored zpool for redundancy and regular snapshots to protect against pilot error.

(update: FreeBSD 7.0 now has excellent ZFS support which might be helpful if you don’t mind leaving Linux)

To avoid having to migrate fully, I moved my home directory to an NFS share on my Solaris machine (iSCSI wouldn’t help; I’d have to mkfs.crappyfs the LUN on the Linux end anyway. Besides, all the Linux iSCSI initiators I tried stank).

It works a treat.

planb$ pwd
/home/sisred
planb$ md5sum misc/oncall.txt
68db8407afbe4965918e3b425e2d5abd  misc/oncall.txt
planb$ rm misc/oncall.txt
planb$ echo 'shit'
shit
planb$ date
Sun Jan  7 21:16:12 GMT 2007
planb$ ls .zfs/snapshot/
[snip - lots of directories]
zfs-auto-snap-2007-01-05-06:00:00/
zfs-auto-snap-2007-01-05-12:00:00/
zfs-auto-snap-2007-01-05-18:00:00/
zfs-auto-snap-2007-01-06-00:00:00/
zfs-auto-snap-2007-01-06-06:00:00/
zfs-auto-snap-2007-01-06-12:00:00/
zfs-auto-snap-2007-01-06-18:00:00/
zfs-auto-snap-2007-01-07-00:00:00/
[snip - lots of directories]
planb$ cp -p .zfs/snapshot/zfs-auto-snap-2007-01-07-18\:00\:00/misc/oncall.txt misc/oncall.txt
planb: $ md5sum misc/oncall.txt
68db8407afbe4965918e3b425e2d5abd  misc/oncall.txt
planb$ uname -rs
Linux 2.6.17-10-386
planb$

Course, this works just as well with any OS that can mount NFS.
The only difference to running directly on Solaris is speed (due to 100Mbit link) and I have to ssh
over to run zfs(1).

Here are my notes (again, they’re mostly a paste of my notes, so ask if anything is unclear).

setup a dedicated link

This is the only remotely tricky bit, and it’s optional.
Both machines are on my desk, so I hooked them up with a crossover cable to keep NFS off the LAN.

I’m using NICs I found in a bin. Once they’re fitted:

both sides

I’ll choose a network of 10.10.10.0/30 for the link.
Each end gets a hostname of ‘red$host’ (the cable is red. Imagination bypass.).

Add this to /etc/hosts on each end:

10.10.10.1  redsun
10.10.10.2  redlinux

and then disable firewalls on those interfaces (eth2 on Linux and e1000g0 on Solaris in my case).

the linux end

add an entry to /etc/network/interfaces

auto eth2
iface eth2 inet static
  address 10.10.10.2
  netmask 255.255.255.252

then bring up the interface with sudo ifup eth2

By default, Linux tosses a coin to choose the interface number at boot, so now and then eth0 and eth2 will
swap.
I’m sure this is useful to some people, but if you prefer a sane networking setup , you can tie them down in /etc/iftab:

eth0 mac 00:13:20:b2:22:23
eth2 mac 00:90:27:12:4f:11

the solaris end

Before you can use a NIC, you need to plumb it in (this hooks up all the kernel processing streams to it, hence the name).

ifconfig
Bring up the interface with:

sun # ifconfig e1000g0 plumb
sun # ifconfig e1000g0 inet 10.10.10.1/30 up

For this to be done automatically at boot:

sun # echo redsun > /etc/hostname.e1000g0
sun # echo "10.10.10.0    255.255.255.252" >> /etc/inet/netmasks

You should be able to ping across the link now.

create a home filesystem

sun # zfs create tank/home
sun # zfs set compression=on tank/home
sun # zfs create tank/home/sisred
sun # zfs set mountpoint=/export/home/sisred tank/home/sisred

create a user

I gave up trying to persuade linux to do NFSv4, so I’m using v3.
That means the uids on each end need to match.

Since I’m moving an existing homedir, I create a Solaris user with matching UID/GID.
(run ‘id’ on the linux host – in my case, ‘1000’ for both).

sun # groupadd -g 1000 sisred
sun # useradd -c "Dick Davies" -g 1000 -u 1000 \
-d /export/home/sisred -m \
-s /usr/bin/bash sisred
sun # chown -R sisred:sisred ~sisred

No need for a password (I use via key-based SSH).

share it out

sun # zfs set sharenfs=on tank/home/sisred
sun # svcadm enable nfs/server
sun # svcs -xv
sun #

The second command will make sure NFS starts at reboot. ‘svcs -xv’ will give useful output if something didn’t work.

Shares are read/write by default. The ‘sharenfs’ property can also take a string of options to share(1M)
so zfs set sharenfs=ro tank/home/sisred creates a readonly share, for example.

move into your new $HOME

linux # sudo mkdir /nfshome
linux # sudo chown sisred:sisred /nfshome
linux # sudo apt-get install portmap # for NFS locking
linux # sudo mount redsun:/export/home/sisred /nfshome
linux # rsync -va /home/sisred/ /nfshome/

(You need to run the last command as the user due to NFS ‘root squash’).

If it looks OK, you can make it permanent by editing /etc/fstab and remounting:

redsun:/export/home/sisred /home/sisred  nfs proto=tcp        0       2

now you’re just showing off

If performance is a bit slow on small files, take a look here for tuning.

You need something to automate snapshots up at the Solaris end. Two options are:

  1. Tim Fosters SMF snapshots
    - some good features (automated ’ zfs -i send/recv’ of snapshots to a remote host, cleanup of old snapshots etc).
  2. roll your own

UPDATE: I’ve since switched to NFSv4, which was easy when you know how. See here