fast zone cloning on Solaris 10

Posted by Dick on October 11, 2007

Glassfish seems like a natural successor to Tomcat.
The clustering features look interesting, but I only have the one machine.

Hmm. I’m going to need a shitload of zones.

send in the clones

The ‘zoneadm clone’ command creates a zone by copying an existing zonepath (to avoid going through the install twice).
On Solaris Express, zones on ZFS can be cloned in about a second
Solaris 10 (update4) has to actually copy the files, so we’ll use a trick to avoid that.

the master plan

  • build 1 ‘template’ zone on ZFS
  • configure it to a ‘standard build’
  • take a ZFS snapshot of the zonepath
  • ZFS clone the snapshot N times to make N zonepaths
  • run zonecfg and hook up each zonepath
  • boot them
  • ssh in and install whatever you like

build your template zone

We’ll quickly make a bog-standard
‘whole root’
zone .

This takes more disk (and longer to install) than a sparse zone,
but gives you maximum flexibility (you can write to /usr, etc.).

zfs create -o mountpoint=/zones vera/zones
zfs create -o compression=on vera/zones/template
zonecfg -z template "create -b; \
    set zonepath=/zones/template ;\
    commit ; exit"
chmod 700 /zones/template/
time zoneadm -z template install

As I said, that takes a while (a sparse zone installs in about 5 minutes):

real    21m30.749s
user    1m18.566s
sys     3m35.917s

Good job we only have to do it once.

tweak it like you mean it

You could clone the zonepath now (skip ahead to ’say cheese’), but
since I tend to setup my machines the same way, I’ll customize things first.

First thing to do is boot the zone, and complete the system identification.

zoneadm -z template boot
zlogin -C -e. template

The zlogin command means :

  • get me a console (-C) login to do system setup
    • sysconfig runs on the zone console, so a straight zlogin isn’t enough
  • type ’..’ (-e.) to be dropped back to the main zone
    • the default sequence is .#, which will kill your ssh session to the global zone

You’ll see a counter as the SMF database is generated on first boot
(which takes a few minutes. again, we only need to do this in the template)::

[Connected to zone 'template' console]
 37/138

Then go through the standard Solaris sysconfig
(doesn’t matter what you enter – this is overridden on a per-zone basis).

When that’s done, the zone will reboot itself (hit ’..’ to exit zrogin).

Now do your ‘standard build’. My list :

  • change roots shell and prompt
  • copy my public SSH keys so I can ssh in as root
  • setup sendmail
  • turn off some daemons

Since that’s what I did for my original solaris install
I can just copy files to do most of this.

zlogin template usermod -s /usr/bin/bash root
cp /.bash_profile /zones/template/root/
cp /etc/ssh/sshd_config /zones/template/root/etc/ssh/sshd_config
cp -Rp /.ssh/ /zones/template/root/.ssh/
cp /etc/mail/sendmail.cf /zones/template/root/etc/mail/sendmail.cf
cp /etc/mail/aliases /zones/template/root/etc/mail/aliases
cp /etc/mail/aliases.db /zones/template/root/etc/mail/aliases.db
for i in webconsole sendmail autofs
do
zlogin template svcadm disable $i
done

say cheese

     zlogin template
     # sys-unconfig # this also halts the 'template' zone
     zoneadm -z template detach
     zfs snapshot vera/zones/template@clean
     zoneadm -z template attach

(the last ‘attach’ command makes patching the zone slighty easier).

going around the houses

Now we can use that to create a new zonepath for our DB zone, ganesh:

zfs clone vera/zones/template@clean vera/zones/ganesh

Life is a LOT easier if you separate your OS from your data, so I also give the zone its own ZFS filesystem – what we call ‘delegating a dataset’ – to install
its apps etc on
(note that although the zonepath is on ZFS, the zone is not ‘aware’ of that, so you can’t create zfs filesystems on it).
This also lets zone admins run their own snapshots etc. (snapping from the global zone works too, so choose your preference)

zfs create -o mountpoint=none vera/delegated/ganesh
zfs set quota=5G vera/delegated/ganesh

zonecfg supports ‘create -a’ to attach a pre-built zoneroot and generate a
config for it. We also

  • set it to boot at system startup (’autoboot’)
  • add a network address (’add net’)
  • apply some simple resource controls (’add cpu-shares/max-lwps/capped-memory’)
    zonecfg -z ganesh "create -a /zones/ganesh;set autoboot=true; \
    add net; set physical=iprb0; set address=10.1.0.1/24; end; \
    set cpu-shares=20; set max-lwps=400; \
    add capped-memory; set physical=400m; set swap=500m; end; \
    add dataset ; set name=vera/delegated/ganesh; end; \
    commit; exit"
    zoneadm -z ganesh attach

feed some prepared answers to sysconfig:

sed s/ZONENAME/ganesh/ \
/zones/scripts/sysidcfg.template > /zones/ganesh/root/etc/sysidcfg

and finally boot it

zoneadm -z ganesh boot

attack of the clones

That’s the database taken care of.
We now have 3 more to do, and this is pretty easy to script.
I threw something together to do the job for me.
It’s pretty stinky (I don’t really speak shell) but should be easy for you to roll your own
You’ll need the script and the template for sysidcfg

cd /zones/scripts
wget http://files.hellooperator.net/solaris/zones/s10/scripts/bang_one_out.s10u4.sh
wget http://files.hellooperator.net/solaris/zones/s10/scripts/sysidcfg.template

Now the payoff:

time for i in kingfish rippyfish turnipfish
 do
   /zones/scripts/bang_one_out.s10u4.sh $i
 done
real    0m14.409s
user    0m2.459s
sys     0m1.097s
zoneadm list -iv
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              native   shared
   6 ganesh           running    /zones/ganesh                  native   shared
  25 kingfish         running    /zones/kingfish                native   shared
  27 rippyfish        running    /zones/rippyfish               native   shared
  29 turnipfish       running    /zones/turnipfish              native   shared

did you see that?

That’s 15 SECONDS to do what took 20 minutes the first time. Except these zones are configured and booted ready to ssh into.

Oh, and there are 3 of them.

I use zone cloning like Jumpstart – a way to
get a known, reproducible base OS as a building blocks for other things.

You can clone zones whatever FS they’re on, but it will take
longer to copy files than to snapshot+clone (especially for whole root zones).

The great thing about ZFS snapshots and clones is that a clone only uses disk space for the changes from its parent snapshot. It’s not obvious at the filesystem level:

du -hs  /zones/template /zones/ganesh
 2.1G   /zones/template
 2.3G   /zones/ganesh

But you can see it in the dataset (the ‘USED’ field below):

zfs list  vera/zones/template vera/zones/ganesh
NAME                  USED  AVAIL  REFER  MOUNTPOINT
vera/zones/ganesh    35.1M  28.6G  2.11G    /zones/ganesh
vera/zones/template  2.13G  28.6G  2.10G  /zones/template

Finally, remember you can clone any zone.
A common
problem we have is our test and dev. systems getting out of step with our production
boxes. If they’re zones
(and they will be if I have a say in it), you can easily clone
the live box (and its database zone) to get a testbed for upgrades, config changes, etc. that is as close to reality as you can get.

sharing JVMs across zones

Posted by Dick on May 27, 2007

My (b33) glassfish v2 build
is a bit long in the tooth
(the latest promoted build is b50).

It bundles a JVM, but the nightly builds don’t, so my first pre-upgrade step is to install a standalone 1.6 JVM.

a communal JVM

I’ll need a few zones for playing with glassfish clustering and they’ll all need JVMs. Ideally, we want a central copy in the global zone that is visible from all the local zones.
That way:

  • all zones use the same on-disk binaries, so the VM system can re-use text segments across your zones (and you save a bit of disk)
  • you centralize upgrades/patches in the global zone
  • a readonly JVM encourages (ok, forces) you to put things like SSL keys and libraries in your glassfish domain directory, where they should be.

neatly packaged

Originally, this ‘howto’ involved getting the tarball from the JEE page, installing it to a zfs filesystem and loopback mounting it into each zone read-only.

Today, I realized the Solaris packages for JDK 1.6 give us all the benefits of that with none of the hassle. They’ll install in the global zone and be visible (as part of /usr) in all local zones. I won’t even have to restart the zones.

Get JDK 6u1 from the Java SE download page
(the Solaris x86 packages – tar.Z on the ‘Download’ page) .

Extract it In the global zone and add the packages it contains:

vera # uncompress jdk-6u1-solaris-i586.tar.Z
vera # tar xf jdk-6u1-solaris-i586.tar
vera # yes | pkgadd -d . SUNW*

It’s now visible in all the zones:

vera # zlogin goldfish
goldfish # java -version
java version "1.6.0_01"
Java(TM) SE Runtime Environment (build 1.6.0_01-b06)
Java HotSpot(TM) Client VM (build 1.6.0_01-b06, mixed mode, sharing)

The only minor niggle is that the /usr/java symlink can’t be edited in the local zones, but it’s easy enough to set a PATH explicitly if you want a different JVM.

postgresql on solaris express

Posted by Dick on April 21, 2007

I need a database to do anything useful with glassfish .
Here are my PostgreSQL install notes.

the install

I chose postgresql 8.2 as part of my SXCE install .
If you don’t have it already, you need to:

globalzone # cd /cdrom/Solaris_11/Product
globalzone # pkgadd -d . SUNWpostgr-82-client SUNWpostgr-82-contrib \
SUNWpostgr-82-docs SUNWpostgr-82-libs SUNWpostgr-82-server \
SUNWpostgr-82-server-data-root SUNWpostgr-82-tcl

I might as well run it in a zone (partly to keep things tidy in case I screw up).
With sparse zones, I only need to install packages in the global zone and all zones can use them (one set of packages to maintain == happy sysadmin).

I’ll use the zone cloning script
I mentioned the other day, and slap on some resource caps while I’m at it :

globalzone # /zones/bangoneout.sh elephantom 1.2.3.4/24
globalzone # zonecfg -z elephantom "set max-lwps=300; add capped-memory; set physical=400M; set swap=512M; end; exit;"

ZFS snapshots make backing up the DB a lot easier, so I’ll give the zone a chunk of my zpool to manage:

globalzone # zfs create tank/delegated/elephantom
globalzone # zfs set mountpoint=none tank/delegated/elephantom
globalzone # zfs set quota=5G tank/delegated/elephantom
globalzone # zonecfg -z elephantom 'add dataset; set name=tank/delegated/elephantom; end'

From here on, we treat the zone as we would any other server:

globalzone # zoneadm -z elephantom reboot
globalzone # zlogin elephantom
[Connected to zone 'elephantom' pts/2]
elephantom #

creating the database

PostgreSQL integrates nicely with Solaris -
there’s RBAC support (a ‘PostgreSQL administration’ profile for DBA tasks),
DTrace providers and SMF integration in recent SXCE builds:

elephantom # svcs postgresql
disabled       12:21:40 svc:/application/database/postgresql:version_81
disabled       12:21:40 svc:/application/database/postgresql:version_82

I’ll make a ZFS filesystem and tell the version_82 instance to use it:

elephantom # zfs create tank/delegated/elephantom/data
elephantom # zfs set mountpoint=/data tank/delegated/elephantom/data
elephantom # chown postgres:postgres /data
elephantom # svccfg -s postgresql:version_82 'setprop postgresql/data = /data'
elephantom # svcadm refresh version_82

The rest of the install is the same as any UNIX.

Install the database as usual:

elephantom # su - postgres
$ /usr/postgres/8.2/bin/initdb /data
....snip usual initdb messages.....
$ exit
elephantom #

It’s probably a good idea to take a snapshot now, before we tweak stuff.
Note we can do this from within the zone since we :

elephantom # zfs snapshot vera/delegated/ganesh/data@pristine

The default config ( /data/postgresql.conf ) needs a few tweaks. I set:


   wal_sync_method = fsync
   full_page_writes = off
   listen_addresses = '*'
   # logfle is /data/server.log
   log_connections = on
   log_disconnections = on
   log_hostname = on

and edited /data/pg_hba.conf
to allow access from my glassfish zone
(all inter-zone traffic goes over loopback, so there’s no need to change your firewall).

Now start the server via SMF:

elephantom # svcadm enable postgresql:version_82

and create the user and the db:

elephantom # su - postgres
$ PATH=/usr/postgres/8.2/bin:$PATH
$ createuser -PREDS dbuser
Enter password for new role:
Enter it again:
CREATE ROLE
$ createdb -O dbuser zonedb
CREATE DATABASE
$ exit

Finally, I’ll check I can login from the glassfish zone:

glassfishzone # /usr/postgres/8.2/bin/psql -h elephantom.mydomain -U dbuser zonedb
Password for user dbuser:
Welcome to psql 8.2.3, the PostgreSQL interactive terminal.
Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit
zonedb=> \q
glassfishzone #

I stuck a 200Mb memory cap on the zone
- the above config seems quite happy in there.

thinking cap

Posted by Dick on April 14, 2007

Now that
I finally got off my ass and upgraded to a recent Solaris Express
I can have a proper look at Duckhorn.

Zones and resource management (RM) were made for each other
, but it could be a bit of an involved process. Project Duckhorn
set out to properly integrate zones and RM.

hold still

I’ll use my glassfish zone, ‘goldfish’
as a guinea pig. Fire up zonecfg in the global zone:

vera # zonecfg -z goldfish
zonecfg:goldfish> set cpu-shares=20
zonecfg:goldfish> set max-lwps=200
zonecfg:goldfish> add capped-memory
zonecfg:goldfish:capped-memory> set physical=500m
zonecfg:goldfish:capped-memory> set swap=750m
zonecfg:goldfish:capped-memory> end
zonecfg:goldfish> exit
vera #

LWPs = ‘lightweight processes’, or threads. CPU shares are used to determine how much CPU the zone gets
(see below). We also set some limits on RAM and swap.

do what I mean, not what I say

So, reboot the zone to apply these limits.

vera # zoneadm -z goldfish reboot
zoneadm: zone 'goldfish': enabling system/rcap service failed: entity not found
vera #

Ah, I forgot to install rcapd (the daemon which enforces resource limits).

vera # mount /cdrom && cd /cdrom/Solaris_11/Product/
vera # yes | pkgadd -d . SUNWrcapr SUNWrcapu
[ output snipped ]

Try again:

vera # zoneadm -z goldfish boot
zoneadm: zone 'goldfish': WARNING: The zone.cpu-shares rctl is set but
zoneadm: zone 'goldfish': FSS is not the default scheduling class for
zoneadm: zone 'goldfish': this zone.  FSS will be used for processes
zoneadm: zone 'goldfish': in the zone but to get the full benefit of FSS,
zoneadm: zone 'goldfish': it should be the default scheduling class.
zoneadm: zone 'goldfish': See dispadmin(1M) for more details.
vera #

So, the zone boots, but you’re warned that you’ve forgotten something.

CPU shares are used by the ‘fair share scheduler’ (FSS) which is another of Solaris’ really nice features.

FSS : burstable cpu quotas

Full details are in FSS
but briefly:

FSS dishes out resources amongst active zones based on their relative number of CPU shares.

i.e. If ‘goldfish’ is the only zone asking for CPU, it gets all the CPU.

Only when there’s contention
(e.g. goldfish goes bananas and I SSH into vera to reboot it) does FSS look at the shares.

The global zone gets 1 share by default, so the calculation is:

  • goldfish (20 shares) gets ( 20 / (20 +1) = ) about 95% CPU
  • vera (1 share) gets ( 1 / (20 + 1) = ) about 5% CPU

which will hopefully be enough to kill the runaway process or halt the goldfish zone.

If you need hard maximums and minimums (for licensing purposes)
look at CPU caps .
Both allocation strategies work on fractions of CPUs (although zoneadm will transparently manage processor sets
and resource pools if you prefer to use those).

tick tock

To make FSS the default scheduler:

vera # dispadmin -d
dispadmin: Default scheduling class is not set
vera # dispadmin -d FSS
vera # dispadmin -d
FSS     (Fair Share)
vera # reboot

the brown noise

We now have some reasonable protection against a runaway zone taking the global zone
with it.

To test this I’m using a shell-based forkbomb –
the following 13 characters are poison to any unix box you paste them into.

:(){ :|:& };:

Login to vera and run ‘prstat -Z’ to watch the zones (global and goldfish).
Then forkbomb ‘goldfish’:

goldfish # :(){ :|:& };:
-bash: fork: Resource temporarily unavailable
-bash: fork: Resource temporarily unavailable
-bash: fork: Resource temporarily unavailable
...
...

As expected, we’re hitting the max-lwps limit (200 LWPs). There’s no noticeable impact on vera.

If we remove that safety net:

vera # zonecfg -z goldfish
zonecfg:goldfish> clear max-lwps
zonecfg:goldfish> commit
zonecfg:goldfish> exit
vera # zonecfg -z goldfish reboot

And forkbomb again:

goldfish # :(){ :|:& };:
-bash: fork: Not enough space
-bash: fork: Not enough space
-bash: fork: Not enough space
...
...

This time, we hit the ‘capped-memory’ limit.

The load is tremendous – as high as 350 (!) – but
thanks to FSS (and that 1 CPU share) we can still run ‘zoneadm -z goldfish halt’.

doctor, it hurts when I do this

As a control, I’ll remove the memory capping (but keep the CPU shares).
This is pretty stupid, as the box will thrash itself to death.

vera # zonecfg -z goldfish
zonecfg:goldfish> remove capped-memory
zonecfg:goldfish> exit
vera # zoneadm -z goldfish reboot
goldfish # :(){ :|:& };:

Sure enough vera and goldfish lock up completely.

The problem is that without a memory cap, the forkbomb will exhaust system swap.
Even though the global zone still gets a share of the CPU, it can’t start any processes
so it’s impossible to run any administrative commands.

So, Don’t Do That. Always cap zone memory (physical and swap) to reserve some for the global zone.

project management

You probably gathered
I’m a great fan of zones. Easy to use RM makes them really powerful.

But RM isn’t limited to zones. Resource controls, CPU shares, etc. can also be applied
to ‘projects’ – user-defined lists of processes within a single zone.
You could define projects in goldfish (one
for the appserver, one for something else in there) to further divvy up the zones allocation
of resources.

There’s an awful lot of goodness linked to at
The Opensolaris Zones Community page .

My favourite two are The Sun BluePrints Guide to Solaris Containers ,
and Mennos blueprint which has some good practical stuff on Oracle
projects.

zones, clones and lazybones

Posted by Dick on April 09, 2007

The third time you do something, you automate it. I’ve been building a lot of
zones lately.

Advance Wars ain’t gonna play itself

Creating a zone is straightforward, but can take a while:

  • get an IP and hostname reserved for it
  • configure a new zone (where to put it, IP address, etc.)
  • install it
  • boot it
  • give sysidconfig information (root pass, DNS setup, terminal type, timezone, etc.)
  • go in and customize it (setup RSA keys, disable services, etc.)
  • install whatever I wanted the zone for in the first place

sysidconfig and customizing the zone are the most involved (and error prone) steps.

All my zones are configured identically (same DNS servers etc) so
I’ll build a template zone tweaked to my ‘standard build’ and
clone zones from it .
This avoids a (slow) install, instead copying the ‘parent’ zonepath to the clones zonepath.
More importantly, any customizations made to the template will be present in the clones.

sysidconfig can be fed a sysidcfg
file (which contains answers to the setup questions) – we’ll do that too.

the mother of all zones

The first thing to do is build the template zone and customize it. Time spent on this step will be time saved later, so anything that makes life easier should go in.

First we do a standard zone config (each clone zone gets its own config later)

vera # zonecfg -z template
create
set zonepath=/zones/template
set autoboot=false
add net
  set physical=iprb0
  set address=1.2.3.4/17
end
commit
exit
vera # zoneadm -z template install

Next, login and customize the zone.

vera # zoneadm -z template boot
vera # zlogin -C -e ^ template

My checklist is:

  1. change roots shell to zsh and home dir to /root
  2. make roots home directory
  3. give root a sane prompt and a decent pager
  4. copy my pubkey into /root/.ssh
  5. enable tcp port forwarding, rsa ssh logins only for root
  6. set up sendmail smarthost and aliases

Since I did all this for my glassfish zone
the other day, I can just copy config files from that:

vera # cp /zones/goldfish/root/etc/passwd /zones/template/root/etc/passwd
vera # mkdir /zones/template/root/root
vera # cp -Rp /zones/goldfish/root/root/.bash_profile /zones/template/root/root/
vera # cp -Rp /zones/goldfish/root/root/.ssh/ /zones/template/root/root/.
vera # cp /zones/goldfish/root/etc/ssh/sshd_config /zones/template/root/etc/ssh/sshd_config
vera # cp /zones/goldfish/root/etc/mail/sendmail.cf /zones/template/root/etc/mail/sendmail.cf
vera # cp /zones/goldfish/root/etc/mail/aliases /zones/template/root/etc/mail/aliases
vera # cp /zones/goldfish/root/etc/mail/aliases.db /zones/template/root/etc/mail/aliases.db

an answer for everything

All my zones:

  • have the same DNS config
  • have the root password disabled (I login with ssh RSA keys or with zlogin)
  • don’t need the network interface setup (since they’re zones)

The only thing that changes between zones is their hostname (‘ZONENAME’) which I’ll change
when I copy my sysidcfg template
into the zones /etc directory.

the payoff

Given the zone name and the IP of its interface, you can bang out zones in around 10 seconds with a ten-line shell script

vera # time /zones/bang_one_out.sh goldfish 1.2.3.4/24
 Cloning snapshot tank/zones/template@SUNWzone1
 Instead of copying, a ZFS clone has been created for this zone.
ID NAME             STATUS     PATH                           BRAND    IP
 0 global           running    /                              native   shared
 6 goldfish       running    /zones/goldfish              native   shared
 - template         installed  /zones/template                native   shared
 real    0m8.593s
 user    0m0.229s
 sys     0m0.397s

That’s 9 seconds to configure, build and boot a zone to a state where you can SSH in as root – all services done, ssh keys generated, etc.

To be fair, /zones being on ZFS speeds things up tremendously (using ZFS clones for the copy). But not having to copy keys, edit ssh/sendmail/passwd configs is very nice.

feeping creaturism

It’s tempting to add to the script, add resource controls etc. (some semblance of fricking argument checking probably wouldn’t kill me) but there’s a project by much
smarter people doing that already.

Zone Manager
is the swiss army knife of zone administration.
I find the number of options a bit overwhelming myself, but
take a look if you’re in search of a good CLI tool for zone administration.