PXE testbed with Cobbler and VMware Fusion

Posted by Dick on July 14, 2008

Something like Puppet could potentially make my life a lot easier.
Puppet can’t do baremetal provisioning; it needs the base OS to be Jumpstart/Kickstarted on first. Fortunately:

  • Cobbler makes running a kickstart server a piece of piss
  • CentOS is a free binary compatible RHEL clone
  • VMWare Fusion takes up a lot less space than a test LAN.

PXE dust

PXE boots require DHCP options that VMware doesn’t enable out of the box.
So I either tweak VMwares dhcpd or use Cobblers DHCP support.

Either choice is fine (they both use ISCs DHCPd anyway);
if you don’t want the overhead of running your cobbler VM all the time,
it probably makes sense to tweak VMwares dhcpd.conf.

Cobbler can add static DHCP entries for systems you define,
(and manage DNS too) so it makes life easier for me
(IRL I’ll have to bribe the DHCP guys to add some options).

install a CentOS 5 VM

  • choose redhat -> RHEL5 as your VM type
  • name it (‘shoemaker’ in my case)
  • 10Gb disk (I just need headless boxes), advanced -> split into 2gb chunks
  • boot your CentoOS 5 iso . Bog standard install takes 4 minutes on my MBP.

install cobbler

EPEL is a collection of decent RPMs (including cobbler and puppet) for RHEL, Fedora and CentOS.
Tell yum about them and install cobbler :

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm
yum install -y cobbler

hardcode your IP

(You’ll need to do this if you installed CentOS to use DHCP, which is the easiest way on VMware)

First, find out your IP, gateway and netmask.

ifconfig eth0 | grep 'inet addr'; netstat -rn | grep UG
           inet addr:192.168.21.134  Bcast:192.168.21.255  Mask:255.255.255.0
 0.0.0.0         192.168.21.2    0.0.0.0         UG        0 0          0 eth0

So you need to

echo "GATEWAY=192.169.21.2" >> /etc/sysconfig/network

edit /etc/sysconfig/network-scripts/ifcfg-eth0

# take this out
  BOOTPROTO=dhcp
  # add these
  BOOTPROTO=static
  IPADDR=192.168.21.134
  NETMASK=255.255.255.0

Then /etc/init.d/network restart.

break VMwares DHCP server.

(NB: Other VMs won’t get DHCP responses until cobbler takes over (duh)).

comment out around line 570 of /Library/Application Support/VMware Fusion/boot.sh

569 ######
570 # let shoemaker do this
571 ###   # vmnet-dhcpd puts itself in the background (daemon mode)
572 ###   "$LIBDIR/vmnet-dhcpd" -cf "$LIBDIR/vmnet8/dhcpd.conf" \
573 ###      -lf "$LEASEDIR"/vmnet-dhcpd-vmnet8.leases \
574 ###      -pf /var/run/vmnet-dhcpd-vmnet8.pid vmnet8
575 #####

(Hint: vmnet8 is the NATted subnet, vmnet1 is the host-only one. Edit the right one.)

Then sudo boot.sh –restart.

install cobbler PXE bits

  • xinetd serves up kernels and initrds over TFTP.
  • Apache serves out kickstart files and RPMs.
  • reposync mirrors remote repos (see here )
    yum install -y dhcp reposync
    for i in xinetd cobblerd httpd # we’ll do DHCP later
    do
    chkconfig $i on
    /etc/init.d/$i start
    done

basic cobbler setup

Note: docs refer to ’/var/lib/cobbler/settings’, but my RPM put it at /etc/cobbler/settings

‘cobbler check’ tells you what it needs you to edit. Obey.

Setup Cobblers DHCP management

Put this in settings:

manage_dhcp: 1
next_server: 'ip-of-cobbler-box'
server: 'ip-of-cobbler-box'

Finally, edit /etc/cobbler/dhcp.template
(man dhcpd.conf and/or crib from /Library/Application Support/VMware Fusion/vmnet8/dhcpd.conf) – my effort is here .

cobbler sync builds your DHCP config, /etc/init.d/dhcpd start makes it live.
If dhcpd won’t start, you cocked up the template. Check /var/log/messages, tweak, ‘cobbler sync’, rinse, repeat.

create a kickstart target and boot it

We can use the install ISO to build a distro and boot profile


  mount /dev/cdrom /mnt
  # copies the DVD onto disk
  cobbler import --mirror=/mnt --name=centos5
  cobbler sync

Make a new VM.

  • choose ‘Linux -> Red Hat Enterprise Linux 5’ (or VMware cocks up the disk image)
  • pick a name and disk size
  • untick ‘Start virtual machine and install OS now’

Change any VM settings if you like (defaults are ok).
Power up the VM. You should see it:

  • get an IP (DHCPd)
  • download a kernel and initrd over TFTP (xinetd and TFTPd)
  • present a menu (pxelinux.0),

Choose ‘centos5’ and you’re away.

further reading

I’ll do a followup article soon with a few tricks/gotchas I’ve found so far. In the meantime:

solaris laptop live upgrade

Posted by Dick on August 08, 2007

The other day I setup my laptop with an eye to making it
live upgradable, so now it’s time to see if I had a clue what I
was doing.

LUvly

Solaris has a feature called Live Upgrade (LU).

The idea is you clone the system into a free slice
(called a ‘boot environment’ or BE), upgrade that, configure it, etc. -
then boot into it quickly when no-one’s looking.

Should it fail horribly, you just reboot and no-one is any the wiser.

It minimizes downtime for upgrades, and gives you a simple and reliable backout (a snapshot/rollback facility).

BE happy

My laptop is set up with a / slice , swap and a spare slice.
All the user data is on my zpool.

By default, LU will copy over all of / – /var /usr /etc , but also /opt.
If you stick /opt on ZFS, LU will try to copy everything to the new BE, which
a) takes hours and b) probably won’ fit. So I’ve put /opt/csw, /opt/SUNWspro, /opt/whatever on ZFS.

hypnotoad $ df -h  -Fzfs
Filesystem             size   used  avail capacity  Mounted on
tank/home               16G    31K    14G     1%    /export/home
tank/home/dick          16G   461M    14G     4%    /export/home/dick
tank/SUNWappserver      16G   125M    14G     1%    /opt/SUNWappserver
tank/SUNWspro           16G   468M    14G     4%    /opt/SUNWspro
tank/csw                16G   216M    14G     2%    /opt/csw
tank/netbeans-5.5       16G   210M    14G     2%    /opt/netbeans-5.5
tank/netbeans-5.5.1     16G   132M    14G     1%    /opt/netbeans-5.5.1
tank/src                16G   612K    14G     1%    /src

These filesystems will be shared across BEs,
but the ‘OS’ (/var /usr/ /sbin etc) will be copied over to the new BE , then upgraded.

hypnotoad $ df -h  -Fufs
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c0d0s0        5.5G   2.8G   2.6G    52%    /

get your media

I’m currently running b68 , so first thing to do is download the latest Nevada DVD ISO (b69) to a ZFS filesystem (there’s not enough room in my teeny root slice, and this avoids LU copying it over to the new BE).

hypnotoad $ pfexec zfs create tank/isos
hypnotoad $ pfexec zfs set mountpoint=/isos tank/isos
hypnotoad $ pfexec zfs set copies=1 tank/isos

mount the ISO

hypnotoad $ pfexec lofiadm -a /isos/sol-nv-b69-x86-dvd.iso
  /dev/lofi/1
hypnotoad $ pfexec mount -F hsfs /dev/lofi/1 /mnt
hypnotoad $ ls /mnt/
  Copyright                    autorun.inf
  DeveloperTools               autorun.sh
  JDS-THIRDPARTYLICENSEREADME  boot
  License                      installer
  README.txt                   sddtool
  Solaris_11

plan your escape

The root filesystem doesn’t need to be backed up (that’s the
whole point!) but I’ll snapshot all the other (ZFS) filesystems just in case:

hypnotoad $ pfexec zfs snapshot -r tank@pre-lu

this recursively (-r) snapshots all filesystem in the zpool ‘tank’ with the label ‘pre-lu’

copy your BE

hypnotoad $ pfexec /usr/sbin/lucreate -c b68 -n b69 -m /:/dev/dsk/c0d0s3:ufs

this :

  • calls the existing BE ‘b68’ (-c b68)
  • creates a new BE called b69 (-n b69)
  • clones the root FS (the OS) to the UFS slice ‘c0d0s3’ (-m /:/dev/dsk/c0d0s3:ufs)

the upshot of which is:

Discovering physical storage devices
Discovering logical storage devices
....
....
Making boot environment  bootable.
Updating bootenv.rc on ABE .
Population of boot environment  successful.
Creation of boot environment  successful.

We now have 2 BEs (the one we’re in now, and the one we’ll be upgrading):

hypnotoad $ pfexec /usr/sbin/lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
------------- ---- --- ---- --- ------
b68                        yes      yes    yes       no     -
b69                        yes      no     no        yes    -

upgrade the new BE

hypnotoad $ pfexec /usr/sbin/luupgrade -u -n b69 -s /mnt/

that’s :

  • do an OS upgrade (-u) …
  • … of the ‘b69’ BE (-n b69) …
  • … from the b69 ISO (-s /mnt).

This’ll take bloody ages (about an hour on my rubbish laptop).

suck it and see

hypnotoad $ pfexec /usr/sbin/luactivate b69
hypnotoad $ pfexec init 6

This’ll create a new default GRUB entry for the new BE. To rollback, either ‘luactivate b68’ or simply reboot and choose the old menu entry.

rolling upgrades

It’s worth knowing that GRUB still boots using the original root fs:

hypnotoad $ df -h -Fufs
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c0d0s3        5.5G   2.8G   2.6G    52%    /
hypnotoad $ pfexec bootadm list-menu
The location for the active GRUB menu is: /dev/dsk/c0d0s0 (not mounted)
The filesystem type of the menu device is 
default 2
timeout 10
0 Solaris Express Community Edition snv_68 X86
1 Solaris failsafe
2 b69
3 b69 failsafe
4 b68
5 b68 failsafe

which means:

  • the best way to edit the menu.lst is using bootadm (which knows where to find it)
  • if you delete that partition, you are screwed.

I thought you’d need a dedicated /boot partition if you wanted to easily do rolling upgrades.

In fact, Live Upgrade knows all about GRUB:

hypnotoad $ pfexec /usr/sbin/ludelete b68
The boot environment  contains the GRUB menu.
Attempting to relocate the GRUB menu.
Relocating GRUB slice to .
Mounted new GRUB slice .
Updating GRUB state.
Moving GRUB menu.
Installing latest GRUB loader.
stage1 written to partition 1 sector 0 (abs 385560)
stage2 written to partition 1, 260 sectors starting at 50 (abs 385610)
Determining the devices to be marked free.
Updating boot environment configuration database.
Updating boot environment description database on all BEs.
Updating all boot environment configuration databases.
Updating GRUB menu default setting
Boot environment  deleted.
hypnotoad $ pfexec /usr/sbin/lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
------------- ---- --- ---- --- ------
b69                        yes      yes    yes       no     -

(Obviously, don’t ‘ludelete’ until you’re happy the new build works well for you).

Now your original slice is freed up to use for b70.

zones, clones and lazybones

Posted by Dick on April 09, 2007

The third time you do something, you automate it. I’ve been building a lot of
zones lately.

Advance Wars ain’t gonna play itself

Creating a zone is straightforward, but can take a while:

  • get an IP and hostname reserved for it
  • configure a new zone (where to put it, IP address, etc.)
  • install it
  • boot it
  • give sysidconfig information (root pass, DNS setup, terminal type, timezone, etc.)
  • go in and customize it (setup RSA keys, disable services, etc.)
  • install whatever I wanted the zone for in the first place

sysidconfig and customizing the zone are the most involved (and error prone) steps.

All my zones are configured identically (same DNS servers etc) so
I’ll build a template zone tweaked to my ‘standard build’ and
clone zones from it .
This avoids a (slow) install, instead copying the ‘parent’ zonepath to the clones zonepath.
More importantly, any customizations made to the template will be present in the clones.

sysidconfig can be fed a sysidcfg
file (which contains answers to the setup questions) – we’ll do that too.

the mother of all zones

The first thing to do is build the template zone and customize it. Time spent on this step will be time saved later, so anything that makes life easier should go in.

First we do a standard zone config (each clone zone gets its own config later)

vera # zonecfg -z template
create
set zonepath=/zones/template
set autoboot=false
add net
  set physical=iprb0
  set address=1.2.3.4/17
end
commit
exit
vera # zoneadm -z template install

Next, login and customize the zone.

vera # zoneadm -z template boot
vera # zlogin -C -e ^ template

My checklist is:

  1. change roots shell to zsh and home dir to /root
  2. make roots home directory
  3. give root a sane prompt and a decent pager
  4. copy my pubkey into /root/.ssh
  5. enable tcp port forwarding, rsa ssh logins only for root
  6. set up sendmail smarthost and aliases

Since I did all this for my glassfish zone
the other day, I can just copy config files from that:

vera # cp /zones/goldfish/root/etc/passwd /zones/template/root/etc/passwd
vera # mkdir /zones/template/root/root
vera # cp -Rp /zones/goldfish/root/root/.bash_profile /zones/template/root/root/
vera # cp -Rp /zones/goldfish/root/root/.ssh/ /zones/template/root/root/.
vera # cp /zones/goldfish/root/etc/ssh/sshd_config /zones/template/root/etc/ssh/sshd_config
vera # cp /zones/goldfish/root/etc/mail/sendmail.cf /zones/template/root/etc/mail/sendmail.cf
vera # cp /zones/goldfish/root/etc/mail/aliases /zones/template/root/etc/mail/aliases
vera # cp /zones/goldfish/root/etc/mail/aliases.db /zones/template/root/etc/mail/aliases.db

an answer for everything

All my zones:

  • have the same DNS config
  • have the root password disabled (I login with ssh RSA keys or with zlogin)
  • don’t need the network interface setup (since they’re zones)

The only thing that changes between zones is their hostname (‘ZONENAME’) which I’ll change
when I copy my sysidcfg template
into the zones /etc directory.

the payoff

Given the zone name and the IP of its interface, you can bang out zones in around 10 seconds with a ten-line shell script

vera # time /zones/bang_one_out.sh goldfish 1.2.3.4/24
 Cloning snapshot tank/zones/template@SUNWzone1
 Instead of copying, a ZFS clone has been created for this zone.
ID NAME             STATUS     PATH                           BRAND    IP
 0 global           running    /                              native   shared
 6 goldfish       running    /zones/goldfish              native   shared
 - template         installed  /zones/template                native   shared
 real    0m8.593s
 user    0m0.229s
 sys     0m0.397s

That’s 9 seconds to configure, build and boot a zone to a state where you can SSH in as root – all services done, ssh keys generated, etc.

To be fair, /zones being on ZFS speeds things up tremendously (using ZFS clones for the copy). But not having to copy keys, edit ssh/sendmail/passwd configs is very nice.

feeping creaturism

It’s tempting to add to the script, add resource controls etc. (some semblance of fricking argument checking probably wouldn’t kill me) but there’s a project by much
smarter people doing that already.

Zone Manager
is the swiss army knife of zone administration.
I find the number of options a bit overwhelming myself, but
take a look if you’re in search of a good CLI tool for zone administration.

DTracing zoned JVMs

Posted by Dick on March 30, 2007

glassfish v2
comes with JDK 1.6,
which has
DTrace providers built into Hotspot
that let you monitor your JVM.

backstory

  • ‘goldfish’ is a local zone (’virtual’ Solaris instance).
  • ‘vera’ is the global zone (the ‘main’ Solaris instance).

The server runs Solaris Express .
Glassfish runs on a JVM in the ‘goldfish’ zone.

I want to DTrace that JVM.

By default, local zones don’t have enough privilege to run dtrace (as it lets you peek into the kernel).
I was about to change that when I realised something.

zones aren’t VMs

I tend to assume that’s obvious, until I talk to non-Solaris users.

  • Solaris zones (unlike VMware guests or Xen domains) don’t run in their own virtual machine
    • zones all share the same kernel (and a lot of the OS)
    • they suffer practically no performance overhead
    • you can give a zone its own filesystems/NICs/schedulers/resource controls or re-use the global zones resources.
    • this gives them very low memory/storage overhead
  • zones can’t run Windows
    • I don’t see that as a problem
  • you can run linux (of sorts) in a zone
    • ask yourself why you want to
  • for storage, you can create one zpool in the global zone to hold all your zones
    • this avoids the massive wastage of : 4 VMs on a host, each with 7 filesystems, each 30% full (which is how most VMware installs seem to run).
    • you can snapshot everything in the global zone, or delegate to each zone. you can change your mind easily too.
  • the zone is a natural place to apply resource controls
    • BUT this is possible without zones
    • processes sharing a zone can be further divided into projects if you need fine grained control
  • the global zone can see (and access) all processes in all zones
    • processes running in the zone only see other processes in that zone

This last point is a huge benefit of zones which I think a lot of people overlook, or mistake for a negative. It also makes my job today much easier.

VMware hosts or Xen dom0s monitor at the VM/domain level:
xen has xentop, vmware has ‘VMware tools’( ESX has a bloated GUI I am trying to drink away).

You can start a console in a vm (like ‘zlogin’ on Solaris) but you might as well SSH into them. It’s no easier to keep track of your processes than if they were on remote machines.

In a global Solaris zone, the processes are all there alongside you. Just use your usual monitoring tools -
ps, prstat, etc. are all zone-aware. Accounting is a doddle too.

back to the point

If you don’t treat zones as VMs, there’s a much simpler way to do this.
Instead of granting ‘goldfish’ dtrace privileges I can simply monitor the JVM from ‘vera’.

i like to watch

This 1-liner (from the DTrace wiki)
fires at every JVM method call, printing the classname of the called instance and the called method:


   vera # dtrace -qn 'hotspot*:::method-entry { printf("-> %4s.%s\n", stringof(copyin(arg1, arg2)), stringof(copyin(arg3, arg4))); }'
   dtrace: buffer size lowered to 2m

NB: this is running as root in the global zone, which has more than enough privilege for dtrace.

Trouble is, I got no output.
Turns out the ’method-entry’ Hotspot probe is disabled by default for performance reasons.
To enable the method-entry probe, you pass the ’-XX:+ExtendedDTraceProbes’ flag when the JVM starts.

the doctor will see you now

Having to bounce the JVM (or run it in a ‘debug’ mode by default on a production box) would suck.
Luckily, the JDK comes with a tool called jinfo – it lets you read system properties, command line flags etc. from a running JVM.
What the manpage doesn’t say is that since JDK 1.6 jinfo can also set those properties on the running JVM.

On other OSes, we’d have to drop into the VM as root (or do some remote JVM voodoo). On Solaris, we can see all processes in all zones from vera
(actually we could do this bit from the zone, since it doesn’t need special privileges).

vera # jinfo -flag +ExtendedDTraceProbes $(pgrep -z goldfish java)

’pgrep -z goldfish foo’ returns the PID of all processes called foo in the ‘goldfish’ zone
(I’m only running 1 java process in the goldfish zone, so I know pgrep will find the right one). jinfo sets the flag on the PID returned by pgrep.

Immediately, the probe starts to fire
and the dtrace window starts spewing class and method names:


   vera # dtrace -qn 'hotspot*:::method-entry { printf("-> %4s.%s\n", stringof(copyin(arg1, arg2)), stringof(copyin(arg3, arg4))); }'
   dtrace: buffer size lowered to 2m
   -> java/util/HashMap/HashMap$HashIterator.newKeyIterator
   -> java/util/HashMap$KeyIterator.<init>
   -> java/util/HashMap$KeyIteratorhIterator.<init>
   -> java/util/HashMap$HashIteratorctorImpl.<init>
   -> java/lang/Objectenterprise/server/ss/provider/ASSelector.<init>rise/server/ss/provider/ASSelector
   -> java/util/HashMap$HashIteratorver/ss/spi/ASSocketFacadeUtils.hasNexti/ASSocketFacadeUtils
   -> java/nio/channels/spi/AbstractSelectorSSocketServiceProxy.beginServiceProxy
   -> java/nio/channels/spi/AbstractInterruptibleChannelce.blockedOn
   -> sun/misc/SharedSecretsrise/server/ss/ASSocketService.getJavaLangAccesscketService
   -> java/lang/Threadenterprise/server/PEMain.currentThreadrver/PEMain
   ...
   ...

There are other interesting probes there (that don’t fire several hundred times a second) – garbage collection,
method compilation, classloading.
See the full probe list for more.

When we get what we came for, we can let the JVM run at full speed again:

vera # jinfo -flag -ExtendedDTraceProbes $(pgrep -z goldfish java)

and dtrace shuts up.

to recap: VMware can kiss my ass

  1. on a 1.6 JVM under Solaris we can switch a running JVM into profiling/debugging mode without needing to restart anything.
  2. we can monitor processes (JVMS or otherwise) in all our ‘virtual machines’ using standard UNIX tools.
  3. we can HUP/KILL/generally bugger about with said zoned processes without needing to ‘zlogin’ if the mood takes us (and we feel like driving the zone admin insane)
  4. I am a decent packaging system away from being a raving Solaris zealot.

ZIL communication

Posted by Dick on February 12, 2007

If you’re dealing with lots of small files, a
NFS and ZFS combination
can run slower than you’d like.

For example, untarring the apache source tree on the NFS client:

planb:$ time tar xvf httpd-2.2.4.tar.gz
real    2m9.594s
user    0m0.356s
sys     0m0.508s

Ouch.

BenR mentioned turning off the ZIL
(ZFS intent log ) to speed up ZFS+NFS over aggregated Gbit ethernet – I doubted my 100Mbit link would have the same bottleneck for anything,
but it was worth a go.

Tell the ZFS/NFS server to switch off the ZIL:

vera# echo 'set zfs:zil_disable=1' >> /etc/system

Then either reboot, or run:

vera # echo ‘zil_disable/W 1’ | mdb -kw
vera # zpool export tank
vera # zpool import tank

In my case, I might as well reboot.

planb:$ mv httpd-2.2.4 outoftheway
planb:$ time tar xzf httpd-2.2.4.tar.gz
real    0m4.862s
user    0m0.368s
sys     0m0.440s

Holy crap.

There are some
implications to the ‘correctness’ of this from the NFS clients point of view but on the ZFS box itself it’s non-lethal, so I think
I’ll keep it (I’m snapshotting the share three times a day, so I’m
reasonably safe if when Linux shits itself).