ZFS for Linux (and OS X and Windows and BSD)

Posted by Dick on January 12, 2007

Ubuntu is a good Linux, but reiserfs let me down this autumn (nothing personal, the other linux filesystems suck too), and I had
a particularly stupid installer destroy ~/.mozilla last month.

I badly wanted a ZFS home directory (not badly enough to go near ZFS-FUSE; I’m not deranged :D ).
Nothing fancy, just a mirrored zpool for redundancy and regular snapshots to protect against pilot error.

(update: FreeBSD 7.0 now has excellent ZFS support which might be helpful if you don’t mind leaving Linux)

To avoid having to migrate fully, I moved my home directory to an NFS share on my Solaris machine (iSCSI wouldn’t help; I’d have to mkfs.crappyfs the LUN on the Linux end anyway. Besides, all the Linux iSCSI initiators I tried stank).

It works a treat.

planb$ pwd
/home/sisred
planb$ md5sum misc/oncall.txt
68db8407afbe4965918e3b425e2d5abd  misc/oncall.txt
planb$ rm misc/oncall.txt
planb$ echo 'shit'
shit
planb$ date
Sun Jan  7 21:16:12 GMT 2007
planb$ ls .zfs/snapshot/
[snip - lots of directories]
zfs-auto-snap-2007-01-05-06:00:00/
zfs-auto-snap-2007-01-05-12:00:00/
zfs-auto-snap-2007-01-05-18:00:00/
zfs-auto-snap-2007-01-06-00:00:00/
zfs-auto-snap-2007-01-06-06:00:00/
zfs-auto-snap-2007-01-06-12:00:00/
zfs-auto-snap-2007-01-06-18:00:00/
zfs-auto-snap-2007-01-07-00:00:00/
[snip - lots of directories]
planb$ cp -p .zfs/snapshot/zfs-auto-snap-2007-01-07-18\:00\:00/misc/oncall.txt misc/oncall.txt
planb: $ md5sum misc/oncall.txt
68db8407afbe4965918e3b425e2d5abd  misc/oncall.txt
planb$ uname -rs
Linux 2.6.17-10-386
planb$

Course, this works just as well with any OS that can mount NFS.
The only difference to running directly on Solaris is speed (due to 100Mbit link) and I have to ssh
over to run zfs(1).

Here are my notes (again, they’re mostly a paste of my notes, so ask if anything is unclear).

setup a dedicated link

This is the only remotely tricky bit, and it’s optional.
Both machines are on my desk, so I hooked them up with a crossover cable to keep NFS off the LAN.

I’m using NICs I found in a bin. Once they’re fitted:

both sides

I’ll choose a network of 10.10.10.0/30 for the link.
Each end gets a hostname of ‘red$host’ (the cable is red. Imagination bypass.).

Add this to /etc/hosts on each end:

10.10.10.1  redsun
10.10.10.2  redlinux

and then disable firewalls on those interfaces (eth2 on Linux and e1000g0 on Solaris in my case).

the linux end

add an entry to /etc/network/interfaces

auto eth2
iface eth2 inet static
  address 10.10.10.2
  netmask 255.255.255.252

then bring up the interface with sudo ifup eth2

By default, Linux tosses a coin to choose the interface number at boot, so now and then eth0 and eth2 will
swap.
I’m sure this is useful to some people, but if you prefer a sane networking setup , you can tie them down in /etc/iftab:

eth0 mac 00:13:20:b2:22:23
eth2 mac 00:90:27:12:4f:11

the solaris end

Before you can use a NIC, you need to plumb it in (this hooks up all the kernel processing streams to it, hence the name).

ifconfig
Bring up the interface with:

sun # ifconfig e1000g0 plumb
sun # ifconfig e1000g0 inet 10.10.10.1/30 up

For this to be done automatically at boot:

sun # echo redsun > /etc/hostname.e1000g0
sun # echo "10.10.10.0    255.255.255.252" >> /etc/inet/netmasks

You should be able to ping across the link now.

create a home filesystem

sun # zfs create tank/home
sun # zfs set compression=on tank/home
sun # zfs create tank/home/sisred
sun # zfs set mountpoint=/export/home/sisred tank/home/sisred

create a user

I gave up trying to persuade linux to do NFSv4, so I’m using v3.
That means the uids on each end need to match.

Since I’m moving an existing homedir, I create a Solaris user with matching UID/GID.
(run ‘id’ on the linux host – in my case, ‘1000’ for both).

sun # groupadd -g 1000 sisred
sun # useradd -c "Dick Davies" -g 1000 -u 1000 \
-d /export/home/sisred -m \
-s /usr/bin/bash sisred
sun # chown -R sisred:sisred ~sisred

No need for a password (I use via key-based SSH).

share it out

sun # zfs set sharenfs=on tank/home/sisred
sun # svcadm enable nfs/server
sun # svcs -xv
sun #

The second command will make sure NFS starts at reboot. ‘svcs -xv’ will give useful output if something didn’t work.

Shares are read/write by default. The ‘sharenfs’ property can also take a string of options to share(1M)
so zfs set sharenfs=ro tank/home/sisred creates a readonly share, for example.

move into your new $HOME

linux # sudo mkdir /nfshome
linux # sudo chown sisred:sisred /nfshome
linux # sudo apt-get install portmap # for NFS locking
linux # sudo mount redsun:/export/home/sisred /nfshome
linux # rsync -va /home/sisred/ /nfshome/

(You need to run the last command as the user due to NFS ‘root squash’).

If it looks OK, you can make it permanent by editing /etc/fstab and remounting:

redsun:/export/home/sisred /home/sisred  nfs proto=tcp        0       2

now you’re just showing off

If performance is a bit slow on small files, take a look here for tuning.

You need something to automate snapshots up at the Solaris end. Two options are:

  1. Tim Fosters SMF snapshots
    - some good features (automated ’ zfs -i send/recv’ of snapshots to a remote host, cleanup of old snapshots etc).
  2. roll your own

UPDATE: I’ve since switched to NFSv4, which was easy when you know how. See here

Stinkstation, more like

Posted by Dick on July 03, 2006

DISCLAIMER: As I said , I only run openlink so I can serve NFS (samba and netatalk are too slow for fullscreen video over 100Mbit). If I was running samba and/or appletalk I would probably not have had a problem.

That said: if you setup NFS on your linkstation, NEVER EVER EVER (ever) backup using the web frontend.

I’ve been backing up my other machines to the LS for a few months.
I got a fast/cheap/quiet/lovely Seagate 250Gb disk and thought I’d backup using the UI (openlink is a superset of the official firmware. I stupidly thought this would be ok.).

Plugged in the disk. It took the LS about an hour to build what looked like an ext2 filesystem on it.
I should have started running at that point.

The backup script on the LS is called do-backup.pl (I would upload a copy, but someone might stumble across it and I don’t want that on my conscience).

Whoever wrote it made the decision to allow clients read-only access to shares while they were being archived. Which would be cool, except the way they do that is essentially:

  1. chmod -R 555 $SHAREDIR
  2. cp -R $SHAREDIR /mnt/usbdisk/`date`
  3. chmod -R 777 $SHAREDIR

I’m paraphrasing. But only slightly. Key features are:

  • it makes no attempt to remember/restore the old perms. This does horrible things to an NFS share. I’m (charitably) assuming it doesn’t fuck up samba/appletalk too badly.
  • every file on the share is made executable before it even does anything (’chmod ugo-w -R …’ would have the same effect and be slightly less stupid)
  • every file in the share is world writable when it completes
  • cp??? (Google returns patches that at least use rsync)
  • this is a CGI. The only user feedback is a blinkenlight on the USB disk
    (I’m using 50Gb, it was 45 minutes in before I sshed to see what was going on)
  • Samba and Appletalk support readonly shares (NFS does too, but I forgive that as it’s not part of openlink)

This rant is mainly due to the death of the eMac the next morning1. I was left with a backup of the LS I didn’t trust and a ‘good copy’ of all our digital photos that had been tampered with. It took a lot of work I could really have done without to make sure that the permissions were sane.

What’s really to blame2 is shitty filesystems that force developers to hack around their lack of features (snapshots in this case). I’ll go into more detail when I’ve calmed down :)

The Linkstation is still a great piece of kit as far as it goes.

In my case, it’s gone on amazon marketplace.

1 yes, I’m aware of the repair program . No, my serial number isn’t in the list.

2 no. not the guy who puts important things on firmware written by people who run off with paypal donations . definitely the filesystem. definitely.

NAS flash

Posted by Dick on April 03, 2006

Flashing openlink went alright (takes ages though. It’s not kidding about doing it over a crossover cable and it’s terrifying doing it from XP).
Somehow /dev/console isn’t created, which has, er, interesting consequences but is easily fixed .

Adding NFS was a doddle (keeping uids in sync is another story but what’s new).
Multicast DNS works when it feels like it (mDNSresponder ships with a handy admin tool called ‘kill -9’); might play with the toolchain and have a go at rolling my own.

Disappointed that the eMac doesn’t do NFS service discovery – that looked like a really neat hack. Perversely, the Mac seems to find and mount Samba more easily than Appletalk so I’ve given in; once I shovel a few Gb of MP3s out of the way I’ll Tiger it up and see if that’s any smarter.

(UPDATE: it is. NFS is way faster than Appletalk or CIFS too.)

NASty as I wanna be

Posted by Dick on March 10, 2006

I finally took a proper look at my finances. Saved about 50 pounds a month
so far by practicing the ancient art of ‘retention team brinkmanship’
on NTL and several insurance companies (I heartily recommend this butt-ugly but invaluable site) .

Sort-of-related and as planned
, my last mini-itx board is on ebay this week.
Nano-itx boards are finally in the shops , and they’re a lot nicer than mini-itx in a lot of ways, but I can’t find it in me to get that excited. They’re very late ; EPIAs size “don’t impressa me much” any more (there’s more competition now), and even the fastest boards are dog slow compared to a 300 quid Dell (yes, they’re ugly. Hide them in the loft.). So they sort of fall between two chairs.

Anyway, by Thursday I’ll be server-free.
Instead of servers that sit around waiting for ribena to be poured into them, I’ll have (on toddler-inaccessbly high shelves):

The Linkstation was mainly because the eMac filled up
with bittorrented bbc telly and I needed a networked disk.
First thought was ‘use the slug’, but that’s for breaking playing with.

Besides, I want
lots of storage for not lots of money, so I need 3.5” disks.
That means an external caddy, which is ugly, noisy, unwieldy and needs a mains plug (it’s tricky to get mains power up by the wrt54g and I’m not up for dangling cat5 around the house like it’s LAN Christmas).

Creating users, shares etc is a doddle and it’s pretty fast.
It has a not-quite silent fan, but then it’s above the eMac (a.k.a. ‘the iHairdryer’)
so by comparison it’s a ninja on tiptoes.

Samba works fine, but I’d like a newer netatalk, multicast dns and NFS (then the slug and gumstix have access to comparitively unlimited storage) so I need custom firmware .
This is exactly the kind of thing I wanted to get away from as a recovering makeworld addict
but in this case I think it’s justified.

In other news : hairy lobsters , eh? wow.