Setting up Netboot
STILL WORKING OUT LOL. Process taken so far:
Okay, so I’ve been trying to get bloody netbook/diskless operation working for awhile. There’s always little pesky bugs which crop up; I almost got it running about a month ago, but I couldn’t manage to get the NFS server running correctly, and though the kernel was loaded properly on the target machine, it was unable to mount a root filesystem because NFS was broke.
So I’m starting with a clean slate, trying out FreeBSD 7.0 for the first time and documenting the steps I take so I can try to better re-create any problems I cause myself. And, lol, if I can get it working this time then having this will be handy when I need to it up again from scratch.
For the most part, the handbook entry on diskless operation is one of the best sources for the entire process; it only lacks in that it doesn’t go into much detail about each of the subsystems. It also doesn’t give any hints regarding what to do when shit goes wrong, but blah. There’s a couple of other (mostly dated) guides on the net, but there’s so many ways to do this it’s easy to get confused.
The setup I’m building is basically a Beowulf cluster – it consists of a set of machines connected on a private, internal network. Only one machine (the head or master node) is actually connected to an external network, thus the internal ones are fairly well-shielded from anything malicious from the outside.
The master node is the only machine with a drive, the rest of the nodes will boot diskless from it. The master node has two NICs – one for the external network which is DHCP-configured (or however your network works), and one for the internal network on which we’ll run a DHCP server.
Each of the nodes which will be booting diskless need special hardware to actually do it, essentially you need a NIC with a PXE-loaded bootrom (and a BIOs which will let you boot from it). A quick way to check is to see if you can boot from your network card from the BIOS – if you can, then you probably have a bootrom. I’m not going to really go into the complicated steps involved in flashing the damned things because that shit it messy. Just hope you have one already loaded with a build which works >_>
I’m using the subnet 192.168.100.0/24 for that internal subnet; the master node will be located at 192.168.100.3, and the test diskless client I’m configuring will be at 192.168.100.11 with hostname suigintou.
Configuring the Master
- Installed FreeBSD 7.0-BETA3, minimal distribution. I chose 7.0 because I’ve been wanting to try it out for some time, and since I was starting from scratch I didn’t have a reason not to. Most of the documentation I’m using is for 6.2, but meh it shouldn’t make a difference. It’s not like there’s that much of a change in the components this uses.
- Post-install configuration: get distributions: ports, man, info, doc, src, games (for fortune). Of utmost importance is that you get the entire source tree, because you’ll need it when it comes time to make distribution. And you need ports to get the dhcp server. Technically, man, info, doc and games can be omitted, but I chose not to in this run.
- Configure sshd’s settings in /etc/ssh/sshd and enable it in rc.conf:
sshd_enable="YES"
then start it
/etc/rc.d/sshd start
Up to this point, I was working off the actual machine. As soon as sshd was started I beheaded the machine and did the rest of the stuff remotely. Because working through putty while browsing the internet and doing other things is much more fun.
- Configure secondary network interface for internal LAN. You’ll need to figure out what your secondary network interface is (with ifconfig) and replace fxp0 with it, durr.
ifconfig fxp0 inet 192.168.100.3 netmask 255.255.255.0
And add a line in rc.conf so this gets done at boot-time from now on -
ifconfig_fxp0="inet 192.168.100.3 netmask 255.255.255.0"
- Lay out directories for everything:
mkdir /diskless mkdir /diskless/tftp mkdir /diskless/suigintou
- Re-build pxeboot from source (because I’ve had… problems with the binary that comes with the distribution for some reason) -
cd /sys/boot make
And copy it over into the tftp folder to be served up
cp /boot/pxeboot /diskless/tftp
- Install net/isc-dhcp3-server, configure rc.conf to boot it at startup only on internal interface:
dhcpd_enable="YES" dhcpd_flags="-q" dhcpd_ifaces="fxp0"
- Configure /usr/local/etc/dhcpd.conf for each diskless host:
default-lease-time 0; max-lease-time 7200; authoritative; ddns-update-style: none; option domain-name-servers 192.168.100.3; subnet 192.168.100.0 netmask 255.255.255.0 { option subnet-mask 255.255.255.0; option broadcast-address 192.168.100.255; host suigintou { hardware ethernet 00:E0:81:02:B9:92; fixed-address 192.168.100.11; next-server 192.168.100.3; filename "/diskless/tftp/pxeboot"; # On another machine, this didn't work (TFTP Error: file not found) # The easiest way to fix this is to tftp into the localhost and try to # fetch the file by hand, then put what works into the filename. option root-path "192.168.100.3:/diskless/suigintou"; } # ... etc }
For each host you’re booting diskless, you’ll want to add another host{ } block. The MAC address of each block is used to associate the diskless client with a hostname. I’ll probably end up tinkering with the root-path option to specify different configurations for each diskless machine, and possibly provide swap space for them (though NFS swap is ick).
- Enable inetd in rc.conf:
inetd_enable="YES"
and have inetd start tftp when needed, for both udp (standard) and tcp (for weird PXE hardware?) connections in inetd.conf -
tftp dgram udp wait root /usr/libexec/tftpd tftpd -l -s /diskless/tftp tftp stream tcp wait root /usr/libexec/tftpd tftpd -l -s /diskless/tftp
and restart inetd -
/etc/rc.d/inetd restart
- Enable the NFS server in rc.conf:
rpcbind_enable="YES" mountd_enable="YES" nfs_server_enable="YES"
and export the proper directories for each host (only 1 here) in /etc/exports:
/diskless/suigintou -alldirs -ro 192.168.100.11
Start up NFS with
/etc/rc.d/rpcbind start /etc/rc.d/nfsd start /etc/rc.d/mountd start
And verify that everything is properly mounted with showmount -e. The output should look something like this -
# showmount -e Exports list on localhost: /diskless/suigintou 192.168.100.0
If there’s nothing listed there, then something isn’t set up properly and you’ll get NFS mount errors when you boot the diskless node.
- Prepare a DISKLESS kernel configuration, based on the GENERIC configuration. If you haven’t compiled a custom kernel before, you’ll benefit from reading the handbook article on building and installing custom kernels.
cp /sys/i386/conf/GENERIC /sys/i386/conf/DISKLESS
and add the following options into the DISKLESS kernel configuration:
options BOOTP options BOOTP_NFSROOT
The handbook article on diskless doesn’t bother to tell you that you shouldn’t modify the GENERIC configuration directly, but you shouldn’t. Always make a copy of GENERIC and work from that copy, otherwise when you break something you can always easily revert.
-
Next, write a script to build the distribution from source -#!/bin/sh export DESTDIR=/diskless/suigintou/ mkdir -p ${DESTDIR} cd /usr/src; make buildworld && make buildkernel KERNCONF=DISKLESS cd /usr/src/etc; make distributionI took this script straight from Diskless Operation in the handbook, but added the KERNCONF=DISKLESS to indicate that we want to use the DISKLESS kernel instead of the GENERIC kernel. And execute that script to build the distribution. This is taking forever to finish blah blah.
FUCK THAT DIDN’T WORK. SOMETHING IS WRONG WITH THE DESTDIR BULLSHIT >:(
Okay, I think I found a fix -
- Build the world and the kernel. Building the world takes fucking ages to do; if you’ve done it before you shouldn’t need to do it again. Ever. You’ll need to compile the kernel in any case.
cd /usr/src make buildworld make buildkernel KERNCONF=DISKLESS
- Once that’s done and over with, you need to slam that stuff into the prepared place for it -
cd /usr/src make installworld DESTDIR=/diskless/suigintou make installkernel DESTDIR=/diskless/suigintou KERNCONF=DISKLESS make distribution DESTDIR=/diskless/suigintou
As a random note, if you fuck something up and aren’t able to delete certain files anymore, it’s because the installkernel make script sets a “no change” flag on a bunch of files so you can’t accidentially fuck your system with rm -rf /*. Anyway, to kill the flag, use chflags [-R] noschg.
- So now we’ve got our root filesystem ready to export. Now just gotta make sure all the processes we need are running (dhcpd, inetd and nfsd), then try booting the remote system… BUM BUM BUMMMMMM
If all goes well, you should be able to boot your remote machine.
Reasons this is Fucked.
The problem is that the entire NFS filesystem will be read-only, which breaks all kinds of shit. One solution I’ve found so far is to slap a union’ed memory-based filesystem over parts of it, like
mdmfs -M -s16m -o union md1 /etc
I had to boot the machine in single-user mode to even do this, because the master.passwd requires a lock to open. Thus, we need to put a memory-backed filesystem over /etc, then touch master.passwd to copy it into the memory-backed part. unionfs is really cool…
Ideally, what I want is to be able to NFS-mount a read-only root directory, then NFS-mount with unionfs a whole filesystem over that, such that we can both modify files AND have those changes be persistant. Memory-backed file systems are great, except that they’re completely lost when you reboot…
Now to figure out how to do that…
Okay, woot figured it out. Basically, you’ll want to lay out the fstab on the client machine something like this:
# Mount the memory-backed filesystems /dev/md0 /var mfs rw,-M,union,-s4m 2 0 /dev/md1 /tmp mfs rw,-M,union,-s8m 2 0 # Mount the NFS-backed filesystems 192.168.100.3:/usr/diskless/suigintou/etc /etc nfs rw 192.168.100.3:/usr/diskless/suigintou/usr /usr nfs rw
The fstab file format is really archaic: it uses a space-delimited list of things. This implies that the list of options must be comma-delimited and can CONTAIN NO SPACES. Took me a fucking half-hour to work out why mount_md was breaking shit. Anyway.
That should just about do it. I’m tired of editting this post, lawds. Now I need to find me a new CMOS battery so I can actually reboot this machine and have it load everything without me going through the BIOS menus to acknowledge that yes, I know, the battery is dead. Fucking fuckity fuck.
Comments are off for this post