|
![]() |
The nodes have no removable media. Instead, they boot up from the network using WOL, bootp and TFTP.
All console messages during this Boot ROM process are sent to serial port 1 (/dev/ttyS0) for diagnostic purposes.
The Wake-On-LAN (WOL) feature is supported by a synergy of the TRENDnet NIC, the EPoX motherboard and the ATX power supply. Whilst the power supply is connected to the mains, it supplies a small amount of current to the TRENDnet NIC through the WOL connector on the motherboard and the WOL cable to the NIC. The NIC continuously monitors the network traffic, looking for a "Magic Packet" (see Magic Packet Technology). When the appropriate Magic Packet is received, the NIC sends a signal back to the ATX power supply via the motherboard to switch on full power and the machine powers up normally.
The Magic Packet is generated by a small program on the server called "wol". This code was written by Bob Edwards and is included in the Etherboot package from version 4.4.0 on. This program can be included in a script which reads the Ethernet Medium Access Control (MAC) address for each node machine's TRENDnet NIC from a file and then sends the Magic Packet to each node. An appropriate delay between each Magic Packet sent prevents a power surge from all machines starting up together and prevents loading of the servers caused by a bootp request storm.
Due to the way UDP broadcast packets are handled by Linux, the WOL packets are only sent out on the interface with the "default" route. We want the packets sent out on the gigabit ethernet interface onto the private network, so we arrange for the default route to go onto this network. This means that the server machines that generate the WOL packets do not see much of the rest of the internet (and most of the rest of the internet don't see the Beowulf servers).
Because the node machines do not have video cards or keyboards in their normal operating configuration, the CMOS RAM of the motherboard BIOS needs to be pre-configured to allow the machine to boot up sensibly from the Etherboot code on the boot ROM on the TRENDnet NIC.
In particular, we disable all IDE interfaces, the floppy interface and ensure that the system will halt on "No Errors". We also ensure that the Wake-On-LAN feature is enabled.
Etherboot is an open-source software project for allowing Intel PCs to boot up from a network server. The compiled Etherboot binary can be loaded from a boot floppy (for testing, etc.) or programmed (burned) into a Read-Only Memory (ROM) device (UV EPROM, EEPROM, Flash ROM etc.).
In our case, the TRENDnet NIC provides for a 28-pin ROM device, which is "found" by the BIOS during system startup and the code in it is executed. The Etherboot code fits into a 16k x 8 device, and so we are using 27C128 UV EPROMs, mainly because they are relatively cheap.
The Etherboot ntulip.c driver, written by Marty Connor and Ken Yap, did not provide support for the Media-Independent Interface (MII) used on the TRENDnet NIC. Instead, it assumed that only the built-in 10BaseT port would be used.
By using Donald Becker's tulip-diag utility (at http://cesdis.gsfc.nasa.gov/linux/diag/tulip-diag.c) we were able to determine that the TRENDnet NIC did, in fact, advertise that it was using an MII in it's configuration ROM (SROM). By decoding the contents of the SROM, as with some assistance from Paul Mackerras of Linuxcare in determining appropriate values to put into the Intel/DEC 21143 device to configure it for MII operation, we were able to get the TRENDnet NIC to work with Etherboot. The patch is now included in Etherboot version 4.4.3 and should work with any NIC based on the 21142/21143 and which supports 100MBps.
A second problem became evident during testing: the 5.4MByte boot image was taking over 42 seconds to load (about 128kbytes/sec) which is not good over a 100Mbps link. Again, with help from Paul Mackerras, a number of optimisations were made to the ntulip.c code to remove the delays and speed up the loading of the boot image. The result of this work is that the boot image is now taking 10.5 seconds to load - over 30 seconds less, but still only about 514kBytes/sec. We should be able to get it a lot faster than this yet (my aim is for less than 1 second). This also required the erasure and re-programming of all the boot EPROMs.
Because our node machines do not have video adaptors, we configured Etherboot to report all startup messages to serial port 1 (/dev/ttyS0) by modifying src-32/Config by adding the lines:
CFLAGS+= -DSERIAL_CONSOLE -DCOMCONSOLE=0x378 CONSPEED= 9600
(actually, uncommenting and editing the existing lines). We also add the flag to use BOOTP in place of DHCP by adding the flag "-DNO_DHCP_SUPPORT" onto the end of one of the CFLAGS+= lines.
The ntulip Etherboot driver was built for testing by inserting a formatted floppy disk into the development machine drive and using:
make ntulip21142.fd0
(the 21142 version of ntulip places the correct vendor and device IDs for the 21142/21143 into the PCI structure part of the boot ROM)
The floppy disk generated is a bootable floppy disk which only runs the Etherboot code. The system can be tested at this stage by booting from the network.
The ROM image was built as a part of building the floppy image and is called ntulip21142.rom. It is a 16,384 (16k) byte binary image to be "burned" (programmed) into a 16k x 8 ROM device. We are using ST 27C256 32k x 8 UV EPROMs as they are cheaper. To make sure that the image is in the correct "half" of the ROM, we replicate it using cat:
cat ntulip21142.rom ntulip21142.rom > ../ntulip.bin
The resulting 32,768 (32k) byte file is then ftp'd to the DOS machine with the EPROM programmer on it's parallel port, from where the 96 Boot ROMs are programmed. We are using a JED Microelectronics EPROM burner.
The EPROMs were then sent to the node hardware supplier for inserting onto the TRENDnet NICs and inclusion into the node machines.
The two server machines each have bootp-2.4.3 installed. Only one of them normally runs the daemon at any time, started from the inetd super-daemon.
Each server contains a complete /etc/bootptab file with the Ethernet MAC address <-> IP name mapping for each machine, as well as the subnet mask, IP number for the server and name of the tftp boot file to load. This information is sent to each node in response to the BOOTP request from the Etherboot boot ROM.
Once the node knows its own IP number and other network parameters (from BOOTP), Etherboot then sends out the TFTP request for the boot file. In actual fact, the node sends a request for each approx. 516 byte piece of the boot file, which, in our case, is typically over 5.4MB, and so over 10,000 such TFTP requests are sent from the node to load in the whole boot image.
Etherboot stores the boot image in RAM, and once loaded, starts to execute it.
The boot image file loaded by TFTP is made by the mknbi-linux (make network boot image for linux) utility that is a part of netboot, included in Etherboot. The command line used to make the boot image is:
/usr/local/bin/mknbi-linux -d ram -i rom -a "ramdisk_size=16384" -k ${KERNEL} -r rootfs.gz -o /tftpboot/beoboot
mknbi-linux joins the compressed kernel image, a compressed RAM disk image and a small loader together into the boot image. The small loader is the first code executed in the TFTP loaded boot image. It is responsible for starting the kernel and passing it information about the RAM disk and the network boot parameters.
The kernel then decompresses itself, starts running, decompresses the RAM disk image, makes it into a RAM disk and makes it the root filesystem. More information on how the kernel is configured to do this is in Node Configuration.