TSG
DCS
FEIT
CECS
ANU
News
- 10th April, 2000 - hardware (finally) completed
- 26th April, 2000 - achieved 150.2 GFlops with more tuning to do - worlds first sub US$1000 / Gflop performance (single precision)
- 5th May, 2000 - machine officially named "bunyip"
- 6th May, 2000 - this paper submitted to Supercomputing 2000, claims our result of US$0.98/MFLOP (single precision) for a real application (PDF version (175k)).
- 5th June, 2000 - bunyip officially launched
- 28th July, 2000 - final submission for Gordon Bell Prize reaches 163 GFlops sustained (US$0.92 per MFlops) and over 193 GFlops peak.
- 8th November, 2000 - Doug Aberdeen presents his Gordon Bell finalist paper at SC2000 Supercomputing conference (25k jpg image)
- 9th November, 2000 - Bunyip wins 2000 Gordon Bell prize for price/performance for a real supercomputing application against some worthy competition from the University of Kentucky teams KLAT2 system. Photo of Doug and Bob with certificates for us and Jon Baxter. The official News release from SC2000.
|
The Project
This project aims to build a 192 processor Beowulf cluster running the Linux Open Source Software (OSS) operating system. The name of the resulting machine is bunyip. The project is sponsored by:
Bunyip is based on commodity dual-CPU Intel-based PCs, multiple 100Mbps Ethernet links and a novel arrangement of Ethernet switches.
|
Applications
The main applications of bunyip are in support of a number of research projects:
- Vector Processing on Commodity Hardware
- Fast Networking on Commodity Hardware
- Job Scheduling Strategies for Cluster Computing Systems
- Scalability of High Performance Object Stores
- Dense Linear Algebra Computations on Clusters
- Scientific Visualization of High Performance Computer Simulations
- Parallel Data Mining Algorithms
- Stable Parallel Algorithms for Banded and Block Bidiagonal Linear Systems of Equations
- Parallel Wavelet Transforms on Networks of Workstations
- Parallel Algorithms for Machine Learning
- Parallel Modal Theorem Proving
- Parallel Chess on a Beowulf
- Feature Learning for Speaker Verification and Speech Recognition
- Parallel Implementation of Architectures for Reasoning and Learning
- Combinatorial Enumeration Problems in Mathematics and Physics
Overall Specification
Cluster
| CPUs | 192 x Intel Pentium III/550 |
| RAM | 36,864 MBytes, or 36 GBytes (9 x 2^32 bytes) |
| Disk | 1,313 x 10^9 bytes (1.3 TBytes) |
| Bi-sectional Bandwidth | 15.2 Gbps |
| Single Precision Flops | 158.4 GFlops |
Overview of Cluster Topology
The cluster of 96 dual-CPU machines is divided into four groups of 24 machines (nodes) each. Each node has three 100Mbps full-duplex ethernet cards. A 48-port switch connects all nodes in one group with all nodes in another (for a total of 6 switches). The topology is effectively a tetrahedron with a group of 24 nodes at each vertex and a switch on each side.
The theorem used to prove the number of switches required for the number of nodes in the network is found in The CRC Handbook of Combinatorial Designs, edited by Charles J Colbourn and Jeffrey H Dinitz, 1996 edition, as Theorem #8.21.
The bi-sectional bandwidth of this configuration is measured through the middle of four switches, each with a backplane capacity of 3.8Gbps for a total of 15.2Gbps. The worst case bandwidth between any two nodes is 3.8/48 Gbps or 79Mbps.The best case bandwidth is between nodes within the same group and is 300Mbps each way or 600Mbps total.
A detailed description of the hardware configuration of bunyip.
Management Group
Some useful Links.