ClusterVision currently offer clusters and servers with the following CPU platforms: the Intel Pentium III, the Intel Pentium 4, the Intel Xeon, the Intel Itanium 2, the AMD Athlon XP and the AMD Athlon MP. The table at the bottom of this section gives an overview of their features. Intel Pentium 4 and Xeon
The latest Pentium 4 is based on the Northwood core, which uses a 0.13 micron technology and currently runs at a maximum clock speed of 3.2 GHz. Apart from SSE and DPL, the P4 supports SSE2, which enables the two floating point units of the P4 to perform either four single-precision or four double-precision FLOPs per clock cycle. The multi-processor version of the P4 is called the Xeon and currently runs at a maximum clock speed of 3.2 GHz. Intel Itanium 2
The Intel Itanium 2 is the succesor of the Itanium processor, but provides a significant performance increase over its predecessor. It has a 64-bit architecture and makes use of EPIC (Explicitly Parallel Instruction Set Computing) which increases Instruction Level Parallelism (ILP) by maximising hardware-software synergy. The Itanium's EPIC, SIMD and other pipelining technologies enable its four floating point units to perform eight single-precision FLOPs or four double-precision FLOPs per clock cycle. All Itanium 2 processors can be used in mutli-processor configurations. It's maximum clock speed is currently 1.5 GHz. AMD Athlon XP and MP
The Athlon XP (XP stands for eXtreme Performance) uses a 0.13 micron technology and currently runs at a maximum clock speed of 2.2 GHz. AMD does no longer use clock speeds to categorise its processors but a "clock speed equivalent". The fastest Athlon which runs at 2.2 GHz, is called the XP3200+. Apart from SSE and DPL, the Athlon supports 3DNow!, which enables its three floating point units to perform four single-precision or two double-precision FLOPs per clock cycle. The multi-processor version of the Athlon is called the Athlon MP and is currently at version XP2800+. AMD Opteron
The AMD Opteron processor is AMD's latest processor. Based on the AMD64 architecture, it is designed to run both 32- and 64-bit applications simultaneously. The AMD Opteron processor has an integrated DDR memory controller which reduces memory latency and allows 256 Terabytes of memory to be addressed at once. The built-in HyperTransport technology allows up to 8 AMD Opteron processors to be connected at a bandwidth of 19.2 Gigabytes/s. The Opteron currently runs at a maximum clock speed of 1.8 GHz. The Opteron is available in 1-way, 2-way and 4-way models. Dual or Single Processor Nodes Most clusters are designed with dual processor slave nodes. The main reasons for this are space and cost-effectiveness, both in the boxes-on-shelves housing solution and in the rackmount housing solution. Dual processor slave nodes are especially more cost-effective when costly high-speed interconnects are being used, as the number of interconnects per processor is half that of the number required when using single processor slave nodes. Although the dual processor design may incur a small performance penalty since the bandwidths of both the interconnect and the memory bus are shared by the two processors, this is usually the best solution for price/performance. | | 
 | |  | Hypertheading is a new technology that is only available on Intel's P4 and Xeon processors. When a processor is dealing with a specific task, only part of the processor is used, wasting processor time. Hyperthreading overcomes this by simulating two processors, thus allowing two tasks to be executed simultaneously. This technique, called thread-level parallellism (TLP), allows two threads to share one set of instructions, drastically improving processor utilisation. To benefit from hyperthreading, applications and operating systems must be mulitithreading too, that is, software written for dual-processor support. In a dual Xeon system, enabling hyperthreading will transparently simulate a total of four processors. Unfortunately, hyperthreading is not so useful for the typical scientific and engineering applications run on compute clusters as they tend to perform many similar floating point or integer operations in succession on each processor, rather than a mixture of operations concurrently. Hyperthreading is more benefitial for servers which run several different applications simultaneously. 
|
 |
|
DDR SDRAM (Double Data Rate SDRAM) DDR doubles the data rate between the memory and the processor by doubling the bus speed, for example from 133 to 266 MHz. This results in a doubling of the memory bandwidths compared to SDRAM. DDR memory is usually labeled as, for example, "DDR266 PC2100", which means certified for bus speeds of 266 MHz to give a bandwidth of 2.1 GB/s. | | 
DDR SDRAM module |
Our clusters always come with a 100 Mbit/s Ethernet network as standard. The Ethernet network is used for administration and monitoring tasks and for the network file system (NFS). If no additional high-speed network is used, the Ethernet network is also used for message passing in parallel jobs. In addition to the Ethernet network, we offer three different optional high-speed networks: SCI from Dolphin/Scali, Myrinet from Myricom, and QsNET from Quadrics. All three interconnects have their own dedicated multi-threaded MPI libraries for intra-node and inter-node communication, which directly interface to the low level hardware and OS functions. All three also have a dedicated I/O processor and onboard memory, which offloads protocol handling from the main CPU and ensures that all available PCI bandwidth is dedicated to data communication. All three interconnects support Direct Memory Access (DMA) and use the 64 bit PCI or PCI-X bus. The three interconnects have different bandwidths, latencies, scalabilities and prices. The table at the bottom of this section summarises some properties of the above three interconnects as well as Megabit and Gigabit Ethernet. | | |
Quadrics QsNet The Quadrics QsNet network is the most costly high-speed interconnect. It has a maximum bi-directional bandwidth of 360 MB/s and a minimum latency of 5 usec. The interface cards connect through Quadrics multistage switches of up to 128 ports, which use a fat tree topology. The switches have a very high bisectional bandwidth, which scales linearly as the network grows in size. Networks of up to 1024 ports can be constructed using so-called federated switches. Quadrics uses its own MPI libraries as well as the Shmem communications library and the ElanFS remote file system protocal, optimised for QsNet. We offer a range of storage solutions with our clusters. A versatile RAID solution is the Infortrend IDE RAID, which consists of a 19" chassis, holding up to 12 large IDE disks of 240 GB each, totalling 2.88 TB of storage capacity. The RAID unit comes with either Utra Wide LVD SCSI connectors or Fibre channel connectors. When using a SCSI interface, two RAID units can be daisy-chained and connected to a single SCSI bus on the master node. We have extensive experience with these RAID units and know exactly how to fine tune the hardware and the Linux operating system to achieve maximum throughput from this system.
| |  Infortrend IDE RAID |