ClusterVision’s cluster design philosophy is to aim for maximum performance, stability and quality while ensuring that the resulting system is within budget constraints for initial purchase and ongoing operation. This goal is achieved by carefully selecting hardware components of the best quality and combining this with the best cluster management software on the market, TrinityX.
The performance of a cluster is affected by almost every individual hardware component, but most notably by the CPU, motherboard chipset, memory and network interconnect. selecting the right combination of components is crucial to achieve an optimally performing cluster. Choosing wrong or ineffective combinations of components can seriously limit the performance of your cluster. This happens, for example, when a motherboard chipset is chosen that limits the PCI bandwidth available to a high-speed network interconnect.
The performance of a cluster is of course also affected by the operating system. Our cluster eco-system TrinityX is optimised for maximum performance running typical scientific and engineering applications. Optimisations may be applied to the Linux kernel, motherboard BIOS, hardware firmware, drivers, libraries and compilation options.
Designing for Redundancy & High Availability
ClusterVision offers fully redundant cluster designs for 99.99% guaranteed 24/7 uptime. In our redundant cluster designs, every potential point of failure is replaced by some form of hardware redundancy. Due to the way TrinityX is designed, cluster operation is not seriously affected by one ore more compute nodes going off-line. If a compute node goes off-line, TrinityX will mark the node as off-line and exclude it from the cluster.
Some clusters are delivered with spare nodes which are kept with the cluster to be swapped in case of a node failure. Both master and slave spare nodes can be provided. We design most of our master nodes with redundant power supplies and double hot or cold swap hard disks configured as a RAID level 1 mirror. In this RAID configuration, all data is simultaneously stored on both disks and the master node continues operating if a disk fails.
We offer two types of failover systems for master nodes :
- Active failover
- Passive failover
The active/passive failover system uses a second master node which constantly monitors the activity of the primary master node. In the active failover system, if the primary master node fails, the secondary master node takes over immediately and without intervention. In the passive failover system, if the primary master node fails, the secondary master node only takes over after intervention by the system administrator. Depending on the usage of the cluster, the required level of control and preferences of the system administrator, the passive or active system may be most appropriate.