Have you ever seen the Batman movie “The Dark Knight Rises”? The Wayne Manor was actually a castle in Nottingham in the U.K. Apart from movies there is a lot of history attached to this respectable old town. Some of the newest scientific discoveries come out of Nottingham. The city houses the University of Nottingham. There is, for instance, the Sir Peter Mansfield Magnetic Resonance Centre, named after the Nobel Prize winner who pioneered MRI research. Modern research cannot do without a lot of computing. That is why the university researchers are supported by a High Performance Computing (HPC) centre. The centre operates a large ClusterVision cluster that is used by 400 of the university’s researchers. Manager of HPC services in Nottingham is Colin Bannister. We asked him about the services his centre delivers.
Q: Can you give a short summary of the HPC services?
Colin Bannister: We have a 4.000 cores HPC cluster system which has recently been expanded. The system is used by a wide range of researchers. The expansion we just had is funded through two new research centres. One is in engineering and is particularly focused on simulations of air engines and gas turbines. The other research centre is called synthetic biology, and is modelling biological systems. It is a joint initiative between computer science in the medical faculties and the school of pharmacy . There is a wide range of other uses on the general system. We see a lot of use from chemistry, and researchers doing simulation. Geneticists typically perform genetic analysis and physicists perform all sorts of different simulations. We really need to cope with a wide range of different customers.
Q: Thank you for this introduction. What kind of university is the University of Nottingham?
Colin Bannister: It is a big research led university, one of the leading research universities in the UK. It is one of what is called the Russell Group Universities, these are the universities in the UK that do most of the research.
The users of our HPC centre are all from the university at this point in time. We have overseas campuses in both China and Malaysia. Users from those campuses also access the facility.
Q.: Can you tell a little bit more about the hardware architecture of the machine? What type of cores, interconnect, and so on?
Colin Bannister: The system consists purely of Intel processors. It has a high performance InfiniBand 40Gbit/s QDR interconnect , provided by Intel. Since Intel bought Qlogic, they own this Interconnect. We have a high performance parallel storage environment provided by Panasas. In addition to the hardware, we use tools including the PBS Pro scheduler which controls job submissions. We have installed various compilers and optimised numerical libraries, to make sure the researches get the most out of the hardware. ClusterVision manages most of the system’s administration from remote.
Q: Do you also have other systems that you are operating?
Colin Bannister: No, we do not. We concentrate on this one system. We ran previous HPC systems for the university, but this is the first one from ClusterVision. This system has now taken over from the older ones. In fact, we did this quite soon after we started the service on the ClusterVision machine.
Q.: The system itself has been procured a few years ago but has recently been upgraded. Why did you decide to upgrade it?
Colin Bannister: Some research centres did not have sufficient capacity on their existing systems. These centres decided to fund an expansion with us rather then purchasing an additional cluster for themselves. With this approach, they could take advantage of the managed facility. They did not want to deal with the management of their own system and recognized that our facility was well managed and offered a high level of service for their users. Hence, they decided to help expand the current system.
Q.: Are there users who are using the whole system? Or is everyone only using part of it?
Colin Bannister: No one really gets to use the whole system. The largest jobs we are running up there are between 500 and 600 cores. Most jobs are smaller than that. It is a multi-user cluster, so we have two or three dozen people running jobs at any one time. We have a user base of 400 users. A research centre’s user base typically uses the capacity that their centre has funded.
Q.: Can you tell a little bit about the system management of the system? What does it entail and how do you organise it?
Colin Bannister: We use Bright Cluster Manager. We found this beneficial in terms of making it easy to see what is going on in the cluster, provision nodes and so on. ClusterVision uses the Bright Cluster Manager as a tool for managing our cluster from remote. ClusterVision completes a great deal of routine administration on the system, as well as supporting it. This was important for us, because we wanted to concentrate on actually helping the users getting the most out of the system and empowering them to focus on the science. The scheduler – we use the PBS Pro scheduler – has been quite extensively customized for our requirements. ClusterVision and Altair helped us tune the system in order to fulfill our usage requirements.
Q.: It is quite a large system. Is it running smoothly?
Colin Bannister: There have been no major problems with the hardware at all. We have seen some minor issues with the scheduler occasionally, but in general, ClusterVision was able to solve issues smoothly and quickly. Major interruptions in the service have been mainly due to changes in our data centre, or for example tests on our electrical system. The system has generally had planned downtime but not any serious amount of unplanned downtime.
Q.: What are the future plans for the system?
Colin Bannister: We have just signed an agreement to run it for at least another two years. In the meantime we are going to be busy reviewing what the computation requirements of the University are. We just launched a survey to that effect and we should be developing our approach in the next few months.
We are involved in a project to support, through an expansion of the system, the new Sir Peter Mansfield Magnetic Resonance Centre named after Peter Mansfield, the person who invented magnetic resonance imaging and won a Nobel Prize due to this.
Q.: Any summary conclusions?
Colin Bannister: I just think ClusterVision has been a really good partner for us, they’ve helped us to develop our service in order to bring it to a much higher level than it previously was. The tools that we chose when we did the procurement of the machine proved to be the right ones.
For further information please visit: