The distributed metadata architecture of FraunhoferFS (FhGFS) has been designed to provide the scalability and flexibility that is required to run a range of demanding HPC applications. It is based on 3 key concepts :
- Distributed file contents & metadata
- Initially optimised specifically for HPC applications
- Native InfiniBand / RDMA(Remote Direct Memory Access)
- Ability to add clients and servers without system downtime
- Multiple servers on the same machine
- Client and servers can run on the same machine
- Servers run on top of local FS
- On-the-fly storage initialisation suitable for temporary “per-job” PFS
- Flexible striping (per-file/per-directory)
- Multiple networks with dynamic failover
Ease of Use
- Servers: userspace
- Client: Kernel module without kernel patches
- Graphical system administration & monitoring
- Simple setup/startup mechanism
- No specific Linux distribution
- No special hardware requirements
FhGFS combines multiple storage servers to provide a shared network storage resource with striped file contents. This way, the system overcomes the tight performance limitations of single servers, single network interconnects, a limited number of hard drives etc. In such a system, high throughput demands of large numbers of clients can easily be satisfied, but even a single client can profit from the aggregated performance of all the storage nodes in the system.
This is made possible by a separation of metadata and file contents. FhGFS clients have direct access to the storage servers and communicate with multiple servers simultaneously, giving applications truly parallel access to the file data. To keep the metadata access latency (e.g. directory lookups) at a minimum, FhGFS allows users to also distribute the metadata across multiple servers.
One of the other strengths of FhGFS is that it is implemented on top of an existing local filesystem (for example, XFS). This increases the reliability significantly over other parallel filesystems such as Lustre.
Distributed File Contents and Metadata
One of the most fundamental concepts of FhGFS is the strict avoidance of architectural bottle necks. Striping file contents across multiple storage servers is only one part of this concept. Another important aspect is the distribution of file system metadata (e.g. directory information) across multiple metadata servers. Large systems and metadata intensive applications in general can greatly profit from the latter feature.
Built on scalable multi-threaded core components with native InfiniBand support, file system nodes can serve InfiniBand and Ethernet (or any other TCP-enabled network) connections at the same time and automatically switch to a redundant connection path in case any of them fails.
Easy to Use
FhGFS requires no kernel patches (the client is a patchless kernel module, the server components are userspace daemons), comes with graphical cluster installation tools and allows users to add more clients and servers to the running system whenever needed.
Client and Servers on any Machine
No specific enterprise Linux distribution or other special environment is required to run FhGFS. FhGFS client and servers can even run on the same machine to enable performance increases for small clusters or networks. FhGFS requires no dedicated file system partition on the servers. It uses existing partitions, formatted with any of the standard Linux file systems, for example XFS or ext4. For larger networks, it is also possible to create several distinct FhGFS file system partitions with different configurations.
When compared to simple remote file systems such as NFS, FhGFS provides a coherent mode, in which it is guaranteed that changes to a file or directory by one client are always immediately visible to other clients.
FraunhoferFS (FhGFS) is under continual development to add new features and improved functionality. Some of the FhGFS features which are currently in development include :
- Non-strict and full Quota support
- FhGFS on-demand: allows configuration and startup of a completely new FhGFS instance (and cleanup afterwards) as simple as running an MPI job by just providing a hosts file and storage paths in a single simple command.
- New metadata format to enable scaling to greater than 500,000 file creates per second sustained and greater than 1,000,000 stats per second with 20 SSD-based metadata servers
- Support for data and metadata mirroring.
- Single-pass file system check that can analyse and repair while the file system is in use.
- Hierarchical Storage Management (HSM) Grau Data integration with single-server HSM solution, providing a parallel file system with HSM capabilities.
Current ClusterVision Installations
ClusterVision has a long-standing relationship with the Fraunhofer team and has successfully implemented FraunhoferFS (FhGFS) at a number of customers in Europe.
- FIAS, 12x OSS, 1PB, RDMA, 800 clients
- Universität Paderborn (currently in installation),
8x OSS, RDMA, 650TB, 20GB/s, 600 clients.
- Universitat Frankfurt, 18x OSS, 193TB, RDMA,
- Rijks Universiteit Groningen, 3xOSS, 144TB,
RDMA, 200 clients
- Italian Institute of Technology, 4x OSS, 24TB, RDMA, 100 clients
- Universitat Stuttgart, 5x OSS, 35TB, RDMA,
- TU Ilmenau, 5x OSS, 192TB, RDMA, 200 clients