Storage

The following storage is available on the cluster:

Name Mount point Current capacity Mounts on Purpose
Home folders /users 39 TiB login, compute Software, code, configuration and other basic files, files not directly used in scheduled jobs; small amounts of data where I/O speed is not critical
Scratch /scratch 1.4 PiB login, compute Files used directly in or created by scheduled jobs, large amounts of data and/or where low latency and/or high bandwidth access is important

Attention

Please see the terms of use for guidance on the types of data it is or is not appropriate to store on the system.

Important

Please note, that although there is some degree of resilience provided at the hardware level, both /users and /scratch are not backed up. Make sure that you always have appropriate backups of the data stored within.

Home folders

Your home directory should be used to store data, such as software, code, configuration, etc., in situations where I/O speed is not critical. By default, this is only accessible by the owner, or the account holder. It is provisioned automatically when the account is created, and can be accessed via the path, /users/<user id>, e.g. /users/k1234567.

Scratch

The Lustre parallel file system provides fast, high performance storage under the /scratch path hierarchy. This should be used to store data that is actively produced, or consumed by, computations, especially where low latency and/or high bandwidth access is required. Different types of scratch are listed below.

Personal scratch

Only accessible by the owner, or the account holder and provisioned automatically when the account is created, user scratch space can be accessed via the path, /scratch/users/<user id>, e.g. /scratch/users/k1234567.

If the data contained within needs to be shared with other users, group scratch should be considered instead.

Group scratch

Accessible by the members of the group that owns the share. It will have its own quota allocation, that will not count towards the individual members’ own allocation. Accessed via the path, /scratch/groups/<group>, e.g. /scratch/groups/biocore.

Group scratch shares are not provisioned automatically and have to be explictly requested.

Info

When requesting group shares, please provide the following information:

  • Name of the share: This ideally should be the research group, or project name.
  • Data owner(s): Individuals that will be the primary point of contact for the data, and will be responsible for its management and access control.
  • List of members: List of individuals, or an existing group, that will have a read/write access to the share.
  • Lifetime: The amount of time after which the share can be expired, and the data stored expunged.

Datasets

A special type of a group share designated to host datasets. By default it will be read-only, with a dataset owner(s) having write access and publicly accessible within the cluster. Can be accessed via the /scratch/datasets/<dataset id> path, e.g. /scratch/datasets/ukbiobank.

Dataset shares are not provisioned automatically and have to be explictly requested.

Info

When requesting dataset shares, please provide the following information:

  • Name of the dataset.
  • Use case and the relevant infrormation about the dataset (estimated size, external source, etc).
  • Data owner(s): Individuals that will be the primary point of contact for the data, and will be responsible for its management and access control. Data owner will automatically have write access.
  • Lifetime: The amount of time after which the share can be expired.

Quotas

Storage space is a finite and shared resource on Rosalind; disk quotas are needed for various reasons:

  • There is a limited amount of disk space that must be shared between many people.
  • Some people tend to use much more disk space than they need, beyond what is reasonable and fair for a shared resource.
  • Sometimes processes can go out of control and produce huge amounts of data. If a disk fills up, no more data will be able to be saved, and people will lose work; for instance, someone who has been working with an editor may not be able to save their changes.

For these reasons and others, it is necessary to manage storage usage on the system. Disk quotas are an equitable way of doing this.

The following default quota allocations are currently in place on the cluster:

User class Home /users Personal scratch /scratch/users
Size Files1 Size Files1
Users from the GSTT2 and SLaM3 BRCs 40 GiB 400,000 500 GiB 2,000,000
NMS4 research staff and PhD students 40 GiB 400,000 100 GiB 1,000,000
All other King’s staff and PhD students 40 GiB 400,000 100 GiB 1,000,000
All taught students 40 GiB 400,000 20 GiB 500,000

Info

Group and dataset share quotas are set and adjusted on-demand. Individual quota allocations may deviate from the above defaults.

Understanding and monitoring your storage usage

Quota reporting tool

A custom tool, rosalind-fs-quota, is available that can report on your storage usage and all of the quotas applicable to your account on the HPC cluster.

To use the tool, first load the module (put this in a file such as ~/.modules or ~/.bashrc to do this automatically at login):

[k1234567@login3(rosalind) ~]$ module load utilities/rosalind-fs-quota

Then you can use the command, ros-fs-quota, to show your usage and quota limits:

[k1234567@login3(rosalind) ~]$ ros-fs-quota
Filesystem/group           Used      Quota
----------------           ----      -----
/users/k1234567            5.9GiB    40GiB
/scratch/users/k1234567    2.0GiB    100GiB
wg_sharedgroup             334GiB    1.0TiB

Use the -h/--help flag to show brief help info on using the tool.

Tip

When you go over the quota limit, it will be indicated by (!) next to the value in the used column, e.g. 1.4TiB(!).

Info

For shared groups on the scratch filesystem, the name of the group is shown in the first column; this is usually not exactly the same as the path to the directory on the filesystem.

Understanding detailed information about your storage quotas

The -v/--verbose flag can be supplied to ros-fs-quota to show more detailed information on your quota(s):

[k1234567@login3(rosalind) ~]$ ros-fs-quota --verbose
Home filesystem usage for k1234567 under /users ($HOME = /users/k1234567):
    Bytes                                           Files
    -----                                           -----
    Used        Quota    Limit    Grace             Used        Quota    Limit    Grace
    5.9GiB      40GiB    44GiB    -                 55K         1.0M     1.1M     -

Scratch filesystem usage under /scratch/users/k1234567:
    Bytes                                           Files
    -----                                           -----
    Used        Quota    Limit    Grace             Used        Quota    Limit    Grace
    2.0GiB      100GiB   110GiB   -                 7           2.0M     2.1M     -

Scratch filesystem usage for group wg_sharedgroup under /scratch/groups:
    Bytes                                           Files
    -----                                           -----
    Used        Quota    Limit    Grace             Used        Quota    Limit    Grace
    334GiB      1.0TiB   1.1TiB   -                 8.2K        2.0M     2.1M     -

Total filesystem usage attributed to user k1234567 on /scratch:
    Bytes                                           Files
    -----                                           -----
    Used        Quota    Limit    Grace             Used        Quota    Limit    Grace
    171GiB      -        -        -                 4.7K        -        -        -

There are two types of quota limits that are set on home directories and scratch space via the disk quota system: block quotas and file1 quotas.

Block quota : A block quota is a limit on the actual amount of disk space that can be used by an account. This space is measured in 1 KiB blocks (1 KiB = 1024 bytes or characters). All files, directories, etc., use up some number of blocks.

File quota : A file quota is a limit on the number of files, directories, etc., that can exist for the account. This is because each file system (eg, /users) has a finite-sized inode table, and each file system object (such as a file or directory) uses up one inode. When this table gets filled up, no more files can be created, even if there is sufficient block capacity. While it is not uncommon for someone to exhaust their block quota, hitting the file quota limit is more unusual. However, software programs or scripted simulations can in some cases produce large numbers of files. Very large numbers of files (especially in a single directory) can also cause performance issues, so file quota limits are one way to mitigate against this.

The detailed output from the ros-fs-quota tool (above) reports several values. The meanings of the numbers are as follows:

Used : The actual space (Bytes) or number of inodes (Files) used. Binary or SI prefixes are used to format the units.

Quota : The quota allocation, also called the soft limit; you will start getting warnings when you exceed this amount of bytes or number of files.

Limit : The hard quota limit, may be the same as the quota (soft) limit but is often slightly higher. You cannot exceed this value; new writes to the filesystem will fail.

Grace : If the (hard) limit is set higher than the (soft) quota, a grace period may also be defined. When the (soft) quota is exceeded, the grace period begins; this value shows the amount of time remaining. If you do not reduce your usage (blocks or files) below the relevant quota limit before the timer expires, further writes to the filesystem will be blocked (even if your usage is still below the hard limit).

Tip

Avoid going over your quota in your home directory (/users/<username>); it can cause problems logging in.


  1. Technically, the limit on number of files is actually for number of inodes — each directory consumes one inode, in addition to one inode per file within. 

  2. NIHR Biomedical Research Centre at Guy’s & St Thomas′ NHS Foundation Trust and King’s College London 

  3. NIHR Maudsley Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and the Institute of Psychiatry, Psychology & Neuroscience at King’s College London 

  4. Faculty of Natural & Mathematical Sciences, King’s College London