Mass data retrieval

Data can be retrieved from MASS using our cdds_retrieve_data tool. This can be used locally or via SPICE.

It's benefits include:

Copying the directory structure associated with the files.
Customisable chunking to reduce load on infrastructure during retrieval.
A dry run option to print actions without retrieving the files.

Note

For JASMIN users, this tool is not currently functional, as it requires the use of recursive listings - which are not permitted on JASMIN.

Using it from the command line

The tool takes six arguments:

moose_base_location (optional): The base moose path for the data. It's default is set to moose:/adhoc/projects/cdds/production/
base_dataset_id: e.g. CMIP6.CMIP.MOHC.UKESM1-0-LL.piControl.r1i1p1f2
variables_file: File containing the list of variables you would like to retrieve.
destination_directory: Where you would like the data extracted to.
--chunk-size (optional): The chunk size (in GB) for extraction. Default set to 100
--dry-run (optional): To do a test run without extracting the data.

Example

cdds_retrieve_data CMIP6.CMIP.MOHC.UKESM1-0-LL.piControl.r1i1p1f2 variables_file desired/output/directory

Usage via SPICE

Below is a template example script that would be run via a sbatch command.

Example

#!/bin/bash -l

#SBATCH --mail-type=END
#SBATCH --mem=5G
#SBATCH --qos=normal
#SBATCH --time=30

cdds_retrieve_data CMIP6.CMIP.MOHC.UKESM1-0-LL.piControl.r1i1p1f2 variables_file desired/output/directory