Quickstart
Running the Climate Data Dissemination System
This tutorial serves as a brief introduction to running CDDS by CMORising a small amount of climate data. Its intention is to provide a minimal example of running CDDS, whilst also sign posting some of the other features and capabilities. The entire process can be roughly split into three parts.
-
Define the
request.cfg
file and the variables to process. -
Setup the directory structure.
-
Run the Cylc conversion workflow.
Prerequisites
This guide does assume that you can run cylc
workflows and that you have access to mass
.
Configuring the Request
Running CDDS requires the user to appropriately configure a request.cfg
file (from here on simply referred to as the "request").
This request is provided to many cdds
commands as a positional argument and contains information ranging from experiment metadata, workflow configuration, ancillary paths, MASS locations, and more.
Each option belongs to a particular section and is documented in Request Configuration.
An Explicit Request Example Including Unused Fields
[metadata]
branch_date_in_child =
branch_date_in_parent =
branch_method = no parent
base_date = 1850-01-01T00:00:00Z
calendar = 360_day
experiment_id = my-experiment-id
institution_id = MOHC
license = GCModelDev model data is licensed under the Open Government License v3 (https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/)
mip = MOHCCP
mip_era = GCModelDev
parent_base_date = 1850-01-01T00:00:00Z
parent_experiment_id =
parent_mip =
parent_mip_era =
parent_model_id = HadGEM3-GC31-LL
parent_time_units = days since 1850-01-01
parent_variant_label =
sub_experiment_id = none
variant_label = r1i1p1f3
model_id = HadGEM3-GC31-LL
model_type = AOGCM AER
[netcdf_global_attributes]
[common]
external_plugin =
external_plugin_location =
mip_table_dir = $CDDS_ETC/mip_tables/GCModelDev/0.0.23
mode = relaxed
package = round-1
workflow_basename = request_id
root_proc_dir = $SCRATCH/cdds_proc # A reasonable default for Met Office, change for JASMIN
root_data_dir = $SCRATCH/cdds_data # A reasonable default for Met Office, change for JASMIN
root_ancil_dir = $CDDS_ETC/ancil/
root_hybrid_heights_dir = $CDDS_ETC/vertical_coordinates/
root_replacement_coordinates_dir = $CDDS_ETC/horizontal_coordinates/
sites_file = $CDDS_ETC/cfmip2/cfmip2-sites-orog.txt
standard_names_version = latest
standard_names_dir = $CDDS_ETC/standard_names/
simulation = False
log_level = INFO
[data]
data_version =
end_date = 2015-01-01T00:00:00Z
mass_data_class = crum
mass_ensemble_member =
start_date = 1950-01-01T00:00:00Z
model_workflow_id = u-bg466
model_workflow_branch = trunk
model_workflow_revision = not used except with Data Request
streams = ap5 onm
variable_list_file = <must be included>
output_mass_root = moose:/adhoc/users/<moose user id> # Update with user id
output_mass_suffix = testing # This is appended to output_mass_root
[misc]
atmos_timestep = 1200 # This is model dependent
use_proc_dir = True
no_overwrite = False
[inventory]
inventory_check = False
inventory_database_location =
[conversion]
skip_extract = False
skip_extract_validation = False
skip_configure = False
skip_qc = False
skip_archive = False
cylc_args = -v
no_email_notifications = True
scale_memory_limits =
override_cycling_frequency =
model_params_dir =
continue_if_mip_convert_failed = False
For simplicity this tutorial relies on many of the default values, and eliminates unused options to reduce the request to its most minimal form. Below two example requests are provided, one for use at the Met Office and other on JASMIN. The highlighted lines in each request indicate fields that the user may need to modify to run this example successfully.
[metadata]
branch_method = no parent
base_date = 1850-01-01T00:00:00Z
calendar = 360_day
experiment_id = my-experiment-id
institution_id = MOHC
license = GCModelDev model data is licensed under the Open Government License v3 (https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/)
mip = MOHCCP
mip_era = GCModelDev
sub_experiment_id = none
variant_label = r1i1p1f3
model_id = HadGEM3-GC31-LL
model_type = AOGCM AER
[common]
mip_table_dir = $CDDS_ETC/mip_tables/GCModelDev/0.0.23
mode = relaxed
package = round-1
root_proc_dir = $DATADIR/cdds_quickstart_tutorial/proc
root_data_dir = $DATADIR/cdds_quickstart_tutorial/data
[data]
end_date = 1955-01-01T00:00:00Z
mass_data_class = crum
start_date = 1950-01-01T00:00:00Z
model_workflow_id = u-bg466
streams = ap5
variable_list_file = $DATADIR/cdds_quickstart_tutorial/variables.txt
output_mass_root = moose:/adhoc/users/<moose user id>
output_mass_suffix = quickstart
[conversion]
skip_extract = False
skip_extract_validation = False
skip_configure = False
skip_qc = False
skip_archive = False
mip_convert_plugin = HadGEM3
[metadata]
branch_method = no parent
base_date = 1850-01-01T00:00:00Z
calendar = 360_day
experiment_id = my-experiment-id
institution_id = MOHC
license = GCModelDev model data is licensed under the Open Government License v3 (https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/)
mip = MOHCCP
mip_era = GCModelDev
sub_experiment_id = none
variant_label = r1i1p1f3
model_id = HadGEM3-GC31-LL
model_type = AOGCM AER
[common]
mip_table_dir = $CDDS_ETC/mip_tables/GCModelDev/0.0.23
mode = relaxed
package = round-1
root_proc_dir = $HOME/cdds_quickstart_tutorial/proc
root_data_dir = $HOME/cdds_quickstart_tutorial/data
[data]
end_date = 1955-01-01T00:00:00Z
mass_data_class = crum
start_date = 1950-01-01T00:00:00Z
model_workflow_id = u-bg466
streams = ap5
variable_list_file = $HOME/cdds_quickstart_tutorial/variables.txt
[conversion]
skip_extract = False
skip_extract_validation = False
skip_configure = False
skip_qc = False
skip_archive = True
jasmin_account =
mip_convert_plugin = HadGEM3
What is GCModelDev?
You may notice in the above request examples that the metadata
section references GCModelDev
(and other non-CMIP6 terms), rather than CMIP6.
The GCModelDev
project is an informal duplication of the CMIP6 Controlled Vocabulary and MIP tables that helps facilitate ad-hoc CMORisation for activities such as model development (you can find the soon to be made public Github repository here).
It allows for extra variables to be added in addition to the standard MIP tables, as well as new models and experiments.
The basic principles of running CDDS remain the same whether you are using GCModelDev or CMIP6.
- Create a working directory for the
request.cfg
, and theproc
anddata
directories. It doesn't have to be the same as those used in the following example, just be sure to update therequest.cfg
appropriately.Note that it isn't a requirement to have thecd $DATADIR mkdir cdds_quickstart_tutorial cd cdds_quickstart_tutorial mkdir proc data touch request.cfg
request.cfg
in the same directory as theproc
anddata
directories. - Copy the above example
request.cfg
into your emptyrequest.cfg
file. - Update fields in the
request.cfg
.- If you are using a different path to
$DATADIR/cdds_quickstart_tutorial
update theroot_proc_dir
,root_data_dir
, andvariable_list_file
fields. - If you are on Azure, update the
output_mass_root
with your MASS username. - If you are on JASMIN populate the
jasmin_account
field with an appropriate account name (used for job submission, see JASMIN docs here).
- If you are using a different path to
- Create an empty
variables.txt
file and add the following line.Amon/tas:ap5
- Activate the CDDS environment
source ~cdds/bin/setup_env_for_cdds 3.1.2
- Validate the
request.cfg
.If any errors are reported and you are not able to fix please contact the CDDS team for help.validate_request request.cfg
Preparing for Processing
The following commands assume you are in the working directory created previously containing the request.cfg
, and that the CDDS environment is activated.
-
Create the directory structure for storing the log and data files created during processing.
create_cdds_directory_structure request.cfg
Info
This will use the
root_proc_dir
androot_data_dir
paths within the request to create the following directory structures. Theproc
directory is used for log files created during processing by variouscdds
commands. It also serves as the location for the configured cylc processing workflow when runnningcdds_convert
, and the internally used variable list created byprepare_generate_variable_list
.Theproc └── GCModelDev └── MOHCCP └── request_id └── round-1 ├── archive ├── configure ├── convert ├── extract ├── prepare └── qualitycheck
data
directory will home the input model data e.g..pp
files as well as the output CMORised.nc
output. A similar hierarchical structure is created for alldata └── GCModelDev └── MOHCCP └── HadGEM3-GC31-LL └── my-experiment-id └── r1i1p1f3 └── round-1 ├── input │ └── u-bg466 └── output
-
Create the internal variables list used by
cdds
.prepare_generate_variable_list request.cfg
Info
This takes the user variables file provided by the
variable_list_file
option, and creates an internally used.json
version which includes additional metadata. The file can be found in theprepare
proc sub-directory.proc/GCModelDev/MOHCCP/request_id/round-1/prepare/GCModelDev_MOHCCP_my-experiment-id_HadGEM3-GC31-LL.json
{ "checksum":"md5: 89a430535551e97433fd8a22edcfec24", "experiment_id":"my-experiment-id", "history":[ { "comment":"Requested variables file created.", "time":"2025-05-07T07:40:21.049522" } ], "metadata":{}, "mip":"MOHCCP", "mip_era":"GCModelDev", "model_id":"HadGEM3-GC31-LL", "model_type":"AOGCM AER", "production_info":"Produced using CDDS Prepare version \"3.1.2\"", "requested_variables":[ { "active":true, "cell_methods":"area: time: mean", "comments":[], "dimensions":[ "longitude", "latitude", "time", "height2m" ], "frequency":"mon", "in_mappings":true, "in_model":true, "label":"tas", "miptable":"Amon", "producible":"yes", "stream":"ap5" } ], "status":"ok", "suite_branch":"cdds", "suite_id":"u-bg466", "suite_revision":"HEAD"
Running the Conversion Workflow
- Launch the
cylc
conversion workflow.cdds_convert request.cfg
- On Jasmin Cylc workflows must be launched from
cylc2.jasmin.ac.uk
.ssh <username>@cylc2.jasmin.ac.uk -XYA
- You may need to
cd
back to the directory containing yourrequest.cfg
.cd $HOME/cdds_quickstart_tutorial
- Reactivate the CDDS environment.
source ~cdds/bin/setup_env_for_cdds 3.2.0
- Launch the
cylc
conversion workflow.cdds_convert request.cfg
The cdds_convert
command does several things when run.
- Generates the
mip_convert
template files. - Makes a copy of the conversion workflow.
- Populates various Jinja2 template variables.
- Automatically submits the workflow using
cylc vip
.
Once the cdds_convert
command has completed you should see a worklfow running named cdds_HadGEM3-GC31-LL_my-experiment-id_r1i1p1f3
.
You can monitor this workflow in the ususal way using cylc tui
or cylc gui
.
Inspecting Your Data
When the workflow has completed successfully you can inspect the CMORised data.
On disk the CMORised output can be found by descending the root_data_dir
directory until the until reaching the output
directory.
$DATADIR/cdds_quickstart_tutorial/data/GCModelDev/MOHCCP/HadGEM3-GC31-LL/my-experiment-id/r1i1p1f3/round-1/output
Within this directory you will find the following hierarchy.
The <stream>_concat
and <stream>_mip_convert
directories are only used during processing and do not contain
├── <stream>
│ └── <MIP Table>
│ └── <variable>
│ └── <file(s)>
If you are running the example at the Met Office and you did not switch off archiving (e.g. skip_archive = True
), then the CMORised output will also have been archived to mass
.
This will be stored in a location based on the request fields output_mass_root
output_mass_suffix
e.g., moo ls moose:/adhoc/users/<username>/quickstart
.
Useful Configuration Options
What follows are some examples of options that are useful to be aware of but the list is not exhaustive. The full list of configuration options available in the request can be found in the Request Configuration tutorial.
How do I add more output variables?
To process more variables you will need to add the appropriate entry to your variables.txt
file.
Each output variable must be specified on a separate line and follow the below format.
<mip table>/<variable name>:<stream>
If you do not know the :<stream>
and use the stream_mappings
command to add this information for you.
stream_mappings --varfile variables.txt --outfile vairables_with_streams.txt
Finally, you must update the streams
option in the request.cfg
with the names of any additional streams.
For example if you added a variable from the monthly ocean onm
stream.
[conversion]
streams = ap5 onm
How do I process data with arbitrary metadata e.g. custom experiment names?
The example request.cfg
configuration given in this tutorial runs in "relaxed" mode.
This allows the user to provide arbitrary metadata values that do not exist in the Controlled Vocabulary, and is particularly useful for model development work.
If you want to ensure that the metadata is consistent with the Controlled Vocabulary you must change the mode to strict
.
[common]
mode = strict
How do I add extra global attributes to my netCDF output?
Arbitrary netcdf global attributes can be added to the CMORised files by adding a section [netcdf_global_attributes]
to a .request.cfg
file.
Any key=value pairs in this section will be written out to every output netcdf file as global attributes..
[netcdf_global_attributes]
foo = bar
I don't need to retrieve data from MASS it's already on disk, how do I use it with CDDS?
CDDS provides a utitily script called cdds_arrange_input_data
which will symlink
the files you already have on disk to the appropriate directories.
The typical directory structure CDDS creates looks like this (individual .pp
and .nc
files not shown).
└── CMIP6
└── ScenarioMIP
└── UKESM1-0-LL
└── ssp126
└── r1i1p1f2
└── cdds_request_ssp126
├── input
└── u-dn300
├── ap4
├── ap5
└── onm
Having made sure you have run the cdds_create_cdds_directories
first you can then run the arrange script.
Where /path/to/model/data
is the location on disk containing your model output files.
cdds_arrange_input_data request.cfg /path/to/model/data
The directory structure that your data is in should not matter as the mapping from file to stream is performed using regular expressions.
For example, the contents of /path/to/model/data
could contain a flat hierarchy with all model files together in one directory.
dn300a.p52015apr.pp
dn300a.p42015apr.pp
medusa_dn300o_1m_20150101-20150201_diad-T.nc
After running cdds_arrange_input_data
it would symlink
the files like so.
└── CMIP6
└── ScenarioMIP
└── UKESM1-0-LL
└── ssp126
└── r1i1p1f2
└── cdds_request_ssp126
├── input
└── u-dn300
├── ap4
└── dn300a.p42015apr.pp
├── ap5
└── dn300a.p42015apr.pp
└── onm
└── medusa_dn300o_1m_20150101-20150201_diad-T.nc
Note
The cdds_arrange_input_data
will symlink
all files that match the filepatterns.
It will not exclude files outside of the start
and end
dates defined in your request, nor will it exclude streams that are not listed in the request.