Code conventions#

Often, these conventions have been derived from having had issues where a different approach has been used, so existing code may need updating to match these conventions.

These conventions are a supplement to the code formatting imposed by tools like black, isort, flake8 and ruff which are run as part of the routine tests in the relevant repositories.

These are conventions - there will be times where they should be ignored.

The overriding theme for the following is to ensure readability. Readable code is easier to review, easier to maintain, and easier to extend with future functionality.

Standard tools#

“Standard” tools should be used when appropriate (collections, itertools from the python standard library; iris, numpy, ANTS, etc).

Modular code#

Modular code - i.e. code that is broken up into small functions/methods - is easier to read, understand and maintain than large, monolithic functions. Other advantages include that it is easier to document, test, and reuse functions/methods when the code is modular. For these reasons, modular code is preferred.

Deprecated code#

New code contributions should not call any deprecated code. Similarly, code edits in areas where deprecated code is called (e.g. due to dependency updates) should be updated to avoid the deprecated code.

Warning

Python suppresses deprecation warnings by default (see DeprecationWarning).

Running the unit tests does enable deprecation warnings, and is configured to treat warnings as errors, so deprecations should trigger test failures.

As such, FutureWarning is the recommended warning to be used when deprecating code, as it is not suppressed by default.

Whitespace#

In both python code and the rose stem workflows:

  1. Spaces should be used, not tabs.

  2. Trailing whitespace should be removed (e.g. cylc instructions here)

Full names for variables, objects, functions, etc#

Variable names should not be truncated (bec tha har to rea) [1] or disemvowelled (bcs tht’s hrdr t rd) [1]. There are exceptions:

  1. coord, lat, lon (but not long because of ambiguity with the data type) are used enough and are standard enough that we should keep using them.

  2. In tests, the pattern xc = cube.coord(axis='x') (and similarly for yc, tc) is used so often that we should keep it for changes to existing tests. Longer names (e.g. lon_coord, time_coord) are preferred for new code.

  3. Consistency with variable names in papers or documentation. Examples include the Raymond filter or using terminology from F03 where appropriate.

  4. Care should be taken with docstrings. Sphinx does use func instead of function and similar. Sphinx warnings for this should trigger an error in the rose-stem suite.

  5. Class methods should follow the classmethod() python convention of using cls as the name of the implicit first argument.

Existing variable names that don’t follow this convention should be updated as encountered. It is strongly preferred to have either a separate ticket, or the first or last commit on a branch for updating legacy names. This makes it easier to distinguish between readability and functional changes when reviewing.

Functions that return booleans (i.e. True or False) should usually be prefixed with is, e.g. is_periodic or _is_global.

Names should be chosen for clarity and readability. Try to avoid the risk of confusion - for example, don’t use the variable name match in a block of code using regular expressions unless it is an re.Match object.

Assert#

assert must only be used in tests, rather than library or application code (it can be disabled with an environment variable or command line switch, which can make debugging painful).

Identifying coordinates on a cube#

As a general rule, the form cube.coord(axis='x') is preferred (and similarly for y, z, and t). This defers the question of identifying the coordinates to iris. This is particularly useful for ensuring our code works with rotated pole coordinates (i.e. it avoids manually checking for both longitude and grid_longitude).

This does rely on the cube having well-formed coordinates. In particular, this means preprocessing frequently has to ignore this convention.

Care is also needed with hybrid height coordinates - both the model_level_number DimCoord and the altitude AuxFactory may be identified as belonging to the z axis.

Line lengths#

black does not impose line length limits for string literals or comments. Keep these to the same length as the code (88 characters with the current version of black) unless there’s a good reason not to, for example:

  • reproducing output where shorter lines would make it harder to read

  • URLs where line breaks make it harder to copy and paste

If a line length of 88 characters is deliberately exceeded, # noqa: E501 should be used so that flake8 ignores the relevant line. # noqa: E501 should be used in preference to # nopep8.

Filtering objects#

List comprehensions (or generator comprehensions, where appropriate) are preferred over using filter() with a lambda function. For example, to select all coordinates with units of metres (m):

# List comprehension (preferred)
[coord for coord in cube.coords() if coord.units == "m"]

# filter with lambda function (not preferred)
list(filter(cube.coords(), lambda coord: coord.units == "m"))

Lambda functions are avoided as a rule, with the exception of iris constraints. See the Python documentation for guidance on refactoring lambda functions.

Warnings#

Warnings for the end user should be used sparingly. In general, warnings should be provided for deprecations and problems that users may want to fix themselves. For issues that the end user definitely has to fix, exceptions should be used instead of warnings. It’s currently preferred to use warnings.warn() to raise warnings, but the resolution of ANTS#46 (and potentially related tickets) may change this advice.

Existing ANTS warnings frequently use format() to add information from variables. New code should prefer f-strings, and older warnings should be updated.

Note that the unit tests are configured to treat warnings as errors. This means that code should not routinely raise warnings. In some circumstances - e.g. warnings from third party libraries - it may be necessary to suppress a particular warning in testing. This can be configured in the top level pyproject.toml file.

TODO, FIXME, etc#

TODO comments to identify future work are encouraged where appropriate e.g. workarounds for known bugs in dependencies that can be removed when the upstream library has been fixed. A TODO comment should always be used and should always be upper case to enable efficient grepping.

A ticket must be referenced in the TODO comment in the form #N for ticket N if the ticket is in the same repository. If the ticket is in a different repository then a full link should be provided.

It is not appropriate to use a TODO comment when submitting code that is known to be broken with the intention of fixing it at a later date.

Workarounds#

As mentioned above, sometimes workarounds are required for known bugs in dependencies and can be removed when the upstream library has been fixed. When this is required, the workaround should be isolated in its own function called <relevant_descriptor>_workaround and a comment should make it clear what the workaround is for and ideally point to an issue belonging to the dependency.

In order to know when this workaround can be removed, a unit test that is expected to fail is added (as well as a unit test for the workaround itself). Then, when the test passes, it is clear that the workaround is no longer needed. For example:

The unittest style (generally used in ANTS), using unittest.expectedFailure().

def test_ants_override(self):

@unittest.expectedFailure
def test_ants_no_override(self):
   # If this succeeds then we can remove this test class and
   # corresponding workaround.

The pytest style (preferred in UG-ANTS), see the pytest documentation on XFail.

def test_ants_override(self):

@pytest.mark.xfail
def test_ants_no_override(self):
   # If this succeeds then we can remove this test class and
   # corresponding workaround.

Ideally, raise an issue for the workaround and include a link with the unit test directing developers to the issue.

Custom exceptions#

Custom exceptions are located in lib/ants/exceptions.py and lib/ugants/exceptions.py. Where users are required to specify input, and do so incorrectly, specific exceptions that detail the correct input format are provided to allow the user to debug their own mistakes. These follow a standard Python custom exception format defined in the file.

Use pure functions and avoid in place operations#

Due to potential issues with dask, pure functions are preferred. This means, in particular, that inplace operations should be avoided where there is a possibility that a cube may have lazy data. For example, instead of:

def bad_example(cube):
   """Operates in place - not preferred"""
   cube.data = cube.data + 1
   return None

bad_example(cube)

use:

def good_example(cube):
   """Returns a new cube - preferred"""
   result = cube.copy()
   result.data = result.data + 1
   return result

cube = good_example(cube)

Strict Zip#

When using the Python built-in zip() function to combine two iterables which are expected to be of the same length, use the strict=True option. This will raise a ValueError if the iterables have different lengths. The default behaviour (strict=False) will silently accept iterables of different lengths, and stop iteration after the first iterable is exhausted.

Automated coding standards tools#

Regular ANTS currently uses black, isort, and flake8 for code formatting and linting, while UG-ANTS uses ruff and black. To run these in a working copy, activate your developer environment (for example via module load, or conda activate), then run:

export PYTHONPATH=${PWD}/lib:$PYTHONPATH
black .
isort .
flake8 .
export PYTHONPATH=${PWD}/lib:$PYTHONPATH
black .
ruff check .
cd ug-ancillary-file-science
ruff check Apps --config <path/to/UG-ANTS/pyproject.toml>
black --check Apps --config <path/to/UG-ANTS/pyproject.toml>

Running black or isort in a working copy automatically fixes identified issues. Running ruff with the --fix option also fixes some issues automatically.

Tip

Commit changes from these tools as a standalone commit without other changes. This makes reviewing code easier.

If it is necessary to disable one or more checks from a line of code, a # noqa: <code> statement for the particular check can be used. Blanket # noqa statements to ignore all checks should not be used.

The ruff complexity checker should be considered as an alert that code may be too complex, rather than an absolute rule. In particular, if it’s necessary to make code harder to read in order to satisfy the complexity check, then a # noqa: C901 can be used to disable the complexity checker. The code reviewer may suggest alternative solutions where appropriate.

Jupyter notebooks#

Jupyter notebooks should not be committed to main. They are useful for prototyping, debugging etc, and may be committed to branches. It is strongly encouraged to clear all outputs prior to committing, to enable sensible diffing between revisions.

Binary files#

Binary files, such as sources or known good outputs (KGOs), should not be committed to main. Where possible, synthetic data for a particular unit test should be generated at runtime using the provided stock functions located in ants.tests.stock and ugants.tests.stock.

In some specific circumstances, binary files are permitted, for example to provide source data to unit test load functions. These files are stored in lib/ants/tests/resources/ and lib/ugants/tests/resources/ alongside a record of how the files were generated.

Footnotes