.. Some of the content of this file has been produced with the assistance of Met Office GitHub Copilot Enterprise .. _code-conventions: Code conventions ================ Often, these conventions have been derived from having had issues where a different approach has been used, so existing code may need updating to match these conventions. These conventions are a supplement to the code formatting imposed by tools like `black`_, `isort`_, `flake8`_ and `ruff`_ which are run as part of the routine tests in the relevant repositories. .. _black: https://black.readthedocs.io/en/stable/ .. _isort: https://pycqa.github.io/isort/ .. _flake8: https://flake8.pycqa.org/en/latest/ .. _ruff: https://docs.astral.sh/ruff/ These are conventions - there will be times where they should be ignored. The overriding theme for the following is to ensure readability. Readable code is easier to review, easier to maintain, and easier to extend with future functionality. Standard tools -------------- "Standard" tools should be used when appropriate (:mod:`collections`, :mod:`itertools` from the python standard library; `iris`_, `numpy`_, ANTS, etc). .. _iris: https://scitools-iris.readthedocs.io/en/stable/index.html .. _numpy: https://numpy.org/doc/stable/ Modular code ------------ Modular code - i.e. code that is broken up into small functions/methods - is easier to read, understand and maintain than large, monolithic functions. Other advantages include that it is easier to document, test, and reuse functions/methods when the code is modular. For these reasons, modular code is preferred. Deprecated code --------------- New code contributions should not call any deprecated code. Similarly, code edits in areas where deprecated code is called (e.g. due to dependency updates) should be updated to avoid the deprecated code. .. warning:: Python suppresses deprecation warnings by default (see :exc:`DeprecationWarning`). Running the unit tests does enable deprecation warnings, and is configured to treat warnings as errors, so deprecations should trigger test failures. As such, :exc:`FutureWarning` is the recommended warning to be used when deprecating code, as it is not suppressed by default. Whitespace ---------- In both python code and the rose stem workflows: 1. Spaces should be used, not tabs. 2. Trailing whitespace should be removed (e.g. `cylc instructions here `_) .. _variable-name-convention: Full names for variables, objects, functions, etc ------------------------------------------------- Variable names should not be truncated (bec tha har to rea) [#f1]_ or disemvowelled (bcs tht's hrdr t rd) [#f1]_. There are exceptions: 1. ``coord``, ``lat``, ``lon`` (but not ``long`` because of ambiguity with the data type) are used enough and are standard enough that we should keep using them. 2. In tests, the pattern ``xc = cube.coord(axis='x')`` (and similarly for ``yc``, ``tc``) is used so often that we should keep it for changes to existing tests. Longer names (e.g. ``lon_coord``, ``time_coord``) are preferred for new code. 3. Consistency with variable names in papers or documentation. Examples include the Raymond filter or using terminology from F03 where appropriate. 4. Care should be taken with docstrings. Sphinx does use ``func`` instead of ``function`` and similar. Sphinx warnings for this should trigger an error in the rose-stem suite. 5. Class methods should follow the :func:`classmethod` python convention of using ``cls`` as the name of the implicit first argument. Existing variable names that don't follow this convention should be updated as encountered. It is strongly preferred to have either a separate ticket, or the first or last commit on a branch for updating legacy names. This makes it easier to distinguish between readability and functional changes when reviewing. Functions that return :obj:`booleans ` (i.e. ``True`` or ``False``) should usually be prefixed with ``is``, e.g. ``is_periodic`` or ``_is_global``. Names should be chosen for clarity and readability. Try to avoid the risk of confusion - for example, don't use the variable name ``match`` in a block of code using regular expressions unless it is an :class:`re.Match` object. Assert ------ `assert `_ must only be used in tests, rather than library or application code (it can be disabled with an environment variable or command line switch, which can make debugging painful). Identifying coordinates on a cube --------------------------------- As a general rule, the form ``cube.coord(axis='x')`` is preferred (and similarly for ``y``, ``z``, and ``t``). This defers the question of identifying the coordinates to iris. This is particularly useful for ensuring our code works with rotated pole coordinates (i.e. it avoids manually checking for both ``longitude`` and ``grid_longitude``). This does rely on the cube having well-formed coordinates. In particular, this means preprocessing frequently has to ignore this convention. Care is also needed with hybrid height coordinates - both the ``model_level_number`` DimCoord and the ``altitude`` AuxFactory may be identified as belonging to the ``z`` axis. Line lengths ------------ ``black`` does not impose line length limits for string literals or comments. Keep these to the same length as the code (88 characters with the current version of black) unless there's a good reason not to, for example: * `reproducing output `_ where shorter lines would make it harder to read * `URLs `_ where line breaks make it harder to copy and paste If a line length of 88 characters is deliberately exceeded, ``# noqa: E501`` should be used so that `flake8 ignores the relevant line `_. ``# noqa: E501`` should be used in preference to ``# nopep8``. Filtering objects ----------------- List comprehensions (or generator comprehensions, where appropriate) are preferred over using :func:`filter` with a :term:`lambda` function. For example, to select all coordinates with units of metres (m): .. code-block:: python # List comprehension (preferred) [coord for coord in cube.coords() if coord.units == "m"] # filter with lambda function (not preferred) list(filter(cube.coords(), lambda coord: coord.units == "m")) Lambda functions are avoided as a rule, with the exception of iris constraints. See the `Python documentation `_ for guidance on refactoring lambda functions. Warnings -------- Warnings for the end user should be used sparingly. In general, warnings should be provided for deprecations and problems that users may want to fix themselves. For issues that the end user definitely has to fix, exceptions should be used instead of warnings. It's currently preferred to use :func:`warnings.warn` to raise warnings, but the resolution of `ANTS#46 `_ (and potentially related tickets) may change this advice. Existing ANTS warnings frequently use :func:`format` to add information from variables. New code should prefer :term:`f-strings`, and older warnings should be updated. Note that the unit tests are configured to treat warnings as errors. This means that code should not routinely raise warnings. In some circumstances - e.g. warnings from third party libraries - it may be necessary to suppress a particular warning in testing. This can be configured in the top level ``pyproject.toml`` file. .. _todo-convention: TODO, FIXME, etc ---------------- TODO comments to identify future work are encouraged where appropriate e.g. workarounds for known bugs in dependencies that can be removed when the upstream library has been fixed. A ``TODO`` comment should always be used and should always be upper case to enable efficient grepping. A ticket must be referenced in the TODO comment in the form ``#N`` for ticket N if the ticket is in the same repository. If the ticket is in a different repository then a full link should be provided. It is not appropriate to use a TODO comment when submitting code that is known to be broken with the intention of fixing it at a later date. Workarounds ----------- As mentioned above, sometimes workarounds are required for known bugs in dependencies and can be removed when the upstream library has been fixed. When this is required, the workaround should be isolated in its own function called ``_workaround`` and a comment should make it clear what the workaround is for and ideally point to an issue belonging to the dependency. In order to know when this workaround can be removed, a unit test that is expected to fail is added (as well as a unit test for the workaround itself). Then, when the test passes, it is clear that the workaround is no longer needed. For example: .. tab-set:: .. tab-item:: unittest style The :mod:`unittest` style (generally used in ANTS), using :meth:`unittest.expectedFailure`. .. code-block:: python def test_ants_override(self): @unittest.expectedFailure def test_ants_no_override(self): # If this succeeds then we can remove this test class and # corresponding workaround. .. tab-item:: pytest style The pytest style (preferred in UG-ANTS), see the pytest documentation on `XFail`_. .. code-block:: python def test_ants_override(self): @pytest.mark.xfail def test_ants_no_override(self): # If this succeeds then we can remove this test class and # corresponding workaround. .. _Xfail: https://docs.pytest.org/en/stable/how-to/skipping.html#xfail-mark-test-functions-as-expected-to-fail Ideally, raise an issue for the workaround and include a link with the unit test directing developers to the issue. Custom exceptions ----------------- Custom exceptions are located in `lib/ants/exceptions.py `_ and `lib/ugants/exceptions.py `_. Where users are required to specify input, and do so incorrectly, specific exceptions that detail the correct input format are provided to allow the user to debug their own mistakes. These follow a standard Python custom exception format defined in the file. Use pure functions and avoid in place operations ------------------------------------------------ Due to `potential issues with dask `_, pure functions are preferred. This means, in particular, that inplace operations should be avoided where there is a possibility that a cube may have lazy data. For example, instead of: .. code-block:: python def bad_example(cube): """Operates in place - not preferred""" cube.data = cube.data + 1 return None bad_example(cube) use: .. code-block:: python def good_example(cube): """Returns a new cube - preferred""" result = cube.copy() result.data = result.data + 1 return result cube = good_example(cube) Strict Zip ---------- When using the Python built-in :func:`zip` function to combine two iterables which are expected to be of the same length, use the ``strict=True`` option. This will raise a :class:`ValueError` if the iterables have different lengths. The default behaviour (``strict=False``) will silently accept iterables of different lengths, and stop iteration after the first iterable is exhausted. Automated coding standards tools -------------------------------- Regular ANTS currently uses `black`_, `isort`_, and `flake8`_ for code formatting and linting, while UG-ANTS uses `ruff`_ and `black`_. To run these in a working copy, activate your developer environment (for example via ``module load``, or ``conda activate``), then run: .. tab-set:: .. tab-item:: ANTS .. code-block:: bash export PYTHONPATH=${PWD}/lib:$PYTHONPATH black . isort . flake8 . .. tab-item:: UG-ANTS .. code-block:: bash export PYTHONPATH=${PWD}/lib:$PYTHONPATH black . ruff check . .. tab-item:: ug-ancillary-file-science .. code-block:: bash cd ug-ancillary-file-science ruff check Apps --config black --check Apps --config Running ``black`` or ``isort`` in a working copy automatically fixes identified issues. Running ``ruff`` with the ``--fix`` option also fixes some issues automatically. .. tip:: Commit changes from these tools as a standalone commit without other changes. This makes reviewing code easier. If it is necessary to disable one or more checks from a line of code, a ``# noqa: `` statement for the particular check `can be used `_. Blanket ``# noqa`` statements to ignore all checks should not be used. The ruff complexity checker should be considered as an alert that code may be too complex, rather than an absolute rule. In particular, if it's necessary to make code harder to read in order to satisfy the complexity check, then a ``# noqa: C901`` can be used to disable the complexity checker. The code reviewer may suggest alternative solutions where appropriate. Jupyter notebooks ----------------- Jupyter notebooks should not be committed to main. They are useful for prototyping, debugging etc, and may be committed to branches. It is strongly encouraged to clear all outputs prior to committing, to enable sensible diffing between revisions. Binary files ------------ Binary files, such as sources or known good outputs (KGOs), should not be committed to main. Where possible, synthetic data for a particular unit test should be generated at runtime using the provided stock functions located in `ants.tests.stock`_ and `ugants.tests.stock`_. In some specific circumstances, binary files are permitted, for example to provide source data to unit test load functions. These files are stored in `lib/ants/tests/resources/`_ and `lib/ugants/tests/resources/`_ alongside a record of how the files were generated. .. _ants.tests.stock: https://github.com/MetOffice/ANTS/blob/main/lib/ants/tests/stock.py .. _ugants.tests.stock: https://github.com/MetOffice/UG-ANTS/blob/main/lib/ugants/tests/stock.py .. _lib/ants/tests/resources/: https://github.com/MetOffice/ANTS/blob/main/lib/ants/tests/resources/ .. _lib/ugants/tests/resources/: https://github.com/MetOffice/UG-ANTS/blob/main/lib/ugants/tests/resources/ .. rubric:: Footnotes .. [#f1] "Because that's harder to read"