.. Some of the content of this file has been produced with the assistance of Met Office GitHub Copilot Enterprise

.. _code-conventions:

Code conventions
================

Often, these conventions have been derived from having had issues where a
different approach has been used, so existing code may need updating to match
these conventions.

These conventions are a supplement to the code formatting imposed by tools like
`black`_, `isort`_, `flake8`_ and `ruff`_ which are run as part of the routine
tests in the relevant repositories.

.. _black: https://black.readthedocs.io/en/stable/
.. _isort: https://pycqa.github.io/isort/
.. _flake8: https://flake8.pycqa.org/en/latest/
.. _ruff: https://docs.astral.sh/ruff/

These are conventions - there will be times where they should be ignored.

The overriding theme for the following is to ensure readability.
Readable code is easier to review, easier to maintain, and easier to extend with
future functionality.

Standard tools
--------------

"Standard" tools should be used when appropriate (:mod:`collections`,
:mod:`itertools` from the python standard library; `iris`_, `numpy`_, ANTS,
etc).

.. _iris: https://scitools-iris.readthedocs.io/en/stable/index.html
.. _numpy: https://numpy.org/doc/stable/

Modular code
------------

Modular code - i.e. code that is broken up into small functions/methods - is
easier to read, understand and maintain than large, monolithic functions.
Other advantages include that it is easier to document, test, and reuse
functions/methods when the code is modular.
For these reasons, modular code is preferred.

Deprecated code
---------------

New code contributions should not call any deprecated code.
Similarly, code edits in areas where deprecated code is called
(e.g. due to dependency updates) should be updated to avoid the deprecated code.

.. warning::

   Python suppresses deprecation warnings by default
   (see :exc:`DeprecationWarning`).

   Running the unit tests does enable deprecation warnings, and is configured to
   treat warnings as errors, so deprecations should trigger test failures.

   As such, :exc:`FutureWarning` is the recommended warning to be
   used when deprecating code, as it is not suppressed by default.

Whitespace
----------

In both python code and the rose stem workflows:

1. Spaces should be used, not tabs.
2. Trailing whitespace should be removed (e.g. `cylc instructions here
   <https://cylc.github.io/cylc-doc/stable/html/workflow-design-guide/style-guide.html#trailing-whitespace>`_)

.. _variable-name-convention:

Full names for variables, objects, functions, etc
-------------------------------------------------

Variable names should not be truncated (bec tha har to rea) [#f1]_ or disemvowelled
(bcs tht's hrdr t rd) [#f1]_.  There are exceptions:

1. ``coord``, ``lat``, ``lon`` (but not ``long`` because of ambiguity with the
   data type) are used enough and are standard enough that we should keep using
   them.
2. In tests, the pattern ``xc = cube.coord(axis='x')`` (and similarly for
   ``yc``, ``tc``) is used so often that we should keep it for changes to
   existing tests.  Longer names (e.g. ``lon_coord``, ``time_coord``) are
   preferred for new code.
3. Consistency with variable names in papers or documentation.  Examples include
   the Raymond filter or using terminology from F03 where appropriate.
4. Care should be taken with docstrings.  Sphinx does use ``func`` instead of
   ``function`` and similar. Sphinx warnings for this should trigger an error in
   the rose-stem suite.
5. Class methods should follow the :func:`classmethod` python convention of
   using ``cls`` as the name of the implicit first argument.

Existing variable names that don't follow this convention should be updated as
encountered.  It is strongly preferred to have either a separate ticket, or the
first or last commit on a branch for updating legacy names.
This makes it easier to distinguish between readability and functional changes
when reviewing.

Functions that return :obj:`booleans <bool>` (i.e. ``True`` or ``False``) should
usually be prefixed with ``is``, e.g. ``is_periodic`` or ``_is_global``.

Names should be chosen for clarity and readability.
Try to avoid the risk of confusion - for example, don't use the variable name
``match`` in a block of code using regular expressions unless it is an
:class:`re.Match` object.

Assert
------

`assert <https://docs.python.org/3/reference/simple_stmts.html#assert>`_
must only be used in tests, rather than library or application code
(it can be disabled with an environment variable or command line switch, which
can make debugging painful).

Identifying coordinates on a cube
---------------------------------

As a general rule, the form ``cube.coord(axis='x')`` is preferred
(and similarly for ``y``, ``z``, and ``t``).
This defers the question of identifying the coordinates to iris.
This is particularly useful for ensuring our code works with rotated pole
coordinates (i.e. it avoids manually checking for both ``longitude`` and
``grid_longitude``).

This does rely on the cube having well-formed coordinates. In particular,
this means preprocessing frequently has to ignore this convention.

Care is also needed with hybrid height coordinates - both the
``model_level_number`` DimCoord and the ``altitude`` AuxFactory may be
identified as belonging to the ``z`` axis.

Line lengths
------------

``black`` does not impose line length limits for string literals or comments.
Keep these to the same length as the code (88 characters with the current
version of black) unless there's a good reason not to, for example:

* `reproducing output
  <https://github.com/MetOffice/ANTS/blob/v3.0.0/lib/ants/tests/stock.py#L33>`_
  where shorter lines would make it harder to read
* `URLs
  <https://github.com/MetOffice/ANTS/blob/v3.0.0/lib/ants/tests/fileformats/pp/test_cube2pp.py#L19>`_
  where line breaks make it harder to copy and paste

If a line length of 88 characters is deliberately exceeded, ``# noqa: E501``
should be used so that `flake8 ignores the relevant line
<https://flake8.pycqa.org/en/latest/user/violations.html#in-line-ignoring-errors>`_.
``# noqa: E501`` should be used in preference to ``# nopep8``.

Filtering objects
-----------------

List comprehensions (or generator comprehensions, where appropriate) are
preferred over using :func:`filter` with a :term:`lambda` function. For example,
to select all coordinates with units of metres (m):

.. code-block:: python

   # List comprehension (preferred)
   [coord for coord in cube.coords() if coord.units == "m"]

   # filter with lambda function (not preferred)
   list(filter(cube.coords(), lambda coord: coord.units == "m"))

Lambda functions are avoided as a rule, with the exception of iris constraints.
See the `Python documentation
<https://docs.python.org/3/howto/functional.html#small-functions-and-the-lambda-expression>`_
for guidance on refactoring lambda functions.

Warnings
--------

Warnings for the end user should be used sparingly. In general, warnings should
be provided for deprecations and problems that users may want to fix themselves.
For issues that the end user definitely has to fix, exceptions should be used
instead of warnings. It's currently preferred to use :func:`warnings.warn`
to raise warnings, but the resolution of `ANTS#46 <https://github.com/MetOffice/ANTS/issues/46>`_
(and potentially related tickets) may change this advice.

Existing ANTS warnings frequently use :func:`format` to add information from
variables. New code should prefer :term:`f-strings`, and older warnings should
be updated.

Note that the unit tests are configured to treat warnings as errors. This means
that code should not routinely raise warnings. In some circumstances - e.g.
warnings from third party libraries - it may be necessary to suppress a
particular warning in testing. This can be configured in the top level
``pyproject.toml`` file.

.. _todo-convention:

TODO, FIXME, etc
----------------

TODO comments to identify future work are encouraged where appropriate e.g.
workarounds for known bugs in dependencies that can be removed when the upstream
library has been fixed. A ``TODO`` comment should always be used and should
always be upper case to enable efficient grepping.

A ticket must be referenced in the TODO comment in the form ``#N`` for ticket N
if the ticket is in the same repository.
If the ticket is in a different repository then a full link should be provided.

It is not appropriate to use a TODO comment when submitting code that is known
to be broken with the intention of fixing it at a later date.

Workarounds
-----------

As mentioned above, sometimes workarounds are required for known bugs in
dependencies and can be removed when the upstream library has been fixed.
When this is required, the workaround should be isolated in its own function
called ``<relevant_descriptor>_workaround`` and a comment should make it clear
what the workaround is for and ideally point to an issue belonging to the
dependency.

In order to know when this workaround can be removed, a unit test that is
expected to fail is added (as well as a unit test for the workaround itself).
Then, when the test passes, it is clear that the workaround is no longer needed.
For example:

.. tab-set::

   .. tab-item:: unittest style

      The :mod:`unittest` style (generally used in ANTS), using
      :meth:`unittest.expectedFailure`.

      .. code-block:: python

         def test_ants_override(self):

         @unittest.expectedFailure
         def test_ants_no_override(self):
            # If this succeeds then we can remove this test class and
            # corresponding workaround.

   .. tab-item:: pytest style

      The pytest style (preferred in UG-ANTS), see the pytest documentation
      on `XFail`_.

      .. code-block:: python

         def test_ants_override(self):

         @pytest.mark.xfail
         def test_ants_no_override(self):
            # If this succeeds then we can remove this test class and
            # corresponding workaround.


.. _Xfail: https://docs.pytest.org/en/stable/how-to/skipping.html#xfail-mark-test-functions-as-expected-to-fail

Ideally, raise an issue for the workaround and include a link with the unit
test directing developers to the issue.

Custom exceptions
-----------------

Custom exceptions are located in `lib/ants/exceptions.py
<https://github.com/MetOffice/ANTS/blob/main/lib/ants/exceptions.py>`_ and
`lib/ugants/exceptions.py
<https://github.com/MetOffice/UG-ANTS/blob/main/lib/ugants/exceptions.py>`_.
Where users are required to specify input, and do so incorrectly, specific
exceptions that detail the correct input format are provided to allow the user
to debug their own mistakes. These follow a standard Python custom exception
format defined in the file.

Use pure functions and avoid in place operations
------------------------------------------------

Due to `potential issues with dask <https://distributed.dask.org/en/stable/client.html#pure-functions-by-default>`_,
pure functions are preferred. This means, in particular, that inplace operations
should be avoided where there is a possibility that a cube may have lazy data.
For example, instead of:

.. code-block:: python

   def bad_example(cube):
      """Operates in place - not preferred"""
      cube.data = cube.data + 1
      return None

   bad_example(cube)

use:

.. code-block:: python

   def good_example(cube):
      """Returns a new cube - preferred"""
      result = cube.copy()
      result.data = result.data + 1
      return result

   cube = good_example(cube)

Strict Zip
----------

When using the Python built-in :func:`zip` function to combine two iterables
which are expected to be of the same length, use the ``strict=True`` option.
This will raise a :class:`ValueError` if the iterables have different lengths.
The default behaviour (``strict=False``) will silently accept iterables of
different lengths, and stop iteration after the first iterable is exhausted.

Automated coding standards tools
--------------------------------

Regular ANTS currently uses `black`_, `isort`_, and `flake8`_ for code
formatting and linting, while UG-ANTS uses `ruff`_ and `black`_.
To run these in a working copy, activate your developer environment (for example
via ``module load``, or ``conda activate``), then run:

.. tab-set::

   .. tab-item:: ANTS

      .. code-block:: bash

         export PYTHONPATH=${PWD}/lib:$PYTHONPATH
         black .
         isort .
         flake8 .

   .. tab-item:: UG-ANTS

      .. code-block:: bash

         export PYTHONPATH=${PWD}/lib:$PYTHONPATH
         black .
         ruff check .

   .. tab-item:: ug-ancillary-file-science

      .. code-block:: bash

         cd ug-ancillary-file-science
         ruff check Apps --config <path/to/UG-ANTS/pyproject.toml>
         black --check Apps --config <path/to/UG-ANTS/pyproject.toml>

Running ``black`` or ``isort`` in a working copy automatically fixes
identified issues. Running ``ruff`` with the ``--fix`` option also fixes some
issues automatically.

.. tip::

   Commit changes from these tools as a standalone commit
   without other changes. This makes reviewing code easier.

If it is necessary to disable one or more checks from a line of code, a
``# noqa: <code>`` statement for the particular check `can be used
<https://flake8.pycqa.org/en/latest/user/violations.html#in-line-ignoring-errors>`_.
Blanket ``# noqa`` statements to ignore all checks should not be used.

The ruff complexity checker should be considered as an alert that code may be
too complex, rather than an absolute rule. In particular, if it's necessary to
make code harder to read in order to satisfy the complexity check, then a
``# noqa: C901`` can be used to disable the complexity checker.
The code reviewer may suggest alternative solutions where appropriate.

Jupyter notebooks
-----------------

Jupyter notebooks should not be committed to main.  They are useful for
prototyping, debugging etc, and may be committed to branches.  It is strongly
encouraged to clear all outputs prior to committing, to enable sensible diffing
between revisions.

Binary files
------------

Binary files, such as sources or known good outputs (KGOs), should not be
committed to main. Where possible, synthetic data for a particular unit test
should be generated at runtime using the provided stock functions located in
`ants.tests.stock`_ and `ugants.tests.stock`_.

In some specific circumstances, binary files are permitted, for example to
provide source data to unit test load functions. These files are stored in
`lib/ants/tests/resources/`_ and `lib/ugants/tests/resources/`_
alongside a record of how the files were generated.

.. _ants.tests.stock: https://github.com/MetOffice/ANTS/blob/main/lib/ants/tests/stock.py
.. _ugants.tests.stock: https://github.com/MetOffice/UG-ANTS/blob/main/lib/ugants/tests/stock.py
.. _lib/ants/tests/resources/: https://github.com/MetOffice/ANTS/blob/main/lib/ants/tests/resources/
.. _lib/ugants/tests/resources/: https://github.com/MetOffice/UG-ANTS/blob/main/lib/ugants/tests/resources/


.. rubric:: Footnotes

.. [#f1] "Because that's harder to read"