Testing
=======

This page provides guidance on testing in ANTS, UG-ANTS and associated science
repositories.

Why do the ANTS/UG-ANTS team ask for unit tests and rose stem tests for code?
-----------------------------------------------------------------------------

Neither unit tests nor rose stem tests are sufficient for providing confidence
that code is correct.  Instead, both are necessary since they complement each
other.

Rose stem tests are not sufficient.  They cannot capture how failures are
handled, confirm intermediate steps are correct, or provide any granularity to
help identify failures.  Rose stem tests are good for confirming that the entire
system works for a particular successful use case.

Unit tests, on the other hand, can confirm individual components function as
expected, can check for failure modes, and can identify where things have gone
wrong.  Unit tests cannot confirm that an end to end system works as expected.

Rose stem tests are good for saying "something" has changed; unit tests are good
for identifying what that "something" is.  This is particularly useful for
handling issues caused by dependency changes (e.g. an update in the version of
iris used with ANTS).

Unit test conventions
---------------------

**All** new code should be unit tested.
The following conventions may help with decisions on how to write the tests.

Where gaps in existing unit tests are discovered, and this gap is crucial to
the code being developed, new tests should be added.  If gaps in existing tests
are found but are not crucial to the new development, then a new issue should
be created for fixing the new tests (there is no requirement that the developer
that made this discovery should also be the one to fix it).  The new issue
should be linked from the issue description so that the reviewer has visibility
and can triage how the new issue should be handled.

What needs to be unit tested?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It is not required that every path through every unit should be tested (e.g. a
unit test for each path in an if statement).

Instead, the **important** functionality needs to be unit tested.  "Important" is a
little bit of a judgement call: for core ANTS and UG-ANTS, we apply a stricter
standard and expect more unit tests.  For ancillary-file-science and
ug-ancillary-file-science, there is more flexibility.  As guidance for what to
unit test, there should be unit tests for:

1. Anything scientifically crucial - usually, the scientist is best placed to
   make this judgement call.
2. The ``main`` function (regular ANTS) or ``run`` method (UG-ANTS) if there is
   any significant code present.
3. Any functions/methods directly called from the ``main`` or ``run``.
4. Failure modes:  if there's code that raises an error or a warning, there
   should be a test for the condition that triggers that error or warning.
5. All public functions/methods: but bear in mind that it may be appropriate to
   make a function/method private rather than unit test it. (`The Hitchhiker's
   Guide to Python <https://docs.python-guide.org/writing/style/#we-are-all-responsible-users>`_
   provides a good description of private "things" in python)
6. Any code changes as a result of a bug fix.  We need a test to confirm that
   the bug has actually been fixed.
7. :ref:`TODO <todo-convention>` items where we want to know if a behaviour has
   changed.

What does not need to be unit tested?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Functionality from other libraries does not usually need to be unit tested.
It is assumed that the library code is already tested.  There are some
circumstances where a test is necessary (typically, in the context of a bug
fix).  This does mean that there's usually no need for (ug-)ancillary-file-science
code to unit test (UG-)ANTS functionality: (UG-)ANTS can be considered to be a
library for (ug-)ancillary-file-science. For example, do not unit test a
regridding step in your science code. If you think the ANTS/UG-ANTS library
needs additional unit testing coverage, feel free to open a bug report in the
relevant repository.

Test names
^^^^^^^^^^

Test files should be named for the function being tested, and the parent
directories should be named for the containing module: ``lib/ants/foo.py``
containing the function ``bar`` should be tested in a ``lib/ants/tests/foo/``
directory, with the ``bar`` function tested in
``lib/ants/tests/foo/test_bar.py``.  Tests covering how the individual functions
combine (also known as integration tests) should be in a file
``lib/ants/tests/foo/test_integration.py``.

Docstrings
^^^^^^^^^^

New tests added should have a docstring describing what the test is doing and
what it is testing. If updating an existing test that does not have a docstring
please add one to the test.

.. seealso::

    :doc:`documentation`
        Guidance on writing docstrings for code. Note that we are generally more
        lenient on docstrings in tests compared to code.

Readability
^^^^^^^^^^^

Ideally, each test should contain a single ``assert`` to make it transparent
what exactly is being tested.  This is a fairly weak convention - readability is
the overall goal, and if it's more readable to have multiple ``asserts``, then
multiple ``asserts`` should be used.  For example, for cases where there's a lot
of set up and related things are being tested (e.g. a complex expected cube
setup, but two simple asserts for the points and the bounds), it may be more
readable to have multiple ``asserts`` in the test.

``MixIn`` classes can be used to reduce test duplication (e.g. a suite of tests
for ``360_day`` and ``Gregorian`` calendars may have a set of tests defined in a
``MixIn`` class, with separate set ups for the two calendars).  This does make
running a single test more difficult, so there is a judgement call between
maintaining duplicate code and convenience in test running.

`ASCII art <https://github.com/MetOffice/ANTS/blob/v3.0.0/lib/ants/tests/constraints/test_extract_overlap.py#L148>`_
can make it easier to understand conceptually complex tests.

.. seealso::

    :ref:`variable-name-convention`
        When writing unit tests, follow the same variable name conventions as
        used in code.


Test style
^^^^^^^^^^

New tests can be written in :mod:`unittest` style (i.e. ``self.assertX``, no pytest
fixtures) or `pytest`_ style (i.e. ``assert X``, pytest fixtures) as appropriate.
This does mean most pytest specific features can now be used.

.. _pytest: https://docs.pytest.org/en/stable/index.html

Because the canonical test *runner* is pytest, :meth:`~unittest.TestCase.subTest`
should not be used for running multiple similar tests.
Instead, ``pytest.parametrize`` is the preferred approach to use.
Alternatively, explicitly write a separate test case for each variant.

The exception to using pytest specific features is that `pytest fixtures for
temporary files <https://docs.pytest.org/en/stable/how-to/tmp_path.html>`_
should not be used.  These leave temporary files on disk after a test run
completes.  Instead, :class:`tempfile.TemporaryFile` or
:class:`tempfile.NamedTemporaryFile` should be used.

For unittest style tests, assertions should be in the order
``assert<Method>(expected, actual)`` rather than ``assert<Method>(actual, expected)``
to ensure the error message for failing tests is correct.

Existing tests are mostly in unittest style.

When adding tests, readability should be the priority.  If moving a test from
unittest style to pytest style results in a more readable test, then this should
be considered even if it means mixing test styles within the same file.

When using pytest fixtures, ideally they should be defined close to the tests
that call them. For example, define the fixture as a method in the a class if
the fixture is used by multiple tests within the class.

The `iris documentation <https://scitools-iris.readthedocs.io/en/latest/developers_guide/contributing_pytest_conversions.html>`_
has advice on converting from unittest to pytest style tests.

``assertRaises`` and ``assertRaisesRegex``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:meth:`~unittest.TestCase.assertRaisesRegex` is preferred over
:meth:`~unittest.TestCase.assertRaises` since this allows checking the error
message as well as the error type.  This can be waived if there is a clear need
for :meth:`~unittest.TestCase.assertRaises` (e.g. trapping multiple different
errors from an external library).

The regular expression for :meth:`~unittest.TestCase.assertRaises` should be the
entire error message (with wildcards and repetition markers for variables) to
enable the test to be read without needing to check the source code.  Most
existing ANTS tests do not conform to this requirement - please update them as
appropriate.

``assertIsNone`` and ``assertIsNotNone``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Using either :meth:`~unittest.TestCase.assertIsNone` or
:meth:`~unittest.TestCase.assertIsNotNone` are the preferred approaches for
checking that an exception is not raised, rather than having a test method with
no assert at all.  This does not apply to UG-ANTS.

Mock conventions in tests
^^^^^^^^^^^^^^^^^^^^^^^^^

There are multiple ways in which :mod:`unittest.mock` can be used in python. The
choice should be made to make it easier to read the test, and in particular, to
understand what is being tested.  The ``mock`` boilerplate code can make it hard
to distinguish code changes from boilerplate.  As a guideline, the order of
preference should be:

1. If a mock is needed for every test in a class, a `class decorator
   <https://github.com/MetOffice/ANTS/blob/v3.0.0/lib/ants/tests/config/test_set_temporary_directory.py#L13>`_
   should be used - this keeps the mocking separate from the individual tests,
   and makes it easier to see what is being tested in an individual test.
2. If the mocked object isn't being tested, and is not needed for every test in
   a class, a decorator for the `individual test method
   <https://github.com/MetOffice/ANTS/blob/v3.0.0/lib/ants/tests/utils/cube/test_defer_cube.py#L61>`_
   should be used.
3. If the mocked object is part of the test (e.g. using
   :meth:`~unittest.mock.Mock.assert_called_once_with`), then a `context manager
   <https://github.com/MetOffice/ANTS/blob/v3.0.0/lib/ants/tests/constraints/test_extract_overlap.py#L163>`_
   is preferred.  This keeps the mocked object and the assertion close together,
   so it's easier to see what is being tested.

Note that most existing usage of mock in ANTS does not follow these guidelines -
this is very much a case of learning from past mistakes.

Extra care should be taken when mock patching the global configuration object,
to ensure that it is reset at the end of the test. This should be achieved using
the :obj:`unittest.mock.patch.dict` context manager. `An example of this`_ can
be found in the unit tests for the configuration of the horizontal extrapolation
mode.

.. _An example of this: https://github.com/MetOffice/ANTS/blob/v3.0.0/lib/ants/tests/regrid/test_integration.py#L135

Running unit tests in a working copy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

All unit tests must pass in the rose stem workflow and the GitHub actions
for a pull request to be approved for merging to
main.  The unit tests are also expected to work from a typical bash shell, but
issues may be missed.  If you come across an inconsistency in unit test
behaviour between different platforms (rose stem, GitHub actions, and a local
run), please raise an issue.  Unit test failures from non-typical environments
(e.g. other shells, text editors, IDEs) will be considered as a lower priority
and may be rejected in some cases. Workarounds for running tests in a specific
IDE should not be included in unit tests, especially if this affects the running
of the test in the standard test runners (GitHub actions, rose stem, bash shell).

To run the tests in a working copy, use:

.. tab-set::

    .. tab-item:: ANTS/UG-ANTS

        .. code-block:: shell

            module load (ug)ants/developer  # This may be different at different sites
            export PYTHONPATH=${PWD}/lib:$PYTHONPATH
            python -m pytest .

    .. tab-item:: ancillary-file-science/ug-ancillary-file-science

        To run tests in an app you're working on in your working copy of
        ancillary-file-science, first cd into the app directory, then:

        If you are against a released version of ANTS:

        .. code-block:: shell

            module load (ug)ants/<ants_version>  # This may be different at different sites
            python -m pytest .

        Or, if you are running against head of main and have a checked out version of ANTS:

        .. code-block:: shell

            module load ants/developer  # This may be different at different sites
            export PYTHONPATH=${ANTS_WORKING_COPY}/lib:$PYTHONPATH
            python -m pytest .


Advice on unit tests
^^^^^^^^^^^^^^^^^^^^

As ever, feel free to contact miao@metoffice.gov.uk for advice, or to set up a
meeting for a more in depth conversation.

Github Copilot can be a useful tool to simplify writing unit tests. Please ensure
that contribuions made with copilot (or other generative AI tools) are suitably
attributed and are made with the licenced version of Github Copilot: see
:ref:`AI-attribution`.

There's also some generic advice available in `The Hitchhiker's Guide to Python
<https://docs.python-guide.org/writing/tests/#testing-your-code>`_.

Rose stem test conventions
--------------------------

What needs to be tested via rose stem?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Every application needs at least one rose stem test to confirm a known good use
case functions as expected.  For the core (UG-)ANTS applications, these tests
should be in (UG-)ANTS and are owned by the ANTS developers.  For the scientist
owned (ug-)ancillary-file-science applications, the rose stem tests should be in
(ug-)ancillary-file-science.

Decomposition splits in KGO tests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In both core and ancillary-file-science, KGO tests that rely on decomposition
should be tested with `x_split and y_split values
<https://metoffice.github.io/ANTS/decomposition.html#configuring-decomposition>`_
of 0, 1, and 2 (where 0 is bypassing decomposition entirely, 1 is going through
decomposition infrastructure without actually splitting, and 2 is a 2x2 split).
The ``split`` cylc parameter is already plumbed in to the test suite and can be
used for this.

Source data files
^^^^^^^^^^^^^^^^^

For guidance on source data, please see the
:doc:`sources tutorial <ants:tutorial_sources>` in the ANTS documentation.

Output filenames in KGO tests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By convention, we use the name::

    output=${ROSE_DATA}/${ROSE_TASK_NAME}

for output paths in the rose apps. This ensures consistency in output filenames,
removes the risk of name collisions, and makes it easier to write the
corresponding rose-ana tests.

The known-good output (KGO) files are now stored outside of the repository. The
`KGO tutorial <ants:tutorial_KGO>` in the ANTS documentation
provides information on managing KGO files.

Absolute paths should not be included in any code or rose stem configuration.
Use variable expansion (e.g. ``${ROSE_DATA}/path/to/file``) instead.

Linkcheck test
^^^^^^^^^^^^^^

The ``linkcheck`` rose stem application checks whether links in the documentation can be
resolved.  This may fail for reasons unrelated to your development (e.g. a
remote server could be down). For this reason, if the ``linkcheck`` test fails,
it should be investigated rather than assumed that it's a problem caused by the
developer.  In some circumstances, it may be acceptable for code to be merged to
main while the ``linkcheck`` test is not passing.

Rose configuration
^^^^^^^^^^^^^^^^^^

``rose config-dump`` should be run in the ``rose-stem`` directory prior to
submitting code for review to ensure that configuration options are in a
standard order throughout the suite.  This prevents issues when users use e.g.
the ``rose edit`` GUI tool and introduce unexpected changes.  Be aware of the
`cylc style guide <https://cylc.github.io/cylc-doc/stable/html/workflow-design-guide/style-guide.html>`_
(although note that we tend to indent jinja2 inline with the cylc workflow
rather than independently).

Long form command line arguments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Some rose stem tests use command line arguments for calls to external tools
(e.g. ``nccmp``).  Long form arguments are preferred for readability (e.g.
``--force`` rather than ``-f``).


Running rose stem tests
^^^^^^^^^^^^^^^^^^^^^^^

To run the rose stem tests for ANTS, UG-ANTS, ancillary-file-science, or
ug-ancillary-file-science run the following commands from the top level of the
checkout of the code:


.. note::
    Requires cylc at a version >= 8.6

.. code-block:: shell

    cylc vip ./rose-stem -z group=all [-n WORKFLOW_NAME]

Note that the group argument can be used to run only a subset of the full
workflow, for instance just the unit tests, or just the rose stem tests for
a specific application.