Testing ======= This page provides guidance on testing in ANTS, UG-ANTS and associated science repositories. Why do the ANTS/UG-ANTS team ask for unit tests and rose stem tests for code? ----------------------------------------------------------------------------- Neither unit tests nor rose stem tests are sufficient for providing confidence that code is correct. Instead, both are necessary since they complement each other. Rose stem tests are not sufficient. They cannot capture how failures are handled, confirm intermediate steps are correct, or provide any granularity to help identify failures. Rose stem tests are good for confirming that the entire system works for a particular successful use case. Unit tests, on the other hand, can confirm individual components function as expected, can check for failure modes, and can identify where things have gone wrong. Unit tests cannot confirm that an end to end system works as expected. Rose stem tests are good for saying "something" has changed; unit tests are good for identifying what that "something" is. This is particularly useful for handling issues caused by dependency changes (e.g. an update in the version of iris used with ANTS). Unit test conventions --------------------- **All** new code should be unit tested. The following conventions may help with decisions on how to write the tests. Where gaps in existing unit tests are discovered, and this gap is crucial to the code being developed, new tests should be added. If gaps in existing tests are found but are not crucial to the new development, then a new issue should be created for fixing the new tests (there is no requirement that the developer that made this discovery should also be the one to fix it). The new issue should be linked from the issue description so that the reviewer has visibility and can triage how the new issue should be handled. What needs to be unit tested? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It is not required that every path through every unit should be tested (e.g. a unit test for each path in an if statement). Instead, the **important** functionality needs to be unit tested. "Important" is a little bit of a judgement call: for core ANTS and UG-ANTS, we apply a stricter standard and expect more unit tests. For ancillary-file-science and ug-ancillary-file-science, there is more flexibility. As guidance for what to unit test, there should be unit tests for: 1. Anything scientifically crucial - usually, the scientist is best placed to make this judgement call. 2. The ``main`` function (regular ANTS) or ``run`` method (UG-ANTS) if there is any significant code present. 3. Any functions/methods directly called from the ``main`` or ``run``. 4. Failure modes: if there's code that raises an error or a warning, there should be a test for the condition that triggers that error or warning. 5. All public functions/methods: but bear in mind that it may be appropriate to make a function/method private rather than unit test it. (`The Hitchhiker's Guide to Python `_ provides a good description of private "things" in python) 6. Any code changes as a result of a bug fix. We need a test to confirm that the bug has actually been fixed. 7. :ref:`TODO ` items where we want to know if a behaviour has changed. What does not need to be unit tested? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Functionality from other libraries does not usually need to be unit tested. It is assumed that the library code is already tested. There are some circumstances where a test is necessary (typically, in the context of a bug fix). This does mean that there's usually no need for (ug-)ancillary-file-science code to unit test (UG-)ANTS functionality: (UG-)ANTS can be considered to be a library for (ug-)ancillary-file-science. For example, do not unit test a regridding step in your science code. If you think the ANTS/UG-ANTS library needs additional unit testing coverage, feel free to open a bug report in the relevant repository. Test names ^^^^^^^^^^ Test files should be named for the function being tested, and the parent directories should be named for the containing module: ``lib/ants/foo.py`` containing the function ``bar`` should be tested in a ``lib/ants/tests/foo/`` directory, with the ``bar`` function tested in ``lib/ants/tests/foo/test_bar.py``. Tests covering how the individual functions combine (also known as integration tests) should be in a file ``lib/ants/tests/foo/test_integration.py``. Docstrings ^^^^^^^^^^ New tests added should have a docstring describing what the test is doing and what it is testing. If updating an existing test that does not have a docstring please add one to the test. .. seealso:: :doc:`documentation` Guidance on writing docstrings for code. Note that we are generally more lenient on docstrings in tests compared to code. Readability ^^^^^^^^^^^ Ideally, each test should contain a single ``assert`` to make it transparent what exactly is being tested. This is a fairly weak convention - readability is the overall goal, and if it's more readable to have multiple ``asserts``, then multiple ``asserts`` should be used. For example, for cases where there's a lot of set up and related things are being tested (e.g. a complex expected cube setup, but two simple asserts for the points and the bounds), it may be more readable to have multiple ``asserts`` in the test. ``MixIn`` classes can be used to reduce test duplication (e.g. a suite of tests for ``360_day`` and ``Gregorian`` calendars may have a set of tests defined in a ``MixIn`` class, with separate set ups for the two calendars). This does make running a single test more difficult, so there is a judgement call between maintaining duplicate code and convenience in test running. `ASCII art `_ can make it easier to understand conceptually complex tests. .. seealso:: :ref:`variable-name-convention` When writing unit tests, follow the same variable name conventions as used in code. Test style ^^^^^^^^^^ New tests can be written in :mod:`unittest` style (i.e. ``self.assertX``, no pytest fixtures) or `pytest`_ style (i.e. ``assert X``, pytest fixtures) as appropriate. This does mean most pytest specific features can now be used. .. _pytest: https://docs.pytest.org/en/stable/index.html Because the canonical test *runner* is pytest, :meth:`~unittest.TestCase.subTest` should not be used for running multiple similar tests. Instead, ``pytest.parametrize`` is the preferred approach to use. Alternatively, explicitly write a separate test case for each variant. The exception to using pytest specific features is that `pytest fixtures for temporary files `_ should not be used. These leave temporary files on disk after a test run completes. Instead, :class:`tempfile.TemporaryFile` or :class:`tempfile.NamedTemporaryFile` should be used. For unittest style tests, assertions should be in the order ``assert(expected, actual)`` rather than ``assert(actual, expected)`` to ensure the error message for failing tests is correct. Existing tests are mostly in unittest style. When adding tests, readability should be the priority. If moving a test from unittest style to pytest style results in a more readable test, then this should be considered even if it means mixing test styles within the same file. When using pytest fixtures, ideally they should be defined close to the tests that call them. For example, define the fixture as a method in the a class if the fixture is used by multiple tests within the class. The `iris documentation `_ has advice on converting from unittest to pytest style tests. ``assertRaises`` and ``assertRaisesRegex`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :meth:`~unittest.TestCase.assertRaisesRegex` is preferred over :meth:`~unittest.TestCase.assertRaises` since this allows checking the error message as well as the error type. This can be waived if there is a clear need for :meth:`~unittest.TestCase.assertRaises` (e.g. trapping multiple different errors from an external library). The regular expression for :meth:`~unittest.TestCase.assertRaises` should be the entire error message (with wildcards and repetition markers for variables) to enable the test to be read without needing to check the source code. Most existing ANTS tests do not conform to this requirement - please update them as appropriate. ``assertIsNone`` and ``assertIsNotNone`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using either :meth:`~unittest.TestCase.assertIsNone` or :meth:`~unittest.TestCase.assertIsNotNone` are the preferred approaches for checking that an exception is not raised, rather than having a test method with no assert at all. This does not apply to UG-ANTS. Mock conventions in tests ^^^^^^^^^^^^^^^^^^^^^^^^^ There are multiple ways in which :mod:`unittest.mock` can be used in python. The choice should be made to make it easier to read the test, and in particular, to understand what is being tested. The ``mock`` boilerplate code can make it hard to distinguish code changes from boilerplate. As a guideline, the order of preference should be: 1. If a mock is needed for every test in a class, a `class decorator `_ should be used - this keeps the mocking separate from the individual tests, and makes it easier to see what is being tested in an individual test. 2. If the mocked object isn't being tested, and is not needed for every test in a class, a decorator for the `individual test method `_ should be used. 3. If the mocked object is part of the test (e.g. using :meth:`~unittest.mock.Mock.assert_called_once_with`), then a `context manager `_ is preferred. This keeps the mocked object and the assertion close together, so it's easier to see what is being tested. Note that most existing usage of mock in ANTS does not follow these guidelines - this is very much a case of learning from past mistakes. Extra care should be taken when mock patching the global configuration object, to ensure that it is reset at the end of the test. This should be achieved using the :obj:`unittest.mock.patch.dict` context manager. `An example of this`_ can be found in the unit tests for the configuration of the horizontal extrapolation mode. .. _An example of this: https://github.com/MetOffice/ANTS/blob/v3.0.0/lib/ants/tests/regrid/test_integration.py#L135 Running unit tests in a working copy ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ All unit tests must pass in the rose stem workflow and the GitHub actions for a pull request to be approved for merging to main. The unit tests are also expected to work from a typical bash shell, but issues may be missed. If you come across an inconsistency in unit test behaviour between different platforms (rose stem, GitHub actions, and a local run), please raise an issue. Unit test failures from non-typical environments (e.g. other shells, text editors, IDEs) will be considered as a lower priority and may be rejected in some cases. Workarounds for running tests in a specific IDE should not be included in unit tests, especially if this affects the running of the test in the standard test runners (GitHub actions, rose stem, bash shell). To run the tests in a working copy, use: .. tab-set:: .. tab-item:: ANTS/UG-ANTS .. code-block:: shell module load (ug)ants/developer # This may be different at different sites export PYTHONPATH=${PWD}/lib:$PYTHONPATH python -m pytest . .. tab-item:: ancillary-file-science/ug-ancillary-file-science To run tests in an app you're working on in your working copy of ancillary-file-science, first cd into the app directory, then: If you are against a released version of ANTS: .. code-block:: shell module load (ug)ants/ # This may be different at different sites python -m pytest . Or, if you are running against head of main and have a checked out version of ANTS: .. code-block:: shell module load ants/developer # This may be different at different sites export PYTHONPATH=${ANTS_WORKING_COPY}/lib:$PYTHONPATH python -m pytest . Advice on unit tests ^^^^^^^^^^^^^^^^^^^^ As ever, feel free to contact miao@metoffice.gov.uk for advice, or to set up a meeting for a more in depth conversation. Github Copilot can be a useful tool to simplify writing unit tests. Please ensure that contribuions made with copilot (or other generative AI tools) are suitably attributed and are made with the licenced version of Github Copilot: see :ref:`AI-attribution`. There's also some generic advice available in `The Hitchhiker's Guide to Python `_. Rose stem test conventions -------------------------- What needs to be tested via rose stem? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Every application needs at least one rose stem test to confirm a known good use case functions as expected. For the core (UG-)ANTS applications, these tests should be in (UG-)ANTS and are owned by the ANTS developers. For the scientist owned (ug-)ancillary-file-science applications, the rose stem tests should be in (ug-)ancillary-file-science. Decomposition splits in KGO tests ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In both core and ancillary-file-science, KGO tests that rely on decomposition should be tested with `x_split and y_split values `_ of 0, 1, and 2 (where 0 is bypassing decomposition entirely, 1 is going through decomposition infrastructure without actually splitting, and 2 is a 2x2 split). The ``split`` cylc parameter is already plumbed in to the test suite and can be used for this. Source data files ^^^^^^^^^^^^^^^^^ For guidance on source data, please see the :doc:`sources tutorial ` in the ANTS documentation. Output filenames in KGO tests ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ By convention, we use the name:: output=${ROSE_DATA}/${ROSE_TASK_NAME} for output paths in the rose apps. This ensures consistency in output filenames, removes the risk of name collisions, and makes it easier to write the corresponding rose-ana tests. The known-good output (KGO) files are now stored outside of the repository. The `KGO tutorial ` in the ANTS documentation provides information on managing KGO files. Absolute paths should not be included in any code or rose stem configuration. Use variable expansion (e.g. ``${ROSE_DATA}/path/to/file``) instead. Linkcheck test ^^^^^^^^^^^^^^ The ``linkcheck`` rose stem application checks whether links in the documentation can be resolved. This may fail for reasons unrelated to your development (e.g. a remote server could be down). For this reason, if the ``linkcheck`` test fails, it should be investigated rather than assumed that it's a problem caused by the developer. In some circumstances, it may be acceptable for code to be merged to main while the ``linkcheck`` test is not passing. Rose configuration ^^^^^^^^^^^^^^^^^^ ``rose config-dump`` should be run in the ``rose-stem`` directory prior to submitting code for review to ensure that configuration options are in a standard order throughout the suite. This prevents issues when users use e.g. the ``rose edit`` GUI tool and introduce unexpected changes. Be aware of the `cylc style guide `_ (although note that we tend to indent jinja2 inline with the cylc workflow rather than independently). Long form command line arguments ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Some rose stem tests use command line arguments for calls to external tools (e.g. ``nccmp``). Long form arguments are preferred for readability (e.g. ``--force`` rather than ``-f``). Running rose stem tests ^^^^^^^^^^^^^^^^^^^^^^^ To run the rose stem tests for ANTS, UG-ANTS, ancillary-file-science, or ug-ancillary-file-science run the following commands from the top level of the checkout of the code: .. note:: Requires cylc at a version >= 8.6 .. code-block:: shell cylc vip ./rose-stem -z group=all [-n WORKFLOW_NAME] Note that the group argument can be used to run only a subset of the full workflow, for instance just the unit tests, or just the rose stem tests for a specific application.