Unit Testing#

The purpose of unit testing is to test the smallest discrete pieces of functionality that have a clearly specified API. This will typically be a procedure. It is not uncommon to group all tests pertaining to a single program unit (for example, a module or type) together.

The unit test code can be found under the <project>/unit-test directory, in a tree which mirrors that in <project>/source. Thus the test for a source file may be easily located by looking in the corresponding location in the unit test tree.

To aid the writing of unit tests, we use the pFUnit unit testing framework. This provides a preprocessor to generate test case source and a driver to run the tests.

Test code can be identified by the .pf extension and are normal Fortran files garnished with @ (at symbol) prefixed directives. The processor substitutes the directives with additional Fortran code and Fortran preprocessor directives.

It is possible to use a .PF extension. This causes the Fortran preprocessor to be run on the file prior to the pFUnit processor. This is necessary for testing templated source but should otherwise be avoided.

When the resulting program is run, the pFUnit framework marks progress and reports any failures.

How to Unit Test a Thing#

A unit test should test only the unit under test. This may seem obvious but it is easy for a panoply of other things to work their way into a test.

Inputs to the unit under test should never be left to the tender mercies of undefined or unclear behaviour. Always set them explicitly and ideally to constants.

The expected result should be constants or calculated from constants. Try to avoid calculations so as to avoid the possibility that the same (flawed) algorithm is used in test and unit under test.

Each test procedure should contain a well defined set of tests. The failure of a test should indicate that a fault lies in a restricted amount of code. A single test procedure exercising an entire module has some value but if it exercises only a single procedure under test, your bug hunting effort has been greatly reduced.

The test should rely on as little external code to set up the test environment as possible. Ideally none at all. Any external code becomes, by implication, part of the unit under test. It must work correctly before the test can pass.

Use of LFRic Infrastructure in Unit Tests#

Unless you are testing a particular part of the infrastructure the rule is: Don’t use it in a unit test.

This is particularly relevant to testing kernel code. The inputs to kernel procedures are simple arrays of primitive types. In a model run, these arrays are derived from infrastructure objects such as fields and function spaces. It is very easy to rely on these infrastructure objects in the unit tests. However, this is a poor idea for a number of reasons.

By using the infrastructure you are adding a lot of dependencies into your test. The effect of using them is that you are not only testing the unit you are interested in but also all the infrastructure used in the test. This reduces the locality of any bug discovered.

It also makes the tests much more complicated to implement and to read. This in itself invites faults.

Canned examples of the arrays used by kernels are provided to set up simple cases sufficient for most testing. These also add external dependencies which widen the scope of the test but to a much lesser extent. They are based on canned data which is merely copied into the appropriate array. Thus they represent a much lower risk.

Information on how to use the provided helper routines that return the canned data can be found in the section: Replacing Infrastructure calls with Canned Information.

What to Test#

Obviously you should test the output of the unit against expected good output. Make sure that when a set of values go in, the correct values come out. When choosing your test case make sure you don’t go for values which can hide failures. For instance testing that when you get zero out may not be helpful. There are many ways to get zero out of a calculation.

It may be worth testing a number of good inputs in case by some fluke you happen to choose the one set of inputs which gives the correct result even though every other option returns the wrong result.

Where possible, it is also worth testing a few expected failure modes. The current unit testing framework doesn’t handle aborts on error (the whole test suite will simply abort), but if the error is handled by the code, it can be tested. For example, if some functionality requires a positive integer, give it a negative one.

Edge cases are another good area to test. Check for out-by-one errors by passing values either side of limits.

The idea is to build confidence that the unit not only functions correctly most of the time but also in stress conditions.

How to Test#

What follows is an introduction to a number of different approaches and techniques for writing unit tests. In general you should always avoid implementing a class derived from TestCase unless you need it to hold fixture data. Fixtures which do not create data which can be held in this way, such as namelist feigners, should not cause a TestCase class to be created.

Even when using fixtures you should prefer standalone @before and @after subroutines over setUp and tearDown methods.

Simple Test Procedures#

A minimal example might look like this:

module simple_mod_test

  use pFUnit_Mod
  use simple_mod, only: thing, thang

  implicit none

  private
  public test_thing, test_thang

contains

  @test
  subroutine test_thing

    integer :: result

    result = thing( 1, 2, 3 )
    @assertEqual( 12, result )

  end subroutine test_thing

  @test
  subroutine test_thang

    integer :: result

    call thang( result )
    @assertEqual( 13, result )

  end subroutine test_thang

end module simple_mod_test

Note the pFUnit directives prefixed with the @ (at) symbol. These are used to mark out the test cases with @test. They are also used to denote “assertions”. These are the actual business end of a test case. An assertion must be met in order for the test to pass.

Note

A restriction of pFUnit is that any line that start with the @ symbol must all be written on a single line. Continuation lines are not permitted.

In this case @assertEqual is used. Surprising no one, this requires that the value from the unit under test (the second argument) must be equal to the expected result passed in the first argument.

Given the problems inherent in testing for equality between floating point numbers a fuzzy match may be used. Simply pass tolerance argument, e.g. tolerance=0.001.

A wide variety of other assertions are provided including inequalities such as @assertGreaterThan and numerical tests such as @assertIsNan.

Also provided is the general purpose @assertTrue. This allows any unsupported test to be implemented using an expression which yields a boolean result and testing for its success.

When an assertion fails it will produce a failure message which includes details about what was expected and what was found. This is usually exactly what is needed in order to diagnose the problem but in some cases it can be unhelpful. When this happens the optional “message” argument may be used.

Test Procedures with Fixtures#

The simple test outlined above is fine for situations where there are no resources to be managed. When there are, “fixtures” are needed.

The most common resource requiring management is memory, allocated space must be deallocated, but configuration may also be considered a resource in this context.

Such managed resources are referred to as “fixtures” in testing parlance. They are things which must exist in order to perform the test but which are not under test themselves. They are created and initialised before each test and destroyed after it. This means that each test has a pristine, known, environment in which to work.

Following is a minimal example of fixtures in use:

module simple_mod_test

  use pFUnit_Mod
  use simple_mod, only: thing, thang

  implicit none

  private
  public test_thing, test_thang

  real, allocatable :: data_block(:)

contains

  @before
  subroutine setUp()

    implicit none

    allocate( data_block(256) )

  end subroutine setUp

  @after
  subroutine tearDown()

    implicit none

    deallocate( data_block )

  end subroutine tearDown

  @test
  subroutine test_thing()

    implicit none

    call thing( 1, 2, 3, data_block )
    @assertEqual( (/12, 13, 14/), data_block )

  end subroutine test_thing

  @test
  subroutine test_thang()

    implicit none

    integer :: result

    data_block = (/-1, -2, -3/)
    result = thang( data_block )
    @assertEqual( 13, result )

  end subroutine test_thang

end module simple_mod_test

Notice how data_block is allocated in setUp and deallocated in tearDown. These procedures are called before and after each test method. This means that the contents do not carry between tests. Therefore they must be initialised for each test. This may be done in setUp if every test needs the same initial condition or locally to the test if they need different starting points.

Feigning Configuration#

If the unit under test makes use of namelist values from configuration modules they must be suitably initialised. You can not rely on them having been set up by a previous test as they have not been. You can not rely on them defaulting to a particular value because they do not.

Failure to do this will lead to the tests failing to pass, in ways that may be unexpected and unpredictable, as the uninitialised (or initialised elsewhere) parameters change value.

To perform this initialisation use the “feign” functions provided by “feign_config_mod” to explicitly set the values needed. These functions work by creating a temporary namelist, then telling the configuration system to load it.

Bear in mind that you must feign everything the code being tested will use. If the unit calls down to helper procedures, any configuration they make use of must also be feigned.

When calling feign functions, use named arguments. This improves the self documenting nature of the code and protects against new arguments upsetting the ordering.

For example:

call feign_planet_config( gravity=10.0_r_def,    &
                          radius=6000000_r_def,  &
                          omega=8.0E-5_r_def,    &
                          rd=300.0_r_def,        &
                          cp=1000.0_r_def,       &
                          p_zero=100000.0_r_def, &
                          scaling_factor=1.0_r_def )

MPI Testing#

Our unit test framework supports running tests in an MPI environment. This is necessary for testing code that will only run in parallel. This type of code currently only resides in the infrastructure (e.g. the partitioner or local meshes)

It is easy to use, it just needs the number of processes to use to be specified for each test.

module simple_mod_test

  use pFUnit_Mod
  use simple_mod, only: thing, thang

  implicit none

  private
  public set_up, tear_down, test_thing, test_thang

contains

  @before
  subroutine set_up( this )

    use feign_config_mod, only : feign_stuff_config

    implicit none

    class(MpiTestMethod), intent(inout) :: this

    !Store the MPI communicator for later use
    call store_comm(this%getMpiCommunicator())

    call feign_stuff_config( name='foo', value=13 )

  end subroutine set_up

  @after
  subroutine tear_down( this )

    use configuration_mod, only : final_configuration

    implicit none

    class(MpiTestMethod), intent(inout) :: this

    call final_configuration()

  end subroutine tear_down

  @test( npes=[1] )
  subroutine test_thing( fixture )

    implicit none

    class(MpiTestMethod), intent(inout) :: fixture

    integer :: data_block(3)

    call thing( 1, 2, 3, data_block )
    @assertEqual( (/12, 13, 14/), data_block )

  end subroutine test_thing

  @test( npes=[1, 2, 4] )
  subroutine test_thang( fixture )

    implicit none

    class(test_simple_type), intent(inout) :: fixture

    integer :: data_block(3)
    integer :: result

    data_block = (/-1, -2, -3/)
    result = thang( fixture%context%getMpiCommunicator(), data_block )
    if (fixture%context%isRootProcess)
      @assertEqual( -13, result )
    else
      @assertEqual( fixture%context%processRank(), result )
    endif

  end subroutine test_thang

end module simple_mod_test

You can see that a list of numbers of processes is provided for each test. The test will be run with each number of processes in turn.

The test can discover information about the parallel environment in which it is running using the context member of the fixture class. This provides a number of query functions and some tools allowing you to gather and all-reduce over the process pool.

Also notice the use of the “feign” procedure to set up configuration for the unit under test. This is discussed in a previous section.

Test Classes#

When the test fixture must hold data for the test cases a test class is used.

module simple_mod_test

  use pFUnit_Mod
  use simple_mod, only: thing, thang

  implicit none

  private
  public test_simple_type, test_thing, test_thang

  @TestCase
  type, public, extends(TestCase) :: test_simple_type
    real, allocatable :: data_block(:)
  contains
    procedure SetUp
    procedure tearDown
  end type test_simple_type

contains

  subroutine setUp( this )

    implicit none

    class(test_simple_type), intent(inout) :: this

    allocate( this%data_block(256) )

  end subroutine setUp

  subroutine tearDown( this )

    implicit none

    class(test_simple_type), intent(inout) :: this

    deallocate( this%data_block )

  end subroutine tearDown

  @test
  subroutine test_thing( context )

    implicit none

    class(test_simple_type), intent(inout) :: context

    call thing( 1, 2, 3, context%data_block )
    @assertEqual( (/12, 13, 14/), context%data_block )

  end subroutine test_thing

  @test
  subroutine test_thang( context )

    implicit none

    class(test_simple_type), intent(inout) :: context

    integer :: result

    this%data_block = (/-1, -2, -3/)
    result = thang( context%data_block )
    @assertEqual( 13, result )

  end subroutine test_thang

end module simple_mod_test

This is unnecessary for simple tests but can be essential for more complex ones. It can also be used as a way to organise several sets of tests with different fixtures in the same source file.

Parameterised Tests#

This is a more advanced topic but fear not, we have actually come across an example all ready. MPI tests are parameterised; the test is called once for each element in the array of number-of-processes. The number of processes is the parameter.

In the case of MPI tests, the parameterisation is handled for you. However it can often be useful to implement your own tests in this way. In particular where you have a series of tests where the logic is identical but the input and output vary.

Tests like these need a quick and neat way of running the same code with a number of different conditions. That is what parameterised tests provide.

There is a fair bit of boilerplate but hopefully nothing too off-putting:

module simple_mod_test

  use pFUnit_Mod
  use simple_mod, only: thang

  implicit none

  private
  public get_parameters, test_thing, test_thang

  @testParameter
  type, public, extends(MPITestParameter) :: simple_parameter_type
    integer :: input(3)
    integer :: expected
  contains
    procedure :: toString
  end type simple_parameter_type

  @TestCase(npes=[1], testParameters={get_parameters()}, constructor=test_simple_constructor)
  type, public, extends(MPITestCase) :: test_simple_type
    private
    integer :: input(3)
    integer :: expected
    real, allocatable :: data_block(:)
  contains
    procedure setUp
    procedure tearDown
  end type test_simple_type

contains

  function test_simple_constructor( test_parameter ) result( new_test )

    implicit none

    type(simple_parameter_type), intent( in ) :: test_parameter
    type(test_simple_type) :: new_test

    new_test%input    = test_parameter%input
    new_test%expected = test_parameter%expected

  end function test_simple_constructor

  function toString( this ) result( string )

    implicit none

    class( simple_parameter_type ), intent( in ) :: this
    character(:), allocatable :: string

    character(str_long) :: buffer

    write( buffer, '(3I3)') this%input
    string = trim( buffer )

  end function toString

  function get_parameters() result( parameters )

    implicit none

    type(simple_parameter_type) :: parameters(4)

    parameters = (/simple_parameter_type([0, 0, 0], 0),     &
                   simple_parameter_type([1, 2, 3], 7),     &
                   simple_parameter_type([-1, -1, -1], -1), &
                   simple_parameter_type([3, 2, 1], 14)/)

  end function get_parameters

  subroutine setUp( this )

   implicit none

   class(test_simple_type), intent(inout) :: this

   allocate( this%data_block(256) )

  end subroutine setUp

  subroutine tearDown( this )

    implicit none

    class(test_simple_type), intent(inout) :: this

    deallocate( this%data_block )

  end subroutine tearDown

  @test( npes=[1, 2, 4] )
  subroutine test_thang( fixture )

    implicit none

    class(test_simple_type), intent(inout) :: fixture

    integer :: result

    data_block = fixture%input
    result = thang( fixture%context%getMpiCommunicator(), fixture%data_block )
    @assertEqual( fixture%expected, result )

  end subroutine test_thang

end module simple_mod_test

As you can see this allows a large number of cases to be tested without a lot of additional code. It can also work in tandem with MPI testing meaning that each test case will be run with each number-of-processes. Of course it works just as well for serial tests if you derive from TestCase instead.

Despite its verbosity it should be fairly obvious what is going on. The only thing which really needs explaining is the toString method on the parameter type. This is used in the case of a test failure to identify which set of parameters was in use. It should return a string which uniquely identifies the case being run.

You may, of course, use the configuration feigning functions in parameterised tests. They were removed from this example for clarity.

Replacing Infrastructure calls with Canned Information#

When testing kernels, it is often necessary to provide quantities that are quite difficult to generate - such as dofmaps, basis functions and quadrature information. When running the full model, these quantities are provided by the infrastructure, but the infrastructure should not be used to generate them in unit tests. Canned versions of these quantities are available from helper routines held in the support directory components/science/unit-test/support

Using Canned Information#

A description of the available canned-data support routines can be found at Unit test canned-data support routines.

Note

All arrays returned by the helper routines are allocated inside the routines and so, will need to be deallocated in the calling routine when they are no longer required.

Adding New Canned Information#

In order to make the use of canned data easier for the unit test writer, it is important to try to follow a similar API to the already existing canned data routines. The following rules should, therefore, be followed:

  • As noted elsewhere in this page, calls to log_event() should be avoided. We should, therefore, avoid the situation where an error has to be trapped, and therefore, logged. For example, different canned dofmaps might be required for different orders of function space. A single canned-data support routine that takes the required order as an argument could be provided. But this would mean that if a non-supported order is passed in, it would have to be caught and reported. It would be better to provide separate functions for the supported orders. These would then only serve a single purpose and so, need no error checking. If anyone tried to call a routine that hadn’t been written (a currently unsupported order), they would get a linker error.

  • The canned-data support routines should have a single mandatory argument through which the canned data is returned. Any additional information used to control what data is returned should be made through optional arguments. The single mandatory argument should be an allocatable that is allocated by the support routine. This way, the user knows they always have to deallocate the data when they have finished with it. A mixture of some canned data that needs deallocating and some that doesn’t will cause confusion and lead to errors.

  • Support routines that return the sizes of things (i.e. get_unit_test_..._sizes_mod.f90) only return scalar integers and so no deallocation of any returned data is required.

Other Considerations#

There are a few remaining technical issues you might care to know about.

Verbose Output From Testing#

If a test suffers an error (rather than a failure) then pFUnit does not always provide clear details on where it had issues, i.e. a traceback of routines and line numbers. It does provide the “-v” option to try and mitigate this shortcoming somewhat.

This option enables “verbose output” which sends a message to the terminal when it starts and ends each individual tests. This should at least provide a starting point as to where to look for problems.

The LFRic build system provides the “VERBOSE” switch. When set using make test VERBOSE=1, it causes the build process to output much more information about what it is doing. Part of that is to specify the “-v” argument to the unit tests.

When the unit tests are run as part of the !Rose/Cylc test suites, this option is specified by default.

Output to the terminal from unit tests#

Logging using log_mod#

Using the log_event() functionality from log_mod within a unit test should be avoided. Calling into log_mod adds another dependency into your unit test, so if someone else puts an error in log_mod your seemingly unrelated unit test will fail - this makes it harder to debug unit tests.

If you really have to call log_event() from a unit test (for example in the unit tests on log_mod itself!), you must remember to call initialise_logging() from the unit test code, before you attempt to use log_event().

Standard Out#

Due to the internal workings of the pFUnit framework writing to standard out from within tests should also be avoided.

When running in “robust” mode pFUnit actually uses standard out to communicate between its supervising process and the tests it is running. Therefore, injecting unexpected data into this stream can trip the framework up.

If you need a quick and easy way to get things to screen for debugging purposes then there is a solution. If you prepend your debug message with “DEBUG: “ then the framework work will know not to interpret it as internal messaging. Instead the text will be sent to your terminal.

The message will be inserted into the middle of the progress tally so it should not be used permanently. However it should be sufficient for temporary debug purposes.

None of this applies when not in “robust” mode but if you always do it then you don’t have to worry which mode is in use.

Technical Details#

Some technical details are outlined here, should they interest you.

Robust Mode or Non-robust Mode?#

pFUnit offers two modes of operation: “robust” and “non-robust”.

In robust mode each test is run in a subprocess. This is the mode you want to run in as it means an error (as opposed to a failure) in the test will not cause the whole test suite to exit. It should also mean that each test is unable to affect any other test.

Sadly robust mode is rather flawed at the moment. There is leakage between tests which are supposed to be isolated. This leads to nasty side-effects between tests.

Also MPI tests, which we use and need, do not work in robust mode.

There is also an odd behaviour whereby errors are reported, then you are told there were no errors and the suite exits with “Okay”.

In light of all this we have chosen to forgo robust mode until it can be made to work properly and in parallel.

Test Ordering#

The list of tests to run is automatically generated by the build system.

The way this list is compiled means that tests are run in i-node order. In other words it is fairly stable on a machine but random between machines. Where there is an unexpected dependency between tests this can cause a test to fail on one machine but not on another.