.. meta:: :description lang=en: Tutorial on developing ANTS based applications :keywords: contribute, ancillary-file-science, application, development, tutorial :property=og:locale: en_GB .. _writing-applications: Writing an ANTS based application ================================= The ANTS library provides a toolkit of functionality common to many ancillary generating applications. The user-written applications used for the standard model configurations within the Momentum partnerhsip are held under the :ancillary-file-science:`ancillary-file-science<>` project (must be signed in to an account with appropriate GitHub permissions for access to that link). An application will usually contain code to carry out the following steps: 1. Read input arguments, such as input and output file names, target grid definition. 2. Load the data from the input files. 3. Process (e.g regrid) the source data to produce the ancillary field(s). 4. Save the data to the output file. For each of these steps, the ANTS library contains code to assist with common operations. In particular, :mod:`ants.command_parse` provides a common commandline interface, which is important to use to ensure a consistent user interface (UI) between all applications, and :mod:`ants.io.save` contains routines for saving ancillary data to common fileformats. As ANTS is based on Iris, you can also use any of the available Iris functionality to process your data. In the tutorial here, we will write an ancillary generating application that covers all steps of the :doc:`ancillary_generation_pipeline`. In real use cases though, it is often preferable to write multiple separate applications, connected via a workflow, to avoid writing monolithic, complex, and expensive to run individual applications. See :ancillary-file-science:`ancillary-file-science<>` and the associated rose-stem workflow for examples of such implementations and breakdowns. As prerequisites for carrying out this tutorial we assume: * Familiarity with `Python `_ * Familiarity with `Iris `_ * Access to and ability to activate an environment with ANTS and its dependencies installed And for the next steps, familiarity with: * `Pytest `_ * `Rose `_ and `Cylc `_ Initial setup ------------- To get started with you will want to check out a copy of :ancillary-file-science:`ancillary-file-science<>` and create a branch where you will be working on your implementation. You will also need an environment active with ANTS installed into it so that you can carry out interactive development and testing of your application. Create App Directory -------------------- Having got your working copy of your branch checked out, we need to create our application in the appropiate place. Ancillary generation applications live in the "Apps" directory of ``ancillary-file-science``. For now, create a directory under "Apps" called "Tutorial", and within that create an "ancil_tutorial.py" where we will be carrying out the coding exercise in this tutorial. In your editor of choice, open up the ancil_tutorial.py file and update it so it contains the following: .. code-block:: python """ Tutorial application ******************** This application was written to carry out the application development tutorial in ANTS. The application does the following: * Loads source and target cubes from the provided filepaths, * Regrids the source data to the target grid using an area weighted regrid. * Adds 1.0 to every data value, * Saves the result to the specified output file path as NetCDF and, optionally, also as a ancillary file. The application implements a few features of ANTS that are common to ancillary applications. """ import ants import ants.regrid import iris from ants import load_cube from ants.io.save import ancil, netcdf def load_data(): return def add_one(): return def regrid_area_weighted(): return def _get_parser(): return def main(): return if __name__ == "__main__": main() This provides us with an initial skeleton from which we will be working. Notice how we have set up placeholders for the majority of the steps we want our app to carry out - we always aim to break down our ancillary generation routines into their constituent parts and avoid "monolithic" main functions containing all the operations wherever possible. Add Arg Parser -------------- ANTS provides a common command line arguments parser via :mod:`ants.command_parse` for use in ancillary applications to provide arguments such as: * Source filepath(s) * Output filepath * ANTS configuration filepath * Target grid or target landseamask filepath * NetCDF-only save option This helps ensure a consistent interface across ANTS based applications and provides a standard entry point for options typically required by them. The :py:class:`ants.AntsArgParser` class used for this can be extended to include application-specific arguments, as required, by using the :meth:`argparse.ArgumentParser.add_argument` method. Here, we will use the out-the-box parser. In your previously created ``ancil_tutorial.py`` file, update ``_get_parser():`` as follows: .. code-block:: python def _get_parser(): """Get ANTS argument parser.""" parser = ants.AntsArgParser(target_grid=True) return parser and also update the ``if __name__ == "__main__":`` as: .. code-block:: python if __name__ == "__main__": args = _get_parser().parse_args() main() Save your changes and you can then inspect the interface by running ``python ancil_tutorial.py --help``. If all has been done correctly then you should see help text printed to the command line showing the various options you can supply via the ``AntsArgParser``. Notice that when we create our parser we are using the ``target_grid=True`` option. This is because we intend to supply the application with a target grid for regridding to later in our implementation. Add File Loading ---------------- For our next step, we are going to need some data. For this, we will be using the ``sample_source.nc`` and ``sample_target.nc`` files stored in the ``rose-stem/sources`` directory of ancillary-file-science. Take a moment to inspect and familiarise yourself with these files using your netCDF tool of choice e.g. ``ncdump -h `` to inspect the contents. As we go about implementing our application we will use ``print()`` statements to provide debugging to confirm what we have done is what we expect. You should see outputs consistent with what you saw inspecting the file in your chosen netCDF tool. To load our source data into our application we will make use of the ``load_cube`` routine from ANTS. Update the ``load_data()`` routine as follows: .. code-block:: python def load_data(source_path, target_path): """ Return cubes obtained from loading the `source` and `target` files. Parameters ---------- source_path : str Filename from which to read source data. target_path : str Filename from which to read target data. Returns ------- tuple[:class:`~iris.cube.Cube`, :class:`~iris.cube.Cube`] The loaded `surface_altitude` and `target` cubes. """ surface_altitude_constraint = iris.NameConstraint(standard_name="surface_altitude") surface_altitude_cube = load_cube(source_path, surface_altitude_constraint) target_cube = load_cube(target_path) return surface_altitude_cube, target_cube and update the ``main()`` routine as: .. code-block:: python def main(source_path, target_path): surface_altitude, target = load_data(source_path, target_path) print(surface_altitude) print(target) return and ``if __name__ == "__main__":`` as: .. code-block:: python if __name__ == "__main__": args = _get_parser().parse_args() main(args.sources, args.target_grid) Looking at our ``load_data`` routine we've done a few things worth noting. Often, source files will contain more than one field, or we want to make sure we are only loading specific fields (as opposed to whatever is in there). To do this we make use of Iris constraints. Here we want to specifically load the surface altitude fields. Run your application as: ``python ancil_tutorial.py /path/to/sample_source.nc --target-grid /path/to/sample_target.nc -o output.nc`` and you should see details of the loaded iris cubes printed to screen. N.B. we have had to supply the ``-o output.nc`` option as the interface requires it, even though we have not yet implemented code to save the file itself. Add Regridding -------------- Assuming no further processing of our source is necessary, the next thing we will want to do is to put that data on our target model grid, as supplied to our application via the ``--target-grid`` option. To do this we will be using the ANTS general regrid interface. Update the ``def regrid_area_weighted():`` section so it is as follows: .. code-block:: python def regrid_area_weighted(source, target): """Regrid data from `source` cube to `target` grid. Uses the ANTS area-weighted regrid scheme. Parameters ---------- source : :class:`~iris.cube.Cube` Cube containing the data to be regridded. target : :class:`~iris.cube.Cube` Target cube containing the destination grid. Returns ------- :class:`~iris.cube.Cube` Data from `source` regridded to `target`. """ scheme = ants.regrid.GeneralRegridScheme(horizontal_scheme="AreaWeighted") return source.regrid(target, scheme) This will take in a source code and use the ANTS general regrid AreaWeighted routine to regrid it to the provided target grid. We will then integrate this into our main routine by updating it as follows: .. code-block:: python def main(source_path, target_path): surface_altitude, target = load_data(source_path, target_path) print("Before regrid:") print(surface_altitude) print(target) regridded_surface_altitude = regrid_area_weighted(surface_altitude, target) print("After regrid:") print(regridded_surface_altitude) return Run your application as before. Look at the outputs printed to screen and you should see that the resolution of the regridded surface altitude is different from the original source data. Add Some Processing ------------------- With data on our target grid we will then likely want to do some processing. Here, we will use our ``add_one()`` routine to do this. We'll use this routine to take in a cube, and return the result of adding 1.0 to the data in there. Update it as follows: .. code-block:: python def add_one(cube): """ Return a modified cube. Returned cube is the same as the original cube, except 1.0 is added to each data value. Original cube is not modified. Note ---- This function operates on a single cube, not a CubeList. Parameters ---------- cube : :class:`~iris.cube.Cube` Cube to add 1.0 to. Returns ------- :class:`~iris.cube.Cube` The cube that has had 1.0 added to it. """ result = cube.copy(cube.core_data() + 1.0) return result We will make use of this in our main function by applying it to the sum of our two regridded fields. Update ``main()`` so it looks like this: .. code-block:: python def main(source_path, target_path): surface_altitude, target = load_data(source_path, target_path) print("Before regrid:") print(surface_altitude) print(target) regridded_surface_altitude = regrid_area_weighted(surface_altitude, target) print("After regrid:") print(regridded_surface_altitude) result = add_one(regridded_surface_altitude) result.attributes["STASH"] = "m01s00i033" print("Results cube:") print(result) return Notice that we have also updated the STASH attribute on the cube via the ``result.attributes["STASH"] = "m01s00i033"`` operation. In order to generate a valid UM ancillary file we need an associated STASH code as this is the only identifier in the file as to what the data is, both to the model and any user inspecting the file. As before, run your program and make sure everything is as expected. Save a File ----------- The final thing to add is saving the results of our processing. By default we recommend always saving out a copy of the processed data in netcdf in addition to any other formats. For saving, we will make use of the routines in :mod:`ants.io.save` we imported earlier - ``ancil`` and ``netcdf``. Firstly, lets update our ``if __name__ == "__main__":`` section to pass through the argument to main for what file to save to, along with whether we want to only save in netcdf (a convention we follow across ancillary apps where applicable) as: .. code-block:: python if __name__ == "__main__": args = _get_parser().parse_args() main( args.sources, args.target_grid, args.output, args.netcdf_only, ) We will now update our ``main()`` section to take these arguments and to run the savers. While we are in there we will also remove the debug print statements, add some comments, and put in a docstring. Update your ``main()`` section to look like this: .. code-block:: python def main(source_path, target_path, output_path, netcdf_only): """ Regrid a field, add 1 to the data, and save the result to a file. Parameters ---------- source_path : str Filename from which to read source data. target_path : str Filename from which to read target grid. output_path : str Filename to write results to. netcdf_only : bool If True, only write output to netCDF file. If False, write output to both netCDF and ancillary files. Returns ------- :class:`~iris.cube.Cube` The processed cube. """ # Load in the source data and target grid surface_altitude, target = load_data(source_path, target_path) # Regrid surface_altitude to the target grid regridded_surface_altitude = regrid_area_weighted(surface_altitude, target) # Process the data result = add_one(regridded_surface_altitude) result.attributes["STASH"] = "m01s00i033" # Always save to netcdf file netcdf(result, output_path) # Also save to ancillary file unless the '--netcdf-only' # command line option has been specified if not netcdf_only: ancil(result, output_path) return result Save your changes and try out your code. Experiment with the ``--netcdf-only`` option and different filenames for your output. You can use mule to inspect the ancil file to make sure it is how you expect and whatever your goto netcdf tool is to inspect the netcdf file. Next Steps ---------- With your application now written, your next step would be to add pytest style unittests for the routines in the application. These would be stored in a tests subdirectory of the ``Apps/Tutorial`` directory you created earlier. We will not go into detail of doing that here, but because we broke down our code into constituent parts rather than leaving it all under ``main()`` we are more easily able to test the code. With unittests written, your final step would be to integrate your new application into the rose-stem framework in ``ancillary-file-science``, providing a representative workflow with some cutdown/representative/synthetic low resolution data. For examples of adding unittests and up-to-date way of implementing rose-stem testing, see the example "Sample" application in :ancillary-file-science:`ancillary-file-science<>`. For the latest guidance on what would be expected from a finalised application visit the ANTS Working Practices page. Summary ------- From the above, you should now have created a application that: * Implements the ANTS argument parser * Loads in some specified data from a file * Loads in a target grid * Regrids data to a target grid using an ANTS regridder * Carries out some data processing * Saves data to netcdf and ancil file formats, with the option to save netcdf only And if you carried out the Next Steps: * Is unittested * Is tested under rose-stem Advanced Usage -------------- The tutorial provided above is a deliberately small scale example with low resolution synthetic data and a low resolution target grid. In real world usage source datasets are often significant in size and target grids contain many more points than in the example here. As a result you may find that some of the operations carried out consume too many resources for the platforms being run on, or that the operations take too long for your needs. In that situation, the author will need to look into optimisation of their codes. ANTS provides some out-the-box support for this via the :mod:`ants.decomposition` framework for splitting up and parallelising operation, noting that care will be needed to ensure its usage is appropriate for the codes being parallelised. Finally, it is also always worth taking a look at the existing ancillary generation applications in :ancillary-file-science:`ancillary-file-science<>` to see what has been done before. It may be that related implementations already exist for what you are intending to do, or that ideas for how to use parts of the :doc:`API docs ` in your processing chain.