CUMF (Compare UM Files)

This utility is used to compare two UM files and report on any differences found in either the headers or field data. Its intended use is to test results from different UM runs against each other to investigate possible changes. An install of this module will include an executable wrapper script mule-cumf which provides a command-line interface to most of CUMF’s functionality, but it may also be imported and used directly inside another Python script.

Command line utility

Here is the help text for the command line utility (obtainable by running mule-cumf --help):

===========================================================================
* CUMF-II - Comparison tool for UM Files, version II (using the Mule API) *
===========================================================================
usage:
  mule-cumf [-h] [options] file_1 file_2

This script will compare all headers and data from two UM files,
and write a report describing any differences to stdout.  The
assumptions made by the comparison may be customised with a
variety of options (see below).

optional arguments:
  -h, --help            show this help message and exit
  --ignore component_name=index1[,index2][...]
                        ignore specific indices of a component; provide the name of
                        the component and a comma separated list of indices or ranges
                        (i.e. M:N) to ignore.  This may be specified multiple times to
                        ignore indices from more than one component

  --ignore-missing      if present, positional headers will be ignored (required if
                        missing fields from either file should not be considered a failure
                        to compare)

  --diff-file filename  a filename to write a new UM file to which contains the
                        absolute differences for any fields that differ

  --full                if not using summary output, will increase the verbosity by
                        reporting on all comparisons (default behaviour is to only report
                        on failures)

  --summary             print a much shorter report which summarises the differences
                        between the files without going into much detail

  --stashmaster STASHMASTER
                        either the full path to a valid stashmaster file, or a UM
                        version number e.g. '10.2'; if given a number cumf will look in
                        the path defined by:
                          mule.stashmaster.STASHMASTER_PATH_PATTERN
                        which by default is :
                          $UMDIR/vnX.X/ctldata/STASHmaster/STASHmaster_A
  --show-missing [=N]   display missing fields from either file. If given, N is the
                         maximum number of fields to display.

possible component names for the ignore option:
    fixed_length_header, integer_constants, real_constants,
    level_dependent_constants, row_dependent_constants,
    column_dependent_constants, additional_parameters, extra_constants,
    temp_historyfile, compressed_field_index1, compressed_field_index2,
    compressed_field_index3, lookup

for details of the indices see UMDP F03:
  https://code.metoffice.gov.uk/doc/um/latest/papers/umdp_F03.pdf

um_utils.cumf API

Here is the API documentation (auto-generated):

CUMF (Compare UM FieldsFiles) is a utility to assist in examining UM files.

Usage:

  • Compare mule.UMFile objects with the UMFileComparison class:

    >>> comp = UMFileComparison(umfile_object1, umfile_object2)
    
  • This object can be manually examined for details, or you can print either a short summary or a full report (note a full report is a super-set of a summary report):

    >>> summary_report(comp)
    >>> full_report(comp)
    

    Note

    The field difference objects behave like the original fields, but their data stores the absolute differences. You could retrieve the data using “get_data” to examine it, or write it out to a file.

Global comparison settings:

The module contains a global “COMPARISON_SETTINGS” dictionary, which defines default values for the various options; these may be overidden for an entire script/session if desired, or in a startup file e.g.

>>> from um_utils import cumf
>>> cumf.COMPARISON_SETTINGS["ignore_missing"] = True

Alternatively each of these settings may be supplied to the main comparison class as keyword arguments. The available settings are:

  • ignore_templates:

    A dictionary indicating which indices should be ignored when making comparisons. The keys give the names of the components and the values are lists of the indices to ignore (e.g. {“fixed_length_header”: [1,2,3], “lookup”: [5,42]}) (default: ignore creation time in fixed length header only)

  • ignore_missing:

    Flag which sets all positional header indices to be ignored - this is useful if the file objects being compared have fields which are missing from either file. (default: False)

  • only_report_failures:

    Flag which indicates that the printed output should not contain any sections which are simply stating that they agree. (This cuts down on the amount of output for larger files). (default:True)

  • lookup_print_func:

    A callback function which is called for each printed field comparison to provide extra information about the fields. It will be passed 2 arguments - the comparison field and the stdout object to write to.

  • show_missing:

    Flag which causes a list of fields missing from each file to be generated in the report. (default: False)

  • show_missing_max:

    Maximum number of missing fields to display. Set to -1 to indicate no maximum. (default: -1)

class um_utils.cumf.DifferenceField(int_headers, real_headers, data_provider)[source]

Bases: Field

Difference object - for two mule.Field objects.

A special subclass of mule.Field which looks and behaves just like the original class, but defines some extra properties that are useful when performing a comparison.

match = None

Global matching flag; True if both the lookup and data match.

data_match = None

Data matching flag; True if the field data matches.

data_shape_match = None

Data shape matching flag: True if fields are the same shape.

compared = None

Tuple containing the number of points which are different and the total number of points in the field.

rms_diff = None

Root-Mean-Squared difference between the two fields.

rms_norm_diff_1 = None

Root-Mean-Squared difference between the two fields, normalised by the values in the first field.

rms_norm_diff_2 = None

Root-Mean-Squared difference between the two fields, normalised by the values in the second field.

max_diff = None

Maximum difference between the two fields.

file_1_index = None

The field-index of the first field in its original file.

file_2_index = None

The field-index of the second field in its original file.

lookup_comparison = None

Holds a ComponentComparison object that describes any differences in the lookup component of the fields.

class um_utils.cumf.DifferenceField2(int_headers, real_headers, data_provider)[source]

Bases: Field2, DifferenceField

A DifferenceField object for mule.Field2 objects.

class um_utils.cumf.DifferenceField3(int_headers, real_headers, data_provider)[source]

Bases: Field3, DifferenceField

A DifferenceField object for mule.Field3 objects.

class um_utils.cumf.DifferenceOperator[source]

Bases: DataOperator

This is a simple operator that calculates the difference between the data in two fields.

__init__()[source]

Initialise the object.

new_field(fields)[source]

Create a new field instance from the 2 fields being compared.

This returns a new DifferenceField object with the same lookup headers as the first field in the list. It’s data will contain the absolute difference of the fields (field_1 - field_2).

Several statistical quantities will also be calculated and saved to the new object, for later inspection.

Args:
  • fields:

    List containing the 2 mule.Field objects to be compared.

Note

Unlike most other operators the data is retrieved in this method as well as in the transform method; because we need to know if the fields compare.

transform(fields, new_field)[source]

Return the absolute differences between the two fields.

class um_utils.cumf.ComponentComparison(component_1, component_2, ignore_indices=[])[source]

Bases: object

This class stores an individual comparison result; valid for any pair of UM header components.

__init__(component_1, component_2, ignore_indices=[])[source]

Return elements of the components which do not agree.

Args:
  • component_1:

    The first component to compare.

  • component_2:

    The second component to compare.

Kwargs:
  • ignore_indices:

    If provided, a list of indices to ignore when performing the check.

diffs = None

If the components differ, this list stores the differences; it will contain one tuple for each difference, consisting of:

  • The index into the components where the difference occurs.

  • The value of the item in component_1.

  • The value of the item in component_2.

component_1 = None

A reference to the first component.

component_2 = None

A reference to the second component

ignored = None

Stores a list of any indices which were ignored.

in_file_1 = None

Presence flag; True if the first component exists.

in_file_2 = None

Presence flag; True if the second component exists.

same_shape = None

Shape flag; True if the components are the same shape.

compared = None

Tuple pair indicating how many values were compared and the total number of possible comparisons.

match = None

Global matching flag; True if both the lookup and data match.

class um_utils.cumf.UMFileComparison(um_file1, um_file2, **kwargs)[source]

Bases: object

A structure which stores comparison information between two mule.UMFile subclasses.

unmatched_file_1 = []

A list containing the indices of any fields which exist in file 1 but were not successfully matched to a field in file 2.

unmatched_file_2 = []

A list containing the indices of any fields which exist in file 2 but were not successfully matched to a field in file 1.

__init__(um_file1, um_file2, **kwargs)[source]

Create the comparison object.

Args:
  • um_file1:

    The first mule.UMFile subclass.

  • um_file2:

    The second mule.UMFile subclass.

Kwargs:

Any other keywords are assumed to be settings to override the values in the global COMPARISON_SETTINGS dictionary, see the docstring of the cumf module for details

file_1 = None

A reference to the first file object.

file_2 = None

A reference to the second file object.

files_are_same_type = None

Type flag; True if both files are the same file type.

comparisons = None

A dictionary containing a ComponentComparison object for each of the possible UM file header components (except the lookup). The dictionary keys are the component names (e.g. “fixed_length_header”)

lookup_ignores = None

A list of the lookup indices which were ignored for this comparison

show_missing = False

Flag which details if a list of missing fields for each file should be generated in reports.

show_missing_max = -1

The maximum number of missing fields to list. Set to -1 to indicate no maximum.

field_comparisons = None

A list of DifferenceField objects; one for each pair of fields compared between the two files.

max_rms_diff_1 = None

A tuple containing the maximum encountered RMS difference relative to the data in the first file, and the index of the field containing it.

max_rms_diff_2 = None

A tuple containing the maximum encountered RMS difference relative to the data in the second file, and the index of the field containing it.

match = None

Global matching flag; True if everything about the files matches.

um_utils.cumf.summary_report(comparison, stdout=None)[source]

Print a report giving a brief summary of a comparison object.

Args:
  • comparison:

    A UMFileComparison object, populated with the differences between two files.

Kwargs:
  • stdout:

    A open file-like object to write the report to.

um_utils.cumf.full_report(comparison, stdout=None, **kwargs)[source]

Print a report giving a full analysis of a comparison object.

Args:
  • comparison:

    A UMFileComparison object, populated with the differences between two files.

Kwargs:
  • stdout:

    A open file-like object to write the report to.

Other Kwargs:

Any other keywords are assumed to be settings to override the values in the global COMPARISON_SETTINGS dictionary, see the docstring of the cumf module for details