CUMF (Compare UM Files)¶
This utility is used to compare two UM files and report on any differences
found in either the headers or field data. Its intended use is to test
results from different UM runs against each other to investigate possible
changes. An install of this module will include an executable wrapper script
mule-cumf which provides a command-line interface to most of CUMF’s
functionality, but it may also be imported and used directly inside another
Python script.
Command line utility¶
Here is the help text for the command line utility (obtainable by running
mule-cumf --help):
===========================================================================
* CUMF-II - Comparison tool for UM Files, version II (using the Mule API) *
===========================================================================
usage:
mule-cumf [-h] [options] file_1 file_2
This script will compare all headers and data from two UM files,
and write a report describing any differences to stdout. The
assumptions made by the comparison may be customised with a
variety of options (see below).
optional arguments:
-h, --help show this help message and exit
--ignore component_name=index1[,index2][...]
ignore specific indices of a component; provide the name of
the component and a comma separated list of indices or ranges
(i.e. M:N) to ignore. This may be specified multiple times to
ignore indices from more than one component
--ignore-missing if present, positional headers will be ignored (required if
missing fields from either file should not be considered a failure
to compare)
--diff-file filename a filename to write a new UM file to which contains the
absolute differences for any fields that differ
--full if not using summary output, will increase the verbosity by
reporting on all comparisons (default behaviour is to only report
on failures)
--summary print a much shorter report which summarises the differences
between the files without going into much detail
--stashmaster STASHMASTER
either the full path to a valid stashmaster file, or a UM
version number e.g. '10.2'; if given a number cumf will look in
the path defined by:
mule.stashmaster.STASHMASTER_PATH_PATTERN
which by default is :
$UMDIR/vnX.X/ctldata/STASHmaster/STASHmaster_A
--show-missing [=N] display missing fields from either file. If given, N is the
maximum number of fields to display.
possible component names for the ignore option:
fixed_length_header, integer_constants, real_constants,
level_dependent_constants, row_dependent_constants,
column_dependent_constants, additional_parameters, extra_constants,
temp_historyfile, compressed_field_index1, compressed_field_index2,
compressed_field_index3, lookup
for details of the indices see UMDP F03:
https://code.metoffice.gov.uk/doc/um/latest/papers/umdp_F03.pdf
um_utils.cumf API¶
Here is the API documentation (auto-generated):
CUMF (Compare UM FieldsFiles) is a utility to assist in examining UM files.
Usage:
Compare
mule.UMFileobjects with theUMFileComparisonclass:>>> comp = UMFileComparison(umfile_object1, umfile_object2)This object can be manually examined for details, or you can print either a short summary or a full report (note a full report is a super-set of a summary report):
>>> summary_report(comp) >>> full_report(comp)Note
The field difference objects behave like the original fields, but their data stores the absolute differences. You could retrieve the data using “get_data” to examine it, or write it out to a file.
Global comparison settings:
The module contains a global “COMPARISON_SETTINGS” dictionary, which defines default values for the various options; these may be overidden for an entire script/session if desired, or in a startup file e.g.
>>> from um_utils import cumf >>> cumf.COMPARISON_SETTINGS["ignore_missing"] = TrueAlternatively each of these settings may be supplied to the main comparison class as keyword arguments. The available settings are:
- ignore_templates:
A dictionary indicating which indices should be ignored when making comparisons. The keys give the names of the components and the values are lists of the indices to ignore (e.g. {“fixed_length_header”: [1,2,3], “lookup”: [5,42]}) (default: ignore creation time in fixed length header only)
- ignore_missing:
Flag which sets all positional header indices to be ignored - this is useful if the file objects being compared have fields which are missing from either file. (default: False)
- only_report_failures:
Flag which indicates that the printed output should not contain any sections which are simply stating that they agree. (This cuts down on the amount of output for larger files). (default:True)
- lookup_print_func:
A callback function which is called for each printed field comparison to provide extra information about the fields. It will be passed 2 arguments - the comparison field and the stdout object to write to.
- show_missing:
Flag which causes a list of fields missing from each file to be generated in the report. (default: False)
- show_missing_max:
Maximum number of missing fields to display. Set to -1 to indicate no maximum. (default: -1)
- class um_utils.cumf.DifferenceField(int_headers, real_headers, data_provider)[source]¶
Bases:
FieldDifference object - for two
mule.Fieldobjects.A special subclass of
mule.Fieldwhich looks and behaves just like the original class, but defines some extra properties that are useful when performing a comparison.- match = None¶
Global matching flag; True if both the lookup and data match.
- data_match = None¶
Data matching flag; True if the field data matches.
- data_shape_match = None¶
Data shape matching flag: True if fields are the same shape.
- compared = None¶
Tuple containing the number of points which are different and the total number of points in the field.
- rms_diff = None¶
Root-Mean-Squared difference between the two fields.
- rms_norm_diff_1 = None¶
Root-Mean-Squared difference between the two fields, normalised by the values in the first field.
- rms_norm_diff_2 = None¶
Root-Mean-Squared difference between the two fields, normalised by the values in the second field.
- max_diff = None¶
Maximum difference between the two fields.
- file_1_index = None¶
The field-index of the first field in its original file.
- file_2_index = None¶
The field-index of the second field in its original file.
- lookup_comparison = None¶
Holds a
ComponentComparisonobject that describes any differences in the lookup component of the fields.
- class um_utils.cumf.DifferenceField2(int_headers, real_headers, data_provider)[source]¶
Bases:
Field2,DifferenceFieldA
DifferenceFieldobject formule.Field2objects.
- class um_utils.cumf.DifferenceField3(int_headers, real_headers, data_provider)[source]¶
Bases:
Field3,DifferenceFieldA
DifferenceFieldobject formule.Field3objects.
- class um_utils.cumf.DifferenceOperator[source]¶
Bases:
DataOperatorThis is a simple operator that calculates the difference between the data in two fields.
- new_field(fields)[source]¶
Create a new field instance from the 2 fields being compared.
This returns a new
DifferenceFieldobject with the same lookup headers as the first field in the list. It’s data will contain the absolute difference of the fields (field_1 - field_2).Several statistical quantities will also be calculated and saved to the new object, for later inspection.
- Args:
- fields:
List containing the 2
mule.Fieldobjects to be compared.
Note
Unlike most other operators the data is retrieved in this method as well as in the transform method; because we need to know if the fields compare.
- class um_utils.cumf.ComponentComparison(component_1, component_2, ignore_indices=[])[source]¶
Bases:
objectThis class stores an individual comparison result; valid for any pair of UM header components.
- __init__(component_1, component_2, ignore_indices=[])[source]¶
Return elements of the components which do not agree.
- Args:
- component_1:
The first component to compare.
- component_2:
The second component to compare.
- Kwargs:
- ignore_indices:
If provided, a list of indices to ignore when performing the check.
- diffs = None¶
If the components differ, this list stores the differences; it will contain one tuple for each difference, consisting of:
The index into the components where the difference occurs.
The value of the item in component_1.
The value of the item in component_2.
- component_1 = None¶
A reference to the first component.
- component_2 = None¶
A reference to the second component
- ignored = None¶
Stores a list of any indices which were ignored.
- in_file_1 = None¶
Presence flag; True if the first component exists.
- in_file_2 = None¶
Presence flag; True if the second component exists.
- same_shape = None¶
Shape flag; True if the components are the same shape.
- compared = None¶
Tuple pair indicating how many values were compared and the total number of possible comparisons.
- match = None¶
Global matching flag; True if both the lookup and data match.
- class um_utils.cumf.UMFileComparison(um_file1, um_file2, **kwargs)[source]¶
Bases:
objectA structure which stores comparison information between two
mule.UMFilesubclasses.- unmatched_file_1 = []¶
A list containing the indices of any fields which exist in file 1 but were not successfully matched to a field in file 2.
- unmatched_file_2 = []¶
A list containing the indices of any fields which exist in file 2 but were not successfully matched to a field in file 1.
- __init__(um_file1, um_file2, **kwargs)[source]¶
Create the comparison object.
- Args:
- um_file1:
The first
mule.UMFilesubclass.
- um_file2:
The second
mule.UMFilesubclass.
- Kwargs:
Any other keywords are assumed to be settings to override the values in the global COMPARISON_SETTINGS dictionary, see the docstring of the
cumfmodule for details
- file_1 = None¶
A reference to the first file object.
- file_2 = None¶
A reference to the second file object.
- files_are_same_type = None¶
Type flag; True if both files are the same file type.
- comparisons = None¶
A dictionary containing a
ComponentComparisonobject for each of the possible UM file header components (except the lookup). The dictionary keys are the component names (e.g. “fixed_length_header”)
- lookup_ignores = None¶
A list of the lookup indices which were ignored for this comparison
- show_missing = False¶
Flag which details if a list of missing fields for each file should be generated in reports.
- show_missing_max = -1¶
The maximum number of missing fields to list. Set to -1 to indicate no maximum.
- field_comparisons = None¶
A list of
DifferenceFieldobjects; one for each pair of fields compared between the two files.
- max_rms_diff_1 = None¶
A tuple containing the maximum encountered RMS difference relative to the data in the first file, and the index of the field containing it.
- max_rms_diff_2 = None¶
A tuple containing the maximum encountered RMS difference relative to the data in the second file, and the index of the field containing it.
- match = None¶
Global matching flag; True if everything about the files matches.
- um_utils.cumf.summary_report(comparison, stdout=None)[source]¶
Print a report giving a brief summary of a comparison object.
- Args:
- comparison:
A
UMFileComparisonobject, populated with the differences between two files.
- Kwargs:
- stdout:
A open file-like object to write the report to.
- um_utils.cumf.full_report(comparison, stdout=None, **kwargs)[source]¶
Print a report giving a full analysis of a comparison object.
- Args:
- comparison:
A
UMFileComparisonobject, populated with the differences between two files.
- Kwargs:
- stdout:
A open file-like object to write the report to.
- Other Kwargs:
Any other keywords are assumed to be settings to override the values in the global COMPARISON_SETTINGS dictionary, see the docstring of the
cumfmodule for details