mule¶
This module provides a series of classes to allow interaction with various file formats produced and used by the UM (Unified Model) system.
The top-level UMFile class provides an object representing a generic
UM file of the fieldsfile-like type, as covered in document UMDP F03.
This enables any file of this general form to be handled.
In practice, most files will be of a specific known subtype and it is then
simpler and safer to use the appropriate subclass, FieldsFile
or LBCFile : These perform type-specific sanity checking,
and provide named attributes to access all of the header elements.
for example:
>>> ff = mule.FieldsFile.from_file(in_path)
>>> print 'model = ', ff.fixed_length_header.model_version
>>> ff.integer_constants.num_soil_levels = 0
>>> ff.fields = [fld for fld in ff.fields
... if (fld.lbuser7 == 1 and fld.lbuser4 in (204, 207)
and 1990 <= fld.lbyr < 2000)]
>>> ff.to_file(out_path)
The more general UMFile class is provided to handle files of other
types, and can also be used to correct or adjust files of recognised types that
are invalid because of unexpected or inconsistent header information.
- class mule._HeaderMetaclass(classname, bases, class_dict)[source]¶
Bases:
typeMetaclass used to give named attributes to other classes.
This metaclass is used in the construction of several header-like classes in this API; note that it is applied on defining the classes (i.e. when the module is imported), not later when a specific instance of the classes are initialised.
The purpose of this class is to attach a set of named attributes to the header object and associate these with specific indices of the underlying array of header values. The target class defines this “mapping” itself, allowing this metaclass to be used for multiple header-like objects.
- class mule.BaseHeaderComponent[source]¶
Bases:
objectBase class for a UM header component.
Note
This class is not intended to be used directly; it acts only to group together the common parts of the
BaseHeaderComponent1DandBaseHeaderComponent2Dclasses.- MDI = None¶
The value to use to indicate missing header values.
- DTYPE = None¶
The data-type of the words in the header.
- CREATE_DIMS = None¶
A tuple defining the default dimensions of the header to be produced by the
empty()method, when the caller provides incomplete shape information. Where an element of the tuple is “None”, the arguments to the empty method must specify a size for the corresponding dimension.
- HEADER_MAPPING = None¶
A list containing a series of tuple-pairs; the raw value of an index in the header, and a named-attribute to associate with it (see the help for the
_HeaderMetaclassfor further details).
- property shape¶
Return the shape of the header object.
- property raw¶
Return the raw values of the header object.
- class mule.BaseHeaderComponent1D(values)[source]¶
Bases:
BaseHeaderComponent1-Dimensional UM header component.
- CREATE_DIMS = (None,)¶
A tuple defining the default dimensions of the header to be produced by the
empty()method, when the caller provides incomplete shape information. Where an element of the tuple is “None”, the arguments to the empty method must specify a size for the corresponding dimension.
- __init__(values)[source]¶
Initialise the object from a series of values.
- Args:
- values:
array-like object containing values in this header.
Note
The values are internally stored offset by 1 element (so that when the raw values are accessed their indexing is 1-based, to match up with their definitions in UMDP F03).
- classmethod empty(num_words=None)[source]¶
Create an instance of the class from-scratch.
- Kwargs:
- num_words:
The number of words to use to create the header.
Note
Passing “num_words” may be optional or mandatory depending on the value of the class’s CREATE_DIMS attribute.
- classmethod from_file(source, num_words)[source]¶
Create an instance of the class populated by values from a file.
- Args:
- source:
The (open) file object containing the header value, with its file pointer positioned at the start of this header.
- num_words:
The number of words to read in from the file to populate the header.
- class mule.BaseHeaderComponent2D(values)[source]¶
Bases:
BaseHeaderComponent2-Dimensional UM header component.
- CREATE_DIMS = (None, None)¶
A tuple defining the default dimensions of the header to be produced by the
empty()method, when the caller provides incomplete shape information. Where an element of the tuple is “None”, the arguments to the empty method must specify a size for the corresponding dimension.
- __init__(values)[source]¶
Initialise the object from a series of values.
- Args:
- values:
2-dimensional array-like object containing values in this header.
Note
The values are internally stored offset by 1 element in their second dimension (so that when the raw values are accessed their indexing is 1-based, to match up with the definitions in UMDP F03).
- classmethod empty(dim1=None, dim2=None)[source]¶
Create an instance of the class from-scratch.
- Kwargs:
- dim1:
The number of words to use for the header’s first dimension.
- dim2:
The number of words to use for the header’s second dimension.
Note
Setting “dim1” and/or “dim2” may be optional or mandatory depending on the values of the class’s CREATE_DIMS attribute.
- classmethod from_file(source, dim1, dim2)[source]¶
Create an instance of the class populated by values from a file.
- Args:
- source:
The (open) file object containing the header value, with its file pointer positioned at the start of this header.
- dim1:
The number of words to read in from the file to populate each row of the header.
- dim2:
The number of the above rows to read in from the file to populate the header.
- class mule.FixedLengthHeader(values)[source]¶
Bases:
BaseHeaderComponent1DThe fixed length header component of a UM file.
This component is different to the others since its length is not able to be altered at creation-time; the fixed length header is always a specific number of words in length.
- HEADER_MAPPING = [('data_set_format_version', 1), ('sub_model', 2), ('vert_coord_type', 3), ('horiz_grid_type', 4), ('dataset_type', 5), ('run_identifier', 6), ('experiment_number', 7), ('calendar', 8), ('grid_staggering', 9), ('time_type', 10), ('projection_number', 11), ('model_version', 12), ('obs_file_type', 14), ('last_fieldop_type', 15), ('t1_year', 21), ('t1_month', 22), ('t1_day', 23), ('t1_hour', 24), ('t1_minute', 25), ('t1_second', 26), ('t1_year_day_number', 27), ('t2_year', 28), ('t2_month', 29), ('t2_day', 30), ('t2_hour', 31), ('t2_minute', 32), ('t2_second', 33), ('t2_year_day_number', 34), ('t3_year', 35), ('t3_month', 36), ('t3_day', 37), ('t3_hour', 38), ('t3_minute', 39), ('t3_second', 40), ('t3_year_day_number', 41), ('integer_constants_start', 100), ('integer_constants_length', 101), ('real_constants_start', 105), ('real_constants_length', 106), ('level_dependent_constants_start', 110), ('level_dependent_constants_dim1', 111), ('level_dependent_constants_dim2', 112), ('row_dependent_constants_start', 115), ('row_dependent_constants_dim1', 116), ('row_dependent_constants_dim2', 117), ('column_dependent_constants_start', 120), ('column_dependent_constants_dim1', 121), ('column_dependent_constants_dim2', 122), ('additional_parameters_start', 125), ('additional_parameters_dim1', 126), ('additional_parameters_dim2', 127), ('extra_constants_start', 130), ('extra_constants_length', 131), ('temp_historyfile_start', 135), ('temp_historyfile_length', 136), ('compressed_field_index1_start', 140), ('compressed_field_index1_length', 141), ('compressed_field_index2_start', 142), ('compressed_field_index2_length', 143), ('compressed_field_index3_start', 144), ('compressed_field_index3_length', 145), ('lookup_start', 150), ('lookup_dim1', 151), ('lookup_dim2', 152), ('total_prognostic_fields', 153), ('data_start', 160), ('data_dim1', 161), ('data_dim2', 162)]¶
A list containing a series of tuple-pairs; the raw value of an index in the header, and a named-attribute to associate with it (see the help for the
_HeaderMetaclassfor further details).
- MDI = -32768¶
The value to use to indicate missing header values.
- DTYPE = '>i8'¶
The data-type of the words in the header.
- _NUM_WORDS = 256¶
The (fixed) number of words in a UM fixed length header.
- __init__(values)[source]¶
Initialise the object from a series of values.
- Args:
- values:
array-like object containing values contained in this header. Must be the exact length specified by _NUM_WORDS.
Note
The values are internally stored offset by 1 element (so that when the raw values are accessed their indexing is 1-based, to match up with their definitions in UMDP F03).
- classmethod empty()[source]¶
Create an instance of the class from-scratch.
Unlike the other header components the fixed length header always creates a class of a fixed size (based on its _NUM_WORDS attribute).
- classmethod from_file(source)[source]¶
Create an instance of the class populated by values from a file.
Unlike the other header components the fixed length header always reads a specific number of values (based on its _NUM_WORDS attribute).
- Args:
- source:
The (open) file object containing the header value, with its file pointer positioned at the start of this header.
- class mule.IntegerConstants(values)[source]¶
Bases:
BaseHeaderComponent1DThe integer constants component of a UM file.
- MDI = -32768¶
The value to use to indicate missing header values.
- DTYPE = '>i8'¶
The data-type of the words in the header.
- class mule.RealConstants(values)[source]¶
Bases:
BaseHeaderComponent1DThe real constants component of a UM file.
- MDI = -1073741824.0¶
The value to use to indicate missing header values.
- DTYPE = '>f8'¶
The data-type of the words in the header.
- class mule.LevelDependentConstants(values)[source]¶
Bases:
BaseHeaderComponent2DThe level dependent constants component of a UM file.
- MDI = -1073741824.0¶
The value to use to indicate missing header values.
- DTYPE = '>f8'¶
The data-type of the words in the header.
- class mule.RowDependentConstants(values)[source]¶
Bases:
BaseHeaderComponent2DThe row dependent constants component of a UM file.
- MDI = -1073741824.0¶
The value to use to indicate missing header values.
- DTYPE = '>f8'¶
The data-type of the words in the header.
- class mule.ColumnDependentConstants(values)[source]¶
Bases:
BaseHeaderComponent2DThe column dependent constants component of a UM file.
- MDI = -1073741824.0¶
The value to use to indicate missing header values.
- DTYPE = '>f8'¶
The data-type of the words in the header.
- class mule.UnsupportedHeaderItem1D(values)[source]¶
Bases:
BaseHeaderComponent1DAn unsupported 1-dimensional component of a UM file.
- MDI = -32768¶
The value to use to indicate missing header values.
- DTYPE = '>i8'¶
The data-type of the words in the header.
- class mule.UnsupportedHeaderItem2D(values)[source]¶
Bases:
BaseHeaderComponent2DAn unsupported 2-dimensional component of a UM file.
- MDI = -32768¶
The value to use to indicate missing header values.
- DTYPE = '>i8'¶
The data-type of the words in the header.
- class mule.Field(int_headers, real_headers, data_provider)[source]¶
Bases:
objectRepresents a single entry in the lookup table, and provides access to the data referenced by it.
Note
This class assumes the (common) UM lookup header comprising of 64 words split between 45 integer and 19 real values.
- __init__(int_headers, real_headers, data_provider)[source]¶
Initialise the Field object.
- Args:
- int_headers:
A sequence of integer header values.
- real_headers:
A sequence of floating-point header values.
- data_provider:
An object representing the field data payload. Typically, this is an object with a “._data_array” method, in which case the data can be fetched with
get_data().
- classmethod empty()[source]¶
Create an instance of the class from-scratch.
The instance will be filled with empty values (-99 for integers, and 0.0 for reals), and will have no data_provider set.
- property raw¶
Return the raw values in the lookup array.
- to_file(output_file)[source]¶
Write the lookup header to a file object.
- Args:
- output_file:
The (open) file object for the lookup to be written to.
- copy()[source]¶
Create a Field which copies its header information from this one, and takes its data from the same data provider.
- set_data_provider(data_provider)[source]¶
Set the field data payload.
- Args:
- data_provider:
An object representing the field data payload. Typically, this is an object with a “._data_array” method, which means the data can be accessed with
get_data().
- _get_raw_payload_bytes()[source]¶
Return a buffer containing the raw bytes of the data payload.
The field data must be unmodified and using the same packing code as the original data (this can be tested by calling _can_copy_deferred_data).
- _can_copy_deferred_data(required_lbpack, required_bacc, required_word)[source]¶
Return whether or not it is possible to simply re-use the bytes making up the field; for this to be possible the data must be unmodified, and the requested output packing and disk word size must be the same as the input.
- class mule.Field2(int_headers, real_headers, data_provider)[source]¶
Bases:
FieldRepresents an entry from the LOOKUP component with a header release number of 2.
- class mule.Field3(int_headers, real_headers, data_provider)[source]¶
Bases:
FieldRepresents an entry from the LOOKUP component with a header release number of 3.
- class mule.ArrayDataProvider(array)[source]¶
Bases:
objectA
Fielddata provider that contains an actual array of values.This is used to make a field with an ordinary array as its data payload.
Note
This must be used with caution, as multiple fields with a concrete data payload can easily consume large amounts of space. By contrast, processing field payloads from an existing file will normally only load one at a time.
- class mule._OperatorDataProvider(operator, source, new_field)[source]¶
Bases:
objectA
Fielddata provider that fetches its data from aDataOperator, by callingtransform().- ..Note: This should only really ever be instantiated from within
the
DataOperator.
- __init__(operator, source, new_field)[source]¶
Create a wrapper, including references to the operator, the original source data and and the result field.
- Args:
- operator:
A reference to the
DataOperatorinstance which created this provider (to allow itstransform()method to be accessed in_data_array()below).
- source:
The source object for the above
DataOperator- this can be anything, and is required here so that it can be passed onto the operator’s meth:transform method below.
- new_field:
The new field returned by the above
DataOperator- this is again needed by the operator’s meth:transform method.
- class mule.DataOperator(*args, **kwargs)[source]¶
Bases:
objectBase class which should be sub-classed to perform manipulations on the data of a field. The
Fieldclasses never store any data directly in memory; only the means to retrieve it from disk and perform any required operations (which will only be executed when explicitly requested - this would normally be at the point the file is being written/closed).Note
the user must override the “__init__”, “new_field” and “transform” methods of this baseclass to create a valid operator.
A DataOperator is used to produce new
Fields, which are calculated from existing source fields and which can also calculate their data results from the source data at a subsequent time.The normal usage occurs in 3 separate stages:
__init__()creates a new operator with any instance-specific parameters.__call__()is used to produce a new, transformedFieldobjects from existing ones, via the usernew_field()method.transform()is called by an output field to calculate its data payload.
For example:
>>> class XSampler(DataOperator): ... def __init__(self, factor): ... self.factor = factor ... def new_field(self, source_field): ... fld = source_field.copy() ... fld.lbnpt /= self.factor ... fld.bdx *= self.factor ... return fld ... def transform(self, source_field, result_field): ... data = source_field.get_data() ... return data[:, ::self.factor] ... >>> XStep4 = XSampler(factor=4) >>> ff.fields = [XStep4(fld) for fld in ff.fields] >>> ff.to_file(out_path)
- __init__(*args, **kwargs)[source]¶
Initialise the operator object - this should be overridden by the user.
This method should accept any user arguments to be “baked” into the operator or to otherwise initialise it as-per the user’s requirements; for example an operator which scales the values in fields by a constant amount might want to accept an argument giving that amount.
- __call__(source, *args, **kwargs)[source]¶
Wrap the operator around a source object.
This calls the user-supplied
new_field()method, and configures the resulting field to return its data from thetransform()method of the data operator.- Args:
- source:
This can be an object of any type; it is typically an existing
Fieldwhich the result field is based on.
- Returns:
- new_field (
Field): A new Field instance, which returns data generated via the
transform()method.
- new_field (
- new_field(source, *args, **kwargs)[source]¶
Produce a new output
Fieldfrom a source object - this method should be overridden by the user.This method encodes how to produce a new field, which is typically derived by calculation from an existing source field or fields. It is called by the
__call__()method.- Args:
- source:
This can be an object of any type; it is typically an existing
Fieldwhich the result field is based on.
- Returns:
- new_field (
Field): A new Field instance, whose lookup attributes reflect the final state of the result: E.G. if the operator affects the number of rows in the field, then ‘new_field’ must have its row settings set accordingly.
- new_field (
Note
It is advisable not to modify the “source” object inside this method; modifications should be confined to the new field object.
- transform(source, result_field)[source]¶
Calculate the data payload for a result field - this method should be overridden by the user.
This method must return a 2D numpy array containing the field data. Typically it will extract the data payload from a source field and manipulate it in some way.
- Args:
- source:
The original ‘source’ argument from the
__call__()invocation that created ‘result_field’. Usually, this is a pre-existingFieldobject from which the result field is calculated.
- result_field:
The ‘new’ field that was created by a call to
__call__(), for which the data is now wanted. This should not be modified, but provides access to any necessary context information determined when it was created.
- Returns:
- data (array):
The data array for ‘result_field’.
- class mule.RawReadProvider(source, sourcefile, offset)[source]¶
Bases:
objectA generic ‘data provider’ object, which deals with the most basic/common data-provision operation of reading in Field data from a file.
This class should not be used directly, since it does not define a “_data_array” method, and so cannot return any data. A series of subclasses of this class are provided which define the ‘_data_array’ method for the different packing types found in various types of
UMFile.- __init__(source, sourcefile, offset)[source]¶
Initialise the read provider.
- Args:
- source:
Initial field object reference (populated with the lookup values from the file specified in sourcefile.
- sourcefile:
Filename associated with source FieldsFileVariant.
- offset:
Starting position of Field data in sourcefile (in bytes).
- class mule._NullReadProvider(source, sourcefile, offset)[source]¶
Bases:
RawReadProviderA ‘raw’ data provider object to be used when a packing code is unrecognised - to be able to represent unknown-type data in a
Field.
- class mule.UMFile[source]¶
Bases:
objectRepresents the structure of a single UM file.
- COMPONENTS = (('integer_constants', <class 'mule.IntegerConstants'>), ('real_constants', <class 'mule.RealConstants'>), ('level_dependent_constants', <class 'mule.LevelDependentConstants'>), ('row_dependent_constants', <class 'mule.RowDependentConstants'>), ('column_dependent_constants', <class 'mule.ColumnDependentConstants'>), ('additional_parameters', <class 'mule.UnsupportedHeaderItem2D'>), ('extra_constants', <class 'mule.UnsupportedHeaderItem1D'>), ('temp_historyfile', <class 'mule.UnsupportedHeaderItem1D'>), ('compressed_field_index1', <class 'mule.UnsupportedHeaderItem1D'>), ('compressed_field_index2', <class 'mule.UnsupportedHeaderItem1D'>), ('compressed_field_index3', <class 'mule.UnsupportedHeaderItem1D'>))¶
A series of tuples containing the name of a header component, and the class which should be used to represent it. The name will become the final attribute name to store the component, but it must also correspond to a name in the HEADER_MAPPING of the fixed length header.
- READ_PROVIDERS = {}¶
A dictionary which maps a string containing the trailing 3 digits (n3 - n1) of a field’s lbpack (packing code) onto a suitable data-provider object to read the field. Any packing code not in this list will default to using a
_NullReadProviderobject (which can only be used to copy the raw byte-data of the field - not to unpack it or access the data).
- WRITE_OPERATORS = {}¶
A dictionary which maps a string containing the trailing 3 digits (n3 - n1) of a field’s lbpack (packing code) onto a suitable
DataOperatorobject to write the field. Any packing code found in a field from this object’s field list but not found here will cause an exception when trying to write to a file.
- WORD_SIZE = 8¶
The word/record size for the file, for all supported UM file types this should be left as the default - 8 (i.e. 64-bit words).
- FIELD_CLASSES = {-99: <class 'mule.Field'>, 2: <class 'mule.Field2'>, 3: <class 'mule.Field3'>}¶
Maps the lblrel (header release number) of each field onto an appropriate
Fieldsubclass to represent it.Note
This mapping must contain an entry for -99, and the
Fieldobject it returns must at a minimum contain attribute mappings for the 5 key elements (lbrel, lblrec, lbnrec, lbegin and lbpack - see UMDP F03), as well as suitable shape information.
- __init__()[source]¶
Create a blank UMFile instance.
The initial creation contains only an empty
FixedLengthHeaderobject, plus an empty (None) named attribute for each component in the COMPONENTS attribute.In most cases this __init__ should not be called directly, but indirectly via the from_file or from_template classmethods.
- classmethod from_file(file_or_filepath, remove_empty_lookups=False, stashmaster=None)[source]¶
Initialise a UMFile, populated using the contents of a file.
- Kwargs:
- file_or_filepath:
An open file-like object, or file path. A path is opened for read; a ‘file-like’ must support seeks.
- remove_empty_lookups:
If set to True, will remove any “empty” lookup headers from the field-list (UM files often have pre-allocated numbers of lookup entries, some of which are left unused).
- stashmaster:
A
mule.stashmaster.STASHMasterobject containing the details of the STASHmaster to associate with the fields in the file (if not provided will attempt to load a central STASHmaster based on the version in the fixed length header).
Note
As part of this the “validate” method will be called. For the base
UMFileclass this does nothing, but sub-classes may override it to provide specific validation checks.
- classmethod from_template(template=None)[source]¶
Create a fieldsfile from a template.
The template is a dictionary of key:value, where ‘key’ is a component name and ‘value’ is a component settings dictionary.
A component given a component settings dictionary in the template is guaranteed to exist in the resulting file object.
Within a component dictionary, key:value pairs indicate the values that named component properties must be set to.
If a component dictionary contains the special key ‘dims’, the associated value is a tuple of dimensions, which is passed to a component.empty() call to produce a new component of that type. Note that in some cases “None” may be used to indicate a dimension which the file-type fixes (e.g. the number of level types).
The resulting file is usually incomplete, but can be used as a convenient starting-point for creating files with a given structure.
Note
When a particular component contains known values in any position of its “CREATE_DIMS” attribute (i.e. not “None”), the template may omit this dimension (as is done in the example above for the ‘level_dependent_constants’ 2nd dimension.
- attach_stashmaster_info(stashmaster)[source]¶
Attach references to the relevant entries in a provided :class:mule.stashmaster.STASHmaster object to each of the fields in this object.
- Args:
- stashmaster:
A :class:mule.stashmaster.STASHmaster instance which should be parsed and attached to any fields in the file.
- copy(include_fields=False)[source]¶
Make a copy of a UMFile object including all of its headers, and optionally also including copies of all of its fields.
- Kwargs:
- include_fields:
If True, the field list in the copied object will be populated with copies of the fields from the source object, otherwise the fields list in the new object will be empty
- validate(filename=None, warn=False)[source]¶
Apply any consistency checks to check the file is “valid”.
Note
In the base
UMFileclass this routine does nothing but a format-specific subclass can override this method to do whatever it considers appropriate to validate the file object.
- remove_empty_lookups()[source]¶
Calling this method will delete any fields from the field list which are empty.
- to_file(output_file_or_path)[source]¶
Write to an output file or path.
- Args:
- output_file_or_path (string or file-like):
An open file or filepath. If a path, it is opened and closed again afterwards.
Note
As part of this the “validate” method will be called. For the base
UMFileclass this does nothing, but sub-classes may override it to provide specific validation checks.
- mule.load_umfile(unknown_umfile, stashmaster=None)[source]¶
Load a UM file of undetermined type, by checking its dataset type and attempting to load it as the correct class.
- Args:
- unknown_umfile:
A file or file-like object containing an unknown file to be loaded based on its dataset_type.
- Kwargs:
- stashmaster:
A
mule.stashmaster.STASHMasterobject containing the details of the STASHmaster to associate with the fields in the file (if not provided will attempt to load a central STASHmaster based on the version in the fixed length header).