mule

This module provides a series of classes to allow interaction with various file formats produced and used by the UM (Unified Model) system.

The top-level UMFile class provides an object representing a generic UM file of the fieldsfile-like type, as covered in document UMDP F03. This enables any file of this general form to be handled.

In practice, most files will be of a specific known subtype and it is then simpler and safer to use the appropriate subclass, FieldsFile or LBCFile : These perform type-specific sanity checking, and provide named attributes to access all of the header elements.

for example:

>>> ff = mule.FieldsFile.from_file(in_path)
>>> print 'model = ', ff.fixed_length_header.model_version
>>> ff.integer_constants.num_soil_levels = 0
>>> ff.fields = [fld for fld in ff.fields
...              if (fld.lbuser7 == 1 and fld.lbuser4 in (204, 207)
                     and 1990 <= fld.lbyr < 2000)]
>>> ff.to_file(out_path)

The more general UMFile class is provided to handle files of other types, and can also be used to correct or adjust files of recognised types that are invalid because of unexpected or inconsistent header information.

class mule._HeaderMetaclass(classname, bases, class_dict)[source]

Bases: type

Metaclass used to give named attributes to other classes.

This metaclass is used in the construction of several header-like classes in this API; note that it is applied on defining the classes (i.e. when the module is imported), not later when a specific instance of the classes are initialised.

The purpose of this class is to attach a set of named attributes to the header object and associate these with specific indices of the underlying array of header values. The target class defines this “mapping” itself, allowing this metaclass to be used for multiple header-like objects.

class mule.BaseHeaderComponent[source]

Bases: object

Base class for a UM header component.

Note

This class is not intended to be used directly; it acts only to group together the common parts of the BaseHeaderComponent1D and BaseHeaderComponent2D classes.

MDI = None

The value to use to indicate missing header values.

DTYPE = None

The data-type of the words in the header.

CREATE_DIMS = None

A tuple defining the default dimensions of the header to be produced by the empty() method, when the caller provides incomplete shape information. Where an element of the tuple is “None”, the arguments to the empty method must specify a size for the corresponding dimension.

HEADER_MAPPING = None

A list containing a series of tuple-pairs; the raw value of an index in the header, and a named-attribute to associate with it (see the help for the _HeaderMetaclass for further details).

property shape

Return the shape of the header object.

property raw

Return the raw values of the header object.

copy()[source]

Create a copy of the header object.

class mule.BaseHeaderComponent1D(values)[source]

Bases: BaseHeaderComponent

1-Dimensional UM header component.

CREATE_DIMS = (None,)

A tuple defining the default dimensions of the header to be produced by the empty() method, when the caller provides incomplete shape information. Where an element of the tuple is “None”, the arguments to the empty method must specify a size for the corresponding dimension.

__init__(values)[source]

Initialise the object from a series of values.

Args:
  • values:

    array-like object containing values in this header.

Note

The values are internally stored offset by 1 element (so that when the raw values are accessed their indexing is 1-based, to match up with their definitions in UMDP F03).

classmethod empty(num_words=None)[source]

Create an instance of the class from-scratch.

Kwargs:
  • num_words:

    The number of words to use to create the header.

Note

Passing “num_words” may be optional or mandatory depending on the value of the class’s CREATE_DIMS attribute.

classmethod from_file(source, num_words)[source]

Create an instance of the class populated by values from a file.

Args:
  • source:

    The (open) file object containing the header value, with its file pointer positioned at the start of this header.

  • num_words:

    The number of words to read in from the file to populate the header.

to_file(output_file)[source]

Write the header to a file object.

Args:
  • output_file:

    The (open) file object for the header to be written to.

class mule.BaseHeaderComponent2D(values)[source]

Bases: BaseHeaderComponent

2-Dimensional UM header component.

CREATE_DIMS = (None, None)

A tuple defining the default dimensions of the header to be produced by the empty() method, when the caller provides incomplete shape information. Where an element of the tuple is “None”, the arguments to the empty method must specify a size for the corresponding dimension.

__init__(values)[source]

Initialise the object from a series of values.

Args:
  • values:

    2-dimensional array-like object containing values in this header.

Note

The values are internally stored offset by 1 element in their second dimension (so that when the raw values are accessed their indexing is 1-based, to match up with the definitions in UMDP F03).

classmethod empty(dim1=None, dim2=None)[source]

Create an instance of the class from-scratch.

Kwargs:
  • dim1:

    The number of words to use for the header’s first dimension.

  • dim2:

    The number of words to use for the header’s second dimension.

Note

Setting “dim1” and/or “dim2” may be optional or mandatory depending on the values of the class’s CREATE_DIMS attribute.

classmethod from_file(source, dim1, dim2)[source]

Create an instance of the class populated by values from a file.

Args:
  • source:

    The (open) file object containing the header value, with its file pointer positioned at the start of this header.

  • dim1:

    The number of words to read in from the file to populate each row of the header.

  • dim2:

    The number of the above rows to read in from the file to populate the header.

to_file(output_file)[source]

Write the header to a file object.

Args:
  • output_file:

    The (open) file object for the header to be written to.

class mule.FixedLengthHeader(values)[source]

Bases: BaseHeaderComponent1D

The fixed length header component of a UM file.

This component is different to the others since its length is not able to be altered at creation-time; the fixed length header is always a specific number of words in length.

HEADER_MAPPING = [('data_set_format_version', 1), ('sub_model', 2), ('vert_coord_type', 3), ('horiz_grid_type', 4), ('dataset_type', 5), ('run_identifier', 6), ('experiment_number', 7), ('calendar', 8), ('grid_staggering', 9), ('time_type', 10), ('projection_number', 11), ('model_version', 12), ('obs_file_type', 14), ('last_fieldop_type', 15), ('t1_year', 21), ('t1_month', 22), ('t1_day', 23), ('t1_hour', 24), ('t1_minute', 25), ('t1_second', 26), ('t1_year_day_number', 27), ('t2_year', 28), ('t2_month', 29), ('t2_day', 30), ('t2_hour', 31), ('t2_minute', 32), ('t2_second', 33), ('t2_year_day_number', 34), ('t3_year', 35), ('t3_month', 36), ('t3_day', 37), ('t3_hour', 38), ('t3_minute', 39), ('t3_second', 40), ('t3_year_day_number', 41), ('integer_constants_start', 100), ('integer_constants_length', 101), ('real_constants_start', 105), ('real_constants_length', 106), ('level_dependent_constants_start', 110), ('level_dependent_constants_dim1', 111), ('level_dependent_constants_dim2', 112), ('row_dependent_constants_start', 115), ('row_dependent_constants_dim1', 116), ('row_dependent_constants_dim2', 117), ('column_dependent_constants_start', 120), ('column_dependent_constants_dim1', 121), ('column_dependent_constants_dim2', 122), ('additional_parameters_start', 125), ('additional_parameters_dim1', 126), ('additional_parameters_dim2', 127), ('extra_constants_start', 130), ('extra_constants_length', 131), ('temp_historyfile_start', 135), ('temp_historyfile_length', 136), ('compressed_field_index1_start', 140), ('compressed_field_index1_length', 141), ('compressed_field_index2_start', 142), ('compressed_field_index2_length', 143), ('compressed_field_index3_start', 144), ('compressed_field_index3_length', 145), ('lookup_start', 150), ('lookup_dim1', 151), ('lookup_dim2', 152), ('total_prognostic_fields', 153), ('data_start', 160), ('data_dim1', 161), ('data_dim2', 162)]

A list containing a series of tuple-pairs; the raw value of an index in the header, and a named-attribute to associate with it (see the help for the _HeaderMetaclass for further details).

MDI = -32768

The value to use to indicate missing header values.

DTYPE = '>i8'

The data-type of the words in the header.

_NUM_WORDS = 256

The (fixed) number of words in a UM fixed length header.

__init__(values)[source]

Initialise the object from a series of values.

Args:
  • values:

    array-like object containing values contained in this header. Must be the exact length specified by _NUM_WORDS.

Note

The values are internally stored offset by 1 element (so that when the raw values are accessed their indexing is 1-based, to match up with their definitions in UMDP F03).

classmethod empty()[source]

Create an instance of the class from-scratch.

Unlike the other header components the fixed length header always creates a class of a fixed size (based on its _NUM_WORDS attribute).

classmethod from_file(source)[source]

Create an instance of the class populated by values from a file.

Unlike the other header components the fixed length header always reads a specific number of values (based on its _NUM_WORDS attribute).

Args:
  • source:

    The (open) file object containing the header value, with its file pointer positioned at the start of this header.

class mule.IntegerConstants(values)[source]

Bases: BaseHeaderComponent1D

The integer constants component of a UM file.

MDI = -32768

The value to use to indicate missing header values.

DTYPE = '>i8'

The data-type of the words in the header.

class mule.RealConstants(values)[source]

Bases: BaseHeaderComponent1D

The real constants component of a UM file.

MDI = -1073741824.0

The value to use to indicate missing header values.

DTYPE = '>f8'

The data-type of the words in the header.

class mule.LevelDependentConstants(values)[source]

Bases: BaseHeaderComponent2D

The level dependent constants component of a UM file.

MDI = -1073741824.0

The value to use to indicate missing header values.

DTYPE = '>f8'

The data-type of the words in the header.

class mule.RowDependentConstants(values)[source]

Bases: BaseHeaderComponent2D

The row dependent constants component of a UM file.

MDI = -1073741824.0

The value to use to indicate missing header values.

DTYPE = '>f8'

The data-type of the words in the header.

class mule.ColumnDependentConstants(values)[source]

Bases: BaseHeaderComponent2D

The column dependent constants component of a UM file.

MDI = -1073741824.0

The value to use to indicate missing header values.

DTYPE = '>f8'

The data-type of the words in the header.

class mule.UnsupportedHeaderItem1D(values)[source]

Bases: BaseHeaderComponent1D

An unsupported 1-dimensional component of a UM file.

MDI = -32768

The value to use to indicate missing header values.

DTYPE = '>i8'

The data-type of the words in the header.

class mule.UnsupportedHeaderItem2D(values)[source]

Bases: BaseHeaderComponent2D

An unsupported 2-dimensional component of a UM file.

MDI = -32768

The value to use to indicate missing header values.

DTYPE = '>i8'

The data-type of the words in the header.

class mule.Field(int_headers, real_headers, data_provider)[source]

Bases: object

Represents a single entry in the lookup table, and provides access to the data referenced by it.

Note

This class assumes the (common) UM lookup header comprising of 64 words split between 45 integer and 19 real values.

__init__(int_headers, real_headers, data_provider)[source]

Initialise the Field object.

Args:
  • int_headers:

    A sequence of integer header values.

  • real_headers:

    A sequence of floating-point header values.

  • data_provider:

    An object representing the field data payload. Typically, this is an object with a “._data_array” method, in which case the data can be fetched with get_data().

classmethod empty()[source]

Create an instance of the class from-scratch.

The instance will be filled with empty values (-99 for integers, and 0.0 for reals), and will have no data_provider set.

property raw

Return the raw values in the lookup array.

to_file(output_file)[source]

Write the lookup header to a file object.

Args:
  • output_file:

    The (open) file object for the lookup to be written to.

copy()[source]

Create a Field which copies its header information from this one, and takes its data from the same data provider.

set_data_provider(data_provider)[source]

Set the field data payload.

Args:
  • data_provider:

    An object representing the field data payload. Typically, this is an object with a “._data_array” method, which means the data can be accessed with get_data().

num_values()[source]

Return the number of values defined by this header.

get_data()[source]

Return the data for this field as an array.

_get_raw_payload_bytes()[source]

Return a buffer containing the raw bytes of the data payload.

The field data must be unmodified and using the same packing code as the original data (this can be tested by calling _can_copy_deferred_data).

_can_copy_deferred_data(required_lbpack, required_bacc, required_word)[source]

Return whether or not it is possible to simply re-use the bytes making up the field; for this to be possible the data must be unmodified, and the requested output packing and disk word size must be the same as the input.

class mule.Field2(int_headers, real_headers, data_provider)[source]

Bases: Field

Represents an entry from the LOOKUP component with a header release number of 2.

class mule.Field3(int_headers, real_headers, data_provider)[source]

Bases: Field

Represents an entry from the LOOKUP component with a header release number of 3.

class mule.ArrayDataProvider(array)[source]

Bases: object

A Field data provider that contains an actual array of values.

This is used to make a field with an ordinary array as its data payload.

Note

This must be used with caution, as multiple fields with a concrete data payload can easily consume large amounts of space. By contrast, processing field payloads from an existing file will normally only load one at a time.

__init__(array)[source]

Create a data-provider which contains a concrete data array.

Args:
  • array (array-like):

    The data payload. It is converted to a numpy array. It must be 2D unmasked data.

_data_array()[source]

Return the data payload.

class mule._OperatorDataProvider(operator, source, new_field)[source]

Bases: object

A Field data provider that fetches its data from a DataOperator, by calling transform().

..Note: This should only really ever be instantiated from within

the DataOperator.

__init__(operator, source, new_field)[source]

Create a wrapper, including references to the operator, the original source data and and the result field.

Args:
  • operator:

    A reference to the DataOperator instance which created this provider (to allow its transform() method to be accessed in _data_array() below).

  • source:

    The source object for the above DataOperator - this can be anything, and is required here so that it can be passed onto the operator’s meth:transform method below.

  • new_field:

    The new field returned by the above DataOperator - this is again needed by the operator’s meth:transform method.

_data_array()[source]

Return the data using the provided operator.

class mule.DataOperator(*args, **kwargs)[source]

Bases: object

Base class which should be sub-classed to perform manipulations on the data of a field. The Field classes never store any data directly in memory; only the means to retrieve it from disk and perform any required operations (which will only be executed when explicitly requested - this would normally be at the point the file is being written/closed).

Note

the user must override the “__init__”, “new_field” and “transform” methods of this baseclass to create a valid operator.

A DataOperator is used to produce new Field s, which are calculated from existing source fields and which can also calculate their data results from the source data at a subsequent time.

The normal usage occurs in 3 separate stages:

  • __init__() creates a new operator with any instance-specific parameters.

  • __call__() is used to produce a new, transformed Field objects from existing ones, via the user new_field() method.

  • transform() is called by an output field to calculate its data payload.

For example:

>>> class XSampler(DataOperator):
...     def __init__(self, factor):
...         self.factor = factor
...     def new_field(self, source_field):
...         fld = source_field.copy()
...         fld.lbnpt /= self.factor
...         fld.bdx *= self.factor
...         return fld
...     def transform(self, source_field, result_field):
...         data = source_field.get_data()
...         return data[:, ::self.factor]
...
>>> XStep4 = XSampler(factor=4)
>>> ff.fields = [XStep4(fld) for fld in ff.fields]
>>> ff.to_file(out_path)
__init__(*args, **kwargs)[source]

Initialise the operator object - this should be overridden by the user.

This method should accept any user arguments to be “baked” into the operator or to otherwise initialise it as-per the user’s requirements; for example an operator which scales the values in fields by a constant amount might want to accept an argument giving that amount.

__call__(source, *args, **kwargs)[source]

Wrap the operator around a source object.

This calls the user-supplied new_field() method, and configures the resulting field to return its data from the transform() method of the data operator.

Args:
  • source:

    This can be an object of any type; it is typically an existing Field which the result field is based on.

Returns:
  • new_field (Field):

    A new Field instance, which returns data generated via the transform() method.

new_field(source, *args, **kwargs)[source]

Produce a new output Field from a source object - this method should be overridden by the user.

This method encodes how to produce a new field, which is typically derived by calculation from an existing source field or fields. It is called by the __call__() method.

Args:
  • source:

    This can be an object of any type; it is typically an existing Field which the result field is based on.

Returns:
  • new_field (Field):

    A new Field instance, whose lookup attributes reflect the final state of the result: E.G. if the operator affects the number of rows in the field, then ‘new_field’ must have its row settings set accordingly.

Note

It is advisable not to modify the “source” object inside this method; modifications should be confined to the new field object.

transform(source, result_field)[source]

Calculate the data payload for a result field - this method should be overridden by the user.

This method must return a 2D numpy array containing the field data. Typically it will extract the data payload from a source field and manipulate it in some way.

Args:
  • source:

    The original ‘source’ argument from the __call__() invocation that created ‘result_field’. Usually, this is a pre-existing Field object from which the result field is calculated.

  • result_field:

    The ‘new’ field that was created by a call to __call__(), for which the data is now wanted. This should not be modified, but provides access to any necessary context information determined when it was created.

Returns:
  • data (array):

    The data array for ‘result_field’.

class mule.RawReadProvider(source, sourcefile, offset)[source]

Bases: object

A generic ‘data provider’ object, which deals with the most basic/common data-provision operation of reading in Field data from a file.

This class should not be used directly, since it does not define a “_data_array” method, and so cannot return any data. A series of subclasses of this class are provided which define the ‘_data_array’ method for the different packing types found in various types of UMFile.

__init__(source, sourcefile, offset)[source]

Initialise the read provider.

Args:
  • source:

    Initial field object reference (populated with the lookup values from the file specified in sourcefile.

  • sourcefile:

    Filename associated with source FieldsFileVariant.

  • offset:

    Starting position of Field data in sourcefile (in bytes).

class mule._NullReadProvider(source, sourcefile, offset)[source]

Bases: RawReadProvider

A ‘raw’ data provider object to be used when a packing code is unrecognised - to be able to represent unknown-type data in a Field.

class mule.UMFile[source]

Bases: object

Represents the structure of a single UM file.

COMPONENTS = (('integer_constants', <class 'mule.IntegerConstants'>), ('real_constants', <class 'mule.RealConstants'>), ('level_dependent_constants', <class 'mule.LevelDependentConstants'>), ('row_dependent_constants', <class 'mule.RowDependentConstants'>), ('column_dependent_constants', <class 'mule.ColumnDependentConstants'>), ('additional_parameters', <class 'mule.UnsupportedHeaderItem2D'>), ('extra_constants', <class 'mule.UnsupportedHeaderItem1D'>), ('temp_historyfile', <class 'mule.UnsupportedHeaderItem1D'>), ('compressed_field_index1', <class 'mule.UnsupportedHeaderItem1D'>), ('compressed_field_index2', <class 'mule.UnsupportedHeaderItem1D'>), ('compressed_field_index3', <class 'mule.UnsupportedHeaderItem1D'>))

A series of tuples containing the name of a header component, and the class which should be used to represent it. The name will become the final attribute name to store the component, but it must also correspond to a name in the HEADER_MAPPING of the fixed length header.

READ_PROVIDERS = {}

A dictionary which maps a string containing the trailing 3 digits (n3 - n1) of a field’s lbpack (packing code) onto a suitable data-provider object to read the field. Any packing code not in this list will default to using a _NullReadProvider object (which can only be used to copy the raw byte-data of the field - not to unpack it or access the data).

WRITE_OPERATORS = {}

A dictionary which maps a string containing the trailing 3 digits (n3 - n1) of a field’s lbpack (packing code) onto a suitable DataOperator object to write the field. Any packing code found in a field from this object’s field list but not found here will cause an exception when trying to write to a file.

WORD_SIZE = 8

The word/record size for the file, for all supported UM file types this should be left as the default - 8 (i.e. 64-bit words).

FIELD_CLASSES = {-99: <class 'mule.Field'>, 2: <class 'mule.Field2'>, 3: <class 'mule.Field3'>}

Maps the lblrel (header release number) of each field onto an appropriate Field subclass to represent it.

Note

This mapping must contain an entry for -99, and the Field object it returns must at a minimum contain attribute mappings for the 5 key elements (lbrel, lblrec, lbnrec, lbegin and lbpack - see UMDP F03), as well as suitable shape information.

__init__()[source]

Create a blank UMFile instance.

The initial creation contains only an empty FixedLengthHeader object, plus an empty (None) named attribute for each component in the COMPONENTS attribute.

In most cases this __init__ should not be called directly, but indirectly via the from_file or from_template classmethods.

classmethod from_file(file_or_filepath, remove_empty_lookups=False, stashmaster=None)[source]

Initialise a UMFile, populated using the contents of a file.

Kwargs:
  • file_or_filepath:

    An open file-like object, or file path. A path is opened for read; a ‘file-like’ must support seeks.

  • remove_empty_lookups:

    If set to True, will remove any “empty” lookup headers from the field-list (UM files often have pre-allocated numbers of lookup entries, some of which are left unused).

  • stashmaster:

    A mule.stashmaster.STASHMaster object containing the details of the STASHmaster to associate with the fields in the file (if not provided will attempt to load a central STASHmaster based on the version in the fixed length header).

Note

As part of this the “validate” method will be called. For the base UMFile class this does nothing, but sub-classes may override it to provide specific validation checks.

classmethod from_template(template=None)[source]

Create a fieldsfile from a template.

The template is a dictionary of key:value, where ‘key’ is a component name and ‘value’ is a component settings dictionary.

A component given a component settings dictionary in the template is guaranteed to exist in the resulting file object.

Within a component dictionary, key:value pairs indicate the values that named component properties must be set to.

If a component dictionary contains the special key ‘dims’, the associated value is a tuple of dimensions, which is passed to a component.empty() call to produce a new component of that type. Note that in some cases “None” may be used to indicate a dimension which the file-type fixes (e.g. the number of level types).

The resulting file is usually incomplete, but can be used as a convenient starting-point for creating files with a given structure.

Note

When a particular component contains known values in any position of its “CREATE_DIMS” attribute (i.e. not “None”), the template may omit this dimension (as is done in the example above for the ‘level_dependent_constants’ 2nd dimension.

attach_stashmaster_info(stashmaster)[source]

Attach references to the relevant entries in a provided :class:mule.stashmaster.STASHmaster object to each of the fields in this object.

Args:
  • stashmaster:

    A :class:mule.stashmaster.STASHmaster instance which should be parsed and attached to any fields in the file.

copy(include_fields=False)[source]

Make a copy of a UMFile object including all of its headers, and optionally also including copies of all of its fields.

Kwargs:
  • include_fields:

    If True, the field list in the copied object will be populated with copies of the fields from the source object, otherwise the fields list in the new object will be empty

validate(filename=None, warn=False)[source]

Apply any consistency checks to check the file is “valid”.

Note

In the base UMFile class this routine does nothing but a format-specific subclass can override this method to do whatever it considers appropriate to validate the file object.

remove_empty_lookups()[source]

Calling this method will delete any fields from the field list which are empty.

to_file(output_file_or_path)[source]

Write to an output file or path.

Args:
  • output_file_or_path (string or file-like):

    An open file or filepath. If a path, it is opened and closed again afterwards.

Note

As part of this the “validate” method will be called. For the base UMFile class this does nothing, but sub-classes may override it to provide specific validation checks.

mule.load_umfile(unknown_umfile, stashmaster=None)[source]

Load a UM file of undetermined type, by checking its dataset type and attempting to load it as the correct class.

Args:
  • unknown_umfile:

    A file or file-like object containing an unknown file to be loaded based on its dataset_type.

Kwargs:
  • stashmaster:

    A mule.stashmaster.STASHMaster object containing the details of the STASHmaster to associate with the fields in the file (if not provided will attempt to load a central STASHmaster based on the version in the fixed length header).