Software standards#
Introduction#
This document specifies the software standards and coding styles to be used when writing new code files for the Met Office Unified Model. When making extensive changes to an existing file a rewrite of the whole file should be done to ensure that the file meets the UM coding standard and style. All code modifications within an existing file should follow these standards.
The only exception to following these coding standards is that there is no requirement to rewrite ‘imported code’ to these standards before it is included within the UM. All new code developed within the Met Office should follow these standards.
Imported code; is developed as part of a collaboration project and then proposed to be suitable for use within the UM; for example the original UKCA code developed in academia. Collaborative developed code specifically for the UM should meet these standards.
Why have standards?#
This document is intended for new as well as experienced programmers, so a few words about why there is a need for software standards and styles may be in order.
Coding standards specify a standard working practice for a project with the aim of improving portability, maintainability and the readability of code. This process makes code development and reviewing easier for all developers involved in the project. Remember that software should be written for people and not just for computers! As long as the syntax rules of the programming language (e.g. Fortran IV - 2003) are followed, the computer does not care how the code is written. You could use archaic language structures, add no comments, leave no spaces etc. However, another programmer trying to use, maintain or alter the code will have trouble working out what the code does and how it does it. A little extra effort whilst writing the code can greatly simplify the task of this other programmer (which might be the original author a year or so after writing the code, when details of it are bound to have been forgotten). In addition, following these standards may well help you to write better, more efficient, programs containing fewer bugs.
While code style is very subjective, by standardising the style, UM routine layout will become familiar to all code developers/reviewers even when they are not familiar with the underlying science.
Units#
All routines and documentation must be written using SI units. Standard SI prefixes may be used. Where relevant, the units used must be clearly stated in both the code and the supporting UM documentation.
Working practices#
The preparation of new files and of changes to existing files should, meet this UM standard documentation and must be developed following the stages outlined in Working Practices for UM Development.
Examples#
This document provides an example programming unit to aid the code developer. This example meets the standards detailed within this paper, with references to the relevant sections.
Technical standards#
UM code should be written in and conform to the Fortran 2003 standard; this is supported by most major Fortran compilers. Obsolescent language features are not permitted. The UM also requires compiler support for Technical Specification 29113 on the Further Interoperability of Fortran with C. This is a new feature for Fortran 2018, but is a common extension in most compilers and has widespread support.
Please note that in order to maximise portability and to avoid the use of radically different design structures within single areas of code, some Fortran 2003 features are excluded from use within the UM. For further details please see Appendix B.
Pre-processor#
In the past include files and C pre-processor were used for scientific code section choices and passing a large list of arrays. This use has been phased out and highly discouraged. The C pre-processor is still used to make machine specific choices and, together with included files, to reduced code duplication. These are all covered by this standards and style document.
How to meet the coding standards#
The following code example exhibits all that is defined as a good coding standard and how code should be written for inclusion within the UM.
The example is highlighted with references (section links) to the remainder of this document which provide further details on the standard and style used.
example_mod.F90 S1 ! ***************************COPYRIGHT***************************** S2 ! (C) Crown copyright Met Office. All rights reserved. ! For further details please refer to the file COPYRIGHT.txt ! which you should have received as part of this distribution. ! ***************************COPYRIGHT***************************** S2 ! ! An example routine depicting how one should construct ! new code to meet the UMDP3 coding standards. S2 ! MODULE example_mod S3 S4 S6 IMPLICIT NONE S7 ! Description: S2 ! A noddy routine that illustrates the way to apply the UMDP3 ! coding standards to help code developers ! pass code reviews. ! ! Method: S2 ! In this routine we apply many of the UMDP3 features ! to construct a simple routine. The references on the RHS take the reader ! to the appropriate section of the UMDP3 guide with further details. ! ! Code Owner: Please refer to the UM file CodeOwners.txt S2 ! This file belongs in section: Control ! ! Code description: S2 ! Language: Fortran 2003. ! This code is written to UMDP3 standards. ! CHARACTER(LEN=*), PARAMETER, PRIVATE :: ModuleName='EXAMPLE_MOD' S14 CONTAINS S1 ! Subroutine Interface: SUBROUTINE example (xlen, ylen, l_unscale, input1, input2, & S10 output, l_loud_opt) ! Description: ! Nothing further to add to module description. S2 USE atmos_constants_mod, ONLY: r S6 USE ereport_mod, ONLY: ereport USE parkind1, ONLY: jpim, jprb S14 USE umprintMgr, ONLY: umprint, ummessage, PrNorm S12 USE errormessagelength_mod, ONLY: errormessagelength USE yomhook, ONLY: lhook, dr_hook S14 IMPLICIT NONE S7 ! Subroutine arguments INTEGER, INTENT(IN) :: xlen ! Length of first dimension of the arrays. S7 INTEGER, INTENT(IN) :: ylen ! Length of second dimension of the arrays. LOGICAL, INTENT(IN) :: l_unscale ! switch scaling off. REAL, INTENT(IN) :: input1(xlen, ylen) ! First input array S7 REAL, INTENT(IN OUT) :: input2(xlen, ylen) ! Second input array S7 REAL, INTENT(OUT) :: output(xlen, ylen) ! Contains the result S7 LOGICAL, INTENT(IN), OPTIONAL :: l_loud_opt ! optional debug flag S7 ! Local variables INTEGER(KIND=jpim), PARAMETER :: zhook_in = 0 ! DrHook tracing entry S7 S14 INTEGER(KIND=jpim), PARAMETER :: zhook_out = 1 ! DrHook tracing exit S14 INTEGER :: i ! Loop counter INTEGER :: j ! Loop counter INTEGER :: icode ! error code for EReport LOGICAL :: l_loud ! debug flag (default false unless l_loud_opt is used) S7 REAL, ALLOCATABLE :: field(:,:) ! Scaling array to fill. S8 REAL(KIND=jprb) :: zhook_handle ! DrHook tracing S14 CHARACTER(LEN=*), PARAMETER :: RoutineName='EXAMPLE' S19 CHARACTER(LEN=errormessagelength) :: cmessage ! used for EReport CHARACTER(LEN=256) :: my_char ! string for output ! End of header IF (lhook) CALL dr_hook(ModuleName//':'//RoutineName, zhook_in, zhook_handle) S14 ! Set debug flag if argument is present l_loud = .FALSE. IF (PRESENT(l_loud_opt)) THEN S7 l_loud = l_loud_opt END IF my_char & S10 = 'This is a very very very very very very very ' & // 'loud character assignment' ! A pointless long character example. icode=0 ! verbosity choice, output some numbers to aid with debugging S5 ! protected by printstatus>=PrNorm and pe=0 WRITE(ummessage,'(A,I0)') 'xlen=', xlen S12 CALL umprint(ummessage, level=PrNorm, pe=0, src='example_mod') S13 WRITE(ummessage,'(A,I0)') 'ylen=', ylen CALL umprint(ummessage, level=PrNorm, pe=0, src='example_mod') IF (l_loud) CALL umprint(my_char, level=PrNorm, src='example_mod') ! Allocate and initialise scaling array S5 ! Noddy code warns user when scaling is not employed. IF (l_unscale) THEN S9 icode = -100 ! set up WARNING message ALLOCATE(field(1,1)) S8 cmessage = 'Scaling is switched off in run!' CALL ereport(RoutineName, icode, cmessage) S19 ELSE ALLOCATE(field(xlen, ylen)) S8 DO j = 1, ylen S9 DO i = 1, xlen S9 field(i, j) = (1.0*i) + (2.0*j) S4 input2(i, j) = input2(i, j) * field(i, j) END DO END DO END IF ! The main calculation of the routine, using OpenMP. S5 !$OMP PARALLEL DEFAULT(NONE) & S15 !$OMP SHARED(xlen, ylen, input1, input2, field, output) & !$OMP PRIVATE(i, j) S15 !$OMP DO SCHEDULE(STATIC) DO j = 1, ylen i_loop: DO i = 1, xlen S9 ! Calculate the Output value: output(i, j) = (input1(i, j) * input2(i, j)) END DO i_loop END DO ! j loop !$OMP END DO S15 !$OMP END PARALLEL S15 DEALLOCATE (field) S8 IF (lhook) CALL dr_hook(ModuleName//':'//RoutineName, zhook_out, zhook_handle) S14 RETURN END SUBROUTINE example S4 END MODULE example_mod S4
UM programming standards; Code Layout, Formatting, Style and Fortran features#
This section outlines the programming standards you should adhere to when developing code for inclusion within the Unified Model. The rules set out in this section aim to improve code readability and ensure that UM code is compatible with the Fortran 2003 standard.
S1. Source files should only contain a single program unit#
Modules may be used to group related variables, subroutines and functions. Each separate file within the source tree should be uniquely named.
The name of the file should reflect the name of the programming unit. Multiple versions of the same file should be named
filename-#verwhere#veris the section/version number (e.g. 1a,2a,2b…). For example:<filename-#ver>.F90when writing a<subroutine><filename_mod-#ver>.F90with writing a<module_mod><existing filename>.F90with<module_mod>only if upgrading existing subroutine since Subversion does not handle renaming of files very well and this allows history of the file to be easily retrieved.
This makes it easier to navigate the UM code source tree for given routines.
You should avoid naming your program units and variables with names that match an intrinsic
FUNCTION,SUBROUTINEorMODULE. We recommend the use of unique names within a program unit.You should also avoid naming your program units and variables with names that match a keyword in a Fortran statement.
Avoid giving program units names that are likely to be used as variable names elsewhere in the code, e.g.
fieldorstring. This makes searching the code difficult and can cause the code browser to make erroneous connections between unrelated routines.Subroutines should be kept reasonably short, where appropriate, say up to about 100 lines of executable code, but don’t forget there are start up overheads involved in calling an external subroutine so they should do a reasonable amount of work.
S2. Headers#
All programming units require a suitable copyright header. Met Office derived code should use the standard UM copyright header as depicted in the good example code. Collaborative UM developed code may require alternative headers as agreed in the collaborative agreements. e.g. UKCA code. The IPR (intellectual property rights) of UM code is important and needs to be protected appropriately.
Headers are an immensely important part of any code as they document what it does, and how it does it. You should write as much of the header as possible BEFORE writing the code, as this will focus your mind on what you are doing and how you intend to do it!
The description of the
MODULEand its containedSUBROUTINEmay be the same and thus it need not be repeated in the latter. If aMODULEcontains more than one subroutine then further descriptions are required.History comments should not be included in the header or routine code. Version control provides the history of our codes.
Code author names should NOT be included explicitly within the code as they quickly become out of date and are sometimes misleading. Instead we reference a single maintainable text file which is included within the UM code repository.
! Code Owner: Please refer to the UM file CodeOwners.txt ! This file belongs in section: <section_name_to_be_entered>
Example UM templates are provided with the source of this document; subroutine, function and module templates.
S3. Free source form#
All code should be written using the free source form.
Please restrict code to 80 columns, so that your code can be easily viewed on any editor and screen and can be printed easily on A4 paper. Note that CreateBC uses a limit of 100 columns, due to the nature of the object-orientated code.
Never put more than one statement on a line.
Write your program in UK English, unless you have a very good reason for not doing so. Write your comments in simple UK English and name your program units and variables based on sensible UK English words. Always bear in mind that your code may be read by people who are not proficient English speakers.
S4. Fortran style#
To improve readability, write your code using the ALL CAPS Fortran keywords approach. The rest of the code may be written in either lower-case with underscores or CamelCase. This approach has the advantage that Fortran keywords stand out.
To improve readability, you should always use the optional space to separate the Fortran keywords. The full list of Fortran keywords with an optional spaces is:
ELSE IF END DO END FORALL END FUNCTION END IF END INTERFACE END MODULE END PROGRAM END SELECT END SUBROUTINE END TYPE END WHERE SELECT CASE ELSE WHERE DOUBLE PRECISION END ASSOCIATE END BLOCK END BLOCK DATA END ENUM END FILE END PROCEDURE GO TO IN OUT SELECT TYPE
Note that not all of these are approved or appropriate for use in UM code. This rule also applies to OpenMP keywords. (See: S15)
The full version of
ENDshould be used at all times, egEND SUBROUTINE <name>andEND FUNCTION <name>New code should be written using Fortran 95/2003 features. Avoid non-portable vendor/compiler extensions.
When writing a
REALliteral with an integer value, put a 0 after the decimal point (i.e. 1.0 as opposed to 1.) to improve readability.Avoid using obsolescent features of the Fortran language, instead make use of F95/2003 alternatives. For example, statement functions are among the list of deprecated features in the F95 standard and these can be replaced by
FUNCTIONs within appropriateMODULEs.Do not use archaic forms of intrinsic functions. For example,
LOG ()should be used in place ofALOG(),MAX()instead ofAMAX1 (),REAL()instead ofFLOAT()etc.Never use the
PAUSEstatement.Never use the
STOPstatement, see S19The standard delimiter for namelists is
/. In particular, note that&ENDis non-standard and should be avoided. For further information on namelists please refer to Runtime namelist variables, defaults, future development.Only use the generic names of intrinsic functions, avoid the use of ‘hardware’ specific intrinsic functions. Use the latter if an only if there is an optimisation benefit and then it must be protected by a platform specific CPP flag S17.
S5. Comments and white spacing#
Always comment code!
Start comments with a single
!. The indention of whole line comments should match that of the code.Use spaces and blank lines where appropriate to format your code to improve readability.
Never use tabs within UM code as the tab character is not in the Fortran character set. If your editor inserts tabs automatically, you should configure it to switch off the functionality when you are editing Fortran source files.
Line up your statements, where appropriate, to improve readability.
S6. The use of modules#
MODULEs are strongly encouraged as the mainstay of future UM code program
units; making use of the implicit INTERFACE checking and removing the need
for the !DEPENDS ON. Argument lists within SUBROUTINE CALLs may
also shorten.
You are expected to
USE <module>, ONLY : <variables>and variables should be imported from the module in which they were originally declared thus enabling a code audit trail of variables around the UM code.For code portability, be careful not to
USE <module>twice in a routine for the same MODULE, especially where usingONLY. This can lead to compiler Warning and Error messages.Where possible, module variables and procedures should be declared PRIVATE. This avoids unnecessary export of symbols, promotes data hiding and may also help the compiler to optimise the code.
The use of derived types is encouraged, to group related variables and their use within Modules.
Review your use of arguments within subroutine calls, could some be simplified by using Modules?
Before writing your Module, check the UM source that no one has already created a Module to do what you want. For example do not declare a new variable/parameter without checking if it is already available in a suitable UM module.
Global type constants (e.g. \(g\) and \(\pi\)) should be maintained at a high level within the UM code and not duplicated within modules at the code section level;
USE <insert global consts module name here>instead. Only section specific constants should be maintained at the section level.When calling another Subroutine or an External Function the use of
! DEPENDS ONdirective is required within the Unified Model prior to theCALLunless the Subroutine or Function is wrapped within a Module; thus USE it,! DEPENDS ON: gather_field_gcom CALL gather_field_gcom(local_field, global_field, & local_row_len, local_rows, & global_row_len, global_rows, & grid_type, halo_type, & gather_pe, proc_group, & icode, cmessage)
Avoid the introduction of additional
COMMONblocks. Developers should now be usingMODULEs.
S7. Argument and variable declaration#
Use IMPLICIT NONE in all program units. This forces you to declare all your variables explicitly. This helps to reduce bugs in your program that will otherwise be difficult to track.
Use meaningful variable names to aid code comprehension.
Variables should not use Fortran keywords or intrinsic functions for their name. For example, a variable should not be named
size, because there is already a Fortran intrinsic function calledSIZE()For the purposes of variable naming, “Fortran keywords or intrinsic functions” shall refer to the set of all keywords and functions, from all Fortran Standard versions (including all past and future versions, not just Fortran 2003). For, example, the
ASSIGNkeyword was deleted in Fortran 95, butassignstill should not be used as a variable name.All variables must be declared, and commented with a brief description. This increases understandability and reduces errors caused by misspellings of variables.
Use
INTENTin declaring arguments as this allows for checks to be done at compile time.Arguments should be declared separately from local variables.
Subroutine arguments should be declared in the same order in the header as they appear in the subroutine statement. This order is not random but is determined by intent, variable dimensions and variable type. All input arguments come first, followed by all input/output arguments and then all output arguments. The exception being any
OPTIONALarguments which should be appended to the end of the argument list. If more than oneOPTIONALargument is used then one should also use keywords so that theOPTIONALarguments are not tied to a specific ‘position’ near the end of the argument list.As
OPTIONALarguments are possible when usingMODULEs (an interface is required) there is no requirement in future for DUMMY arguments and glue routines.It is recommended that one uses local variables in routines which are set to the values of optional arguments in the code if present, otherwise a default value is used. This removes the requirement to always use
PRESENTwhen using the optional argument.Within each section of the header, variables of a given type should be grouped together. These groups must be declared in the order
INTEGER,REAL,LOGICALand thenCHARACTER, with each grouping separated by a blank line. In general variables should be declared one per line. Use a separate type statement for each line as this makes it easier to copy code around (you can always use the editor to repeat a line to save typing the type statement again) and prevents you from running out of continuation lines.If an array is dimensioned by another variable, ensure that the variable is declared first.
The
EXTERNALstatement should not be used for subroutines although it is allowed for functions, again for code portability.Avoid the
DIMENSIONattribute or statement. Declare the dimension with the declared variables which improves readability.Common practice
INTEGER, DIMENSION(10,20) :: a, b, c
Better approach
INTEGER :: a(10, 20), b(10, 20), c(10, 20)
Initialisation in the declaration of a variable should only be done after considering whether it is to be only initialised on the first encounter of the variable or not. Fortran automatically adds
SAVEto the declaration attribute to this type of initialisation. This is especially important in OpenMP and when you expect the variable to be reset everytime the routine is entered.POINTERs are also affected so please be aware of the effects.Character strings must be declared with a length when stored in an array.
If an argument list has a dummy argument that makes use of incoming data (whether or not it has an explicit
INTENT) and another argument explicitly declaredINTENT(OUT), do not use the same variable as the actual argument to both dummy arguments (“aliasing”). Some compilers will reinitialise allINTENT(OUT)variables on entry, destroying the incoming data.Example subroutine:
SUBROUTINE foo(m,n) REAL, INTENT(IN) :: m REAL, INTENT(OUT) :: n
Bad practice:
CALL foo(a,a)
Safe approach:
b = a CALL foo(b,a)
S8. Allocatables#
When Allocating and deallocating, use a separate ALLOCATE and DEALLOCATE statement for each array.
When using the
ALLOCATEstatement, ensure that any arrays passed to subroutines have been allocated, even if it’s anticipated that they won’t be used.IF (L_mcr_qrain) THEN ALLOCATE ( mix_rain_phys2(1-offx:row_length+offx, & 1-offy:rows+offy, wet_levels) ELSE ALLOCATE ( mix_rain_phys2(1,1,1) ) END IF ! DEPENDS ON: q_to_mix CALL do_something(row_length, rows, wet_levels, & offx,offy, mix_rain_phys2 )
To prevent memory fragmentation ensure that allocates and deallocates match in reverse order.
ALLOCATE ( A(row_length,rows,levels) ) ALLOCATE ( B(row_length,rows,levels) ) ALLOCATE ( C(row_length,rows,levels) ) .... DEALLOCATE ( C ) DEALLOCATE ( B ) DEALLOCATE ( A )
Where possible, an ALLOCATE statement for an ALLOCATABLE array (or a POINTER used as a dynamic array) should be coupled with a DEALLOCATE within the same scope. If an ALLOCATABLE array is a PUBLIC MODULE variable, it is highly desirable for its memory allocation and deallocation to be only performed in procedures within the MODULE in which it is declared. You may consider writing specific SUBROUTINEs within the MODULE to handle these memory managements.
Always define a POINTER before using it. You can define a POINTER in its declaration by pointing it to the intrinsic function NULL() (also see advice in S7). Alternatively, you can make sure that your POINTER is defined or nullified early on in the program unit. Similarly, NULLIFY a POINTER when it is no longer in use, either by using the NULLIFY statement or by pointing your POINTER to NULL().
New operators can be defined within an
INTERFACEblock.ASSOCIATEDshould only be done on initialised pointers. Uninitialised pointers are undefined andASSOCIATEDcan have different effects on different platforms.
S9. Code IF blocks, DO LOOPs, and other constructs#
The use of comments is required for both large
DOloops and largeIFblocks; those spanning 15 lines or more, see S5.Indent blocks of code by 2 characters.
Use the newer forms of the relational operators for LOGICAL comparisons:
== instead of .EQ. /= instead of .NE. > instead of .GT. < instead of .LT. >= instead of .GE. (do not use =>) <= instead of .LE. (do not use =<)
Positive logic is usually easier to understand. When using an IF-ELSE-END IF construct you should use positive logic in the IF test, provided that the positive and the negative blocks are about the same length.
Common practice
IF (my_var /= some_value) THEN CALL do_this() ELSE CALL do_that() END IF
Better approach
IF (my_var == some_value) THEN CALL do_that() ELSE CALL do_this() END IF
Where appropriate, simplify your LOGICAL assignments, for example:
Common practice
IF (my_var == some_value) THEN something = .TRUE. something_else = .FALSE. ELSE something = .FALSE. something_else = .TRUE. END IF ! ... IF (something .EQV. .TRUE.) THEN CALL do_something() ! ... END IF
Better approach
something = (my_var == some_value) something_else = (my_var /= some_value) ! ... IF (something) THEN CALL do_something() ! ... END IF
Avoid the use of ‘magic numbers’ that is numeric constants hard wired into the code. These are very hard to maintain and obscure the function of the code. It is much better to assign the ‘magic number’ to a variable or constant with a meaningful name and then to use this throughout the code. In many cases the variable will be assigned in a top level control routine and passed down via a include file or module. This ensures that all subroutines will use the correct value of the numeric constant and that alteration of it in one place will be propagated to all its occurrences. Unless the value needs to be alterable whilst the program is running (e.g. is altered via I/O such as a namelist) the assignment should be made using a
PARAMETERstatement.Poor Practice
IF (ObsType == 3) THEN
Better Approach
...specify in the header local constant section.... INTEGER, PARAMETER :: SurfaceWind = 3 !No. for surface wind ...and then use in the logical code... IF (ObsType == SurfaceWind) THEN
Similarly avoid the use of ‘magic logicals’ in CALLs to subroutines. Such use makes the code less readable and developers are required to look at the called subroutine to find what has been set to either
.TRUE.or.FALSE..Poor Practice
CALL Phys(.FALSE.,.TRUE.,icode)
Better Approach
...specify in the header local constant section.... ...meaningful logical names, perhaps base them on what is used in the called subroutine LOGICAL, PARAMETER :: bl_is_off = .FALSE. LOGICAL, PARAMETER :: conv_is_on = .TRUE. ...and then use in the relevant subroutine calls... CALL Phys(bl_is_off, conv_is_on, icode)
Be careful when comparing real numbers using
==. To avoid problems related to machine precision, a threshold on the difference between the two numbers is often preferable, e.g.Common practice
IF ( real1 == real2 ) THEN ... END IF
Better approach
IF ( ABS(real1 - real2) < small_number ) THEN ... END IF
where small_number is some suitably small number. In most cases, a suitable value for small_number can be obtained using the Fortran intrinsic functions
EPSILONorTINY.The UM perturbation sensitivity project is currently in the process of identifying coding issues that lead to excessive perturbation growth in the model. Currently, all problems are emerging at IF tests that contain comparisons between real numbers. Typical, real case UM examples of what can go wrong are detailed in Appendix C of this document.
Loops must terminate with an
END DOstatement. To improve the clarity of program structure you are encouraged to add labels or comments to theDOandEND DOstatements.DO i = 1, 100 j_loop: DO j = 1, 10 DO k = 1, 10 ...code statements... END DO ! k END DO j_loop END DO ! outer loop i
EXITstatements must be labelled. This is both for clarity, and to ensure consistency of behaviour. (The semantics of theEXITstatement changes between revisions of the Fortran standard.)i_loop: DO i = 1, 10 IF (i > 3) EXIT i_loop END DO i_loop
Avoid the use of the
GO TOstatement.The only acceptable use of
GO TOis to jump to the end of a routine after the detection of an error, in which case you must use9999as the label (then everyone will understand whatGO TO 9999means).UM Error reporting guidance is detailed in S19
Avoid assigned
GO TO, computedGO TO, arithmeticIF, etc. Use the appropriate modern constructs such asIF,WHERE,SELECT CASE, etc..Where possible, consider using
CYCLE,EXITor aWHEREconstruct to simplify complicatedDOloops.Be aware that logic in
IFconditions can be performed in any order. So checking that array is greater than lower bound and using that index is not safe.Common approach
DO j = 1, rows DO i = 1, row_length IF (cloud_level(i,j) > 0 .AND. cloud(i,j,cloud_level(i,j)) == 0.0) THEN cloud(i,j,cloud_level(i,j)) = 1.0 END IF END DO END DO
Better approach
DO j = 1, rows DO i = 1, row_length IF (cloud_level(i,j) > 0) THEN IF (cloud(i,j,cloud_level(i,j)) == 0.0) THEN cloud(i,j,cloud_level(i,j)) = 1.0 END IF END IF END DO END DO
Array initialisations and literals should use the
[]form rather than the(//)form. For example:INTEGER :: i_array(3) = [1,2,3]
S10. Line continuation#
The only symbol to be used as a continuation line marker is ‘
&’ at the end of a line. It is suggested that you align these continuation markers to aid readability. Do not add a second ‘&’ to the beginning of the next line. This advice also applies to blocks of Fortran code protected by the OpenMP sentinel ‘!$’. The only currently allowed exception is to continuation lines used with OpenMP directives, i.e. ‘!$OMP’, where the ‘&’ marker may optionally be used. Please see section S15 for more advice on OpenMP.Short and simple Fortran statements are easier to read and understand than long and complex ones. Where possible, avoid using continuation lines in a statement.
Try to avoid string continuations and spread the string across multiple lines using concatenations (
//) instead.When calling functions or subroutines, ensure the left parenthesis is on the same line as the subprogram’s name, and not after a continuation marker. This helps the code browser to parse the source tree correctly.
S11. Fortran I/O#
When calling
OPEN, ensure that theACTIONargument is specified. In particular,ACTION='READ'shall be used for files that are opened only for reading as this reduces file locking costs.Don’t check for the existence of a file by using
INQUIREif the only action you’ll take if the file doesn’t exist is to report an error. Rather useOPEN( ... , IOSTAT=icode, IOMSG=iomessage)and include theiomessagein an error message ificodeis non-zero. This will capture a wider range of errors with fewer filesystem metadata accesses.
S12. Formatting and output of text#
Writing output to the “stdout” stream, commonly unit 6 in fortran must use the
provided API, which is accessible by including USE umPrintMgr in the
calling code.
Single string output should be written as
CALL umprint('Hello',src='routine_name')
where ‘routine_name’ is the name of the current subroutine or function. Routines which implement DrHook (section S14) will already have a
PARAMETER 'RoutineName'which can be used for this purpose.Multi-component output must first be written to an internal file via
WRITEstatement. TheumPrintMgrmodule provides a convenient string for this purpose;umMessage, though you may use your own.WRITE (ummessage,'(A,I0,A)') 'I am ', age, ' years old' CALL umprint(ummessage,src='routine_name')
Avoid the use of
WRITE (ummessage,*)Always add formatting information to your write statements. It is important to ensure that the output message fits within the space given. Some compilers will pad unformatted values with leading blanks, which can greatly increase the width of any output. Writes to internal files may cause the program to abort if the message is longer than the string provided.
Use dynamic-width edit descriptors where possible, to avoid truncating strings or failing to print integer or real values correctly:
Use
Afor character input and output, rather than e.g.A7.Use
I0for integer output, rather than e.g.I3.Use
F0.\(n\) for real output, rather than e.g.F14.\(n\). Other real edit descriptors such asE,ENandEScan also be used but do not accept a 0 field width.
This is particularly important in any routine where missing data indicators may be present, which will typically require a much larger width than other data.
The character variable
newline(from theumPrintmgrmodule) is recognised as a newline if embedded in the string passed toumPrint.The total line length should not exceed 80 characters. Use
newlineor separate calls toumprintto keep long messages easily readable.CHARACTERvalues should not contain vertical space, nor should edit descriptors be used for carriage control. Usenewlineto control vertical space:WRITE(ummessage, '(A)') newline // 'This should stand out.' // newline CALL umprint(ummessage,src='routine_name')
Calls to
umPrintshould be protected by a suitable setting of the PrintStatus variable, see S13 either with conditional logic or an additionallevelargument,CALL umprint(ummessage,src='routine_name',level=PrOper)
If your output is not required from each processor protect the
umPrinteither with logic, or an additionalpeargument, for example,! We'll only output at diagnostic level on pe0 CALL umprint(ummessage,src='routine_name',level=PrDiag,pe=0)
Never use a
FORMATstatement: they require the use of labels, and obscure the meaning of the I/O statement. The formatting information can be placed explicitly within theREAD,WRITEorPRINTstatement, or be assigned to aCHARACTERvariable in aPARAMETERstatement in the header of the routine for later use in I/O statements. Never place output text within the format specifier: i.e. only format information may be placed within theFMT=part of an I/O statement, all variables and literals, including any character literals, must be ‘arguments’ of the I/O routine itself. This improves readability by clearly separating what is to be read/written from how to read/write it.Common practice
WRITE(Cmessage, & & '("Cannot run with decomposition ",I3," x ",I3, & & " (",I3,") processors. ", & & "Maxproc is ",I3," processors.")') & & nproc_EW,nproc_NS,nproc_EW*nproc_NS,Maxproc
Better approach
WRITE(cmessage,'(4(A,I0),A)') & 'Cannot run with decomposition ',nproc_ew,'x',nproc_ns, & '(',nproc_ew*nproc_ns,') processors. Maxproc is ',maxproc, & ' processors.'
In order to flush output buffers, the routine
umprintflushshould be used for “stdout” written viaumprintandUM_FORT_FLUSHfor data writtent to any other fortran unit. These routines abstract flush operations providing a portable interface. These are the only method of flushing that should be used.
S13. PrintStatus#
There are four different settings of PrintStatus used in the UM, each of which
is assigned a numeric value. There is a shorter form available for each one.
These are defined as PARAMETERs and so can be tested using constructs
similar to:
IF (PrintStatus >= PrStatus_Normal) THEN
For “stdout”, they can also be provided as an argument to umprint. The
current value of PrintStatus is stored in the variable PrintStatus in the
aforementioned module, and set using the gui and/or input namelist. Note that
the utility executables operate at a fixed value of PrintStatus and that
output choices in code shared with these utilities will impact their
behaviour.
The different settings are:
PrStatus_MinorPrMin- This setting is intended to produce minimal output and should hence be only used for output which is required in every run. Users running with this setting should expect to have to rerun with a more verbose setting to diagnose any problems. Fatal error messages should fall into this category, but otherwise it should not generally be used by developers.PrStatus_NormalorPrNorm- The “standard” setting of PrintStatus. Messages with this setting should be important for all users in every run. Information output using this setting should summarise the situation - more detailed information should be protected byPrStatus_Diaginstead.PrStatus_OperorPrOper- Slightly more detailed thanPrStatus_Normal, this is intended for messages which are not required for research users but are needed when running operationally.PrStatus_Diagor orPrDiag- The most verbose option, all messages which do not fall into one of the above categories should use this setting. Non-essential, detailed information about values of variables, status messages, etc should be included in this category. If a developer adds code to assist debugging problems, it should also be protected byPrStatus_Diag.
S14. DrHook#
DrHook is a library written by ECMWF which can produce run-time information such as:
Per-routine profiling information based on walltime, CPU-time and MFlops.
Tracebacks in the event of code failure. A developer can force a traceback at any point in the code with an appropriate call to the DrHook library.
Memory usage information.
For DrHook to be effective, calls to the library are needed in each individual subroutine. DrHook must be called:
At the start of each routine, before any other executable code.
At each exit point from the routine; not only at the end, but just before any other
RETURNstatements.
When adding DrHook to a routine, the following rules should be followed:
Routines contained in modules should include the name of the module in the call to DrHook, colon-separated. E.g.
'MODULE_NAME:ROUTINE_NAME'.All names should be in capitals.
The necessary instrumentation code and the recommended method of implementing it is shown below.
CHARACTER(LEN=*), PARAMETER, PRIVATE :: ModuleName = 'MODULE_NAME'
CONTAINS
...
USE parkind1, ONLY: jpim, jprb
USE yomhook, ONLY: lhook, dr_hook
...
CHARACTER(LEN=*), PARAMETER :: RoutineName = 'ROUTINE_NAME'
INTEGER(KIND=jpim), PARAMETER :: zhook_in = 0
INTEGER(KIND=jpim), PARAMETER :: zhook_out = 1
REAL(KIND=jprb) :: zhook_handle
IF (lhook) CALL dr_hook(ModuleName//':'//RoutineName,zhook_in,zhook_handle)
...
IF (lhook) CALL dr_hook(ModuleName//':'//RoutineName,zhook_out,zhook_handle)
The example subroutine shown in How to meet the coding standards demonstrates DrHook instrumentation.
Calls to DrHook add a very small overhead to the code, and so should normally only be added to routines that do a non-trivial amount of work. Adding DrHook calls to very small routines may represent a large increase in the workload of those routines, and furthermore if those routines are called many thousands of times during a single run of the UM then this will generate large amounts of duplicate data. The developer and reviewer may decide it is unnecessary to include DrHook calls in such routines.
Note that there is no benefit to adding DrHook calls to a module that consists only of Fortran declarations and lacks any executable code.
DrHook calls should not be added to RECURSIVE routines as they are likely
to cause runtime errors.
S15. OpenMP#
OpenMP is a very powerful technology for introducing shared memory parallelism to a code, but it does have some potential for confusion. To help minimise this, the following should be adhered to,
Only use the OpenMP 3.1 standard. Support for OpenMP 4.0 is not yet widespread, and implementations are somewhat immature.
Only use the
!$OMPversion of the directive and start at beginning of the line (see previous general guidance on sentinels).Never rely on the default behaviour for
SHAREDorPRIVATEvariables. The use ofDEFAULT(NONE)is preferred, with the type of all variables explicitly specified. A differentDEFAULTmay be allowed if the number of variables is very large (i.e. dozens).Parameters by default are shared. To make this obvious it is helpful to list parameters used in the OMP block as a Fortran comment just before the
PARALLELregion.Always use explicit
!$OMP END DO- don’t rely on implicit rules.Unlike
SINGLEregions,MASTERregions do not carry an implicit barrier at the end. Please add an!$OMP BARRIERdirective immediately after!$OMP END MASTERdirectives. Barriers may be omitted for performance reasons if it is safe to do so.Calls to OpenMP functions and module use should be protected by the OpenMP sentinel. That is, the line should start with
!$and a space. No other comment line should start with this combination.Always specify the scheduler to be used for DO loops, since the default is implementation specific. A common default is STATIC. This is normally fine but can cause problems within certain cases.
As with non-OpenMP code, you should always use the optional space to separate the OpenMP keywords to improve readability. For example,
PARALLELDOshould becomePARALLEL DO. (See also: S4)Any use of a sentinel (including OpenMP) should start at the beginning of the line, e.g.
The following correctly uses the
!$OMPsentinel at the beginning of the line.IF (do_loop) THEN !$OMP PARALLEL DO PRIVATE(i) DO i = 1, 100 ... END DO !$OMP PARALLEL DO END IF
Whilst the following can lead to compilers not using the lines starting with
!$OMPsentinel.IF (do_loop) THEN !$OMP PARALLEL DO PRIVATE(i) DO i = 1, 100 ... END DO !$OMP PARALLEL DO END IF
Careful use of the OpenMP reduction clauses is required as we want to try and preserve bit-comparison across different threads. This is not guaranteed with some
REDUCTIONclauses.OpenMP directives in C code must be protected by both a
SHUM_USE_C_OPENMP_VIA_THREAD_UTILSand an_OPENMPif-def. This ensures it is possible to select the use of only the Fortran OpenMP runtime library, which can prevent incompatibilities between different libraries. If possible, provide a Fortran implementation of the OpenMP parallelism as well, using the wrappers in thethread_utilsmodule from SHUMlib. (Further rules apply; see OpenMP in C Code for more information.)
S16. MPI#
The Unified Model depends on the GCOM library for communications. GCOM has only modest functionality however so the use of MPI is permitted providing the following principles are adhered to:
Only use MPI via GCOM’s MPL interface layer. MPI libraries can be found that support only 32-bit argument or only 64-bit arguments. MPL is designed to abstract this issue away.
Only use functionality from versions of MPI up to 3.1. These have widespread support.
S17. Preprocessing#
Use of preprocessor directives should only be used when its inclusion can be justified, e.g. machine dependent options or reducing duplication of a large code section, see S18.
Do not use preprocessing directives (#if, #include, #endif) for
selecting science code section versions. Do not use #include directive to
pass a large list of arrays or to pass common items.
In particular:
- “Must” use
#if definedrather than#if. If the CPP flag does not exist the pre-processor evaluates the test to true.
- “Must” use
Use run-time rather than compile time switches
Do not replicate run-time switches with compile-time ones, so avoid
#if defined(OCEAN) IF (submodel == ocean) THEN #endif ... #if defined(OCEAN) END IF #endif
Do not add optional arguments to subroutines protected by directives, instead migrate to FORTRAN 95/2003 code and make use of OPTIONAL argument functionality.
Put
#iflines inside included files rather than around the#includeitself.Use directive names that clearly indicate their purpose.
When removing scientific sections, remove variables that were only needed for that section.
Do not wrap a routine within CPP flags. Let the compiler work out when it is required.
Please refrain from using consecutive question marks (
??) in the source code as some preprocessors can interpret them as C trigraphs.
S18. Code duplication#
In the case of a large area of code that needs to be duplicated, e.g. same
computation applied to different types, then the use of the #include
preprocessing directive is recommended by adhering the following rules:
Only one include file per routine. If a routine needs multiple include files, consider dividing the routine into small multiple routines. The same include file cannot be used in multiple modules or routines. Consider creating a special routine with the shared code if needed.
Use
*.has a file extension for#includefiles since the build system will automatically recognise it.File name should always be
modulename_routinename.h. An accepted exception is when the module name and the routine name are the same, e.g. instead ofroutine_mod_routine.huseroutine.h.The include file should be located in a special
includesub-directory where the Fortran module is located.An include file should only be used for reducing code duplication, not for performance reason. Let the compiler implement proper in-lining.
The following code shows an example on how to use the #include
preprocessing directive inside a module to reduce code duplication.
The module file
my_mod.F90in thesrc/path/to/moddirectory with the duplicated routines:INTERFACE calc_1 MODULE PROCEDURE calc_1_32bit,calc_1_64bit END INTERFACE SUBROUTINE calc_1_32bit(r,n,d) IMPLICIT NONE INTEGER, PARAMETER :: prec = real32 #include "my_mod_calc_1.h" END SUBROUTINE SUBROUTINE calc_1_64bit(r,n,d) IMPLICIT NONE INTEGER, PARAMETER :: prec = real64 #include "my_mod_calc_1.h" END SUBROUTINE
The included file
my_mod_calc_1.hin thesrc/path/to/mod/includedirectory with the shared code:! --- Begin shared body of calc_1 --- REAL(KIND=prec), INTENT(OUT) :: r REAL(KIND=prec), INTENT(IN) :: n REAL(KIND=prec), INTENT(IN) :: d r = n / d ! --- End shared body of calc_1 ---
S19. Error reporting#
The most important rule in error reporting is never to CALL abort or to
use STOP; these can cause problems in a parallel computing environment.
Where it is possible that errors may occur they should be detected and
appropriate action taken. Errors may be of two types: fatal errors requiring
program termination; and non-fatal warnings which allow the program to
continue. Both types are passed to a reporting routine ereport, which
takes different actions depending on the value of the error code passed to it
as an argument:
If the error code is
> 0an error message will be printed and the program will abort (hopefully with a traceback).If the error code is
< 0a warning message will be printed, the error code variable will be reset to 0, and the program continues.If the error code is 0 nothing happens and the program continues uninterrupted.
Both warnings and errors are sent to the .pe\(n\) file of the
processor generating the warning, which is stdout for processor 0 only.
Warnings will only appear in stderr if they occur on processor 0. Errors will
always appear in stderr. Note that if a warning occurs on a processor for
which output has been disabled using the print manager settings, then that
warning will not be printed as there will be no .pe\(n\) file to
send it to.
When using READ or OPEN or other Fortran intrinsics which deal with IO,
please use both the error status IOSTAT and the error message IOMSG
arguments, followed by code printing the latter if the former is non-zero. The
check_iostat subroutine provides a convenient way to do this; any non-zero
value of IOSTAT will cause it to print the return value of IOMSG and
abort the program.
The arguments of
ereportare:SUBROUTINE ereport (RoutineName, ErrorStatus,Message) CHARACTER(LEN=*), INTENT(IN) :: RoutineName ! Name of the calling routine CHARACTER(LEN=*), INTENT(IN) :: Message ! Error message for output INTEGER, INTENT(IN OUT) :: ErrorStatus ! Error code
Ensure the error code variable is set to zero before use. This includes at the start of every routine where it is a local variable, and also before calling any routine that returns it(
INTENT(IN OUT)).Error messages should contain enough information to help the user diagnose and solve the problem.
Avoid splitting error information between stdout (
umprint) and stderr (ereport). Keep the details in one place where possible. If the nature of the error requires large quantities of additional data in stdout to diagnose it properly, make this clear in the error message.The variable
errormessagelengthin moduleerrormessagelength_modis provided for declaring the length ofCHARACTERvariables to be used with error reporting. This provides a longer string for holding e.g. the return value of anIMOSGargument.Avoid using a namelist input value or the return code of another routine as the error code, especially if you do not know what values it may take. It may not be apparent to the user that the problem value is actually the error code, or what sign it originally had. Use a dedicated error code and include the return code or problematic value in the message itself.
Common practice:
IF (foo /= 0) THEN icode = ABS(foo) cmessage = 'Invalid input value for foo' CALL ereport(RoutineName, icode, cmessage) END IF
Better approach:
IF (foo /= 0) THEN icode = 10 WRITE(cmessage, '(A,I0)') 'Invalid input value for foo. Value received: ',foo CALL ereport(RoutineName, icode, cmessage) END IF
Specific standards#
Runtime namelist variables, defaults, future development#
The UM reads in a number of run time ‘control’ namelists; within READLSTA.F90.
Examples are the RUN_<physics> type namelists. When new science options
are required to be added to the UM the developer is expected to add the new
variable/parameter to the relevant RUN_<physics> namelist and declaration
in the corresponding module, updating READLSTA.F90 as required.
The use of cruntimc.h is to be avoided as this approach is being phased out
in favour of suitable modules.
Code development should use MODULES to define namelist LOGICALS, PARAMETERS and VARIABLES (and their defaults) alongwith the NAMELIST.
It is essential that defaults are set; items within namelists are expected to fall into 3 camps:
variable never actually changes; it is a default for all users
this should be set in the code and removed from any input namelist.
variable rarely changes;
set identified default within UM code, with comment explaining choice.
We advise that these are not included in the namelist. A code change will be required to alter it.
regularly changes or is a new item and thus no default is yet suitable
LOGICALs usually toFALSEvariables set to RMDI or IMDI
CHARACTERstrings should be set to a default string. For example,aero_data_dir = 'aero data dir is unset'
An example of preferred practice see RUN_Stochastic. The namelist variables
are all defined within a MODULE, stochastic_physics_run_mod.F90, including
default values.
Defensive input programming#
When real or integer values are read into the code by a namelist, the Rose metadata should either use a values list or a range so that the Rose GUI can warn the user of invalid values. These values should also be tested in the code to ensure that the values read in are valid. As it is possible to edit Rose namelists, or ignore Rose GUI warnings, the GUI should not be relied on for checking that input values of reals and integers are valid. It may also be appropriate to check logical values if a specific combination of logicals will cause an error for example.
The routine, chk_var, is available for developers to more easily check
their inputs. Checks made by chk_var should match any checks made by Rose,
however checks by chk_var are made by the code and will by default, abort
the run. Developers should refer to the um-training for
more information on chk_var.
Optimised namelist reading procedures#
As of UM9.1 the procedure to read UM namelists has been enhanced but this has implications for the code developer, requiring extra code changes when adding/removing a UM input namelist item. Tied with each namelist read is now the requirement for a ‘read_nml_routine’ usually found in the containing module of the namelist.
If a coder wishes to add a new variable to a namelist (xxxxxx) then the new read_nml_xxxxxx subroutine will need changing. The changes required are:
increment the relevant type parameter by the variable size (for a real scalar increase n_real by 1)
add a new line to the list in the my_namelist type declaration in the relevant variable type.
add a new line to the my_nml population section in the relevant variable type
add a new line to the namelist population section in the relevant variable type.
See the UM code for examples.
Unix script standards#
This standard covers UM shell scripts which are used in the operational suite as well as within the UM itself. The requirements that this standard is intended to meet are as follows:
The script should be easily understood and used, and should be easy for a programmer other than the original author to modify.
To simplify portability it should conform to the unix standard as much as possible, and exclude obsolescent and implementation-specific features when possible.
It should be written in an efficient way.
The structure of the script should conform to the design agreed in the project plan.
Scripts are to be regarded as being control code as far as external documentation is concerned.
Python standards#
Python code used in or with the UM should obey the standard Python style guide PEP 8. This means that our Python code will follow the same guidelines commonly adhered to in other Python projects, including Rose.
C standards#
C code used in or with the UM should conform to the C99 standard (ISO/IEC 9899:1999: Programming languages - C (1999) by JTC 1/SC 22/WG 14).
Furthermore, it is assumed that any C implementation used by the UM supports
C99 Annex F (IEC 60559 Floating-point arithmetic) i.e. it is assumed the
implementation defines __STDC_IEC_559__. It is also assumed the
implementation provides the optional 8-, 16-, 32-, and 64-bit exact-width
integer types.
Preprocessing of C#
Preprocessing of source files is allowed, as defined by the C99 standard, but
with a few minor exceptions. This use includes - but is not limited to - the
use of #include, macros, #pragma, and _Pragma statements.
The exceptions are as follows:
Code must not be dependent on preprocessing to select optional or platform specific features in order for it to compile or run. Platform specific and optional code are allowed; but this should augment basic functionality rather than implment a key component of it. In other words, code should be able to compile and run correctly on all platforms without any optional or platform dependent macros being defined, even if the code could take advantage of them on that platform.
Platform specific code must be protected by an if-def test on a compiler and/or platform specific macro as appropriate. (Examples may include the use of
__GNUC__,__clang__,__linux__,_AIX,__x86_64__, or__aarch64__) This includes the protection of compiler-specific#pragma/_Pragmastatements.If-def tests must not use the
#ifdef/#ifndefstyle. Instead use#if defined()or#if !defined()as appropriate. This restriction is required to simplify the implementation of automated testing.
Code Layout#
Rules regarding whitespace, 80 column line widths, prohibition on tab use, and
the use of UK English apply to C code as they would Fortran code. Comments
should use the traditional /* */ style; C++ style comments (//) should
be avoided.
Copyright and Code Owner Comments#
Copyright and code owner comments follow the same rules as in Fortran, except
with slight modification for the differing comment delimiters in the two
languages - using /* */ instead of !. An example of a compliant
comment header detailing copyright and code owner comments is given below.
/**********************************COPYRIGHT***********************************/
/* (C) Crown copyright Met Office. All rights reserved. */
/* For further details please refer to the file COPYRIGHT.txt */
/* which you should have received as part of this distribution. */
/**********************************COPYRIGHT***********************************/
/* Code Owner: Please refer to the UM file CodeOwners.txt */
/* This file belongs in section: C Code */
Deprecated identifiers#
In addition to the identifiers deprecated by the C99 standard, the following table lists identifiers which should be considered deprecated within UM code - and where appropriate, what to replace them with.
OpenMP in C Code#
It is possible for the runtime libraries used by OpenMP to be incompatible if different vendors or compiler versions are used for the C and Fortran compiler. For this reason, whilst use of OpenMP in C code is permitted, there are some rules governing acceptable use that must be followed.
Protecting OpenMP in C Code#
OpenMP directives (#pragma omp) in C code must be protected by both a
SHUM_USE_C_OPENMP_VIA_THREAD_UTILS and an _OPENMP #ifdef. This
ensures it is possible to select the use of only the Fortan OpenMP runtime
library if required. If possible, provide a Fortran implementation of the
OpenMP parallelism as well, using the wrappers in the thread_utils module
from SHUMlib. An example of such use is given below.
#if defined(_OPENMP) && defined(SHUM_USE_C_OPENMP_VIA_THREAD_UTILS)
/* this branch uses the Fortran OpenMP runtime, via the SHUMlib thread_utils module */
thread_utils_func();
#elif defined(_OPENMP) && !defined(SHUM_USE_C_OPENMP_VIA_THREAD_UTILS)
/* this branch uses OpenMP pragmas within C */
#pragma omp parallel
{
omp_func();
}
#else
/* this branch does not use OpenMP */
serial_func();
#endif
Ideally this should lead to code capable of providing all three possible runtime outcomes, the use of which are compile-time configurable:
No OpenMP is used.
OpenMP is used through the C runtime library. (The compiler defines
_OPENMP, through the nomal compiler switch selection process.)OpenMP is used through the Fortran runtime library, accesed via SHUMlib. (The compiler defines
_OPENMP; the user definesSHUM_USE_C_OPENMP_VIA_THREAD_UTILS)
You must always ensure that the no OpenMP case is possible.
(See also: The SHUMlib documentation on shum_thread_utils)
Other Uses of the _OPENMP Macro#
The use of the _OPENMP preprocessor macro for code other than directives is
permitted. This can be used equivalently to how the !$ sentinel would be
in Fortran. A recommended use is to protect the inclusion of the header for
the thread_utils module, as shown below.
#if defined(_OPENMP) && defined(SHUM_USE_C_OPENMP_VIA_THREAD_UTILS)
#include "c_shum_thread_utils.h"
#endif
Or to protect inclusion of the OpenMP header, as shown below.
#if defined(_OPENMP) && !defined(SHUM_USE_C_OPENMP_VIA_THREAD_UTILS)
#include <omp.h>
#endif
Further Rules for OpenMP in C#
In order to standardise the way the above rules are implemented, and to allow for automated checking of the compliance of code, the following additional rules are imposed.
You cannot hide the use of the
_OPENMP&SHUM_USE_C_OPENMP_VIA_THREAD_UTILSmacros through the definition of a third macro dependent on them. For example, you must not define and use a new macro in place of the two original macros, as shown here:#define USE_THREAD_UTILS defined(_OPENMP) && defined(SHUM_USE_C_OPENMP_VIA_THREAD_UTILS) #if defined(USE_THREAD_UTILS) thread_utils_func(); #endif
If-def tests on
_OPENMP&SHUM_USE_C_OPENMP_VIA_THREAD_UTILSmust always occur as a pair. You may not test the use of_OPENMPorSHUM_USE_C_OPENMP_VIA_THREAD_UTILSin isolation._OPENMPmust come first in any#if defined()pair.Any OpenMP
#if defined()pair must not also include a logical test on a third macro. If this functionality is required, find an appropriate nesting of#if defined()tests. For example instead of:#if defined(_OPENMP) && defined(SHUM_USE_C_OPENMP_VIA_THREAD_UTILS) && defined(OTHER) /* do stuff */ #endif
Use:
#if defined(_OPENMP) && defined(SHUM_USE_C_OPENMP_VIA_THREAD_UTILS) #if defined(OTHER) /* do stuff */ #endif #endif
You must not use negative logic in an if-def test on
_OPENMP(i.e.#if !defined(_OPENMP)). Instead, use positive logic and an#elsebranch. Use of negative logic is permitted for if-def tests on the accompanyingSHUM_USE_C_OPENMP_VIA_THREAD_UTILSmacro, as this will be required to distinguish between cases using the C and Fortan OpenMP runtimes.
Code Reviews#
In order to ensure that these standards are adhered to and are having the desired effect code reviews must be held. Reviews can also be useful in disseminating computing skills. To this end two types of code review are performed in the order below:
A science/technical review is performed first to ensure that the code performs as it is intended, it complies with the standards and is well documented. Guidance for reviewers is found in the Science/Technical Review Guidance page on the UM homepage.
A Code Review is performed to analyse the change for its impact, ensure that it meets this coding standard and to ensure that all concerned parties are made aware of changes that are required. Guidance for reviewers is outlined in Code Review Guidance page.
A. UM Software standard summary#
The rules discussed in the main text are reproduced here in summary form with pdf links to the sections.
Standard |
Section |
|---|---|
Use the naming convention for program units. |
|
Use your header and supply the appropriately complete code header |
|
History comments are NOT required and should be removed from routines. |
|
Fortan code should be written in free source form |
|
Code must occur in columns 1-80 (1-100 for CreateBC). |
|
Never put more than one statement per line. |
|
Use English in your code. |
|
All Fortran keywords should be ALL CAPS while everything else is lowercase or CamelCase. |
|
Avoid archaic Fortran features |
|
Only use the generic names of intrinsic functions |
|
Comments start with a single
|
|
Single line comments can be indented within the code, after the statement. |
|
Do not leave a blank line after a comment line. |
|
Do NOT use TABS within UM code. |
|
The use of MODULEs is greatly encouraged. |
|
Use meaningful variable names |
|
Use and declare variables and arguments in the order |
|
Use |
|
Use |
|
Use |
|
Do not use |
|
The use of ALLOCATABLE arrays can optmize memory use. |
|
Indent code within |
|
Terminate loops with |
|
|
|
Avoid comparing two reals
|
|
Avoid using ‘magic numbers’ and ‘magic logicals’ |
|
Avoid use of |
|
Avoid numeric labels |
|
Exception is for error trapping,
jump to the label |
|
Continuation line marker must be
|
|
Always use an |
|
Check for file existence with
|
|
Always format information explcitly within WRITE, READs etc. |
|
Ensure that output messages do
not use
|
|
Ensure that output messages are
protected by an appropriate
setting of |
|
Ensure your subroutines are instrumented for DrHook. |
|
Only use OpenMP sentinels at the
beginning of lines |
|
Be very careful when altering calculations within a OpenMP block. |
|
If possible implement runtime logicals rather than compile time logicals. |
|
Do not replicate (duplicate) runtime logic with cpp logic. |
|
Do not protect optional arguments with cpp flags, use OPTIONAL args instead. |
|
Do not use CPP flags for selecting science code, use runtime logicals |
|
Use
|
|
Never use |
|
New namelist items should begin life as category c items. |
B. Fortran 2003#
The following table provides guidance on which Fortran 2003 features are welcome for inclusion in the UM.
This has been compiled upon review of major Fortran compilers feature support.
Feature |
Acceptable |
Comment |
|---|---|---|
ISO TR 15581 Allocatable Enhancements |
Yes |
|
Interoperability with C |
Yes |
|
Access to the computing environment |
Yes |
|
Flush |
Yes |
|
IOMSG |
Yes |
|
Assignment to an allocatable array |
No |
Includes auto-reallocation |
Intrinsic Modules |
Yes |
eg ISO_C_BINDING |
Allocatable Scalars |
Yes |
|
Allocatable Character lengths |
Yes |
gnu offers partial support. |
VOLATILE attribute |
Yes |
|
Parametrized derived data types |
No |
Lack of compiler support |
O-O coding: type extension, polymorphic entities, type bound procedures |
No |
Not for the current UM, but considered for the UM replacement, LFRIC-GUNGHO and MakeBC replacement CreateBC |
Derived type input output |
No |
Lack of compiler support |
Kind type parameters of integer specifiers |
No |
Lack of compiler support |
Recursive input/output |
No |
|
Transferring an allocation |
No |
Prefer to see DEALLOCATEs used for code readability. |
Support for international character sets |
No |
C. Dealing with rounding issues.#
Background#
The UM perturbation sensitivity project identified coding issues that lead to
excessive perturbation growth in the model. Problems identified included
IF tests that contained comparisons between real numbers; for example
IF (qCL(i) > 0.0 ) In this test, qCL(i) is being used to represent one
of two states;
“no liquid cloud”
“some liquid cloud”
This is fine, but it is then important to ensure that rounding issues do not
lead to unintended changes of state prior to the test, such as slightly
non-zero qCL(i) values when there is supposed to be no liquid cloud. If
such problems occur at discontinuous branches in the code, the result is
spurious perturbation growth.
This appendix collects together some typical examples of what can go wrong, and how to deal with them. First, though, it is worth making a quick note of some of the characteristics of floating-point arithmetic.
Floating-point identities and non-identities#
In floating-point arithmetic many of the identities that hold in normal arithmetic no longer hold, basically because of the limited precision available to represent real numbers. Thus, it is often important that coders know which algebraic identities pass through to floating-point arithmetic and which don’t, and how results can be affected by the way the calculations are implemented by the compiler. For chapter and verse on floating-point arithmetic, a good reference is “David Goldberg’s article “
The following floating-point identity always holds:
0.0 * x = 0.0
The following also hold, but only if the numbers that go into the calculations have the same precision:
0.0 + x = x
1.0 * x = x
x / x = 1.0
x - x = 0.0
x - y = x + (-y)
x + y = y + x
x * y = y * x
2.0 * x = x + x
0.5 * x = x / 2.0
For example, optimisation may lead to some variables being held in cache and others in main memory, and these will generally store numbers with different levels of precision. Thus, coding based on these identities will probably work as intended in most circumstances, but may be vulnerable to higher levels of optimisation.
The following are non-identities:
x + (y + z) /= (x + y) + z
x * (y * z) /= (x * y) * z
x * (y / z) /= (x * y) / z
These say that, unlike in normal arithmetic, the order of the calculations matters. Failure to recognise this can cause problems, as in example 1 below. (Note that putting brackets around calculations to try and impose the “correct” order of calculation will not necessarily work; the compiler will decide for itself!)
Example 1: Non-distributive arithmetic#
At UM vn7.4, the routine LSP_DEPOSITION contains the following
calculation:
! Deposition removes some liquid water content
! First estimate of the liquid water removed is explicit
dqil(i) = max (min ( dqi_dep(i)*area_mix(i) &
& /(area_mix(i)+area_ice_1(i)), &
& qcl(i)*area_mix(i)/cfliq(i)) ,0.0)
...
If (l_seq) Then
qcl(i) = qcl(i) - dqil(i) ! Bergeron Findeisen acts first
Here, dqil is a change to cloud liquid water qcl, which is limited in
the calculation to qcl*area_mix/cfliq, where area_mix is the fraction
of the gridbox with both liquid and ice cloud, and cfliq is the fraction
with liquid cloud. Basically, the change to cloud liquid water is being
limited by the amount of liquid cloud which overlaps with ice cloud it can
deposit onto.
In the special case that all the liquid cloud coincides with ice cloud, we have
area_mix = cfliq, implying area_mix/cfliq = 1.0. In this case, the
limit for dqil should be exactly qcl, but is coded as
qcl*area_mix/cfliq. In tests on the IBM, it seems that the compiler
decides that the multiplication should precede the division, so the outcome of
the calculation is not necessarily qcl. Thus, the update to qcl on the
last line does not necessarily lead to qcl = 0.0 when the limit is hit.
One solution to this problem is to supply area_mix/cfliq directly as a
ratio:
If (cfliq(i) /= 0.0) Then
areamix_over_cfliq(i)=area_mix(i)/cfliq(i)
End if
...
! Deposition removes some liquid water content
! First estimate of the liquid water removed is explicit
dqil(i) = max (min ( dqi_dep(i)*area_mix(i) &
& /(area_mix(i)+area_ice_1(i)), &
& qcl(i)*areamix_over_cfliq(i)) ,0.0)
This is the solution we have adopted in the large-scale precipitation code.
Example 2: Changing units when applying limits#
At UM vn7.4, the routine LSP_TIDY contains the following calculation:
! Calculate transfer rate
dpr(i) = temp7(i) / lfrcp ! Rate based on Tw excess
! Limit to the amount of snow available
dpr(i) = min(dpr(i) , snow_agg(i) &
& * dhi(i)*iterations*rhor(i) )
...
! Update values of snow and rain
If (l_seq) Then
snow_agg(i) = snow_agg(i) - dpr(i)*rho(i)*dhilsiterr(i)
qrain(i) = qrain(i) + dpr(i)
where
dhilsiterr(i) = 1.0/(dhi(i)*iterations)
rhor(i) = 1.0/rho(i)
Here, dpr is a conversion rate from snow into rain, and the second
statement limits this rate to that required to melt all of the snow within the
timestep. Thus, the intention is that if this limit is hit the final snow
amount will come out to exactly 0.0. However, the outcome in this case is
effectively as follows:
dpr(i) = snow_agg(i) * dhi(i)*iterations*rhor(i)
snow_agg(i) = snow_agg(i) - dpr(i)*rho(i)*dhilsiterr(i)
( = snow_agg(i) &
- snow_agg(i) &
* dhi(i)*iterations*rhor(i)*rho(i)*1.0/(dhi(i)*iterations) )
In normal arithmetic, the multiplier on the final line comes out to exactly one, but this is not necessarily the case in floating-point arithmetic. Whether the expression comes out to exactly 1.0 or not will be highly sensitive to the values going into the calculation. If the result is slightly different to 1.0, the outcome is likely to be a tiny but non-zero snow amount.
The basic problem here is that the limit comes from a particular quantity, but is being applied indirectly via its rate of change. Thus when the limiting quantity is updated a change of units is required. The solution here is to apply the limit to the quantity itself, shifting the change of units to calculations involving rates:
! Calculate transfer
dp(i) = rho(i)*dhilsiterr(i)*temp7(i) / lfrcp
! Limit to the amount of snow available
dp(i) = min(dp(i), snow_agg(i))
...
! Update values of snow and rain
If (l_seq) Then
snow_agg(i) = snow_agg(i) - dp(i)
qrain(i) = qrain(i) + dp(i)*dhi(i)*iterations*rhor(i)
Example 3: Dealing with special cases#
At UM vn7.4, the routine LS_CLD contains the following calculation to
update the total cloud fraction CF given the liquid and frozen cloud
fractions CFL and CFF:
TEMP0=OVERLAP_RANDOM
TEMP1=0.5*(OVERLAP_MAX-OVERLAP_MIN)
TEMP2=0.5*(OVERLAP_MAX+OVERLAP_MIN)-OVERLAP_RANDOM
CF(I,J,K)=CFL(I,J,K)+CFF(I,J,K) &
& -(TEMP0+TEMP1*OVERLAP_ICE_LIQUID &
& +TEMP2*OVERLAP_ICE_LIQUID*OVERLAP_ICE_LIQUID)
! Check that the overlap wasnt negative
CF(I,J,K)=MIN(CF(I,J,K),CFL(I,J,K)+CFF(I,J,K))
During testing, it was observed that CF was often coming out to
0.9999999999999....; i.e., almost but not quite 1.0, and that whether this
occured was highly sensitive to the input data. This sensitivity was then
being passed down to branches testing on, for example, whether CF was
equal to CFF.
If the above calculations are followed through algebraically, it can be shown
that if CFL+CFF >= 1, then CF must be exactly one. In the
floating-point case, however, this no longer follows, so we often get cases
where there is a slight deviation from unity. The simplest solution in this
example is to deal with the special case separately:
TEMP0=OVERLAP_RANDOM
TEMP1=0.5*(OVERLAP_MAX-OVERLAP_MIN)
TEMP2=0.5*(OVERLAP_MAX+OVERLAP_MIN)-OVERLAP_RANDOM
! CFF + CFL >= 1 implies CF = 1
IF (CFL(I,J,K)+CFF(I,J,K) >= 1.0) THEN
CF(I,J,K) = 1.0
ELSE
CF(I,J,K)=CFL(I,J,K)+CFF(I,J,K) &
& -(TEMP0+TEMP1*OVERLAP_ICE_LIQUID &
& +TEMP2*OVERLAP_ICE_LIQUID*OVERLAP_ICE_LIQUID)
! Check that the overlap wasnt negative
CF(I,J,K)=MIN(CF(I,J,K),CFL(I,J,K)+CFF(I,J,K))
END IF