The cmipdata API

This section describes the cmipdata Application Programming Interface (API). It contains a list of classes and functions, within the three core cmipdata modules: classes, preprocessing_tools and loading_tools. The little developed plotting_tools is also described.

classes

The classes module provides one classes and three functions. The class: DataNode

The core functionality of cmipdata is to organize a large number of model output files into a logical structure so that further processing can be done. Data is organized into a tree-like structure using the class DataNode as the nodes of a tree. The entire tree structure will be referred to as an ensemble. At each level of the tree the level is specified by the genre attribute.

Various methods exist to interact with the ensemble, and its constituent elements.

The mkensemble() function is used to create Ensemble objects, while match_ensembles() finds models common to two ensembles and match_reliazations() matches realizations between two ensembles. Once created, an ensemble can be used to harness the power of the preprocessing_tools to apply systematic operations to all files.

class classes.DataNode(genre, name, parent=None, **kwargs)[source]

Bases: object

Defines a cmipdata DataNode.

Attributes

genre (string) The attribute of DataNode
name (string) The name of the particular genre
children (list) List of DataNodees of genre beneath the current DataNode
parent (DataNode) for genre ‘ensemble’ the parent is None
start_date (string) for genre ‘file’
end_date (string) for genre ‘file’
realm (string) for genre ‘variable’ contains the realm of the varaible

Methods

add(child)[source]

Add DataNode to children

Parameters:child : DataNode
delete(child)[source]

Delete DataNode from children

Parameters:child : DataNode
fulldetails()[source]

prints information about the number of models, experiments, variables and files ina DataNode tree.

fulldetails_tofile(fi)[source]

prints information about the number of models, experiments, variables and files ina DataNode tree.

getChild(input_name)[source]
Returns DataNode given the name of the DataNode
if it is in children
Parameters:input_name : string
Returns:DataNode : Returns None if the DataNode is not in children
getDictionary()[source]

Returns a dictionary which has the genres and their names for all the ancestors of the DataNode

getNameWithoutDates()[source]

Return string name with the dates removed if present

Returns:string
lister(genre, unique=True)[source]

Returns a list of names of a particular genre

Parameters:

genre : string

the genre of returned list

unique: boolean

if True removes duplicates from the list

Return

——

list of strings

mer()[source]
Returns a generator containing lists of length 3
with the DataNode genre:’realization’
the DataNode genre:’experiment’ string model-experiment-realization
Returns:generator
objects(genre)[source]

Returns a generator for a DataNode of a particular genre

Parameters:

genre : string

the genre of returned generator

parentobject(genre)[source]

Returns the parent DataNode of a particular genre

Parameters:

genre : string

the genre of returned DataNode

sinfo(listOfGenres=['variable', 'model', 'experiment', 'realization', 'ncfile'])[source]

Returns the number of models, experiments, realizations, variables and files in the DataNode

squeeze()[source]

Remove any empty elements from the ensemble

classes.match_models(ens1, ens2, delete=False)[source]

Find common models between two ensembles.

Parameters:

ens1 : cmipdata ensemble

ens2 : cmipdata ensemble

the two cmipdata ensembles to compare.

Returns:

ens1 : cmipdata ensemble

ens2 : cmipdata ensemble

two ensembles with matching models.

classes.match_realizations(ens1, ens2, delete=False)[source]

Find common realizations between two ensembles.

Parameters:

ens1 : cmipdata ensemble

ens2 : cmipdata ensemble

the two cmipdata ensembles to compare.

Returns:

ens1 : cmipdata ensemble

ens2 : cmipdata ensemble

two ensembles with matching realizations.

classes.mkensemble(filepattern, experiment='*', prefix='', kwargs='')[source]

Creates and returns a cmipdata ensemble from a list of filenames matching filepattern.

Optionally specifying prefix will remove prefix from each filename before the parsing is done. This is useful, for example, to remove pre-pended paths used in filepattern (see example 2).

Once the list of matching filenames is derived, the model, experiment, realization, variable, start_date and end_date fields are extracted by parsing the filnames against a specified file naming convention. By default this is the CMIP5 convention, which is:

variable_realm_model_experiment_realization_startdate-enddate.nc

If the default CMIP5 naming convention is not used by your files, an arbitary naming convention for the parsing may be specified by the dicionary kwargs (see example 3).

Parameters:

filepattern : string

A string that by default is matched against all files in the current directory. But filepattern could include a full path to reference files not in the current directory, and can also include wildcards.

prefix : string

A pattern occuring in filepattern before the start of the official filename, as defined by the file naming converntion. For instance, a path preceeding the filename.

Examples

1. Create ensemble of all sea-level pressure files from the historical experiment in the current directory:

ens = mkensemble('psl*historical*.nc')

2. Create ensemble of all sea-level pressure files from all experiments in a non-local directory:

ens = mkensemble('/home/ncs/ra40/cmip5/sam/c5_slp/psl*'
              , prefix='/home/ncs/ra40/cmip5/sam/c5_slp/')
  1. Create ensemble defining a custom file naming convention:

    kwargs = {'separator':'_', 'variable':0, 'realm':1, 'model':2, 'experiment':3,
              'realization':4, 'dates':5}
    
    ens = mkensemble('psl*.nc', **kwargs)
    

preprocessing_tools

The preprocessing_tools module of cmipdata is a set of functions which use os.system calls to Climate Data Operators (cdo) to systematically apply a given processing on multiple NetCDF files, which are listed in cmipdata ensemble objects.
preprocessing_tools.areaint(ensemble, delete=True, output_prefix='')[source]

Calculate the area weighted integral for each file in ens.

The output files are prepended with ‘area-integral’. The original the input files are removed if delete=True (default). An updated ensemble object is also returned.

Parameters:

ens : cmipdata Ensemble

The ensemble on which to do the processing.

delete : boolean

If delete=True, delete the original input files.

Returns:

ens : cmipdata Ensemble

An updated ensemble object, containing the names of the newly processed files.

The processed files are also written to present working directory.

Examples

  1. Compute the area integral for all files in ens:

    ens = cd.areaint(ens)
    
preprocessing_tools.areamean(ensemble, delete=True, output_prefix='')[source]

Calculate the area mean for each file in ens.

The output files are prepended with ‘area-mean’. The original the input files are removed if delete=True (default). An updated ensemble object is also returned.

Parameters:

ens : cmipdata Ensemble

The ensemble on which to do the processing.

delete : boolean

If delete=True, delete the original input files.

Returns:

ens : cmipdata Ensemble

An updated ensemble object, containing the names of the newly processed files.

The processed files are also written to present working directory.

Examples

  1. Compute the area mean for all files in ens:

    area_mean_ens = cd.areamean(ens)
    
preprocessing_tools.cat_exp_slices(ensemble, delete=True, output_prefix='')[source]

Concatenate multiple time-slice files per experiment.

For all models in ens which divide their output into multiple files per experiment (time-slices), cat_exp_slices concatenates the files into one unified file, and deletes the individual slices, unless delete=False. The input ensemble can contain multiple models, experiments, realizations and variables, which cat_exp_slices will process independently. In other words, files are joined per-model, per-experiment, per-realization, per-variable. For example, if the ensemble contains two experiments for many models/realizations for variable psl, two unified files will be produced per realization: one for the historical and one for the rcp45 experiment. To join files over experiments (e.g. to concatenate historical and rcp45) see cat_experiments.

Parameters:

ens : cmipdata Ensemble

The ensemble on which to do the concatenation.

delete : boolean

If delete=True, delete the individual time-slice files.

Returns:

ens : cmipdata Ensemble

An updated ensemble object, containing the names of the newly concatenated files.

The concatenated files are written to present working directory.

See also

cat_experiments
Concatenate the files for two experiments.

Examples

For a simple ensemble comprized of only 1 model, 1 experiment and one realization.:

# Look at the ensemble structure before the concatenation
ens.fulldetails()
HadCM3:
    historical
            r1i1p1
                    ts
                          ts_Amon_HadCM3_historical_r1i1p1_185912-188411.nc
                          ts_Amon_HadCM3_historical_r1i1p1_188412-190911.nc
                          ts_Amon_HadCM3_historical_r1i1p1_190912-193411.nc
                          ts_Amon_HadCM3_historical_r1i1p1_193412-195911.nc
                          ts_Amon_HadCM3_historical_r1i1p1_195912-198411.nc
                          ts_Amon_HadCM3_historical_r1i1p1_198412-200512.nc

# Do the concantenation
ens = cd.cat_exp_slices(ens)

# Look at the ensemble structure after the concatenation
ens.fulldetails()
HadCM3:
    historical
            r1i1p1
                    ts
                          ts_Amon_HadCM3_historical_r1i1p1_185912-200512.nc
preprocessing_tools.cat_experiments(ensemble, variable_name, exp1_name, exp2_name, delete=True, output_prefix='')[source]

Concatenate the files for two experiments.

Experiments exp1 and exp2 are concatenated into a single file for each realization of each model listed in ens. For each realization, the concatenated file for variable variable_name is written to the current working directory and the input files are deleted by default, unless delete=False.

The concatenation occurs for each realization for which input files exist for both exp1 and exp2. If no match is found for the realization in exp1 (i.e. there is no corresponding realization in exp2), then the files for both experiments are deleted from the path (unless delete=False) and the realization is removed from ens. Similarly if exp2 is missing for a given model, that model is deleted from ens.

Parameters:

ens : cmipdata Ensemble

The ensemble on which to do the concatenation.

variable_name : str

The name of the variable to be concatenated.

exp1_name : str

The name of the first experiment to be concatenated (e.g. ‘historical’).

exp2_name : str

The name of the second experiment to be concatenated (e.g. ‘rcp45’).

delete : boolean

If delete=True, delete the individual time-slice files.

Returns:

ens : cmipdata Ensemble

An updated ensemble object, containing the names of the newly concatenated files.

The concatenated files are written to present working directory.

Examples

  1. Join the historical and rcp45 simulations for variable ts in ens:

    ens = cd.cat_experiments(ens, 'ts', exp1_name='historical', exp2_name='rcp45')
    
preprocessing_tools.climatology(ensemble, delete=True, output_prefix='')[source]

Compute the monthly climatology for each file in ens.

The climatology is calculated over the full file-length using cdo ymonmean, and the output files are prepended with ‘climatology_‘. The original the input files are removed if delete=True (default). An updated ensemble object is also returned.

If you want to compute the climatology over a specific time slice, use time_slice before compute the climatology.

Parameters:

ens : cmipdata Ensemble

The ensemble on which to do the remapping.

delete : boolean

If delete=True, delete the original input files.

Returns:

ens : cmipdata Ensemble

An updated ensemble object, containing the names of the newly processed files.

The processed files are also written to present working directory.

Examples

  1. Compute the climatology:

    climatology_ens = cd.climatology(ens)
    
preprocessing_tools.del_ens_files(ensem)[source]

delete from disk all files listed in ensemble ens

preprocessing_tools.ens_stats(ens, variable_name, output_prefix='')[source]

Compute the ensemble mean and standard deviation.

The ensemble mean and standard deviation is computed over all models-realizations and experiments for variable variable_name in ens, such that each model has a weight of one. An output file is written containing the ensemble mean and another file is written with the standard deviation, containing the names ‘_ENS-MEAN_‘ and ‘_ENS-STD_‘ in the place of the model-name. If the ensemble contains multiple experiments, files are written for each experiment.

The ensemble in ens must be homogenous. That is to say all files must be on the same grid and span the same time-frame, within each experiment (see remap, and time_slice for more). Additionally, variable_name should have only one filename per realization and experiment. That is, join_exp_slice should have been applied.

The calculation is done by, first computing the mean over all realizations for each model; then for the ensemble, calculating the mean over all models. The standard deviation is calculated across models using the realization mean for each model.

Parameters:

ens : cmipdata Ensemble

The ensemble on which to do the concatenation.

variable_name : str

The name of the variable to be concatenated.

Returns:

A tuple of lists containing the names of the mean and standard deviation files created

The ENS-MEAN and ENS-STD files are written to present working directory.

Examples

  1. Compute the statistics for the ts variable:

    >>cd.ens_stats(ens, 'ts')
    

experiment_list = ens.lister(‘experiment’) for exname in experiment_list:

files_to_mean = [] for model in ens.objects(‘model’):

experiment = model.getChild(exname) if experiment != None:

modfilesall = [] for realization in experiment.children:

realization modfilesall.append(realization.getChild(variable_name).children)
preprocessing_tools.my_operator(ensemble, my_cdo_str='', output_prefix='processed_', delete=False)[source]

Apply a customized cdo operation to all files in ens.

For each file in ens the command in my_cdo_str is applied and an output file appended by ‘output_prefix’ is created.

Optionally delete the original input files if delete=True.

Parameters:

ens : cmipdata Ensemble

The ensemble on which to do the processing.

my_cdo_str : str

The (chain) of cdo commands to apply. Defined variables which can be used in my_cdo_str are: model, experiment, realization, variable, infile, outfile

output_prefix : str

The string to prepend to the processed filenames.

delete : boolean

If delete=True, delete the original input files.

Returns:

ens : cmipdata Ensemble

An updated ensemble object, containing the names of the newly processed files.

The processed files are also written to present working directory.

Examples

  1. Do an annual mean:

    my_cdo_str = 'cdo -yearmean {infile} {outfile}'
    my_ens = cd.my_operator(ens, my_cdo_str, output_prefix='annual_')
    
  2. Do a date selection and time mean:

    my_cdo_str = 'cdo sub {infile} -timmean -seldate,1991-01-01,2000-12-31 {infile} {outfile}'
    my_ens = cd.my_operator(ens, my_cdo_str, output_prefix='test_')
    
preprocessing_tools.remap(ensemble, remap='r360x180', method='remapdis', delete=True, output_prefix='')[source]

Remap files to a specified resolution.

For each file in ens, remap to resolution remap=’r_nlon_x_nlat_‘, where _nlon_, _nlat_ are the number of lat-lon points to use. Removal of the original input files occurs if delete=True (default). An updated ensemble object is also returned.

By default the distance weighted remapping is used, but any valid cdo remapping method can be used by specifying the option argument ‘method’, e.g. method=’remapdis’.

Parameters:

ens : cmipdata Ensemble

The ensemble on which to do the remapping.

remap : str

The resolution to remap to, e.g. for a 1-degree grid remap=’r360x180’

delete : boolean

If delete=True, delete the original input files.

Returns:

ens : cmipdata Ensemble

An updated ensemble object, containing the names of the newly processed files.

The processed files are also written to present working directory.

preprocessing_tools.time_anomaly(ensemble, start_date, end_date, delete=False, output_prefix='')[source]

Compute the anomaly relative the period between start_date and end_date, for each file in ens.

The resulting output is written to file with the prefix ‘anomaly_‘, and the original input files are deleted if delete=True.

Parameters:

ens : cmipdata Ensemble

The ensemble on which to do the processing.

start_date : str

Start date for the base period with format: YYYY-MM-DD

end_date : str

End date for the base period with format: YYYY-MM-DD

delete : boolean

If delete=True, delete the original input files.

Returns:

ens : cmipdata Ensemble

An updated ensemble object, containing the names of the newly processed files.

The processed files are also written to present working directory.

Examples

  1. Compute the anomaly relative to the base period 1980 to 2010:

    ens = cd.time_anomaly(ens, start_date='1980-01-01', end_date='2010-12-31')
    
preprocessing_tools.time_slice(ensemble, start_date, end_date, delete=True, output_prefix='')[source]

Limit the data to the period between start_date and end_date, for each file in ens.

The resulting output is written to file, named with with the correct date range, and the original input files are deleted if delete=True.

Parameters:

ens : cmipdata Ensemble

The ensemble on which to do the processing.

start_date : str

Start date for the output file with format: YYYY-MM-DD

end_date : str

End date for the output file with format: YYYY-MM-DD

delete : boolean

If delete=True, delete the original input files.

Returns:

ens : cmipdata Ensemble

An updated ensemble object, containing the names of the newly processed files.

The processed files are also written to present working directory.

Examples

  1. Select data between 1 January 1980 and 31 December 2013:

    ens = cd.time_slice(ens, start_date='1979-01-01', end_date='2013-12-31')
    
preprocessing_tools.trends(ensemble, start_date, end_date, delete=False)[source]

Compute linear trends over the period between start_date and end_date, for each file in ens.

The resulting output is written to file, named with with the correct date range, and the original input files are deleted if delete=True.

Parameters:

ens : cmipdata Ensemble

The ensemble on which to do the processing.

start_date : str

Start date for the output file with format: YYYY-MM-DD

end_date : str

End date for the output file with format: YYYY-MM-DD

delete : boolean

If delete=True, delete the original input files.

Returns:

ens : cmipdata Ensemble

An updated ensemble object, containing the names of the newly processed files.

The processed files are also written to present working directory,

and begin with “slope_” and “intercept_”.

Examples

  1. Select data between 1 January 1980 and 31 December 2013:

    ens = cd.trends(ens, start_date='1979-01-01', end_date='2013-12-31')
    
preprocessing_tools.zonmean(ensemble, delete=True, output_prefix='')[source]

Calculate the zonal mean for each file in ens.

The output files are prepended with ‘zonal-mean’. The original the input files are removed if delete=True (default). An updated ensemble object is also returned.

Parameters:

ens : cmipdata Ensemble

The ensemble on which to do the processing.

delete : boolean

If delete=True, delete the original input files.

Returns:

ens : cmipdata Ensemble

An updated ensemble object, containing the names of the newly processed files.

The processed files are also written to present working directory.

Examples

  1. Compute the zonal mean for all files in ens:

    zonal_mean_ens = cd.zonmean(ens)
    

loading_tools

The loading_tools module of cmipdata is a set of functions which use the cdo python bindings and NetCDF4 to load data from input NetCDF files listed in a cmipdata ensemble object into python numpy arrays. Some processing can optionally be done during the loading, specifically remapping, time-slicing, time-averaging and zonal-averaging.

loading_tools.get_dimensions(ifile, varname, toDatetime=False)[source]

Returns the dimensions of variable varname in file ifile as a dictionary. If one of the dimensions begins with lat (Lat, Latitude and Latitudes), it will be returned with a key of lat, and similarly for lon. If toDatetime=True, the time dimension is converted to a datetime.

loading_tools.get_models(files)[source]
loading_tools.get_realizations(files)[source]
loading_tools.loadfiles(ens, varname, toDatetime=False, **kwargs)[source]

Load a variable “varname” from all files in ens, and load it into a matrix where the zeroth dimensions represents an input file and dimensions 1 to n are the dimensions of the input variable. Variable “varname” must have the same shape in all ifiles. Keyword argument toDatetime (defaults to False) will be passed as a keyword argument to get_dimensions(). Optionally specify any kwargs valid for loadvar.

Requires netCDF4, cdo bindings and numpy

Returns:

dictionary with keys data and dimensions

data maps to a numpy array containing the data dimensions has keys; models, realizations,

and possibly lat, lon, and time

loading_tools.loadvar(ifile, varname, cdostr=None, **kwargs)[source]

Load variables from a NetCDF file with optional pre-processing.

Load a CMIP5 netcdf variable “varname” from “ifile” and an optional cdo string for preprocessing the data from the netCDF files. Requires netCDF4, CDO and CDO python bindings. Returns a masked array, var.