The cmipdata API¶
This section describes the cmipdata Application Programming Interface
(API). It contains a list of classes and functions, within the three core
cmipdata
modules: classes
, preprocessing_tools
and
loading_tools
. The little developed plotting_tools
is also described.
classes¶
The classes module provides one classes and three functions. The class: DataNode
The core functionality of cmipdata
is to organize a large number of
model output files into a logical structure so that further processing
can be done. Data is organized into a tree-like structure using the class
DataNode as the nodes of a tree. The entire tree structure will be referred
to as an ensemble. At each level of the tree the level is specified by the
genre attribute.
Various methods exist to interact with the ensemble, and its constituent elements.
The mkensemble()
function is used to create Ensemble
objects, while match_ensembles()
finds models common to two ensembles and
match_reliazations()
matches realizations between two ensembles. Once
created, an ensemble can be used to harness the power of
the preprocessing_tools
to apply systematic operations to all files.
-
class
classes.
DataNode
(genre, name, parent=None, **kwargs)[source]¶ Bases:
object
Defines a cmipdata DataNode.
Attributes
genre (string) The attribute of DataNode name (string) The name of the particular genre children (list) List of DataNodees of genre beneath the current DataNode parent (DataNode) for genre ‘ensemble’ the parent is None start_date (string) for genre ‘file’ end_date (string) for genre ‘file’ realm (string) for genre ‘variable’ contains the realm of the varaible Methods
-
fulldetails
()[source]¶ prints information about the number of models, experiments, variables and files ina DataNode tree.
-
fulldetails_tofile
(fi)[source]¶ prints information about the number of models, experiments, variables and files ina DataNode tree.
-
getChild
(input_name)[source]¶ - Returns DataNode given the name of the DataNode
- if it is in children
Parameters: input_name : string Returns: DataNode : Returns None if the DataNode is not in children
-
getDictionary
()[source]¶ Returns a dictionary which has the genres and their names for all the ancestors of the DataNode
-
lister
(genre, unique=True)[source]¶ Returns a list of names of a particular genre
Parameters: genre : string
the genre of returned list
unique: boolean
if True removes duplicates from the list
Return
——
list of strings
-
mer
()[source]¶ - Returns a generator containing lists of length 3
- with the DataNode genre:’realization’
- the DataNode genre:’experiment’ string model-experiment-realization
Returns: generator
-
objects
(genre)[source]¶ Returns a generator for a DataNode of a particular genre
Parameters: genre : string
the genre of returned generator
-
parentobject
(genre)[source]¶ Returns the parent DataNode of a particular genre
Parameters: genre : string
the genre of returned DataNode
-
-
classes.
match_models
(ens1, ens2, delete=False)[source]¶ Find common models between two ensembles.
Parameters: ens1 : cmipdata ensemble
ens2 : cmipdata ensemble
the two cmipdata ensembles to compare.
Returns: ens1 : cmipdata ensemble
ens2 : cmipdata ensemble
two ensembles with matching models.
-
classes.
match_realizations
(ens1, ens2, delete=False)[source]¶ Find common realizations between two ensembles.
Parameters: ens1 : cmipdata ensemble
ens2 : cmipdata ensemble
the two cmipdata ensembles to compare.
Returns: ens1 : cmipdata ensemble
ens2 : cmipdata ensemble
two ensembles with matching realizations.
-
classes.
mkensemble
(filepattern, experiment='*', prefix='', kwargs='')[source]¶ Creates and returns a cmipdata ensemble from a list of filenames matching filepattern.
Optionally specifying prefix will remove prefix from each filename before the parsing is done. This is useful, for example, to remove pre-pended paths used in filepattern (see example 2).
Once the list of matching filenames is derived, the model, experiment, realization, variable, start_date and end_date fields are extracted by parsing the filnames against a specified file naming convention. By default this is the CMIP5 convention, which is:
variable_realm_model_experiment_realization_startdate-enddate.nc
If the default CMIP5 naming convention is not used by your files, an arbitary naming convention for the parsing may be specified by the dicionary kwargs (see example 3).
Parameters: filepattern : string
A string that by default is matched against all files in the current directory. But filepattern could include a full path to reference files not in the current directory, and can also include wildcards.
prefix : string
A pattern occuring in filepattern before the start of the official filename, as defined by the file naming converntion. For instance, a path preceeding the filename.
Examples
1. Create ensemble of all sea-level pressure files from the historical experiment in the current directory:
ens = mkensemble('psl*historical*.nc')
2. Create ensemble of all sea-level pressure files from all experiments in a non-local directory:
ens = mkensemble('/home/ncs/ra40/cmip5/sam/c5_slp/psl*' , prefix='/home/ncs/ra40/cmip5/sam/c5_slp/')
Create ensemble defining a custom file naming convention:
kwargs = {'separator':'_', 'variable':0, 'realm':1, 'model':2, 'experiment':3, 'realization':4, 'dates':5} ens = mkensemble('psl*.nc', **kwargs)
preprocessing_tools¶
The preprocessing_tools module of cmipdata is a set of functions which use os.system calls to Climate Data Operators (cdo) to systematically apply a given processing on multiple NetCDF files, which are listed in cmipdata ensemble objects.
-
preprocessing_tools.
areaint
(ensemble, delete=True, output_prefix='')[source]¶ Calculate the area weighted integral for each file in ens.
The output files are prepended with ‘area-integral’. The original the input files are removed if delete=True (default). An updated ensemble object is also returned.
Parameters: ens : cmipdata Ensemble
The ensemble on which to do the processing.
delete : boolean
If delete=True, delete the original input files.
Returns: ens : cmipdata Ensemble
An updated ensemble object, containing the names of the newly processed files.
The processed files are also written to present working directory.
Examples
Compute the area integral for all files in ens:
ens = cd.areaint(ens)
-
preprocessing_tools.
areamean
(ensemble, delete=True, output_prefix='')[source]¶ Calculate the area mean for each file in ens.
The output files are prepended with ‘area-mean’. The original the input files are removed if delete=True (default). An updated ensemble object is also returned.
Parameters: ens : cmipdata Ensemble
The ensemble on which to do the processing.
delete : boolean
If delete=True, delete the original input files.
Returns: ens : cmipdata Ensemble
An updated ensemble object, containing the names of the newly processed files.
The processed files are also written to present working directory.
Examples
Compute the area mean for all files in ens:
area_mean_ens = cd.areamean(ens)
-
preprocessing_tools.
cat_exp_slices
(ensemble, delete=True, output_prefix='')[source]¶ Concatenate multiple time-slice files per experiment.
For all models in ens which divide their output into multiple files per experiment (time-slices), cat_exp_slices concatenates the files into one unified file, and deletes the individual slices, unless delete=False. The input ensemble can contain multiple models, experiments, realizations and variables, which cat_exp_slices will process independently. In other words, files are joined per-model, per-experiment, per-realization, per-variable. For example, if the ensemble contains two experiments for many models/realizations for variable psl, two unified files will be produced per realization: one for the historical and one for the rcp45 experiment. To join files over experiments (e.g. to concatenate historical and rcp45) see cat_experiments.
Parameters: ens : cmipdata Ensemble
The ensemble on which to do the concatenation.
delete : boolean
If delete=True, delete the individual time-slice files.
Returns: ens : cmipdata Ensemble
An updated ensemble object, containing the names of the newly concatenated files.
The concatenated files are written to present working directory.
See also
cat_experiments
- Concatenate the files for two experiments.
Examples
For a simple ensemble comprized of only 1 model, 1 experiment and one realization.:
# Look at the ensemble structure before the concatenation ens.fulldetails() HadCM3: historical r1i1p1 ts ts_Amon_HadCM3_historical_r1i1p1_185912-188411.nc ts_Amon_HadCM3_historical_r1i1p1_188412-190911.nc ts_Amon_HadCM3_historical_r1i1p1_190912-193411.nc ts_Amon_HadCM3_historical_r1i1p1_193412-195911.nc ts_Amon_HadCM3_historical_r1i1p1_195912-198411.nc ts_Amon_HadCM3_historical_r1i1p1_198412-200512.nc # Do the concantenation ens = cd.cat_exp_slices(ens) # Look at the ensemble structure after the concatenation ens.fulldetails() HadCM3: historical r1i1p1 ts ts_Amon_HadCM3_historical_r1i1p1_185912-200512.nc
-
preprocessing_tools.
cat_experiments
(ensemble, variable_name, exp1_name, exp2_name, delete=True, output_prefix='')[source]¶ Concatenate the files for two experiments.
Experiments exp1 and exp2 are concatenated into a single file for each realization of each model listed in ens. For each realization, the concatenated file for variable variable_name is written to the current working directory and the input files are deleted by default, unless delete=False.
The concatenation occurs for each realization for which input files exist for both exp1 and exp2. If no match is found for the realization in exp1 (i.e. there is no corresponding realization in exp2), then the files for both experiments are deleted from the path (unless delete=False) and the realization is removed from ens. Similarly if exp2 is missing for a given model, that model is deleted from ens.
Parameters: ens : cmipdata Ensemble
The ensemble on which to do the concatenation.
variable_name : str
The name of the variable to be concatenated.
exp1_name : str
The name of the first experiment to be concatenated (e.g. ‘historical’).
exp2_name : str
The name of the second experiment to be concatenated (e.g. ‘rcp45’).
delete : boolean
If delete=True, delete the individual time-slice files.
Returns: ens : cmipdata Ensemble
An updated ensemble object, containing the names of the newly concatenated files.
The concatenated files are written to present working directory.
Examples
Join the historical and rcp45 simulations for variable ts in ens:
ens = cd.cat_experiments(ens, 'ts', exp1_name='historical', exp2_name='rcp45')
-
preprocessing_tools.
climatology
(ensemble, delete=True, output_prefix='')[source]¶ Compute the monthly climatology for each file in ens.
The climatology is calculated over the full file-length using cdo ymonmean, and the output files are prepended with ‘climatology_‘. The original the input files are removed if delete=True (default). An updated ensemble object is also returned.
If you want to compute the climatology over a specific time slice, use time_slice before compute the climatology.
Parameters: ens : cmipdata Ensemble
The ensemble on which to do the remapping.
delete : boolean
If delete=True, delete the original input files.
Returns: ens : cmipdata Ensemble
An updated ensemble object, containing the names of the newly processed files.
The processed files are also written to present working directory.
Examples
Compute the climatology:
climatology_ens = cd.climatology(ens)
-
preprocessing_tools.
ens_stats
(ens, variable_name, output_prefix='')[source]¶ Compute the ensemble mean and standard deviation.
The ensemble mean and standard deviation is computed over all models-realizations and experiments for variable variable_name in ens, such that each model has a weight of one. An output file is written containing the ensemble mean and another file is written with the standard deviation, containing the names ‘_ENS-MEAN_‘ and ‘_ENS-STD_‘ in the place of the model-name. If the ensemble contains multiple experiments, files are written for each experiment.
The ensemble in ens must be homogenous. That is to say all files must be on the same grid and span the same time-frame, within each experiment (see remap, and time_slice for more). Additionally, variable_name should have only one filename per realization and experiment. That is, join_exp_slice should have been applied.
The calculation is done by, first computing the mean over all realizations for each model; then for the ensemble, calculating the mean over all models. The standard deviation is calculated across models using the realization mean for each model.
Parameters: ens : cmipdata Ensemble
The ensemble on which to do the concatenation.
variable_name : str
The name of the variable to be concatenated.
Returns: A tuple of lists containing the names of the mean and standard deviation files created
The ENS-MEAN and ENS-STD files are written to present working directory.
Examples
Compute the statistics for the ts variable:
>>cd.ens_stats(ens, 'ts')
experiment_list = ens.lister(‘experiment’) for exname in experiment_list:
files_to_mean = [] for model in ens.objects(‘model’):
experiment = model.getChild(exname) if experiment != None:
modfilesall = [] for realization in experiment.children:
realization modfilesall.append(realization.getChild(variable_name).children)
-
preprocessing_tools.
my_operator
(ensemble, my_cdo_str='', output_prefix='processed_', delete=False)[source]¶ Apply a customized cdo operation to all files in ens.
For each file in ens the command in my_cdo_str is applied and an output file appended by ‘output_prefix’ is created.
Optionally delete the original input files if delete=True.
Parameters: ens : cmipdata Ensemble
The ensemble on which to do the processing.
my_cdo_str : str
The (chain) of cdo commands to apply. Defined variables which can be used in my_cdo_str are: model, experiment, realization, variable, infile, outfile
output_prefix : str
The string to prepend to the processed filenames.
delete : boolean
If delete=True, delete the original input files.
Returns: ens : cmipdata Ensemble
An updated ensemble object, containing the names of the newly processed files.
The processed files are also written to present working directory.
Examples
Do an annual mean:
my_cdo_str = 'cdo -yearmean {infile} {outfile}' my_ens = cd.my_operator(ens, my_cdo_str, output_prefix='annual_')
Do a date selection and time mean:
my_cdo_str = 'cdo sub {infile} -timmean -seldate,1991-01-01,2000-12-31 {infile} {outfile}' my_ens = cd.my_operator(ens, my_cdo_str, output_prefix='test_')
-
preprocessing_tools.
remap
(ensemble, remap='r360x180', method='remapdis', delete=True, output_prefix='')[source]¶ Remap files to a specified resolution.
For each file in ens, remap to resolution remap=’r_nlon_x_nlat_‘, where _nlon_, _nlat_ are the number of lat-lon points to use. Removal of the original input files occurs if delete=True (default). An updated ensemble object is also returned.
By default the distance weighted remapping is used, but any valid cdo remapping method can be used by specifying the option argument ‘method’, e.g. method=’remapdis’.
Parameters: ens : cmipdata Ensemble
The ensemble on which to do the remapping.
remap : str
The resolution to remap to, e.g. for a 1-degree grid remap=’r360x180’
delete : boolean
If delete=True, delete the original input files.
Returns: ens : cmipdata Ensemble
An updated ensemble object, containing the names of the newly processed files.
The processed files are also written to present working directory.
-
preprocessing_tools.
time_anomaly
(ensemble, start_date, end_date, delete=False, output_prefix='')[source]¶ Compute the anomaly relative the period between start_date and end_date, for each file in ens.
The resulting output is written to file with the prefix ‘anomaly_‘, and the original input files are deleted if delete=True.
Parameters: ens : cmipdata Ensemble
The ensemble on which to do the processing.
start_date : str
Start date for the base period with format: YYYY-MM-DD
end_date : str
End date for the base period with format: YYYY-MM-DD
delete : boolean
If delete=True, delete the original input files.
Returns: ens : cmipdata Ensemble
An updated ensemble object, containing the names of the newly processed files.
The processed files are also written to present working directory.
Examples
Compute the anomaly relative to the base period 1980 to 2010:
ens = cd.time_anomaly(ens, start_date='1980-01-01', end_date='2010-12-31')
-
preprocessing_tools.
time_slice
(ensemble, start_date, end_date, delete=True, output_prefix='')[source]¶ Limit the data to the period between start_date and end_date, for each file in ens.
The resulting output is written to file, named with with the correct date range, and the original input files are deleted if delete=True.
Parameters: ens : cmipdata Ensemble
The ensemble on which to do the processing.
start_date : str
Start date for the output file with format: YYYY-MM-DD
end_date : str
End date for the output file with format: YYYY-MM-DD
delete : boolean
If delete=True, delete the original input files.
Returns: ens : cmipdata Ensemble
An updated ensemble object, containing the names of the newly processed files.
The processed files are also written to present working directory.
Examples
Select data between 1 January 1980 and 31 December 2013:
ens = cd.time_slice(ens, start_date='1979-01-01', end_date='2013-12-31')
-
preprocessing_tools.
trends
(ensemble, start_date, end_date, delete=False)[source]¶ Compute linear trends over the period between start_date and end_date, for each file in ens.
The resulting output is written to file, named with with the correct date range, and the original input files are deleted if delete=True.
Parameters: ens : cmipdata Ensemble
The ensemble on which to do the processing.
start_date : str
Start date for the output file with format: YYYY-MM-DD
end_date : str
End date for the output file with format: YYYY-MM-DD
delete : boolean
If delete=True, delete the original input files.
Returns: ens : cmipdata Ensemble
An updated ensemble object, containing the names of the newly processed files.
The processed files are also written to present working directory,
and begin with “slope_” and “intercept_”.
Examples
Select data between 1 January 1980 and 31 December 2013:
ens = cd.trends(ens, start_date='1979-01-01', end_date='2013-12-31')
-
preprocessing_tools.
zonmean
(ensemble, delete=True, output_prefix='')[source]¶ Calculate the zonal mean for each file in ens.
The output files are prepended with ‘zonal-mean’. The original the input files are removed if delete=True (default). An updated ensemble object is also returned.
Parameters: ens : cmipdata Ensemble
The ensemble on which to do the processing.
delete : boolean
If delete=True, delete the original input files.
Returns: ens : cmipdata Ensemble
An updated ensemble object, containing the names of the newly processed files.
The processed files are also written to present working directory.
Examples
Compute the zonal mean for all files in ens:
zonal_mean_ens = cd.zonmean(ens)
loading_tools¶
The loading_tools module of cmipdata is a set of functions which use the cdo python bindings and NetCDF4 to load data from input NetCDF files listed in a cmipdata ensemble object into python numpy arrays. Some processing can optionally be done during the loading, specifically remapping, time-slicing, time-averaging and zonal-averaging.
-
loading_tools.
get_dimensions
(ifile, varname, toDatetime=False)[source]¶ Returns the dimensions of variable varname in file ifile as a dictionary. If one of the dimensions begins with lat (Lat, Latitude and Latitudes), it will be returned with a key of lat, and similarly for lon. If toDatetime=True, the time dimension is converted to a datetime.
-
loading_tools.
loadfiles
(ens, varname, toDatetime=False, **kwargs)[source]¶ Load a variable “varname” from all files in ens, and load it into a matrix where the zeroth dimensions represents an input file and dimensions 1 to n are the dimensions of the input variable. Variable “varname” must have the same shape in all ifiles. Keyword argument toDatetime (defaults to False) will be passed as a keyword argument to get_dimensions(). Optionally specify any kwargs valid for loadvar.
Requires netCDF4, cdo bindings and numpy
Returns: dictionary with keys data and dimensions
data maps to a numpy array containing the data dimensions has keys; models, realizations,
and possibly lat, lon, and time
-
loading_tools.
loadvar
(ifile, varname, cdostr=None, **kwargs)[source]¶ Load variables from a NetCDF file with optional pre-processing.
Load a CMIP5 netcdf variable “varname” from “ifile” and an optional cdo string for preprocessing the data from the netCDF files. Requires netCDF4, CDO and CDO python bindings. Returns a masked array, var.