Core API

Documentation of the core API of pyaerocom.

Logging

pyaerocom initializes logging automatically on import in the following way.

  1. info-messages or worse are logged to logs/pyaerocom.log.$PID or (dynamic feature) the file given in the environment variable PYAEROCOM_LOG_FILE - (dynamic feature) these log-files will be deleted after 7 days.

  2. warning-messages or worse are also printed on stdout. (dynamic feature) Output to stdout is disabled if the script is called non-interactive.

Putting a file with the name logging.ini in the scripts current working directory will use that configuration instead of above described default. An example logging.ini doing about the same as described above, except for the dynamic features, and enable debug logging on one package (pyaerocom.io.ungridded), is provided here:

[loggers]
keys=root,pyaerocom-ungridded

[handlers]
keys=console,file

[formatters]
keys=plain,detailed

[formatter_plain]
format=%(message)s

[formatter_detailed]
format=%(asctime)s:%(name)s:%(levelname)s:%(message)s
datefmt=%F %T

[handler_console]
class=StreamHandler
formatter=plain
args=(sys.stdout,)
level=WARN

[handler_file]
class=FileHandler
formatter=detailed
level=DEBUG
file_name=logs/pyaerocom.log.%(pid)s
args=('%(file_name)s', "w")


[logger_root]
handlers=file,console
level=INFO

[logger_pyaerocom-ungridded]
handlers=file
qualname=pyaerocom.io.readungriddedbase
level=DEBUG
propagate=0

Data classes

Gridded data

class pyaerocom.griddeddata.GriddedData(input=None, var_name=None, check_unit=True, convert_unit_on_init=True, **meta)[source]

pyaerocom object representing gridded data (e.g. model diagnostics)

Gridded data refers to data that can be represented on a regular, multidimensional grid. In pyaerocom this comprises both model output and diagnostics as well as gridded level 3 satellite data, typically with dimensions latitude, longitude, time (for surface or columnar data) and an additional dimension lev (or similar) for vertically resolved data.

Under the hood, this data object is based on (but not inherited from) the iris.cube.Cube object, and makes large use of the therein implemented functionality (many methods implemented here in GriddedData are simply wrappers for Cube methods.

Note

Note that the implemented functionality in this class is mostly limited to what is needed in the pyaerocom API (e.g. for pyaerocom.colocation routines or data import) and is not aimed at replacing or competing with similar data classes such as iris.cube.Cube or xarray.DataArray. Rather, dependent on the use case, one or another of such gridded data objects is needed for optimal processing, which is why GriddedData provides methods and / or attributes to convert to or from other such data classes (e.g. GriddedData.cube is an instance of iris.cube.Cube and method GriddedData.to_xarray() can be used to convert to xarray.DataArray). Thus, GriddedData can be considered rather high-level as compared to the other mentioned data classes from iris or xarray.

Note

Since GriddedData object is based on the iris.cube.Cube object it is optimised for netCDF files that follow the CF conventions and may not work out of the box for files that do not follow this standard.

Parameters:
  • input (str: or Cube) – data input. Can be a single .nc file or a preloaded iris Cube.

  • var_name (str, optional) – variable name that is extracted if input is a file path. Irrelevant if input is preloaded Cube

  • check_unit (bool) – if True, the assigned unit is checked and if it is an alias to another unit the unit string will be updated. It will print a warning if the unit is invalid or not equal the associated AeroCom unit for the input variable. Set convert_unit_on_init to True, if you want an automatic conversion to AeroCom units. Defaults to True.

  • convert_unit_on_init (bool) – if True and if unit check indicates non-conformity with AeroCom unit it will be converted automatically, and warning will be printed if that conversion fails. Defaults to True.

COORDS_ORDER_TSERIES = ['time', 'latitude', 'longitude']

Req. order of dimension coordinates for time-series computation

SUPPORTED_VERT_SCHEMES = ['mean', 'max', 'min', 'surface', 'altitude', 'profile']
property TS_TYPES

List with valid filename encryptions specifying temporal resolution

aerocom_filename(at_stations=False)[source]

Filename of data following Aerocom 3 conventions

Parameters:

at_stations (str) – if True, then AtStations string will be included in filename

Returns:

generated file name based on what is in this object

Return type:

str

aerocom_savename(data_id=None, var_name=None, vert_code=None, year=None, ts_type=None)[source]

Get filename for saving following AeroCom conventions

Parameters:
  • data_id (str, optional) – data ID used in output filename. Defaults to None, in which case data_id is used.

  • var_name (str, optional) – variable name used in output filename. Defaults to None, in which case var_name is used.

  • vert_code (str, optional) – vertical code used in output filename (e.g. Surface, Column, ModelLevel). Defaults to None, in which case assigned value in metadata is used.

  • year (str, optional) – year to be used in filename. If None, then it is attempted to be inferred from values in time dimension.

  • ts_type (str, optional) – frequency string to be used in filename. If None, then ts_type is used.

Raises:

ValueError – if vertical code is not provided and cannot be inferred or if year is not provided and data is not single year. Note that if year is provided, then no sanity checking is done against time dimension.

Returns:

output filename following AeroCom Phase 3 conventions.

Return type:

str

property altitude_access
apply_region_mask(region_id, thresh_coast=0.5, inplace=False)[source]

Apply a masked region filter

area_weighted_mean()[source]

Get area weighted mean

property area_weights

Area weights of lat / lon grid

property base_year

Base year of time dimension

Note

Changing this attribute will update the time-dimension.

calc_area_weights()[source]

Calculate area weights for grid

change_base_year(new_year, inplace=True)[source]

Changes base year of time dimension

Relevant, e.g. for climatological analyses.

Note

This method does not account for offsets arising from leap years ( affecting daily or higher resolution data). It is thus recommended to use this method with care. E.g. if you use this method on a 2016 daily data object, containing a calendar that supports leap years, you’ll end up with 366 time stamps also in the new data object.

Parameters:
  • new_year (int) – new base year (can also be other than integer if it is convertible)

  • inplace (bool) – if True, modify this object, else, use a copy

Returns:

modified data object

Return type:

GriddedData

check_altitude_access()[source]

Checks if altitude levels can be accessed

Returns:

True, if altitude access is provided, else False

Return type:

bool

check_dimcoords_tseries() None[source]

Check order of dimension coordinates for time series retrieval

For computation of time series at certain lon / lat coordinates, the data dimensions have to be in a certain order specified by COORDS_ORDER_TSERIES.

This method checks the current order (and dimensionality) of data and raises appropriate errors.

Raises:
check_frequency()[source]

Check if all datapoints are sampled at the same time frequency

check_lon_circular()[source]

Check if latitude and longitude coordinates are circular

check_unit(try_convert_if_wrong=False)[source]

Check if unit is correct

collapsed(coords, aggregator, **kwargs)[source]

Collapse cube

Reimplementation of method iris.cube.Cube.collapsed(), for details see here

Parameters:
  • coords (str or list) – string IDs of coordinate(s) that are to be collapsed (e.g. ["longitude", "latitude"])

  • aggregator (str or Aggregator or WeightedAggretor) – the aggregator used. If input is string, it is converted into the corresponding iris Aggregator object, see str_to_iris() for valid strings

  • **kwargs – additional keyword args (e.g. weights)

Returns:

collapsed data object

Return type:

GriddedData

property computed
property concatenated
convert_unit(new_unit, inplace=True)[source]

Convert unit of data to new unit

Parameters:
  • new_unit (str or cf_units.Unit) – new unit of data

  • inplace (bool) – convert in this instance or create a new one

property coord_names

List containing coordinate names

property coords_order

Array containing the order of coordinates

copy()[source]

Copy this data object

copy_coords(other, inplace=True)[source]

Copy all coordinates from other data object

Requires the underlying data to be the same shape.

Warning

This operation will delete all existing coordinates and auxiliary coordinates and will then copy the ones from the input data object. No checks of any kind will be performed

Parameters:
  • other (GriddedData or Cube) – other data object (needs to be same shape as this object)

  • inplace (bool) – if True, then this object will be modified and returned, else a copy.

Returns:

data object containing coordinates from other object

Return type:

GriddedData

crop(lon_range=None, lat_range=None, time_range=None, region=None)[source]

High level function that applies cropping along multiple axes

Note

1. For cropping of longitudes and latitudes, the method iris.cube.Cube.intersection() is used since it automatically accepts and understands longitude input based on definition 0 <= lon <= 360 as well as for -180 <= lon <= 180 2. Time extraction may be provided directly as index or in form of pandas.Timestamp objects.

Parameters:
  • lon_range (tuple, optional) – 2-element tuple containing longitude range for cropping. If None, the longitude axis remains unchanged. Example input to crop around meridian: lon_range=(-30, 30)

  • lat_range (tuple, optional) – 2-element tuple containing latitude range for cropping. If None, the latitude axis remains unchanged

  • time_range (tuple, optional) –

    2-element tuple containing time range for cropping. Allowed data types for specifying the times are

    1. a combination of 2 pandas.Timestamp instances or

    2. a combination of two strings that can be directly converted into pandas.Timestamp instances (e.g. time_range=(“2010-1-1”, “2012-1-1”)) or

    3. directly a combination of indices (int).

    If None, the time axis remains unchanged.

  • region (str or Region, optional) – string ID of pyaerocom default region or directly an instance of the Region object. May be used instead of lon_range and lat_range, if these are unspecified.

Returns:

new data object containing cropped grid

Return type:

GriddedData

property cube

Instance of underlying cube object

property data

Data array (n-dimensional numpy array)

Note

This is a pointer to the data object of the underlying iris.Cube instance and will load the data into memory. Thus, in case of large datasets, this may lead to a memory error

property data_id

ID of data object (e.g. model run ID, obsnetwork ID)

Note

This attribute was formerly named name which is alse the corresponding attribute name in metadata

property data_revision

Revision string from file Revision.txt in the main data directory

delete_all_coords(inplace=True)[source]

Deletes all coordinates (dimension + auxiliary) in this object

delete_aux_vars()[source]

Delete auxiliary variables and iris AuxFactories

property delta_t

Array containing timedelta values for each time stamp

property dimcoord_names

List containing coordinate names

estimate_value_range_from_data(extend_percent=5)[source]

Estimate lower and upper end of value range for these data

Parameters:

extend_percent (int) – percentage specifying to which extend min and max values are to be extended to estimate the value range. Defaults to 5.

Returns:

  • float – lower end of estimated value range

  • float – upper end of estimated value range

extract(constraint, inplace=False)[source]

Extract subset

Parameters:

constraint (iris.Constraint) – constraint that is to be applied

Returns:

new data object containing cropped data

Return type:

GriddedData

extract_surface_level()[source]

Extract surface level from 4D field

filter_altitude(alt_range=None)[source]

Currently dummy method that makes life easier in Filter

Returns:

current instance

Return type:

GriddedData

filter_region(region_id, inplace=False, **kwargs)[source]

Filter region based on ID

This works both for rectangular regions and mask regions

Parameters:
  • region_id (str) – name of region

  • inplace (bool) – if True, the current data object is modified, else a new object is returned

  • **kwargs – additional keyword args passed to apply_region_mask() if input region is a mask.

Returns:

filtered data object

Return type:

GriddedData

find_closest_index(**dimcoord_vals)[source]

Find the closest indices for dimension coordinate values

property from_files

List of file paths from which this data object was created

get_altitude(**coords)[source]

Extract (or try to compute) altitude values at input coordinates

get_area_weighted_timeseries(region=None)[source]

Helper method to extract area weighted mean timeseries

Parameters:

region – optional, name of AeroCom default region for which the mean is to be calculated (e.g. EUROPE)

Returns:

station data containing area weighted mean

Return type:

StationData

property grid

Underlying grid data object

property has_data

True if sum of shape of underlying Cube instance is > 0, else False

property has_latlon_dims

Boolean specifying whether data has latitude and longitude dimensions

property has_time_dim

Boolean specifying whether data has latitude and longitude dimensions

infer_ts_type()[source]

Try to infer sampling frequency from time dimension data

Returns:

ts_type that was inferred (is assigned to metadata too)

Return type:

str

Raises:

DataDimensionError – if data object does not contain a time dimension

interpolate(sample_points=None, scheme='nearest', collapse_scalar=True, **coords)[source]

Interpolate cube at certain discrete points

Reimplementation of method iris.cube.Cube.interpolate(), for details see here

Note

The input coordinates may also be provided using the input arg **coords which provides a more intuitive option (e.g. input (sample_points=[("longitude", [10, 20]), ("latitude", [1, 2])]) is the same as input (longitude=[10, 20], latitude=[1,2])

Parameters:
  • sample_points (list) – sequence of coordinate pairs over which to interpolate

  • scheme (str or iris interpolator object) – interpolation scheme, pyaerocom default is nearest. If input is string, it is converted into the corresponding iris Interpolator object, see str_to_iris() for valid strings

  • collapse_scalar (bool) – Whether to collapse the dimension of scalar sample points in the resulting cube. Default is True.

  • **coords – additional keyword args that may be used to provide the interpolation coordinates in an easier way than using the Cube argument sample_points. May also be a combination of both.

Returns:

new data object containing interpolated data

Return type:

GriddedData

Examples

>>> from pyaerocom import GriddedData
>>> data = GriddedData()
>>> data._init_testdata_default()
>>> itp = data.interpolate([("longitude", (10)),
...                         ("latitude" , (35))])
>>> print(itp.shape)
(365, 1, 1)
intersection(*args, **kwargs)[source]

Ectract subset using iris.cube.Cube.intersection()

See here for details related to method and input parameters.

Note

Only works if underlying grid data type is iris.cube.Cube

Parameters:
  • *args – non-keyword args

  • **kwargs – keyword args

Returns:

new data object containing cropped data

Return type:

GriddedData

property is_climatology
property is_masked

Flag specifying whether data is masked or not

Note

This method only works if the data is loaded.

isel(**kwargs)[source]
property lat_res
load_input(input, var_name=None, perform_fmt_checks=None)[source]

Import input as cube

Parameters:
  • input (str: or Cube) – data input. Can be a single .nc file or a preloaded iris Cube.

  • var_name (str, optional) – variable name that is extracted if input is a file path . Irrelevant if input is preloaded Cube

  • perform_fmt_checks (bool, optional) – perform formatting checks based on information in filenames. Only relevant if input is a file

property lon_res
property long_name

Long name of variable

max()[source]

Maximum value

Return type:

float

mean(areaweighted=True)[source]

Mean value of data array

Note

Corresponds to numerical mean of underlying N-dimensional numpy array. Does not consider area-weights or any other advanced averaging.

mean_at_coords(latitude=None, longitude=None, time_resample_kwargs=None, **kwargs)[source]

Compute mean value at all input locations

Parameters:
  • latitude (1D list or similar) – list of latitude coordinates of coordinate locations. If None, please provided coords in iris style as list of (lat, lon) tuples via coords (handled via arg kwargs)

  • longitude (1D list or similar) – list of longitude coordinates of coordinate locations. If None, please provided coords in iris style as list of (lat, lon) tuples via coords (handled via arg kwargs)

  • time_resample_kwargs (dict, optional) – time resampling arguments passed to StationData.resample_time()

  • **kwargs – additional keyword args passed to to_time_series()

Returns:

mean value at coordinates over all times available in this object

Return type:

float

property metadata
min()[source]

Minimum value

Return type:

float

property name

ID of model to which data belongs

nanmax()[source]

Maximum value excluding NaNs

Return type:

float

nanmin()[source]

Minimum value excluding NaNs

Return type:

float

property ndim

Number of dimensions

property plot_settings

Variable instance that contains plot settings

The settings can be specified in the variables.ini file based on the unique var_name, see e.g. here

If no default settings can be found for this variable, all parameters will be initiated with None, in which case the Aerocom plot method uses

quickplot_map(time_idx=0, xlim=(-180, 180), ylim=(-90, 90), add_mean=True, **kwargs)[source]

Make a quick plot onto a map

Parameters:
  • time_idx (int) – index in time to be plotted

  • xlim (tuple) – 2-element tuple specifying plotted longitude range

  • ylim (tuple) – 2-element tuple specifying plotted latitude range

  • add_mean (bool) – if True, the mean value over the region and period is inserted

  • **kwargs – additional keyword arguments passed to pyaerocom.quickplot.plot_map()

Returns:

matplotlib figure instance containing plot

Return type:

fig

property reader

Instance of reader class from which this object was created

Note

Currently only supports instances of ReadGridded.

register_var_glob(delete_existing=True)[source]
regrid(other=None, lat_res_deg=None, lon_res_deg=None, scheme='areaweighted', **kwargs)[source]

Regrid this grid to grid resolution of other grid

Parameters:
  • other (GriddedData or Cube, optional) – other data object to regrid to. If None, then input args lat_res and lon_res are used to regrid.

  • lat_res_deg (float or int, optional) – latitude resolution in degrees (is only used if input arg other is None)

  • lon_res_deg (float or int, optional) – longitude resolution in degrees (is only used if input arg other is None)

  • scheme (str) – regridding scheme (e.g. linear, neirest, areaweighted)

Returns:

regridded data object (new instance, this object remains unchanged)

Return type:

GriddedData

remove_outliers(low=None, high=None, inplace=True)[source]

Remove outliers from data

Parameters:
  • low (float) – lower end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. minimum attribute of available variables)

  • high (float) – upper end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. maximum attribute of available variables)

  • inplace (bool) – if True, this object is modified, else outliers are removed in a copy of this object

Returns:

modified data object

Return type:

GriddedData

reorder_dimensions_tseries() None[source]

Transpose dimensions of data such that to_time_series() works

Raises:
resample_time(to_ts_type, how=None, min_num_obs=None, use_iris=False)[source]

Resample time to input resolution

Parameters:
  • to_ts_type (str) – either of the supported temporal resolutions (cf. IRIS_AGGREGATORS in helpers, e.g. “monthly”)

  • how (str) – string specifying how the data is to be aggregated, default is mean

  • min_num_obs (dict or int, optional) –

    integer or nested dictionary specifying minimum number of observations required to resample from higher to lower frequency. For instance, if input_data is hourly and to_ts_type is monthly, you may specify something like:

    min_num_obs =
        {'monthly'  :   {'daily'  : 7},
         'daily'    :   {'hourly' : 6}}
    

    to require at least 6 hours per day and 7 days per month.

  • use_iris (bool) – option to use resampling scheme from iris library rather than xarray.

Returns:

new data object containing downscaled data

Return type:

GriddedData

Raises:

TemporalResolutionError – if input resolution is not provided, or if it is higher temporal resolution than this object

search_other(var_name)[source]

Searches data for another variable

The search is constrained to the time period spanned by this object and it is attempted to load the same frequency. Uses reader (instance of ReadGridded to search for the other variable data).

Parameters:

var_name (str) – variable to be searched

Raises:

VariableNotFoundError – if data for input variable cannot be found.

Returns:

input variable data

Return type:

GriddedData

sel(use_neirest=True, **dimcoord_vals)[source]

Select subset by dimension names

Note

This is a BETA version, please use with care

Parameters:

**dimcoord_vals – key / value pairs specifying coordinate values to be extracted

Returns:

subset data object

Return type:

GriddedData

property shape
short_str()[source]

Short string representation

split_years(years=None)[source]

Generator to split data object into individual years

Note

This is a generator method and thus should be looped over

Parameters:

years (list, optional) – List of years that should be excluded. If None, it uses output from years_avail().

Yields:

GriddedData – single year data object

property standard_name

Standard name of variable

property start

Start time of dataset as datetime64 object

std()[source]

Standard deviation of values

property stop

Start time of dataset as datetime64 object

property suppl_info
time_stamps()[source]

Convert time stamps into list of numpy datetime64 objects

The conversion is done using method cfunit_to_datetime64()

Returns:

list containing all time stamps as datetime64 objects

Return type:

list

to_netcdf(out_dir, savename=None, **kwargs)[source]

Save as NetCDF file

Parameters:
  • out_dir (str) – output direcory (must exist)

  • savename (str, optional) – name of file. If None, aerocom_savename() is used which is generated automatically and may be modified via **kwargs

  • **kwargs – keywords for name

Returns:

list of output files created

Return type:

list

to_time_series(sample_points=None, scheme='nearest', vert_scheme=None, add_meta=None, use_iris=False, **coords)[source]

Extract time-series for provided input coordinates (lon, lat)

Extract time series for each lon / lat coordinate in this cube or at predefined sample points (e.g. station data). If sample points are provided, the cube is interpolated first onto the sample points.

Parameters:
  • sample_points (list) – coordinates (e.g. lon / lat) at which time series is supposed to be retrieved

  • scheme (str or iris interpolator object) – interpolation scheme (for details, see interpolate())

  • vert_scheme (str) – string specifying how to treat vertical coordinates. This is only relevant for data that contains vertical levels. It will be ignored otherwise. Note that if the input coordinate specifications contain altitude information, this parameter will be set automatically to ‘altitude’. Allowed inputs are all data collapse schemes that are supported by pyaerocom.helpers.str_to_iris() (e.g. mean, median, sum). Further valid schemes are altitude, surface, profile. If not other specified and if altitude coordinates are provided via sample_points (or **coords parameters) then, vert_scheme will be set to altitude. Else, profile is used.

  • add_meta (dict, optional) – dictionary specifying additional metadata for individual input coordinates. Keys are meta attribute names (e.g. station_name) and corresponding values are lists (with length of input coords) or single entries that are supposed to be assigned to each station. E.g. add_meta=dict(station_name=[<list_of_station_names>])).

  • **coords – additional keyword args that may be used to provide the interpolation coordinates (for details, see interpolate())

Returns:

list of result dictionaries for each coordinate. Dictionary keys are: longitude, latitude, var_name

Return type:

list

to_xarray()[source]

Convert this object to an xarray.DataArray

Return type:

DataArray

transpose(new_order)[source]

Re-order data dimensions in object

Wrapper for iris.cube.Cube.transpose()

Note

Changes THIS object (i.e. no new instance of GriddedData will be created)

Parameters:

order (list) – new index order

property ts_type

Temporal resolution of data

property unit

Unit of data

property unit_ok

Boolean specifying if variable unit is AeroCom default

property units

Unit of data

update_meta(**kwargs)[source]

Update metadata dictionary

Parameters:

**kwargs – metadata to be added to metadata.

property var_info

Print information about variable

property var_name

Name of variable

property var_name_aerocom

AeroCom variable name

property vert_code

Vertical code of data (e.g. Column, Surface, ModelLevel)

years_avail()[source]

Generate list of years that are available in this dataset

Return type:

list

Ungridded data

class pyaerocom.ungriddeddata.UngriddedData(num_points=None, add_cols=None)[source]

Class representing point-cloud data (ungridded)

The data is organised in a 2-dimensional numpy array where the first index (rows) axis corresponds to individual measurements (i.e. one timestamp of one variable) and along the second dimension (containing 11 columns) the actual values are stored (in column 6) along with additional information, such as metadata index (can be used as key in metadata to access additional information related to this measurement), timestamp, latitude, longitude, altitude of instrument, variable index and, in case of 3D data (e.g. LIDAR profiles), also the altitude corresponding to the data value.

Note

That said, let’s look at two examples.

Example 1: Suppose you load 3 variables from 5 files, each of which contains 30 timestamps. This corresponds to a total of 3*5*30=450 data points and hence, the shape of the underlying numpy array will be 450x11.

Example 2: 3 variables, 5 files, 30 timestamps, but each variable is height resolved, containing 100 altitudes => 3*5*30*100=4500 data points, thus, the final shape will be 4500x11.

metadata

dictionary containing meta information about the data. Keys are floating point numbers corresponding to each station, values are corresponding dictionaries containing station information.

Type:

dict

meta_idx

dictionary containing index mapping for each station and variable. Keys correspond to metadata key (float -> station, see metadata) and values are dictionaries containing keys specifying variable name and corresponding values are arrays or lists, specifying indices (rows) of these station / variable information in _data. Note: this information is redunant and is there to accelarate station data extraction since the data index matches for a given metadata block do not need to be searched in the underlying numpy array.

Type:

dict

var_idx

mapping of variable name (keys, e.g. od550aer) to numerical variable index of this variable in data numpy array (in column specified by _VARINDEX)

Type:

dict

Parameters:
  • num_points (int, optional) – inital number of total datapoints (number of rows in 2D dataarray)

  • add_cols (list, optional) – list of additional index column names of 2D datarray.

ALLOWED_VERT_COORD_TYPES = ['altitude']
STANDARD_META_KEYS = ['filename', 'station_id', 'station_name', 'instrument_name', 'PI', 'country', 'country_code', 'ts_type', 'latitude', 'longitude', 'altitude', 'data_id', 'dataset_name', 'data_product', 'data_version', 'data_level', 'framework', 'instr_vert_loc', 'revision_date', 'website', 'ts_type_src', 'stat_merge_pref_attr']
add_chunk(size=None)[source]

Extend the size of the data array

Parameters:

size (int, optional) – number of additional rows. If None (default) or smaller than minimum chunksize specified in attribute _CHUNKSIZE, then the latter is used.

add_station_data(stat, meta_idx=None, data_idx=None, check_index=False)[source]
all_datapoints_var(var_name)[source]

Get array of all data values of input variable

Parameters:

var_name (str) – variable name

Returns:

1-d numpy array containing all values of this variable

Return type:

ndarray

Raises:

AttributeError – if variable name is not available

property altitude

Altitudes of stations

append(other)[source]

Append other instance of UngriddedData to this object

Note

Calls merge(other, new_obj=False)()

Parameters:

other (UngriddedData) – other data object

Returns:

merged data object

Return type:

UngriddedData

Raises:

ValueError – if input object is not an instance of UngriddedData

apply_filters(var_outlier_ranges=None, **filter_attributes)[source]

Extended filtering method

Combines filter_by_meta() and adds option to also remove outliers (keyword remove_outliers), set flagged data points to NaN (keyword set_flags_nan) and to extract individual variables (keyword var_name).

Parameters:
  • var_outlier_ranges (dict, optional) – dictionary specifying custom outlier ranges for individual variables.

  • **filter_attributes (dict) – filters that are supposed to be applied to the data. To remove outliers, use keyword remove_outliers, to set flagged values to NaN, use keyword set_flags_nan, to extract single or multiple variables, use keyword var_name. Further filter keys are assumed to be metadata specific and are passed to filter_by_meta().

Returns:

filtered data object

Return type:

UngriddedData

apply_region_mask(region_id=None)[source]

TODO : Write documentations

Parameters:

region_id (str or list (of strings)) – ID of region or IDs of multiple regions to be combined

property available_meta_keys

List of all available metadata keys

Note

This is a list of all metadata keys that exist in this dataset, but it does not mean that all of the keys are registered in all metadata blocks, especially if the data is merged from different sources with different metadata availability

change_var_idx(var_name, new_idx)[source]

Change index that is assigned to variable

Each variable in this object has assigned a unique index that is stored in the dictionary var_idx and which is used internally to access data from a certain variable from the data array _data (the indices are stored in the data column specified by _VARINDEX, cf. class header).

This index thus needs to be unique for each variable and hence, may need to be updated, when two instances of UngriddedData are merged (cf. merge()).

And the latter is exactrly what this function does.

Parameters:
  • var_name (str) – name of variable

  • new_idx (int) – new index of variable

Raises:

ValueError – if input new_idx already exist in this object as a variable index

check_convert_var_units(var_name, to_unit=None, inplace=True)[source]
check_set_country()[source]

CHecks all metadata entries for availability of country information

Metadata blocks that are missing country entry will be updated based on country inferred from corresponding lat / lon coordinate. Uses pyaerocom.geodesy.get_country_info_coords() (library reverse-geocode) to retrieve countries. This may be errouneous close to country borders as it uses eucledian distance based on a list of known locations.

Note

Metadata blocks that do not contain latitude and longitude entries are skipped.

Returns:

  • list – metadata entries where country was added

  • list – corresponding countries that were inferred from lat / lon

check_unit(var_name, unit=None)[source]

Check if variable unit corresponds to AeroCom unit

Parameters:
  • var_name (str) – variable name for which unit is to be checked

  • unit (str, optional) – unit to be checked, if None, AeroCom default unit is used

Raises:

MetaDataError – if unit information is not accessible for input variable name

clear_meta_no_data(inplace=True)[source]

Remove all metadata blocks that do not have data associated with it

Parameters:

inplace (bool) – if True, the changes are applied to this instance directly, else to a copy

Returns:

cleaned up data object

Return type:

UngriddedData

Raises:

DataCoverageError – if filtering results in empty data object

code_lat_lon_in_float()[source]

method to code lat and lon in a single number so that we can use np.unique to determine single locations

colocate_vardata(var1, data_id1=None, var2=None, data_id2=None, other=None, **kwargs)[source]
property contains_datasets

List of all datasets in this object

property contains_instruments

List of all instruments in this object

property contains_vars: list[str]

List of all variables in this dataset

copy()[source]

Make a copy of this object

Returns:

copy of this object

Return type:

UngriddedData

Raises:

MemoryError – if copy is too big to fit into memory together with existing instance

property countries_available

Alphabetically sorted list of country names available

decode_lat_lon_from_float()[source]

method to decode lat and lon from a single number calculated by code_lat_lon_in_float

empty_trash()[source]

Set all values in trash column to NaN

extract_dataset(data_id)[source]

Extract single dataset into new instance of UngriddedData

Calls filter_by_meta().

Parameters:

data_id (str) – ID of dataset

Returns:

new instance of ungridded data containing only data from specified input network

Return type:

UngriddedData

extract_var(var_name, check_index=True)[source]

Split this object into single-var UngriddedData objects

Parameters:
  • var_name (str) – name of variable that is supposed to be extracted

  • check_index (Bool) – Call _check_index() in the new data object.

Returns:

new data object containing only input variable data

Return type:

UngriddedData

extract_vars(var_names, check_index=True)[source]

Extract multiple variables from dataset

Loops over input variable names and calls extract_var() to retrieve single variable UngriddedData objects for each variable and then merges all of these into one object

Parameters:
  • var_names (list or str) – list of variables to be extracted

  • check_index (Bool) – Call _check_index() in the new data object.

Returns:

new data object containing input variables

Return type:

UngriddedData

Raises:

VarNotAvailableError – if one of the input variables is not available in this data object

filter_altitude(alt_range)[source]

Filter altitude range

Parameters:

alt_range (list or tuple) – 2-element list specifying altitude range to be filtered in m

Returns:

filtered data object

Return type:

UngriddedData

filter_by_meta(negate=None, **filter_attributes)[source]

Flexible method to filter these data based on input meta specs

Parameters:
  • negate (list or str, optional) – specified meta key(s) provided via filter_attributes that are supposed to be treated as ‘not valid’. E.g. if station_name=”bad_site” is input in filter_attributes and if station_name is listed in negate, then all metadata blocks containing “bad_site” as station_name will be excluded in output data object.

  • **filter_attributes – valid meta keywords that are supposed to be filtered and the corresponding filter values (or value ranges) Only valid meta keywords are considered (e.g. data_id, longitude, latitude, altitude, ts_type)

Returns:

filtered ungridded data object

Return type:

UngriddedData

Raises:
  • NotImplementedError – if attempt variables are supposed to be filtered (not yet possible)

  • IOError – if any of the input keys are not valid meta key

Example

>>> import pyaerocom as pya
>>> r = pya.io.ReadUngridded(['AeronetSunV2Lev2.daily',
                              'AeronetSunV3Lev2.daily'], 'od550aer')
>>> data = r.read()
>>> data_filtered = data.filter_by_meta(data_id='AeronetSunV2Lev2.daily',
...                                     longitude=[-30, 30],
...                                     latitude=[20, 70],
...                                     altitude=[0, 1000])
filter_region(region_id, check_mask=True, check_country_meta=False, **kwargs)[source]

Filter object by a certain region

Parameters:
  • region_id (str) – name of region (must be valid AeroCom region name or HTAP region)

  • check_mask (bool) – if True and region_id a valid name for a binary mask, then the filtering is done based on that binary mask.

  • check_country_meta (bool) – if True, then the input region_id is first checked against available country names in metadata. If that fails, it is assumed that this regions is either a valid name for registered rectangular regions or for available binary masks.

  • **kwargs – currently not used in method (makes usage in higher level classes such as Filter easier as other data objects have the same method with possibly other input possibilities)

Returns:

filtered data object (containing only stations that fall into input region)

Return type:

UngriddedData

find_common_data_points(other, var_name, sampling_freq='daily')[source]
find_common_stations(other: UngriddedData, check_vars_available=None, check_coordinates: bool = True, max_diff_coords_km: float = 0.1) dict[source]

Search common stations between two UngriddedData objects

This method loops over all stations that are stored within this object (using metadata) and checks if the corresponding station exists in a second instance of UngriddedData that is provided. The check is performed on basis of the station name, and optionally, if desired, for each station name match, the lon lat coordinates can be compared within a certain radius (defaul 0.1 km).

Note

This is a beta version and thus, to be treated with care.

Parameters:
  • other (UngriddedData) – other object of ungridded data

  • check_vars_available (list (or similar), optional) – list of variables that need to be available in stations of both datasets

  • check_coordinates (bool) – if True, check that lon and lat coordinates of station candidates match within a certain range, specified by input parameter max_diff_coords_km

Returns:

dictionary where keys are meta_indices of the common station in this object and corresponding values are meta indices of the station in the other object

Return type:

dict

find_station_meta_indices(station_name_or_pattern, allow_wildcards=True)[source]

Find indices of all metadata blocks matching input station name

You may also use wildcard pattern as input (e.g. Potenza)

Parameters:
  • station_pattern (str) – station name or wildcard pattern

  • allow_wildcards (bool) – if True, input station_pattern will be used as wildcard pattern and all matches are returned.

Returns:

list containing all metadata indices that match the input station name or pattern

Return type:

list

Raises:

StationNotFoundError – if no such station exists in this data object

property first_meta_idx
static from_cache(data_dir, file_name)[source]

Load pickled instance of UngriddedData

Parameters:
  • data_dir (str) – directory where pickled object is stored

  • file_name (str) – file name of pickled object (needs to end with pkl)

Raises:

ValueError – if loading failed

Returns:

loaded UngriddedData object. If this method is called from an instance of UngriddedData, this instance remains unchanged. You may merge the returned reloaded instance using merge().

Return type:

UngriddedData

static from_station_data(stats, add_meta_keys=None)[source]

Create UngriddedData from input station data object(s)

Parameters:
  • stats (list or StationData) – input data object(s)

  • add_meta_keys (list, optional) – list of metadata keys that are supposed to be imported from the input StationData objects, in addition to the default metadata retrieved via StationData.get_meta().

Raises:

ValueError – if any of the input data objects is not an instance of StationData.

Returns:

ungridded data object created from input station data objects

Return type:

UngriddedData

get_variable_data(variables, start=None, stop=None, ts_type=None, **kwargs)[source]

Extract all data points of a certain variable

Parameters:

vars_to_extract (str or list) – all variables that are supposed to be accessed

property has_flag_data

Boolean specifying whether this object contains flag data

property index
property is_empty

Boolean specifying whether this object contains data or not

property is_filtered

Boolean specifying whether this data object has been filtered

Note

Details about applied filtering can be found in filter_hist

property is_vertical_profile

Boolean specifying whether is vertical profile

last_filter_applied()[source]

Returns the last filter that was applied to this dataset

To see all filters, check out filter_hist

property last_meta_idx

Index of last metadata block

property latitude

Latitudes of stations

property longitude

Longitudes of stations

merge(other, new_obj=True)[source]

Merge another data object with this one

Parameters:
  • other (UngriddedData) – other data object

  • new_obj (bool) – if True, this object remains unchanged and the merged data objects are returned in a new instance of UngriddedData. If False, then this object is modified

Returns:

merged data object

Return type:

UngriddedData

Raises:

ValueError – if input object is not an instance of UngriddedData

merge_common_meta(ignore_keys=None)[source]

Merge all meta entries that are the same

Note

If there is an overlap in time between the data, the blocks are not merged

Parameters:

ignore_keys (list) – list containing meta keys that are supposed to be ignored

Returns:

merged data object

Return type:

UngriddedData

property nonunique_station_names

List of station names that occur more than once in metadata

num_obs_var_valid(var_name)[source]

Number of valid observations of variable in this dataset

Parameters:

var_name (str) – name of variable

Returns:

number of valid observations (all values that are not NaN)

Return type:

int

plot_station_coordinates(var_name=None, start=None, stop=None, ts_type=None, color='r', marker='o', markersize=8, fontsize_base=10, legend=True, add_title=True, **kwargs)[source]

Plot station coordinates on a map

All input parameters are optional and may be used to add constraints related to which stations are plotted. Default is all stations of all times.

Parameters:
  • var_name (str, optional) – name of variable to be retrieved

  • start – start time (optional)

  • stop – stop time (optional). If start time is provided and stop time not, then only the corresponding year inferred from start time will be considered

  • ts_type (str, optional) – temporal resolution

  • color (str) – color of stations on map

  • marker (str) – marker type of stations

  • markersize (int) – size of station markers

  • fontsize_base (int) – basic fontsize

  • legend (bool) – if True, legend is added

  • add_title (bool) – if True, title will be added

  • **kwargs – Addifional keyword args passed to pyaerocom.plot.plot_coordinates()

Returns:

matplotlib axes instance

Return type:

axes

plot_station_timeseries(station_name, var_name, start=None, stop=None, ts_type=None, insert_nans=True, ax=None, **kwargs)[source]

Plot time series of station and variable

Parameters:
  • station_name (str or int) – station name or index of station in metadata dict

  • var_name (str) – name of variable to be retrieved

  • start – start time (optional)

  • stop – stop time (optional). If start time is provided and stop time not, then only the corresponding year inferred from start time will be considered

  • ts_type (str, optional) – temporal resolution

  • **kwargs – Addifional keyword args passed to method pandas.Series.plot()

Returns:

matplotlib axes instance

Return type:

axes

remove_outliers(var_name, inplace=False, low=None, high=None, unit_ref=None, move_to_trash=True)[source]

Method that can be used to remove outliers from data

Parameters:
  • var_name (str) – variable name

  • inplace (bool) – if True, the outliers will be removed in this object, otherwise a new oject will be created and returned

  • low (float) – lower end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. minimum attribute of available variables)

  • high (float) – upper end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. maximum attribute of available variables)

  • unit_ref (str) – reference unit for assessment of input outlier ranges: all data needs to be in that unit, else an Exception will be raised

  • move_to_trash (bool) – if True, then all detected outliers will be moved to the trash column of this data object (i.e. column no. specified at UngriddedData._TRASHINDEX).

Returns:

ungridded data object that has all outliers for this variable removed.

Return type:

UngriddedData

Raises:

ValueError – if input move_to_trash is True and in case for some of the measurements there is already data in the trash.

save_as(file_name, save_dir)[source]

Save this object to disk

Note

So far, only storage as pickled object via CacheHandlerUngridded is supported, so input file_name must end with .pkl

Parameters:
  • file_name (str) – name of output file

  • save_dir (str) – name of output directory

Returns:

file path

Return type:

str

set_flags_nan(inplace=False)[source]

Set all flagged datapoints to NaN

Parameters:

inplace (bool) – if True, the flagged datapoints will be set to NaN in this object, otherwise a new oject will be created and returned

Returns:

data object that has all flagged data values set to NaN

Return type:

UngriddedData

Raises:

AttributeError – if no flags are assigned

property shape

Shape of data array

property station_coordinates

dictionary with station coordinates

Returns:

dictionary containing station coordinates (latitude, longitude, altitude -> values) for all stations (keys) where these parameters are accessible.

Return type:

dict

property station_name

Latitudes of data

property time

Time dimension of data

to_station_data(meta_idx, vars_to_convert=None, start=None, stop=None, freq=None, ts_type_preferred=None, merge_if_multi=True, merge_pref_attr=None, merge_sort_by_largest=True, insert_nans=False, allow_wildcards_station_name=True, add_meta_keys=None, resample_how=None, min_num_obs=None)[source]

Convert data from one station to StationData

Parameters:
  • meta_idx (float) – index of station or name of station.

  • vars_to_convert (list or str, optional) – variables that are supposed to be converted. If None, use all variables that are available for this station

  • start – start time, optional (if not None, input must be convertible into pandas.Timestamp)

  • stop – stop time, optional (if not None, input must be convertible into pandas.Timestamp)

  • freq (str) – pandas frequency string (e.g. ‘D’ for daily, ‘M’ for month end) or valid pyaerocom ts_type

  • merge_if_multi (bool) – if True and if data request results in multiple instances of StationData objects, then these are attempted to be merged into one StationData object using merge_station_data()

  • merge_pref_attr – only relevant for merging of multiple matches: preferred attribute that is used to sort the individual StationData objects by relevance. Needs to be available in each of the individual StationData objects. For details cf. pref_attr in docstring of merge_station_data(). Example could be revision_date. If None, then the stations will be sorted based on the number of available data points (if merge_sort_by_largest is True, which is default).

  • merge_sort_by_largest (bool) – only relevant for merging of multiple matches: cf. prev. attr. and docstring of merge_station_data() method.

  • insert_nans (bool) – if True, then the retrieved StationData objects are filled with NaNs

  • allow_wildcards_station_name (bool) – if True and if input meta_idx is a string (i.e. a station name or pattern), metadata matches will be identified applying wildcard matches between input meta_idx and all station names in this object.

Returns:

StationData object(s) containing results. list is only returned if input for meta_idx is station name and multiple matches are detected for that station (e.g. data from different instruments), else single instance of StationData. All variable time series are inserted as pandas Series

Return type:

StationData or list

to_station_data_all(vars_to_convert=None, start=None, stop=None, freq=None, ts_type_preferred=None, by_station_name=True, ignore_index=None, **kwargs)[source]

Convert all data to StationData objects

Creates one instance of StationData for each metadata block in this object.

Parameters:
  • vars_to_convert (list or str, optional) – variables that are supposed to be converted. If None, use all variables that are available for this station

  • start – start time, optional (if not None, input must be convertible into pandas.Timestamp)

  • stop – stop time, optional (if not None, input must be convertible into pandas.Timestamp)

  • freq (str) – pandas frequency string (e.g. ‘D’ for daily, ‘M’ for month end) or valid pyaerocom ts_type (e.g. ‘hourly’, ‘monthly’).

  • by_station_name (bool) – if True, then iter over unique_station_name (and merge multiple matches if applicable), else, iter over metadata index

  • **kwargs – additional keyword args passed to to_station_data() (e.g. merge_if_multi, merge_pref_attr, merge_sort_by_largest, insert_nans)

Returns:

4-element dictionary containing following key / value pairs:

  • stats: list of StationData objects

  • station_name: list of corresponding station names

  • latitude: list of latitude coordinates

  • longitude: list of longitude coordinates

Return type:

dict

property unique_station_names

List of unique station names

pyaerocom.ungriddeddata.reduce_array_closest(arr_nominal, arr_to_be_reduced)[source]

Co-located data

class pyaerocom.colocateddata.ColocatedData(data=None, **kwargs)[source]

Class representing colocated and unified data from two sources

Sources may be instances of UngriddedData or GriddedData that have been compared to each other.

Note

Currently, it is not foreseen, that this object is instantiated from scratch, but it is rather created in and returned by objects / methods that perform colocation.

The purpose of this object is thus, not the creation of colocated objects, but solely the analysis of such data as well as I/O features (e.g. save as / read from .nc files, convert to pandas.DataFrame, plot station time series overlays, scatter plots, etc.).

In the current design, such an object comprises 3 or 4 dimensions, where the first dimension (data_source, index 0) is ALWAYS length 2 and specifies the two datasets that were co-located (index 0 is obs, index 1 is model). The second dimension is time and in case of 3D colocated data the 3rd dimension is station_name while for 4D colocated data the 3rd and 4th dimension are latitude and longitude, respectively.

3D colocated data is typically created when a model is colocated with station based ground based observations ( cf pyaerocom.colocation.colocate_gridded_ungridded()) while 4D colocated data is created when a model is colocated with another model or satellite observations, that cover large parts of Earth’s surface (other than discrete lat/lon pairs in the case of ground based station locations).

Parameters:
  • data (xarray.DataArray or numpy.ndarray or str, optional) – Colocated data. If str, then it is attempted to be loaded from file. Else, it is assumed that data is numpy array and that all further supplementary inputs (e.g. coords, dims) for the instantiation of DataArray is provided via **kwargs.

  • **kwargs – Additional keyword args that are passed to init of DataArray in case input data is numpy array.

Raises:

IOError – if init fails

apply_country_filter(region_id, use_country_code=False, inplace=False)[source]

Apply country filter

Parameters:
  • region_id (str) – country name or code.

  • use_country_code (bool, optional) – If True, input value for country is evaluated against country codes rather than country names. Defaults to False.

  • inplace (bool, optional) – Apply filter to this object directly or to a copy. The default is False.

Raises:

NotImplementedError – if data is 4D (i.e. it has latitude and longitude dimensions).

Returns:

filtered data object.

Return type:

ColocatedData

apply_latlon_filter(lat_range=None, lon_range=None, region_id=None, inplace=False)[source]

Apply rectangular latitude/longitude filter

Parameters:
  • lat_range (list, optional) – latitude range that is supposed to be applied. If specified, then also lon_range need to be specified, else, region_id is checked against AeroCom default regions (and used if applicable)

  • lon_range (list, optional) – longitude range that is supposed to be applied. If specified, then also lat_range need to be specified, else, region_id is checked against AeroCom default regions (and used if applicable)

  • region_id (str) – name of region to be applied. If provided (i.e. not None) then input args lat_range and lon_range are ignored

  • inplace (bool, optional) – Apply filter to this object directly or to a copy. The default is False.

Raises:

ValueError – if lower latitude bound exceeds upper latitude bound.

Returns:

filtered data object

Return type:

ColocatedData

apply_region_mask(region_id, inplace=False)[source]

Apply a binary regions mask filter to data object. Available binary regions IDs can be found at pyaerocom.const.HTAP_REGIONS.

Parameters:
  • region_id (str) – ID of binary regions.

  • inplace (bool, optional) – If True, the current instance, is modified, else a new instance of ColocatedData is created and filtered. The default is False.

Raises:

DataCoverageError – if filtering results in empty data object.

Returns:

data – Filtered data object.

Return type:

ColocatedData

property area_weights

Wrapper for calc_area_weights()

calc_area_weights()[source]

Calculate area weights

Note

Only applies to colocated data that has latitude and longitude dimension.

Returns:

array containing weights for each datapoint (same shape as self.data[0])

Return type:

ndarray

calc_nmb_array()[source]

Calculate data array with normalised bias (NMB) values

Returns:

NMBs at each coordinate

Return type:

DataArray

calc_spatial_statistics(aggr=None, use_area_weights=False, **kwargs)[source]

Calculate spatial statistics from model and obs data

Spatial statistics is computed by averaging first the time dimension and then, if data is 4D, flattening lat / lon dimensions into new station_name dimension, so that the resulting dimensions are data_source and station_name. These 2D data are then used to calculate standard statistics using pyaerocom.mathutils.calc_statistics().

See also calc_statistics() and calc_temporal_statistics().

Parameters:
  • aggr (str, optional) – aggreagator to be used, currently only mean and median are supported. Defaults to mean.

  • use_area_weights (bool) – if True and if data is 4D (i.e. has lat and lon dimension), then area weights are applied when caluclating the statistics based on the coordinate cell sizes. Defaults to False.

  • **kwargs – additional keyword args passed to pyaerocom.mathutils.calc_statistics()

Returns:

dictionary containing statistical parameters

Return type:

dict

calc_statistics(use_area_weights=False, **kwargs)[source]

Calculate statistics from model and obs data

Calculate standard statistics for model assessment. This is done by taking all model and obs data points in this object as input for pyaerocom.mathutils.calc_statistics(). For instance, if the object is 3D with dimensions data_source (obs, model), time (e.g. 12 monthly values) and station_name (e.g. 4 sites), then the input arrays for model and obs into pyaerocom.mathutils.calc_statistics() will be each of size 12x4.

See also calc_temporal_statistics() and calc_spatial_statistics().

Parameters:
  • use_area_weights (bool) – if True and if data is 4D (i.e. has lat and lon dimension), then area weights are applied when caluclating the statistics based on the coordinate cell sizes. Defaults to False.

  • **kwargs – additional keyword args passed to pyaerocom.mathutils.calc_statistics()

Returns:

dictionary containing statistical parameters

Return type:

dict

calc_temporal_statistics(aggr=None, **kwargs)[source]

Calculate temporal statistics from model and obs data

Temporal statistics is computed by averaging first the spatial dimension(s) (that is, station_name for 3D data, and latitude and longitude for 4D data), so that only data_source and time remains as dimensions. These 2D data are then used to calculate standard statistics using pyaerocom.mathutils.calc_statistics().

See also calc_statistics() and calc_spatial_statistics().

Parameters:
Returns:

dictionary containing statistical parameters

Return type:

dict

check_set_countries(inplace=True, assign_to_dim=None)[source]

Checks if country information is available and assigns if not

If not country information is available, countries will be assigned for each lat / lon coordinate using pyaerocom.geodesy.get_country_info_coords().

Parameters:
  • inplace (bool, optional) – If True, modify and return this object, else a copy. The default is True.

  • assign_to_dim (str, optional) – name of dimension to which the country coordinate is assigned. Default is None, in which case station_name is used.

Raises:

DataDimensionError – If data is 4D (i.e. if latitude and longitude are othorgonal dimensions)

Returns:

data object with countries assigned

Return type:

ColocatedData

property coords

Coordinates of data array

copy()[source]

Copy this object

property countries_available

Alphabetically sorted list of country names available

Raises:

MetaDataError – if no country information is available

Returns:

list of countries available in these data

Return type:

list

property country_codes_available

Alphabetically sorted list of country codes available

Raises:

MetaDataError – if no country information is available

Returns:

list of countries available in these data

Return type:

list

property data

xarray.DataArray containing colocated data

Raises:

AttributeError – if data is not available

Returns:

array containing colocated data and metadata (in fact, there is no additional attributes to ColocatedData and everything is contained in data).

Return type:

xarray.DataArray

property data_source

Coordinate array containing data sources (z-axis)

property dims

Names of dimensions

filter_altitude(alt_range, inplace=False)[source]

Apply altitude filter

Parameters:
  • alt_range (list or tuple) – altitude range to be applied to data (2-element list)

  • inplace (bool, optional) – Apply filter to this object directly or to a copy. The default is False.

Raises:

NotImplementedError – If data is 4D, i.e. it contains latitude and longitude dimensions.

Returns:

Filtered data object .

Return type:

ColocatedData

filter_region(region_id, check_mask=True, check_country_meta=False, inplace=False)[source]

Filter object by region

Parameters:
  • region_id (str) – ID of region

  • inplace (bool) – if True, the filtering is done directly in this instance, else a new instance is returned

  • check_mask (bool) – if True and region_id a valid name for a binary mask, then the filtering is done based on that binary mask.

  • check_country_meta (bool) – if True, then the input region_id is first checked against available country names in metadata. If that fails, it is assumed that this regions is either a valid name for registered rectangular regions or for available binary masks.

Returns:

filtered data object

Return type:

ColocatedData

flatten_latlondim_station_name()[source]

Stack (flatten) lat / lon dimension into new dimension station_name

Returns:

new colocated data object with dimension station_name and lat lon arrays as additional coordinates

Return type:

ColocatedData

from_csv(file_path)[source]

Read data from CSV file

from_dataframe(df)[source]

Create colocated Data object from dataframe

Note

This is intended to be used as back-conversion from to_dataframe() and methods that use the latter (e.g. to_csv()).

get_coords_valid_obs()[source]

Get latitude / longitude coordinates where obsdata is available

Returns:

  • list – latitute coordinates

  • list – longitude coordinates

get_country_codes()[source]

Get country names and codes for all locations contained in these data

Raises:

MetaDataError – if no country information is available

Returns:

dictionary of unique country names (keys) and corresponding country codes (values)

Return type:

dict

static get_meta_from_filename(file_path)[source]

Get meta information from file name

Note

This does not yet include IDs of model and obs data as these should be included in the data anyways (e.g. column names in CSV file) and may include the delimiter _ in their name.

Returns:

dicitonary with meta information

Return type:

dict

get_meta_item(key: str)[source]

Get metadata value

Parameters:

key (str) – meta item key.

Raises:

AttributeError – If key is not available.

Returns:

value of metadata.

Return type:

object

get_regional_timeseries(region_id, **filter_kwargs)[source]

Compute regional timeseries both for model and obs

Parameters:
  • region_id (str) – name of region for which regional timeseries is supposed to be retrieved

  • **filter_kwargs – additional keyword args passed to filter_region().

Returns:

dictionary containing regional timeseries for model (key mod) and obsdata (key obs) and name of region.

Return type:

dict

get_time_resampling_settings()[source]

Returns a dictionary with relevant settings for temporal resampling

Return type:

dict

property has_latlon_dims

Boolean specifying whether data has latitude and longitude dimensions

property has_time_dim

Boolean specifying whether data has a time dimension

property lat_range

Latitude range covered by this data object

property latitude

Array of latitude coordinates

property lon_range

Longitude range covered by this data object

property longitude

Array of longitude coordinates

max()[source]

Wrapper for xarray.DataArray.max() called from data

Returns:

maximum of data

Return type:

xarray.DataArray

property meta

DEPRECATED -> use metadata

property metadata

Meta data dictionary (wrapper to data.attrs

min()[source]

Wrapper for xarray.DataArray.min() called from data

Returns:

minimum of data

Return type:

xarray.DataArray

property model_name
property ndim

Dimension of data array

property num_coords

Total number of lat/lon coordinate pairs

property num_coords_with_data

Number of lat/lon coordinate pairs that contain at least one datapoint

Note

Occurrence of valid data is only checked for obsdata (first index in data_source dimension).

property num_grid_points

DEPRECATED -> use num_coords

property obs_name
open(file_path)[source]

High level helper for reading from supported file sources

Parameters:

file_path (str) – file path

plot_coordinates(marker='x', markersize=12, fontsize_base=10, **kwargs)[source]

Plot station coordinates

Uses pyaerocom.plot.plotcoordinates.plot_coordinates().

Parameters:
  • marker (str, optional) – matplotlib marker name used to plot site locations. The default is ‘x’.

  • markersize (int, optional) – Size of site markers. The default is 12.

  • fontsize_base (int, optional) – Basic fontsize. The default is 10.

  • **kwargs – additional keyword args passed to pyaerocom.plot.plotcoordinates.plot_coordinates()

Return type:

matplotlib.axes.Axes

plot_scatter(**kwargs)[source]

Create scatter plot of data

Parameters:

**kwargs – keyword args passed to pyaerocom.plot.plotscatter.plot_scatter()

Returns:

matplotlib axes instance

Return type:

Axes

read_netcdf(file_path)[source]

Read data from NetCDF file

Parameters:

file_path (str) – file path

rename_variable(var_name, new_var_name, data_source, inplace=True)[source]

Rename a variable in this object

Parameters:
  • var_name (str) – current variable name

  • new_var_name (str) – new variable name

  • data_source (str) – name of data source (along data_source dimension)

  • inplace (bool) – replace here or create new instance

Returns:

instance with renamed variable

Return type:

ColocatedData

Raises:
resample_time(to_ts_type, how=None, min_num_obs=None, colocate_time=False, settings_from_meta=False, inplace=False, **kwargs)[source]

Resample time dimension

The temporal resampling is done using TimeResampler

Parameters:
  • to_ts_type (str) – desired output frequency.

  • how (str or dict, optional) – aggregator used for resampling (e.g. max, min, mean, median). Can also be hierarchical scheme via dict, similar to min_num_obs. The default is None.

  • min_num_obs (int or dict, optional) – Minimum number of observations required to resample from current frequency (ts_type) to desired output frequency.

  • colocate_time (bool, optional) – If True, the modeldata is invalidated where obs is NaN, before resampling. The default is False (updated in v0.11.0, before was True).

  • settings_from_meta (bool) – if True, then input args how, min_num_obs and colocate_time are ignored and instead the corresponding values set in metadata are used. Defaults to False.

  • inplace (bool, optional) – If True, modify this object directly, else make a copy and resample that one. The default is False (updated in v0.11.0, before was True).

  • **kwargs – Addtitional keyword args passed to TimeResampler.resample().

Returns:

Resampled colocated data object.

Return type:

ColocatedData

property savename_aerocom

Default save name for data object following AeroCom convention

set_zeros_nan(inplace=True)[source]

Replace all 0’s with NaN in data

Parameters:

inplace (bool) – Whether to modify this object or a copy. The default is True.

Returns:

cd – modified data object

Return type:

ColocatedData

property shape

Shape of data array

stack(inplace=False, **kwargs)[source]

Stack one or more dimensions

For details see xarray.DataArray.stack().

Parameters:
  • inplace (bool) – modify this object or a copy.

  • **kwargs – input arguments passed to DataArray.stack()

Returns:

stacked data object

Return type:

ColocatedData

property start

Start datetime of data

property start_str

Start date of data as str with format YYYYMMDD

Type:

str

property stop

Stop datetime of data

property stop_str

Stop date of data as str with format YYYYMMDD

Type:

str

property time

Array containing time stamps

to_csv(out_dir, savename=None)[source]

Save data object as .csv file

Converts data to pandas.DataFrame and then saves as csv

Parameters:
  • out_dir (str) – output directory

  • savename (str, optional) – name of file, if None, the default save name is used (cf. savename_aerocom)

to_dataframe()[source]

Convert this object into pandas.DataFrame

Note

This does not include meta information

to_netcdf(out_dir, savename=None, **kwargs)[source]

Save data object as NetCDF file

Wrapper for method xarray.DataArray.to_netdcf()

Parameters:
  • out_dir (str) – output directory

  • savename (str, optional) – name of file, if None, the default save name is used (cf. savename_aerocom)

  • **kwargs – additional, optional keyword arguments passed to xarray.DataArray.to_netdcf()

Returns:

file path of stored object.

Return type:

str

property ts_type

String specifying temporal resolution of data

property unit

DEPRECATED -> use units

property units

Unit of data

property unitstr

String representation of obs and model units in this object

unstack(inplace=False, **kwargs)[source]

Unstack one or more dimensions

For details see xarray.DataArray.unstack().

Parameters:
  • inplace (bool) – modify this object or a copy.

  • **kwargs – input arguments passed to DataArray.unstack()

Returns:

unstacked data object

Return type:

ColocatedData

property var_name

Coordinate array containing data sources (z-axis)

Station data

class pyaerocom.stationdata.StationData(**meta_info)[source]

Dict-like base class for single station data

ToDo: write more detailed introduction

Note

Variable data (e.g. numpy array or pandas Series) can be directly assigned to the object. When assigning variable data it is recommended to add variable metadata (e.g. unit, ts_type) in var_info, where key is variable name and value is dict with metadata entries.

dtime

list / array containing time index values

Type:

list

var_info

dictionary containing information about each variable

Type:

dict

data_err

dictionary that may be used to store uncertainty timeseries or data arrays associated with the different variable data.

Type:

dict

overlap

dictionary that may be filled to store overlapping timeseries data associated with one variable. This is, for instance, used in merge_vardata() to store overlapping data from another station.

Type:

dict

PROTECTED_KEYS = ['dtime', 'var_info', 'station_coords', 'data_err', 'overlap', 'numobs', 'data_flagged']

Keys that are ignored when accessing metadata

STANDARD_COORD_KEYS = ['latitude', 'longitude', 'altitude']

List of keys that specify standard metadata attribute names. This is used e.g. in get_meta()

STANDARD_META_KEYS = ['filename', 'station_id', 'station_name', 'instrument_name', 'PI', 'country', 'country_code', 'ts_type', 'latitude', 'longitude', 'altitude', 'data_id', 'dataset_name', 'data_product', 'data_version', 'data_level', 'framework', 'instr_vert_loc', 'revision_date', 'website', 'ts_type_src', 'stat_merge_pref_attr']
VALID_TS_TYPES = ['minutely', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'native']
calc_climatology(var_name, start=None, stop=None, min_num_obs=None, clim_mincount=None, clim_freq=None, set_year=None, resample_how=None)[source]

Calculate climatological timeseries for input variable

Parameters:
  • var_name (str) – name of data variable

  • start – start time of data used to compute climatology

  • stop – start time of data used to compute climatology

  • min_num_obs (dict or int, optional) – minimum number of observations required per period (when downsampling). For details see pyaerocom.time_resampler.TimeResampler.resample())

  • clim_micount (int, optional) – minimum number of of monthly values required per month of climatology

  • set_year (int, optional) – if specified, the output data will be assigned the input year. Else the middle year of the climatological interval is used.

  • resample_how (str) – how should the resampled data be averaged (e.g. mean, median)

  • **kwargs – Additional keyword args passed to pyaerocom.time_resampler.TimeResampler.resample()

Returns:

new instance of StationData containing climatological data

Return type:

StationData

check_dtime()[source]

Checks if dtime attribute is array or list

check_if_3d(var_name)[source]

Checks if altitude data is available in this object

check_unit(var_name, unit=None)[source]

Check if variable unit corresponds to a certain unit

Parameters:
  • var_name (str) – variable name for which unit is to be checked

  • unit (str, optional) – unit to be checked, if None, AeroCom default unit is used

Raises:
  • MetaDataError – if unit information is not accessible for input variable name

  • UnitConversionError – if current unit cannot be converted into specified unit (e.g. 1 vs m-1)

  • DataUnitError – if current unit is not equal to input unit but can be converted (e.g. 1/Mm vs 1/m)

check_var_unit_aerocom(var_name)[source]

Check if unit of input variable is AeroCom default, if not, convert

Parameters:

var_name (str) – name of variable

Raises:
  • MetaDataError – if unit information is not accessible for input variable name

  • UnitConversionError – if current unit cannot be converted into specified unit (e.g. 1 vs m-1)

  • DataUnitError – if current unit is not equal to AeroCom default and cannot be converted.

convert_unit(var_name, to_unit)[source]

Try to convert unit of data

Requires that unit of input variable is available in var_info

Parameters:
  • var_name (str) – name of variable

  • to_unit (str) – new unit

Raises:
copy()[source]
property default_vert_grid

AeroCom default grid for vertical regridding

For details, see DEFAULT_VERT_GRID_DEF in Config

Returns:

numpy array specifying default coordinates

Return type:

ndarray

dist_other(other)[source]

Distance to other station in km

Parameters:

other (StationData) – other data object

Returns:

distance between this and other station in km

Return type:

float

get_meta(force_single_value=True, quality_check=True, add_none_vals=False, add_meta_keys=None)[source]

Return meta-data as dictionary

By default, only default metadata keys are considered, use parameter add_meta_keys to add additional metadata.

Parameters:
  • force_single_value (bool) – if True, then each meta value that is list or array,is converted to a single value.

  • quality_check (bool) – if True, and coordinate values are lists or arrays, then the standarad deviation in the values is compared to the upper limits allowed in the local variation. The upper limits are specified in attr. COORD_MAX_VAR.

  • add_none_vals (bool) – Add metadata keys which have value set to None.

  • add_meta_keys (str or list, optional) – Add none-standard metadata.

Returns:

dictionary containing the retrieved meta-data

Return type:

dict

Raises:
  • AttributeError – if one of the meta entries is invalid

  • MetaDataError – in case of consistencies in meta data between individual time-stamps

get_station_coords(force_single_value=True)[source]

Return coordinates as dictionary

This method uses the standard coordinate names defined in STANDARD_COORD_KEYS (latitude, longitude and altitude) to get the station coordinates. For each of these parameters tt first looks in station_coords if the parameter is defined (i.e. it is not None) and if not it checks if this object has an attribute that has this name and uses that one.

Parameters:

force_single_value (bool) – if True and coordinate values are lists or arrays, then they are collapsed to single value using mean

Returns:

dictionary containing the retrieved coordinates

Return type:

dict

Raises:
  • AttributeError – if one of the coordinate values is invalid

  • CoordinateError – if local variation in either of the three spatial coordinates is found too large

get_unit(var_name)[source]

Get unit of variable data

Parameters:

var_name (str) – name of variable

Returns:

unit of variable

Return type:

str

Raises:

MetaDataError – if unit cannot be accessed for variable

get_var_ts_type(var_name, try_infer=True)[source]

Get ts_type for a certain variable

Note

Converts to ts_type string if assigned ts_type is in pandas format

Parameters:
  • var_name (str) – data variable name for which the ts_type is supposed to be retrieved

  • try_infer (bool) – if ts_type is not available, try inferring it from data

Returns:

the corresponding data time resolution

Return type:

str

Raises:

MetaDataError – if no metadata is available for this variable (e.g. if var_name cannot be found in var_info)

has_var(var_name)[source]

Checks if input variable is available in data object

Parameters:

var_name (str) – name of variable

Returns:

True, if variable data is available, else False

Return type:

bool

insert_nans_timeseries(var_name)[source]

Fill up missing values with NaNs in an existing time series

Note

This method does a resample of the data onto a regular grid. Thus, if the input ts_type is different from the actual current ts_type of the data, this method will not only insert NaNs but at the same.

Parameters:
  • var_name (str) – variable name

  • inplace (bool) – if True, the actual data in this object will be overwritten with the new data that contains NaNs

Returns:

the modified station data object

Return type:

StationData

merge_meta_same_station(other, coord_tol_km=None, check_coords=True, inplace=True, add_meta_keys=None, raise_on_error=False)[source]

Merge meta information from other object

Note

Coordinate attributes (latitude, longitude and altitude) are not copied as they are required to be the same in both stations. The latter can be checked and ensured using input argument check_coords

Parameters:
  • other (StationData) – other data object

  • coord_tol_km (float) – maximum distance in km between coordinates of input StationData object and self. Only relevant if check_coords is True. If None, then _COORD_MAX_VAR is used which is defined in the class header.

  • check_coords (bool) – if True, the coordinates are compared and checked if they are lying within a certain distance to each other (cf. coord_tol_km).

  • inplace (bool) – if True, the metadata from the other station is added to the metadata of this station, else, a new station is returned with the merged attributes.

  • add_meta_keys (str or list, optional) – additional non-standard metadata keys that are supposed to be considered for merging.

  • raise_on_error (bool) – if True, then an Exception will be raised in case one of the metadata items cannot be merged, which is most often due to unresolvable type differences of metadata values between the two objects

merge_other(other, var_name, add_meta_keys=None, **kwargs)[source]

Merge other station data object

Parameters:
  • other (StationData) – other data object

  • var_name (str) – variable name for which info is to be merged (needs to be both available in this object and the provided other object)

  • add_meta_keys (str or list, optional) – additional non-standard metadata keys that are supposed to be considered for merging.

  • kwargs – keyword args passed on to merge_vardata() (e.g time resampling settings)

Returns:

this object that has merged the other station

Return type:

StationData

merge_vardata(other, var_name, **kwargs)[source]

Merge variable data from other object into this object

Note

This merges also the information about this variable in the dict var_info. It is required, that variable meta-info is specified in both StationData objects.

Note

This method removes NaN’s from the existing time series in the data objects. In order to fill up the time-series with NaNs again after merging, call insert_nans_timeseries()

Parameters:
  • other (StationData) – other data object

  • var_name (str) – variable name for which info is to be merged (needs to be both available in this object and the provided other object)

  • kwargs – keyword args passed on to _merge_vardata_2d()

Returns:

this object merged with other object

Return type:

StationData

merge_varinfo(other, var_name)[source]

Merge variable specific meta information from other object

Parameters:
  • other (StationData) – other data object

  • var_name (str) – variable name for which info is to be merged (needs to be both available in this object and the provided other object)

plot_timeseries(var_name, add_overlaps=False, legend=True, tit=None, **kwargs)[source]

Plot timeseries for variable

Note

If you set input arg add_overlaps = True the overlapping timeseries data - if it exists - will be plotted on top of the actual timeseries using red colour and dashed line. As the overlapping data may be identical with the actual data, you might want to increase the line width of the actual timeseries using an additional input argument lw=4, or similar.

Parameters:
  • var_name (str) – name of variable (e.g. “od550aer”)

  • add_overlaps (bool) – if True and if overlapping data exists for this variable, it will be added to the plot.

  • tit (str, optional) – title of plot, if None, default title is used

  • **kwargs – additional keyword args passed to matplotlib plot method

Returns:

matplotlib.axes instance of plot

Return type:

axes

Raises:
  • KeyError – if variable key does not exist in this dictionary

  • ValueError – if length of data array does not equal the length of the time array

remove_outliers(var_name, low=None, high=None, check_unit=True)[source]

Remove outliers from one of the variable timeseries

Parameters:
  • var_name (str) – variable name

  • low (float) – lower end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. minimum attribute of available variables)

  • high (float) – upper end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. maximum attribute of available variables)

  • check_unit (bool) – if True, the unit of the data is checked against AeroCom default

remove_variable(var_name)[source]

Remove variable data

Parameters:

var_name (str) – name of variable that is to be removed

Returns:

current instance of this object, with data removed

Return type:

StationData

Raises:

VarNotAvailableError – if the input variable is not available in this object

resample_time(var_name, ts_type, how=None, min_num_obs=None, inplace=False, **kwargs)[source]

Resample one of the time-series in this object

Parameters:
  • var_name (str) – name of data variable

  • ts_type (str) – new frequency string (can be pyaerocom ts_type or valid pandas frequency string)

  • how (str) – how should the resampled data be averaged (e.g. mean, median)

  • min_num_obs (dict or int, optional) – minimum number of observations required per period (when downsampling). For details see pyaerocom.time_resampler.TimeResampler.resample())

  • inplace (bool) – if True, then the current data object stored in self, will be overwritten with the resampled time-series

  • **kwargs – Additional keyword args passed to pyaerocom.time_resampler.TimeResampler.resample()

Returns:

with resampled variable timeseries

Return type:

StationData

resample_timeseries(var_name, **kwargs)[source]

Wrapper for resample_time() (for backwards compatibility)

Note

For backwards compatibility, this method will return a pandas Series instead of the actual StationData object

same_coords(other, tol_km=None)[source]

Compare station coordinates of other station with this station

Parameters:
  • other (StationData) – other data object

  • tol_km (float) – distance tolerance in km

Returns:

if True, then the two object are located within the specified tolerance range

Return type:

bool

select_altitude(var_name, altitudes)[source]

Extract variable data within certain altitude range

Note

Beta version

Parameters:
  • var_name (str) – name of variable for which metadata is supposed to be extracted

  • altitudes (list) – altitude range in m, e.g. [0, 1000]

Returns:

data object within input altitude range

Return type:

pandas. Series or xarray.DataArray

to_timeseries(var_name, **kwargs)[source]

Get pandas.Series object for one of the data columns

Parameters:

var_name (str) – name of variable (e.g. “od550aer”)

Returns:

time series object

Return type:

Series

Raises:
  • KeyError – if variable key does not exist in this dictionary

  • ValueError – if length of data array does not equal the length of the time array

property units

Dictionary containing units of all variables in this object

property vars_available

Number of variables available in this data object

Other data classes

class pyaerocom.vertical_profile.VerticalProfile(data: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], altitude: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], dtime, var_name: str, data_err: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None, var_unit: str, altitude_unit: str)[source]

Object representing single variable profile data

property altitude

Array containing altitude values corresponding to data

property data

Array containing data values corresponding to data

property data_err

Array containing data values corresponding to data

plot(plot_errs=True, whole_alt_range=False, rot_xlabels=30, errs_shaded=True, errs_alpha=0.1, add_vertbar_zero=True, figsize=None, ax=None, **kwargs)[source]

Simple plot method for vertical profile

Co-location routines

High-level co-location engine

Classes and methods to perform high-level colocation.

class pyaerocom.colocation_auto.ColocationSetup(model_id=None, obs_config: PyaroConfig | None = None, obs_id=None, obs_vars=None, ts_type=None, start=None, stop=None, basedir_coldata=None, save_coldata=False, **kwargs)[source]

Setup class for high-level model / obs co-location.

An instance of this setup class can be used to run a colocation analysis between a model and an observation network and will create a number of pya.ColocatedData instances, which can be saved automatically as NetCDF files.

Apart from co-location, this class also handles reading of the input data for co-location. Supported co-location options are:

1. gridded vs. ungridded data For instance 3D model data (instance of GriddedData) with lat, lon and time dimension that is co-located with station based observations which are represented in pyaerocom through UngriddedData objects. The co-location function used is pyaerocom.colocation.colocated_gridded_ungridded(). For this type of co-location, the output co-located data object will be 3-dimensional, with dimensions data_source (index 0: obs, index 1: model), time and station_name.

2. gridded vs. gridded data For instance 3D model data that is co-located with 3D satellite data (both instances of GriddedData), both objects with lat, lon and time dimensions. The co-location function used is pyaerocom.colocation.colocated_gridded_gridded(). For this type of co-location, the output co-located data object will be 4-dimensional, with dimensions data_source (index 0: obs, index 1: model), time and latitude and longitude.

model_id

ID of model to be used.

Type:

str

obs_config

In the case Pyaro is used, a config must be provided. In that case obs_id(see below) is ignored and only the config is used.

Type:

PyaroConfig

obs_id

ID of observation network to be used.

Type:

str

obs_vars

Variables to be analysed (need to be available in input obs dataset). Variables that are not available in the model data output will be skipped. Alternatively, model variables to be used for a given obs variable can also be specified via attributes model_use_vars and model_add_vars.

Type:

list

ts_type

String specifying colocation output frequency.

Type:

str

start

Start time of colocation. Input can be integer denoting the year or anything that can be converted into pandas.Timestamp using pyaerocom.helpers.to_pandas_timestamp(). If None, than the first available date in the model data is used.

stop

stop time of colocation. int or anything that can be converted into pandas.Timestamp using pyaerocom.helpers.to_pandas_timestamp() or None. If None and if start is on resolution of year (e.g. start=2010) then stop will be automatically set to the end of that year. Else, it will be set to the last available timestamp in the model data.

filter_name

name of filter to be applied. If None, no filter is used (to be precise, if None, then

pyaerocom.const.DEFAULT_REG_FILTER is used which should default to ALL-wMOUNTAINS, that is, no filtering).

Type:

str

basedir_coldata

Base directory for storing of colocated data files.

Type:

str

save_coldata

if True, colocated data objects are saved as NetCDF file.

Type:

bool

obs_name

if provided, this string will be used in colocated data filename to specify obsnetwork, else obs_id will be used.

Type:

str, optional

obs_data_dir

location of obs data. If None, attempt to infer obs location based on obs ID.

Type:

str, optional

obs_use_climatology

BETA if True, pyaerocom default climatology is computed from observation stations (so far only possible for unrgidded / gridded colocation).

Type:

bool

obs_vert_type

AeroCom vertical code encoded in the model filenames (only AeroCom 3 and later). Specifies which model file should be read in case there are multiple options (e.g. surface level data can be read from a Surface.nc file as well as from a ModelLevel.nc file). If input is string (e.g. ‘Surface’), then the corresponding vertical type code is used for reading of all variables that are colocated (i.e. that are specified in obs_vars).

Type:

str

obs_ts_type_read

may be specified to explicitly define the reading frequency of the observation data (so far, this does only apply to gridded obsdata such as satellites), either as str (same for all obs variables) or variable specific as dict. For ungridded reading, the frequency may be specified via obs_id, where applicable (e.g. AeronetSunV3Lev2.daily). Not to be confused with ts_type, which specifies the frequency used for colocation. Can be specified variable specific in form of dictionary.

Type:

str or dict, optional

obs_filters

filters applied to the observational dataset before co-location. In case of gridded / gridded, these are filters that can be passed to pyaerocom.io.ReadGridded.read_var(), for instance, flex_ts_type, or constraints. In case the obsdata is ungridded (gridded / ungridded co-locations) these are filters that are handled through keyword filter_post in pyaerocom.io.ReadUngridded.read(). These filters are applied to the UngriddedData objects after reading and caching the data, so changing them, will not invalidate the latest cache of the UngriddedData.

Type:

dict

read_opts_ungridded

dictionary that specifies reading constraints for ungridded reading, and are passed as **kwargs to pyaerocom.io.ReadUngridded.read(). Note that - other than for obs_filters these filters are applied during the reading of the UngriddedData objects and specifying them will deactivate caching.

Type:

dict, optional

model_name

if provided, this string will be used in colocated data filename to specify model, else obs_id will be used.

Type:

str, optional

model_data_dir

Location of model data. If None, attempt to infer model location based on model ID.

Type:

str, optional

model_read_opts

options for model reading (passed as keyword args to pyaerocom.io.ReadUngridded.read()).

Type:

dict, optional

model_use_vars

dictionary that specifies mapping of model variables. Keys are observation variables, values are the corresponding model variables (e.g. model_use_vars=dict(od550aer=’od550csaer’)). Example: your observation has var od550aer but your model model uses a different variable name for that variable, say od550. Then, you can specify this via model_use_vars = {‘od550aer’ : ‘od550’}. NOTE: in this case, a model variable od550aer will be ignored, even if it exists (cf model_add_vars).

Type:

dict, optional

model_rename_vars

rename certain model variables after co-location, before storing the associated ColocatedData object on disk. Keys are model variables, values are new names (e.g. model_rename_vars={‘od550aer’:’MyAOD’}). Note: this does not impact which variables are read from the model.

Type:

dict, optional

model_add_vars

additional model variables to be processed for one obs variable. E.g. model_add_vars={‘od550aer’: [‘od550so4’, ‘od550gt1aer’]} would co-locate both model SO4 AOD (od550so4) and model coarse mode AOD (od550gt1aer) with total AOD (od550aer) from obs (in addition to od550aer vs od550aer if applicable).

Type:

dict, optional

model_to_stp

ALPHA (please do not use): convert model data values to STP conditions after co-location. Note: this only works for very particular settings at the moment and needs revision, as it relies on access to meteorological data.

Type:

bool

model_ts_type_read

may be specified to explicitly define the reading frequency of the model data, either as str (same for all obs variables) or variable specific as dict. Not to be confused with ts_type, which specifies the output frequency of the co-located data.

Type:

str or dict, optional

model_read_aux

may be used to specify additional computation methods of variables from models. Keys are variables to be computed, values are dictionaries with keys vars_required (list of required variables for computation of var and fun (method that takes list of read data objects and computes and returns var).

Type:

dict, optional

model_use_climatology

if True, attempt to use climatological model data field. Note: this only works if model data is in AeroCom conventions (climatological fields are indicated with 9999 as year in the filename) and if this is active, only single year analysis are supported (i.e. provide int to start to specify the year and leave stop empty).

Type:

bool

gridded_reader_id

BETA: dictionary specifying which gridded reader is supposed to be used for model (and gridded obs) reading. Note: this is a workaround solution and will likely be removed in the future when the gridded reading API is more harmonised (see https://github.com/metno/pyaerocom/issues/174).

Type:

dict

flex_ts_type

Bboolean specifying whether reading frequency of gridded data is allowed to be flexible. This includes all gridded data, whether it is model or gridded observation (e.g. satellites). Defaults to True.

Type:

bool

min_num_obs

time resampling constraints applied, defaults to None, in which case no constraints are applied. For instance, say your input is in daily resolution and you want output in monthly and you want to make sure to have roughly 50% daily coverage for the monthly averages. Then you may specify min_num_obs=15 which will ensure that at least 15 daily averages are available to compute a monthly average. However, you may also define a hierarchical scheme that first goes from daily to weekly and then from weekly to monthly, via a dict. E.g. min_num_obs=dict(monthly=dict(weekly=4), weekly=dict(daily=3)) would ensure that each week has at least 3 daily values, as well as that each month has at least 4 weekly values.

Type:

dict or int, optional

resample_how

string specifying how data should be aggregated when resampling in time. Default is “mean”. Can also be a nested dictionary, e.g. resample_how={‘conco3’: ‘daily’: {‘hourly’ : ‘max’}} would use the maximum value to aggregate from hourly to daily for variable conco3, rather than the mean.

Type:

str or dict, optional

obs_remove_outliers

if True, outliers are removed from obs data before colocation, else not. Default is False. Custom outlier ranges for each variable can be specified via obs_outlier_ranges, and for all other variables, the pyaerocom default outlier ranges are used. The latter are specified in variables.ini file via minimum and maximum attributes and can also be accessed through pyaerocom.variable.Variable.minimum and pyaerocom.variable.Variable.maximum, respectively.

Type:

bool

model_remove_outliers

if True, outliers are removed from model data (normally this should be set to False, as the models are supposed to be assessed, including outlier cases). Default is False. Custom outlier ranges for each variable can be specified via model_outlier_ranges, and for all other variables, the pyaerocom default outlier ranges are used. The latter are specified in variables.ini file via minimum and maximum attributes and can also be accessed through pyaerocom.variable.Variable.minimum and pyaerocom.variable.Variable.maximum, respectively.

Type:

bool

obs_outlier_ranges

dictionary specifying outlier ranges for individual obs variables. (e.g. dict(od550aer = [-0.05, 10], ang4487aer=[0,4])). Only relevant if obs_remove_outliers is True.

Type:

dict, optional

model_outlier_ranges

like obs_outlier_ranges but for model variables. Only relevant if model_remove_outliers is True.

Type:

dict, optional

zeros_to_nan

If True, zero’s in output co-located data object will be converted to NaN. Default is False.

Type:

bool

harmonise_units

if True, units are attempted to be harmonised during co-location (note: raises Exception if True and in case units cannot be harmonised).

Type:

bool

regrid_res_deg

resolution in degrees for regridding of model grid (done before co-location). Default is None.

Type:

int, optional

colocate_time

if True and if obs and model sampling frequency (e.g. daily) are higher than output colocation frequency (e.g. monthly), then the datasets are first colocated in time (e.g. on a daily basis), before the monthly averages are calculated. Default is False.

Type:

bool

reanalyse_existing

if True, always redo co-location, even if there is already an existing co-located NetCDF file (under the output location specified by basedir_coldata ) for the given variable combination to be co-located. If False and output already exists, then co-location is skipped for the associated variable. Default is True.

Type:

bool

raise_exceptions

if True, Exceptions that may occur for individual variables to be processed, are raised, else the analysis is skipped for such cases.

Type:

bool

keep_data

if True, then all colocated data objects computed when running run() will be stored in data. Defaults to True.

Type:

bool

add_meta

additional metadata that is supposed to be added to each output ColocatedData object.

Type:

dict

CRASH_ON_INVALID = False

do not raise Exception if invalid item is attempted to be assigned (Overwritten from base class)

OBS_VERT_TYPES_ALT = {'2D': '2D', 'Surface': 'ModelLevel'}

Dictionary specifying alternative vertical types that may be used to read model data. E.g. consider the variable is ec550aer, obs_vert_type=’Surface’ and obs_vert_type_alt=dict(Surface=’ModelLevel’). Now, if a model that is used for the analysis does not contain a data file for ec550aer at the surface (’ec550aer*Surface.nc’), then, the colocation routine will look for ‘ec550aer*ModelLevel.nc’ and if this exists, it will load it and extract the surface level.

add_glob_meta(**kwargs)[source]

Add global metadata to add_meta

Parameters:

kwargs – metadata to be added

Return type:

None

property basedir_logfiles

Base directory for storing logfiles

class pyaerocom.colocation_auto.Colocator(**kwargs)[source]

High level class for running co-location

Note

This object inherits from ColocationSetup and is also instantiated as such. For setup attributes, please see base class.

get_model_name()[source]

Get name of model

Note

Not to be confused with model_id which is always the database ID of the model, while model_name can differ from that and is used for output files, etc.

Raises:

AttributeError – If neither model_id or model_name are set

Returns:

preferably model_name, else model_id

Return type:

str

get_nc_files_in_coldatadir()[source]

Get list of NetCDF files in colocated data directory

Returns:

list of NetCDF file paths found

Return type:

list

get_obs_name()[source]

Get name of obsdata source

Note

Not to be confused with obs_id which is always the database ID of the observation dataset, while obs_name can differ from that and is used for output files, etc.

Raises:

AttributeError – If neither obs_id or obs_name are set

Returns:

preferably obs_name, else obs_id

Return type:

str

property model_reader

Model data reader

property model_vars

List of all model variables specified in config

Note

This method does not check if the variables are valid or available.

Returns:

list of all model variables specified in this setup.

Return type:

list

property obs_is_ungridded

True if obs_id refers to an ungridded observation, else False

Type:

bool

property obs_is_vertical_profile

True if obs_id refers to a VerticalProfile, else False

Type:

bool

property obs_reader

Observation data reader

property output_dir

Output directory for colocated data NetCDF files

Type:

str

prepare_run(var_list: list | None = None) dict[source]

Prepare colocation run for current setup.

Parameters:

var_name (str, optional) – Variable name that is supposed to be analysed. The default is None, in which case all defined variables are attempted to be colocated.

Raises:

AttributeError – If no observation variables are defined (obs_vars empty).

Returns:

vars_to_process – Mapping of variables to be processed, keys are model vars, values are obs vars.

Return type:

dict

run(var_list: list | None = None, **opts)[source]

Perform colocation for current setup

See also prepare_run().

Parameters:
  • var_list (list, optional) – list of variables supposed to be analysed. The default is None, in which case all defined variables are attempted to be colocated.

  • **opts – keyword args that may be specified to change the current setup before colocation

Returns:

nested dictionary, where keys are model variables, values are dictionaries comprising key / value pairs of obs variables and associated instances of ColocatedData.

Return type:

dict

Low-level co-location functions

Methods and / or classes to perform colocation

pyaerocom.colocation.check_time_ival(data, start, stop)[source]
pyaerocom.colocation.check_ts_type(data, ts_type)[source]
pyaerocom.colocation.colocate_gridded_gridded(data, data_ref, ts_type=None, start=None, stop=None, filter_name=None, regrid_res_deg=None, harmonise_units=True, regrid_scheme='areaweighted', update_baseyear_gridded=None, min_num_obs=None, colocate_time=False, resample_how=None, **kwargs)[source]

Colocate 2 gridded data objects

Parameters:
  • data (GriddedData) – gridded data (e.g. model results)

  • data_ref (GriddedData) – reference data (e.g. gridded satellite object) that is co-located with data. observation data or other model)

  • ts_type (str, optional) – desired temporal resolution of output colocated data (e.g. “monthly”). Defaults to None, in which case the highest possible resolution is used.

  • start (str or datetime64 or similar, optional) – start time for colocation, if None, the start time of the input GriddedData object is used

  • stop (str or datetime64 or similar, optional) – stop time for colocation, if None, the stop time of the input GriddedData object is used

  • filter_name (str, optional) – string specifying filter used (cf. pyaerocom.filter.Filter for details). If None, then it is set to ‘ALL-wMOUNTAINS’, which corresponds to no filtering (world with mountains). Use ALL-noMOUNTAINS to exclude mountain sites.

  • regrid_res_deg (int or dict, optional) – regrid resolution in degrees. If specified, the input gridded data objects will be regridded in lon / lat dimension to the input resolution (if input is integer, both lat and lon are regridded to that resolution, if input is dict, use keys lat_res_deg and lon_res_deg to specify regrid resolutions, respectively).

  • harmonise_units (bool) – if True, units are attempted to be harmonised (note: raises Exception if True and units cannot be harmonised). Defaults to True.

  • regrid_scheme (str) – iris scheme used for regridding (defaults to area weighted regridding)

  • update_baseyear_gridded (int, optional) – optional input that can be set in order to redefine the time dimension in the first gridded data object `data`to be analysed. E.g., if the data object is a climatology (one year of data) that has set the base year of the time dimension to a value other than the specified input start / stop time this may be used to update the time in order to make co-location possible.

  • min_num_obs (int or dict, optional) – minimum number of observations for resampling of time

  • colocate_time (bool) – if True and if original time resolution of data is higher than desired time resolution (ts_type), then both datasets are colocated in time before resampling to lower resolution.

  • resample_how (str or dict) – string specifying how data should be aggregated when resampling in time. Default is “mean”. Can also be a nested dictionary, e.g. resample_how={‘daily’: {‘hourly’ : ‘max’}} would use the maximum value to aggregate from hourly to daily, rather than the mean.

  • **kwargs – additional keyword args (not used here, but included such that factory class can handle different methods with different inputs)

Returns:

instance of colocated data

Return type:

ColocatedData

pyaerocom.colocation.colocate_gridded_ungridded(data, data_ref, ts_type=None, start=None, stop=None, filter_name=None, regrid_res_deg=None, harmonise_units=True, regrid_scheme='areaweighted', var_ref=None, update_baseyear_gridded=None, min_num_obs=None, colocate_time=False, use_climatology_ref=False, resample_how=None, **kwargs)[source]

Colocate gridded with ungridded data (low level method)

For high-level colocation see pyaerocom.colocation_auto.Colocator and pyaerocom.colocation_auto.ColocationSetup

Note

Uses the variable that is contained in input GriddedData object (since these objects only contain a single variable). If this variable is not contained in observation data (or contained but using a different variable name) you may specify the obs variable to be used via input arg var_ref

Parameters:
  • data (GriddedData) – gridded data object (e.g. model results).

  • data_ref (UngriddedData) – ungridded data object (e.g. observations).

  • ts_type (str) – desired temporal resolution of colocated data (must be valid AeroCom ts_type str such as daily, monthly, yearly.).

  • start (str or datetime64 or similar, optional) – start time for colocation, if None, the start time of the input GriddedData object is used.

  • stop (str or datetime64 or similar, optional) – stop time for colocation, if None, the stop time of the input GriddedData object is used

  • filter_name (str) – string specifying filter used (cf. pyaerocom.filter.Filter for details). If None, then it is set to ‘ALL-wMOUNTAINS’, which corresponds to no filtering (world with mountains). Use ALL-noMOUNTAINS to exclude mountain sites.

  • regrid_res_deg (int or dict, optional) – regrid resolution in degrees. If specified, the input gridded data object will be regridded in lon / lat dimension to the input resolution (if input is integer, both lat and lon are regridded to that resolution, if input is dict, use keys lat_res_deg and lon_res_deg to specify regrid resolutions, respectively).

  • harmonise_units (bool) – if True, units are attempted to be harmonised (note: raises Exception if True and units cannot be harmonised).

  • var_ref (str, optional) – variable against which data in arg data is supposed to be compared. If None, then the same variable is used (i.e. data.var_name).

  • update_baseyear_gridded (int, optional) – optional input that can be set in order to re-define the time dimension in the gridded data object to be analysed. E.g., if the data object is a climatology (one year of data) that has set the base year of the time dimension to a value other than the specified input start / stop time this may be used to update the time in order to make colocation possible.

  • min_num_obs (int or dict, optional) – minimum number of observations for resampling of time

  • colocate_time (bool) – if True and if original time resolution of data is higher than desired time resolution (ts_type), then both datasets are colocated in time before resampling to lower resolution.

  • use_climatology_ref (bool) – if True, climatological timeseries are used from observations

  • resample_how (str or dict) – string specifying how data should be aggregated when resampling in time. Default is “mean”. Can also be a nested dictionary, e.g. resample_how={‘daily’: {‘hourly’ : ‘max’}} would use the maximum value to aggregate from hourly to daily, rather than the mean.

  • **kwargs – additional keyword args (passed to UngriddedData.to_station_data_all())

Returns:

instance of colocated data

Return type:

ColocatedData

Raises:
  • VarNotAvailableError – if grid data variable is not available in ungridded data object

  • AttributeError – if instance of input UngriddedData object contains more than one dataset

  • TimeMatchError – if gridded data time range does not overlap with input time range

  • ColocationError – if none of the data points in input UngriddedData matches the input colocation constraints

pyaerocom.colocation.correct_model_stp_coldata(coldata, p0=None, t0=273.15, inplace=False)[source]

Correct modeldata in colocated data object to STP conditions

Note

BETA version, quite unelegant coded (at 8pm 3 weeks before IPCC deadline), but should do the job for 2010 monthly colocated data files (AND NOTHING ELSE)!

pyaerocom.colocation.resolve_var_name(data)[source]

Check variable name of GriddedData against AeroCom default

Checks whether the variable name set in the data corresponds to the AeroCom variable name, or whether it is an alias. Returns both the variable name set and the AeroCom variable name.

Parameters:

data (GriddedData) – Data to be checked.

Returns:

  • str – variable name as set in data (may be alias, but may also be AeroCom variable name, in which case first and second return parameter are the same).

  • str – corresponding AeroCom variable name

Co-locating ungridded observations

pyaerocom.combine_vardata_ungridded.combine_vardata_ungridded(data_ids_and_vars, match_stats_how='closest', match_stats_tol_km=1, merge_how='combine', merge_eval_fun=None, var_name_out=None, data_id_out=None, var_unit_out=None, resample_how=None, min_num_obs=None, add_meta_keys=None)[source]

Combine and colocate different variables from UngriddedData

This method allows to combine different variable timeseries from different ungridded observation records in multiple ways. The source data may be all included in a single instance of UngriddedData or in multiple, for details see first input parameter :param:`data_ids_and_vars`. Merging can be done in flexible ways, e.g. by combining measurements of the same variable from 2 different datasets or by computing new variables based on 2 measured variables (e.g. concox=concno2+conco3). Doing this requires colocation of site locations and timestamps of both input observation records, which is done in this method.

It comprises 2 major steps:

  1. Compute list of StationData objects for both input data combinations (data_id1 & var1; data_id2 & var2) and based on these, find the coincident locations. Finding coincident sites can either be done based on site location name or based on

    their lat/lon locations. The method to use can be specified via input arg :param:`match_stats_how`.

  2. For all coincident locations, a new instance of StationData is computed that has merged the 2 timeseries in the way

    that can be specified through input args :param:`merge_how` and :param:`merge_eval_fun`. If the 2 original timeseries from both sites come in different temporal resolutions, they will be resampled to the lower of both resolutions. Resampling constraints that are supposed to be applied in that case can be provided via the respective input args for temporal resampling. Default is pyaerocom default, which corresponds to ~25% coverage constraint (as of 22.10.2020) for major resolution steps, such as daily->monthly.

Note

Currently, only 2 variables can be combined to a new one (e.g. concox=conco3+concno2).

Note

Be aware of unit conversion issues that may arise if your input data is not in AeroCom default units. For details see below.

Parameters:
  • data_ids_and_vars (list) – list of 3 element tuples, each containing, in the following order 1. instance of UngriddedData; 2. dataset ID (remember that UngriddedData can contain more than one dataset); and 3. variable name. Note that currently only 2 of such tuples can be combined.

  • match_stats_how (str, optional) – String specifying how site locations are supposed to be matched. The default is ‘closest’. Supported are ‘closest’ and ‘station_name’.

  • match_stats_tol_km (float, optional) – radius tolerance in km for matching site locations when using ‘closest’ for site location matching. The default is 1.

  • merge_how (str, optional) – String specifying how to merge variable data at site locations. The default is ‘combine’. If both input variables are the same and combine is used, then the first input variable will be preferred over the other. Supported are ‘combine’, ‘mean’ and ‘eval’, for the latter, merge_eval_fun needs to be specified explicitly.

  • merge_eval_fun (str, optional) – String specifying how var1 and var2 data should be evaluated (only relevant if merge_how=’eval’ is used) . The default is None. E.g. if one wants to retrieve the column aerosol fine mode fraction at 550nm (fmf550aer) through AERONET, this could be done through the SDA product by prodiding data_id1 and var1 are ‘AeronetSDA’ and ‘od550aer’ and second input data_id2 and var2 are ‘AeronetSDA’ and ‘od550lt1aer’ and merge_eval_fun could then be ‘fmf550aer=(AeronetSDA;od550lt1aer/AeronetSDA;od550aer)*100’. Note that the input variables will be converted to their AeroCom default units, so the specification of merge_eval_fun should take that into account in case the originally read obsdata is not in default units.

  • var_name_out (str, optional) – Name of output variable. Default is None, in which case it is attempted to be inferred.

  • data_id_out (str, optional) – data_id set in output StationData objects. Default is None, in which case it is inferred from input data_ids (e.g. in above example of merge_eval_fun, the output data_id would be ‘AeronetSDA’ since both input IDs are the same.

  • var_unit_out (str) – unit of output variable.

  • resample_how (str, optional) – String specifying how temporal resampling should be done. The default is ‘mean’.

  • min_num_obs (int or dict, optional) – Minimum number of observations for temporal resampling. The default is None in which case pyaerocom default is used, which is available via pyaerocom.const.OBS_MIN_NUM_RESAMPLE.

  • add_meta_keys (list, optional) – additional metadata keys to be added to output StationData objects from input data. If None, then only the pyaerocom default keys are added (see StationData.STANDARD_META_KEYS).

Raises:
  • ValueError – If input for merge_how or match_stats_how is invalid.

  • NotImplementedError – If one of the input UngriddedData objects contains more than one dataset.

Returns:

merged_stats – list of StationData objects containing the colocated and combined variable data.

Return type:

list

Reading of gridded data

Gridded data specifies any dataset that can be represented and stored on a regular grid within a certain domain (e.g. lat, lon time), for instance, model output or level 3 satellite data, stored, for instance, as NetCDF files. In pyaerocom, the underlying data object is GriddedData and pyaerocom supports reading of such data for different file naming conventions.

Gridded data using AeroCom conventions

class pyaerocom.io.readgridded.ReadGridded(data_id=None, data_dir=None, file_convention='aerocom3')[source]

Class for reading gridded files using AeroCom file conventions

data_id

string ID for model or obsdata network (see e.g. Aerocom interface map plots lower left corner)

Type:

str

data

imported data object

Type:

GriddedData

data_dir

directory containing result files for this model

Type:

str

start

start time for data import

Type:

pandas.Timestamp

stop

stop time for data import

Type:

pandas.Timestamp

file_convention

class specifying details of the file naming convention for the model

Type:

FileConventionRead

files

list containing all filenames that were found. Filled, e.g. in ReadGridded.get_model_files()

Type:

list

from_files

List of all netCDF files that were used to concatenate the current data cube (i.e. that can be based on certain matching settings such as var_name or time interval).

Type:

list

ts_types

list of all sampling frequencies (e.g. hourly, daily, monthly) that were inferred from filenames (based on Aerocom file naming convention) of all files that were found

Type:

list

vars

list containing all variable names (e.g. od550aer) that were inferred from filenames based on Aerocom model file naming convention

Type:

list

years

list of available years as inferred from the filenames in the data directory.

Type:

list

Parameters:
  • data_id (str) – string ID of model (e.g. “AATSR_SU_v4.3”,”CAM5.3-Oslo_CTRL2016”)

  • data_dir (str, optional) – directory containing data files. If provided, only this directory is considered for data files, else the input data_id is used to search for the corresponding directory.

  • file_convention (str) – string ID specifying the file convention of this model (cf. installation file file_conventions.ini)

  • init (bool) – if True, the model directory is searched (search_data_dir()) on instantiation and if it is found, all valid files for this model are searched using search_all_files().

AUX_ADD_ARGS = {'concprcpoxn': {'prlim': 0.0001, 'prlim_set_under': nan, 'prlim_units': 'm d-1', 'ts_type': 'daily'}, 'concprcpoxs': {'prlim': 0.0001, 'prlim_set_under': nan, 'prlim_units': 'm d-1', 'ts_type': 'daily'}, 'concprcprdn': {'prlim': 0.0001, 'prlim_set_under': nan, 'prlim_units': 'm d-1', 'ts_type': 'daily'}}

Additional arguments passed to computation methods for auxiliary data This is optional and defined per-variable like in AUX_FUNS

AUX_ALT_VARS = {'ac550dryaer': ['ac550aer'], 'od440aer': ['od443aer'], 'od870aer': ['od865aer']}
AUX_FUNS = {'ang4487aer': <function compute_angstrom_coeff_cubes>, 'angabs4487aer': <function compute_angstrom_coeff_cubes>, 'conc*': <function multiply_cubes>, 'concNhno3': <function calc_concNhno3_from_vmr>, 'concNnh3': <function calc_concNnh3_from_vmr>, 'concNnh4': <function calc_concNnh4>, 'concNno3pm10': <function calc_concNno3pm10>, 'concNno3pm25': <function calc_concNno3pm25>, 'concNtnh': <function calc_concNtnh>, 'concNtno3': <function calc_concNtno3>, 'concno3': <function add_cubes>, 'concno3pm10': <function calc_concno3pm10>, 'concno3pm25': <function calc_concno3pm25>, 'concox': <function add_cubes>, 'concprcpoxn': <function compute_concprcp_from_pr_and_wetdep>, 'concprcpoxs': <function compute_concprcp_from_pr_and_wetdep>, 'concprcprdn': <function compute_concprcp_from_pr_and_wetdep>, 'concsspm10': <function add_cubes>, 'concsspm25': <function calc_sspm25>, 'dryoa': <function add_cubes>, 'fmf550aer': <function divide_cubes>, 'mmr*': <function mmr_from_vmr>, 'od550gt1aer': <function subtract_cubes>, 'sc550dryaer': <function subtract_cubes>, 'vmrox': <function add_cubes>, 'wetoa': <function add_cubes>}
AUX_REQUIRES = {'ang4487aer': ('od440aer', 'od870aer'), 'angabs4487aer': ('abs440aer', 'abs870aer'), 'conc*': ('mmr*', 'rho'), 'concNhno3': ('vmrhno3',), 'concNnh3': ('vmrnh3',), 'concNnh4': ('concnh4',), 'concNno3pm10': ('concno3f', 'concno3c'), 'concNno3pm25': ('concno3f', 'concno3c'), 'concNtnh': ('concnh4', 'vmrnh3'), 'concNtno3': ('concno3f', 'concno3c', 'vmrhno3'), 'concno3': ('concno3c', 'concno3f'), 'concno3pm10': ('concno3f', 'concno3c'), 'concno3pm25': ('concno3f', 'concno3c'), 'concox': ('concno2', 'conco3'), 'concprcpoxn': ('wetoxn', 'pr'), 'concprcpoxs': ('wetoxs', 'pr'), 'concprcprdn': ('wetrdn', 'pr'), 'concsspm10': ('concss25', 'concsscoarse'), 'concsspm25': ('concss25', 'concsscoarse'), 'dryoa': ('drypoa', 'drysoa'), 'fmf550aer': ('od550lt1aer', 'od550aer'), 'mmr*': ('vmr*',), 'od550gt1aer': ('od550aer', 'od550lt1aer'), 'rho': ('ts', 'ps'), 'sc550dryaer': ('ec550dryaer', 'ac550dryaer'), 'vmrox': ('vmrno2', 'vmro3'), 'wetoa': ('wetpoa', 'wetsoa')}
CONSTRAINT_OPERATORS = {'!=': <ufunc 'not_equal'>, '<': <ufunc 'less'>, '<=': <ufunc 'less_equal'>, '==': <ufunc 'equal'>, '>': <ufunc 'greater'>, '>=': <ufunc 'greater_equal'>}
property TS_TYPES

List with valid filename encryptions specifying temporal resolution

Update 7.11.2019: not in use anymore due to improved handling of all possible frequencies now using TsType class.

VERT_ALT = {'Surface': 'ModelLevel'}
add_aux_compute(var_name, vars_required, fun)[source]

Register new variable to be computed

Parameters:
  • var_name (str) – variable name to be computed

  • vars_required (list) – list of variables to read, that are required to compute var_name

  • fun (callable) – function that takes a list of GriddedData objects as input and that are read using variable names specified by vars_required.

apply_read_constraint(data, constraint, **kwargs)[source]

Filter a GriddeData object by value in another variable

Note

BETA version, that was hacked down in a rush to be able to apply AOD>0.1 threshold when reading AE.

Parameters:
  • data (GriddedData) – data object to which constraint is applied

  • constraint (dict) – dictionary defining read constraint (see check_constraint_valid() for minimum requirement). If constraint contains key var_name (not mandatory), then the corresponding variable is attemted to be read and is used to evaluate constraint and the corresponding boolean mask is then applied to input data. Wherever this mask is True (i.e. constraint is met), the current value in input data will be replaced with numpy.ma.masked or, if specified, with entry new_val in input constraint dict.

  • **kwargs (TYPE) – reading arguments in case additional variable data needs to be loaded, to determine filter mask (i.e. if var_name is specified in input constraint). Parse to read_var().

Raises:

ValueError – If constraint is invalid (cf. check_constraint_valid() for details).

Returns:

modified data objects (all grid-points that met constraint are replaced with either numpy.ma.masked or with a value that can be specified via key new_val in input constraint).

Return type:

GriddedData

browser

This object can be used to

check_compute_var(var_name)[source]

Check if variable name belongs to family that can be computed

For instance, if input var_name is concdust this method will check AUX_REQUIRES to see if there is a variable family pattern (conc*) defined that specifies how to compute these variables. If a match is found, the required variables and computation method is added via add_aux_compute().

Parameters:

var_name (str) – variable name to be checked

Returns:

True if match is found, else False

Return type:

bool

check_constraint_valid(constraint)[source]

Check if reading constraint is valid

Parameters:

constraint (dict) – reading constraint. Requires at lest entries for following keys: - operator (str): for valid operators see CONSTRAINT_OPERATORS - filter_val (float): value against which data is evaluated wrt to operator

Raises:

ValueError – If constraint is invalid

Return type:

None.

compute_var(var_name, start=None, stop=None, ts_type=None, experiment=None, vert_which=None, flex_ts_type=True, prefer_longer=False, vars_to_read=None, aux_fun=None, try_convert_units=True, aux_add_args=None, rename_var=None, **kwargs)[source]

Compute auxiliary variable

Like read_var() but for auxiliary variables (cf. AUX_REQUIRES)

Parameters:
  • var_name (str) – variable that are supposed to be read

  • start (Timestamp or str, optional) – start time of data import (if valid input, then the current start will be overwritten)

  • stop (Timestamp or str, optional) – stop time of data import

  • ts_type (str) – string specifying temporal resolution (choose from hourly, 3hourly, daily, monthly). If None, prioritised of the available resolutions is used

  • experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)

  • vert_which (str) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel)

  • flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable

  • prefer_longer (bool) – if True and applicable, the ts_type resulting in the longer time coverage will be preferred over other possible frequencies that match the query.

  • try_convert_units (bool) – if True, units of GriddedData objects are attempted to be converted to AeroCom default. This applies both to the GriddedData objects being read for computation as well as the variable computed from the forme objects. This is, for instance, useful when computing concentration in precipitation from wet deposition and precipitation amount.

  • rename_var (str) – if this is set, the var_name attribute of the output GriddedData object will be updated accordingly.

  • **kwargs – additional keyword args passed to _load_var()

Returns:

loaded data object

Return type:

GriddedData

concatenate_cubes(cubes)[source]

Concatenate list of cubes into one cube

Parameters:

CubeList – list of individual cubes

Returns:

Single cube that contains concatenated cubes from input list

Return type:

Cube

Raises:

iris.exceptions.ConcatenateError – if concatenation of all cubes failed

property data_dir: str

Directory where data files are located

property data_id: str

Data ID of dataset

property experiments: list

List of all experiments that are available in this dataset

property file_type

File type of data files

property files: list

List of data files

filter_files(var_name=None, ts_type=None, start=None, stop=None, experiment=None, vert_which=None, is_at_stations=False, df=None)[source]

Filter file database

Parameters:
  • var_name (str) – variable that are supposed to be read

  • ts_type (str) – string specifying temporal resolution (choose from “hourly”, “3hourly”, “daily”, “monthly”). If None, prioritised of the available resolutions is used

  • start (Timestamp or str, optional) – start time of data import

  • stop (Timestamp or str, optional) – stop time of data import

  • experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)

  • vert_which (str or dict, optional) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel) or dictionary containing var_name as key and vertical coded string as value, accordingly

  • flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable

  • prefer_longer (bool) – if True and applicable, the ts_type resulting in the longer time coverage will be preferred over other possible frequencies that match the query.

filter_query(var_name, ts_type=None, start=None, stop=None, experiment=None, vert_which=None, is_at_stations=False, flex_ts_type=True, prefer_longer=False)[source]

Filter files for read query based on input specs

Returns:

dataframe containing filtered dataset

Return type:

DataFrame

find_common_ts_type(vars_to_read, start=None, stop=None, ts_type=None, experiment=None, vert_which=None, flex_ts_type=True)[source]

Find common ts_type for list of variables to be read

Parameters:
  • vars_to_read (list) – list of variables that is supposed to be read

  • start (Timestamp or str, optional) – start time of data import (if valid input, then the current start will be overwritten)

  • stop (Timestamp or str, optional) – stop time of data import (if valid input, then the current start will be overwritten)

  • ts_type (str) – string specifying temporal resolution (choose from hourly, 3hourly, daily, monthly). If None, prioritised of the available resolutions is used

  • experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)

  • vert_which (str) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel)

  • flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable

Returns:

common ts_type for input variable

Return type:

str

Raises:

DataCoverageError – if no match can be found

get_files(var_name, ts_type=None, start=None, stop=None, experiment=None, vert_which=None, is_at_stations=False, flex_ts_type=True, prefer_longer=False)[source]

Get data files based on input specs

get_var_info_from_files() dict[source]

Creates dicitonary that contains variable specific meta information

Returns:

dictionary where keys are available variables and values (for each variable) contain information about available ts_types, years, etc.

Return type:

dict

has_var(var_name)[source]

Check if variable is available

Parameters:

var_name (str) – variable to be checked

Return type:

bool

property name

Deprecated name of attribute data_id

read(vars_to_retrieve=None, start=None, stop=None, ts_type=None, experiment=None, vert_which=None, flex_ts_type=True, prefer_longer=False, require_all_vars_avail=False, **kwargs)[source]

Read all variables that could be found

Reads all variables that are available (i.e. in vars_filename)

Parameters:
  • vars_to_retrieve (list or str, optional) – variables that are supposed to be read. If None, all variables that are available are read.

  • start (Timestamp or str, optional) – start time of data import

  • stop (Timestamp or str, optional) – stop time of data import

  • ts_type (str, optional) – string specifying temporal resolution (choose from “hourly”, “3hourly”, “daily”, “monthly”). If None, prioritised of the available resolutions is used

  • experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)

  • vert_which (str or dict, optional) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel) or dictionary containing var_name as key and vertical coded string as value, accordingly

  • flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable

  • prefer_longer (bool) – if True and applicable, the ts_type resulting in the longer time coverage will be preferred over other possible frequencies that match the query.

  • require_all_vars_avail (bool) – if True, it is strictly required that all input variables are available.

  • **kwargs – optional and support for deprecated input args

Returns:

loaded data objects (type GriddedData)

Return type:

tuple

Raises:
  • IOError – if input variable names is not list or string

  • VarNotAvailableError

    1. if require_all_vars_avail=True and one or more of the desired variables is not available in this class 2. if require_all_vars_avail=True and if none of the input variables is available in this object

read_var(var_name, start=None, stop=None, ts_type=None, experiment=None, vert_which=None, flex_ts_type=True, prefer_longer=False, aux_vars=None, aux_fun=None, constraints=None, try_convert_units=True, rename_var=None, **kwargs)[source]

Read model data for a specific variable

This method searches all valid files for a given variable and for a provided temporal resolution (e.g. daily, monthly), optionally within a certain time window, that may be specified on class instantiation or using the corresponding input parameters provided in this method.

The individual NetCDF files for a given temporal period are loaded as instances of the iris.Cube object and appended to an instance of the iris.cube.CubeList object. The latter is then used to concatenate the individual cubes in time into a single instance of the pyaerocom.GriddedData class. In order to ensure that this works, several things need to be ensured, which are listed in the following and which may be controlled within the global settings for NetCDF import using the attribute GRID_IO (instance of OnLoad) in the default instance of the pyaerocom.config.Config object accessible via pyaerocom.const.

Parameters:
  • var_name (str) – variable that are supposed to be read

  • start (Timestamp or str, optional) – start time of data import

  • stop (Timestamp or str, optional) – stop time of data import

  • ts_type (str) – string specifying temporal resolution (choose from “hourly”, “3hourly”, “daily”, “monthly”). If None, prioritised of the available resolutions is used

  • experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)

  • vert_which (str or dict, optional) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel) or dictionary containing var_name as key and vertical coded string as value, accordingly

  • flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable

  • prefer_longer (bool) – if True and applicable, the ts_type resulting in the longer time coverage will be preferred over other possible frequencies that match the query.

  • aux_vars (list) – only relevant if var_name is not available for reading but needs to be computed: list of variables that are required to compute var_name

  • aux_fun (callable) – only relevant if var_name is not available for reading but needs to be computed: custom method for computation (cf. add_aux_compute() for details)

  • constraints (list, optional) – list of reading constraints (dict type). See check_constraint_valid() and apply_read_constraint() for details related to format of the individual constraints.

  • try_convert_units (bool) – if True, then the unit of the variable data is checked against AeroCom default unit for that variable and if it deviates, it is attempted to be converted to the AeroCom default unit. Default is True.

  • rename_var (str) – if this is set, the var_name attribute of the output GriddedData object will be updated accordingly.

  • **kwargs – additional keyword args parsed to _load_var()

Returns:

loaded data object

Return type:

GriddedData

Raises:
property registered_var_patterns

List of string patterns for computation of variables

The information is extracted from AUX_REQUIRES

Returns:

list of variable patterns

Return type:

list

reinit()[source]

Reinit everything that is loaded specific to data_dir

search_all_files(update_file_convention=True)[source]

Search all valid model files for this model

This method browses the data directory and finds all valid files, that is, file that are named according to one of the aerocom file naming conventions. The file list is stored in files.

Note

It is presumed, that naming conventions of files in the data directory are not mixed but all correspond to either of the conventions defined in

Parameters:

update_file_convention (bool) – if True, the first file in data_dir is used to identify the file naming convention (cf. FileConventionRead)

Raises:

DataCoverageError – if no valid files could be found

search_data_dir()[source]

Search data directory based on model ID

Wrapper for method search_data_dir_aerocom()

Returns:

data directory

Return type:

str

Raises:

IOError – if directory cannot be found

property start

First available year in the dataset (inferred from filenames)

Note

This is not variable or ts_type specific, so it is not necessarily given that data from this year is available for all variables in vars or all frequencies liste in ts_types

property stop

Last available year in the dataset (inferred from filenames)

Note

This is not variable or ts_type specific, so it is not necessarily given that data from this year is available for all variables in vars or all frequencies liste in ts_types

property ts_types

Available frequencies

update(**kwargs)[source]

Update one or more valid parameters

Parameters:

**kwargs – keyword args that will be used to update (overwrite) valid class attributes such as data, data_dir, files

property vars
property vars_filename
property vars_provided

Variables provided by this dataset

property years_avail: list

Years available in dataset

pyaerocom.io.readgridded.is_3d(var_name)

Gridded data using EMEP conventions

Reading of ungridded data

Other than gridded data, ungridded data represents data that is irregularly sampled in space and time, for instance, observations at different locations around the globe. Such data is represented in pyaerocom by UngriddedData which is essentially a point-cloud dataset. Reading of UngriddedData is typically specific for different observational data records, as they typically come in various data formats using various metadata conventions, which need to be harmonised, which is done during the data import.

The following flowchart illustrates the architecture of ungridded reading in pyaerocom. Below are information about the individual reading classes for each dataset (blue in flowchart), the abstract template base classes the reading classes are based on (dark green) and the factory class ReadUngridded (orange) which has registered all individual reading classes. The data classes that are returned by the reading class are indicated in light green.

_images/pyaerocom_ungridded_io_flowchart.png

ReadUngridded factory class

Factory class that has all reading class for the individual datasets registered.

class pyaerocom.io.readungridded.ReadUngridded(data_ids=None, ignore_cache=False, data_dirs=None, configs: PyaroConfig | list[PyaroConfig] | None = None)[source]

Factory class for reading of ungridded data based on obsnetwork ID

This class also features reading functionality that goes beyond reading of inidividual observation datasets; including, reading of multiple datasets and post computation of new variables based on datasets that can be read.

Parameters:

SOON (COMING) –

DONOTCACHE_NAME = 'DONOTCACHE'
property INCLUDED_DATASETS
INCLUDED_READERS = [<class 'pyaerocom.io.read_aeronet_invv3.ReadAeronetInvV3'>, <class 'pyaerocom.io.read_aeronet_invv2.ReadAeronetInvV2'>, <class 'pyaerocom.io.read_aeronet_sdav2.ReadAeronetSdaV2'>, <class 'pyaerocom.io.read_aeronet_sdav3.ReadAeronetSdaV3'>, <class 'pyaerocom.io.read_aeronet_sunv2.ReadAeronetSunV2'>, <class 'pyaerocom.io.read_aeronet_sunv3.ReadAeronetSunV3'>, <class 'pyaerocom.io.read_earlinet.ReadEarlinet'>, <class 'pyaerocom.io.read_ebas.ReadEbas'>, <class 'pyaerocom.io.read_aasetal.ReadAasEtal'>, <class 'pyaerocom.io.read_airnow.ReadAirNow'>, <class 'pyaerocom.io.read_eea_aqerep.ReadEEAAQEREP'>, <class 'pyaerocom.io.read_eea_aqerep_v2.ReadEEAAQEREP_V2'>, <class 'pyaerocom.io.gaw.reader.ReadGAW'>, <class 'pyaerocom.io.ghost.reader.ReadGhost'>, <class 'pyaerocom.io.mep.reader.ReadMEP'>, <class 'pyaerocom.io.icos.reader.ReadICOS'>, <class 'pyaerocom.io.icpforests.reader.ReadICPForest'>]
property SUPPORTED_DATASETS

Returns list of strings containing all supported dataset names

SUPPORTED_READERS = [<class 'pyaerocom.io.read_aeronet_invv3.ReadAeronetInvV3'>, <class 'pyaerocom.io.read_aeronet_invv2.ReadAeronetInvV2'>, <class 'pyaerocom.io.read_aeronet_sdav2.ReadAeronetSdaV2'>, <class 'pyaerocom.io.read_aeronet_sdav3.ReadAeronetSdaV3'>, <class 'pyaerocom.io.read_aeronet_sunv2.ReadAeronetSunV2'>, <class 'pyaerocom.io.read_aeronet_sunv3.ReadAeronetSunV3'>, <class 'pyaerocom.io.read_earlinet.ReadEarlinet'>, <class 'pyaerocom.io.read_ebas.ReadEbas'>, <class 'pyaerocom.io.read_aasetal.ReadAasEtal'>, <class 'pyaerocom.io.read_airnow.ReadAirNow'>, <class 'pyaerocom.io.read_eea_aqerep.ReadEEAAQEREP'>, <class 'pyaerocom.io.read_eea_aqerep_v2.ReadEEAAQEREP_V2'>, <class 'pyaerocom.io.gaw.reader.ReadGAW'>, <class 'pyaerocom.io.ghost.reader.ReadGhost'>, <class 'pyaerocom.io.mep.reader.ReadMEP'>, <class 'pyaerocom.io.icos.reader.ReadICOS'>, <class 'pyaerocom.io.icpforests.reader.ReadICPForest'>, <class 'pyaerocom.io.pyaro.read_pyaro.ReadPyaro'>]
add_config(config: PyaroConfig) None[source]

Adds single PyaroConfig to self.configs

Parameters:

config (PyaroConfig) –

Raises:

ValueError – If config is not PyaroConfig

add_pyaro_reader(config: PyaroConfig) ReadUngriddedBase[source]
property configs

List configs

property data_dirs

Data directory(ies) for dataset(s) to read (keys are data IDs)

Type:

dict

property data_id

ID of dataset

Note

Only works if exactly one dataset is assigned to the reader, that is, length of data_ids is 1.

Raises:

AttributeError – if number of items in data_ids is unequal one.

Returns:

data ID

Return type:

str

property data_ids

List of datasets supposed to be read

dataset_provides_variables(data_id=None)[source]

List of variables provided by a certain dataset

get_lowlevel_reader(data_id: str | None = None) ReadUngriddedBase[source]

Helper method that returns initiated reader class for input ID

Parameters:

data_id (str) – Name of dataset

Returns:

instance of reading class (needs to be implementation of base class ReadUngriddedBase).

Return type:

ReadUngriddedBase

get_reader(data_id)[source]
get_vars_supported(obs_id, vars_desired)[source]

Filter input list of variables by supported ones for a certain data ID

Parameters:
  • obs_id (str) – ID of observation network

  • vars_desired (list) – List of variables that are desired

Returns:

list of variables that can be read through the input network

Return type:

list

property ignore_cache

Boolean specifying whether caching is active or not

property post_compute

Information about datasets that can be computed in post

read(data_ids=None, vars_to_retrieve=None, only_cached=False, filter_post=None, configs: PyaroConfig | list[PyaroConfig] | None = None, **kwargs)[source]

Read observations

Iter over all datasets in data_ids, call read_dataset() and append to data object

Parameters:
  • data_ids (str or list) – data ID or list of all datasets to be imported

  • vars_to_retrieve (str or list) – variable or list of variables to be imported

  • only_cached (bool) – if True, then nothing is reloaded but only data is loaded that is available as cached objects (not recommended to use but may be used if working offline without connection to database)

  • filter_post (dict, optional) – filters applied to UngriddedData object AFTER it is read into memory, via UngriddedData.apply_filters(). This option was introduced in pyaerocom version 0.10.0 and should be used preferably over **kwargs. There is a certain flexibility with respect to how these filters can be defined, for instance, sub dicts for each data_id. The most common way would be to provide directly the input needed for UngriddedData.apply_filters. If you want to read multiple variables from one or more datasets, and if you want to apply variable specific filters, it is recommended to read the data individually for each variable and corresponding set of filters and then merge the individual filtered UngriddedData objects afterwards, e.g. using data_var1 & data_var2.

  • **kwargs – Additional input options for reading of data, which are applied WHILE the data is read. If any such additional options are provided that are applied during the reading, then automatic caching of the output UngriddedData object will be deactivated. Thus, it is recommended to handle data filtering via filter_post argument whenever possible, which will result in better performance as the unconstrained original data is read in and cached, and then the filtering is applied.

Example

>>> import pyaerocom.io.readungridded as pio
>>> from pyaerocom import const
>>> obj = pio.ReadUngridded(data_id=const.AERONET_SUN_V3L15_AOD_ALL_POINTS_NAME)
>>> obj.read()
>>> print(obj)
>>> print(obj.metadata[0.]['latitude'])
read_dataset(data_id, vars_to_retrieve=None, only_cached=False, filter_post=None, **kwargs)[source]

Read dataset into an instance of ReadUngridded

Parameters:
  • data_id (str) – name of dataset

  • vars_to_retrieve (list) – variable or list of variables to be imported

  • only_cached (bool) – if True, then nothing is reloaded but only data is loaded that is available as cached objects (not recommended to use but may be used if working offline without connection to database)

  • filter_post (dict, optional) – filters applied to UngriddedData object AFTER it is read into memory, via UngriddedData.apply_filters(). This option was introduced in pyaerocom version 0.10.0 and should be used preferably over **kwargs. There is a certain flexibility with respect to how these filters can be defined, for instance, sub dicts for each data_id. The most common way would be to provide directly the input needed for UngriddedData.apply_filters. If you want to read multiple variables from one or more datasets, and if you want to apply variable specific filters, it is recommended to read the data individually for each variable and corresponding set of filters and then merge the individual filtered UngriddedData objects afterwards, e.g. using data_var1 & data_var2.

  • **kwargs – Additional input options for reading of data, which are applied WHILE the data is read. If any such additional options are provided that are applied during the reading, then automatic caching of the output UngriddedData object will be deactivated. Thus, it is recommended to handle data filtering via filter_post argument whenever possible, which will result in better performance as the unconstrained original data is read in and cached, and then the filtering is applied.

Returns:

data object

Return type:

UngriddedData

read_dataset_post(data_id, vars_to_retrieve, only_cached=False, filter_post=None, **kwargs)[source]

Read dataset into an instance of ReadUngridded

Parameters:
  • data_id (str) – name of dataset

  • vars_to_retrieve (list) – variable or list of variables to be imported

  • only_cached (bool) – if True, then nothing is reloaded but only data is loaded that is available as cached objects (not recommended to use but may be used if working offline without connection to database)

  • filter_post (dict, optional) – filters applied to UngriddedData object AFTER it is read into memory, via UngriddedData.apply_filters(). This option was introduced in pyaerocom version 0.10.0 and should be used preferably over **kwargs. There is a certain flexibility with respect to how these filters can be defined, for instance, sub dicts for each data_id. The most common way would be to provide directly the input needed for UngriddedData.apply_filters. If you want to read multiple variables from one or more datasets, and if you want to apply variable specific filters, it is recommended to read the data individually for each variable and corresponding set of filters and then merge the individual filtered UngriddedData objects afterwards, e.g. using data_var1 & data_var2.

  • **kwargs – Additional input options for reading of data, which are applied WHILE the data is read. If any such additional options are provided that are applied during the reading, then automatic caching of the output UngriddedData object will be deactivated. Thus, it is recommended to handle data filtering via filter_post argument whenever possible, which will result in better performance as the unconstrained original data is read in and cached, and then the filtering is applied.

Returns:

data object

Return type:

UngriddedData

property supported_datasets

Wrapper for SUPPORTED_DATASETS

ReadUngriddedBase template class

All ungridded reading routines are based on this template class.

class pyaerocom.io.readungriddedbase.ReadUngriddedBase(data_id: str | None = None, data_dir: str | None = None)[source]

TEMPLATE: Abstract base class template for reading of ungridded data

Note

The two dictionaries AUX_REQUIRES and AUX_FUNS can be filled with variables that are not contained in the original data files but are computed during the reading. The former specifies what additional variables are required to perform the computation and the latter specifies functions used to perform the computations of the auxiliary variables. See, for instance, the class ReadAeronetSunV3, which includes the computation of the AOD at 550nm and the Angstrom coefficient (in 440-870 nm range) from AODs measured at other wavelengths.

AUX_FUNS = {}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

property DATASET_PATH

Wrapper for data_dir.

abstract property DATA_ID

Name of dataset (OBS_ID)

Note

  • May be implemented as global constant in header of derieved class

  • May be multiple that can be specified on init (see example below)

abstract property DEFAULT_VARS

List containing default variables to read

IGNORE_META_KEYS = []
abstract property PROVIDES_VARIABLES

List of variables that are provided by this dataset

Note

May be implemented as global constant in header

property REVISION_FILE

Name of revision file located in data directory

abstract property SUPPORTED_DATASETS

List of all datasets supported by this interface

Note

  • best practice to specify in header of class definition

  • needless to mention that DATA_ID needs to be in this list

abstract property TS_TYPE

Temporal resolution of dataset

This should be defined in the header of an implementation class if it can be globally defined for the corresponding obs-network or in other cases it should be initated as string undefined and then, if applicable, updated in the reading routine of a file.

The TS_TYPE information should ultimately be written into the meta-data of objects returned by the implementation of read_file() (e.g. instance of StationData or a normal dictionary) and the method read() (which should ALWAYS return an instance of the UngriddedData class).

Note

  • Please use "undefined" if the derived class is not sampled on a regular basis.

  • If applicable please use Aerocom ts_type (i.e. hourly, 3hourly, daily, monthly, yearly)

  • Note also, that the ts_type in a derived class may or may not be defined in a general case. For instance, in the EBAS database the resolution code can be found in the file header and may thus be intiated as "undefined" in the initiation of the reading class and then updated when the class is being read

  • For derived implementation classes that support reading of multiple network versions, you may also assign

check_vars_to_retrieve(vars_to_retrieve)[source]

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters:

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns:

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type:

tuple

compute_additional_vars(data, vars_to_compute)[source]

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns:

updated data object now containing also computed variables

Return type:

dict

property data_dir: str

Location of the dataset

Note

This can be set explicitly when instantiating the class (e.g. if data is available on local machine). If unspecified, the data location is attempted to be inferred via get_obsnetwork_dir()

Raises:

FileNotFoundError – if data directory does not exist or cannot be retrieved automatically

Type:

str

property data_id

ID of dataset

property data_revision

Revision string from file Revision.txt in the main data directory

find_in_file_list(pattern=None)[source]

Find all files that match a certain wildcard pattern

Parameters:

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns:

list containing all files in files that match pattern

Return type:

list

Raises:

IOError – if no matches can be found

get_file_list(pattern=None)[source]

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters:

pattern (str, optional) – file name pattern applied to search

Returns:

list containing retrieved file locations

Return type:

list

Raises:

IOError – if no files can be found

logger

Class own instance of logger class

abstract read(vars_to_retrieve=None, files=[], first_file=None, last_file=None)[source]

Method that reads list of files as instance of UngriddedData

Parameters:
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used

Returns:

instance of ungridded data object containing data from all files.

Return type:

UngriddedData

abstract read_file(filename, vars_to_retrieve=None)[source]

Read single file

Parameters:
  • filename (str) – string specifying filename

  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

Returns:

imported data in a suitable format that can be handled by read() which is supposed to append the loaded results from this method (which reads one datafile) to an instance of UngriddedData for all files.

Return type:

dict or StationData, or other…

read_first_file(**kwargs)[source]

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters:

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns:

dictionary or similar containing loaded results from first file

Return type:

dict-like

read_station(station_id_filename, **kwargs)[source]

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters:
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns:

loaded data

Return type:

UngriddedData

Raises:

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)[source]

Remove outliers from data

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)[source]

Check if input variable is supported

Parameters:

var_name (str) – AeroCom variable name or alias

Raises:

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns:

True, if variable is supported by this interface, else False

Return type:

bool

property verbosity_level

Current level of verbosity of logger

AERONET

Aerosol Robotic Network (AERONET)

AERONET base class

All AERONET reading classes are based on the template ReadAeronetBase class which, in turn inherits from ReadUngriddedBase.

class pyaerocom.io.readaeronetbase.ReadAeronetBase(data_id=None, data_dir=None)[source]

Bases: ReadUngriddedBase

TEMPLATE: Abstract base class template for reading of Aeronet data

Extended abstract base class, derived from low-level base class ReadUngriddedBase that contains some more functionality.

ALT_VAR_NAMES_FILE = {}

dictionary specifying alternative column names for variables defined in VAR_NAMES_FILE

Type:

OPTIONAL

AUX_FUNS = {}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

COL_DELIM = ','

column delimiter in data block of files

property DATASET_PATH

Wrapper for data_dir.

abstract property DATA_ID

Name of dataset (OBS_ID)

Note

  • May be implemented as global constant in header of derieved class

  • May be multiple that can be specified on init (see example below)

DEFAULT_UNIT = '1'

Default data unit that is assigned to all variables that are not specified in UNITS dictionary (cf. UNITS)

abstract property DEFAULT_VARS

List containing default variables to read

IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
INSTRUMENT_NAME = 'sun_photometer'

name of measurement instrument

META_NAMES_FILE = {}

dictionary specifying the file column names (values) for each metadata key (cf. attributes of StationData, e.g. ‘station_name’, ‘longitude’, ‘latitude’, ‘altitude’)

META_NAMES_FILE_ALT = ({},)
abstract property PROVIDES_VARIABLES

List of variables that are provided by this dataset

Note

May be implemented as global constant in header

property REVISION_FILE

Name of revision file located in data directory

abstract property SUPPORTED_DATASETS

List of all datasets supported by this interface

Note

  • best practice to specify in header of class definition

  • needless to mention that DATA_ID needs to be in this list

property TS_TYPE

Default implementation of string for temporal resolution

TS_TYPES = {}

dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution. Key is the name of the dataset and value is the corresponding ts_type

UNITS = {}

Variable specific units, only required for variables that deviate from DEFAULT_UNIT (is irrelevant for all variables that are so far supported by the implemented Aeronet products, i.e. all variables are dimensionless as specified in DEFAULT_UNIT)

VAR_NAMES_FILE = {}

dictionary specifying the file column names (values) for each Aerocom variable (keys)

VAR_PATTERNS_FILE = {}

Mappings for identifying variables in file (may be specified in addition to explicit variable names specified in VAR_NAMES_FILE)

check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters:

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns:

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type:

tuple

property col_index

Dictionary that specifies the index for each data column

Note

Implementation depends on the data. For instance, if the variable information is provided in all files (of all stations) and always in the same column, then this can be set as a fixed dictionary in the __init__ function of the implementation (see e.g. class ReadAeronetSunV2). In other cases, it may not be ensured that each variable is available in all files or the column definition may differ between different stations. In the latter case you may automise the column index retrieval by providing the header names for each meta and data column you want to extract using the attribute dictionaries META_NAMES_FILE and VAR_NAMES_FILE by calling _update_col_index() in your implementation of read_file() when you reach the line that contains the header information.

compute_additional_vars(data, vars_to_compute)

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns:

updated data object now containing also computed variables

Return type:

dict

property data_dir: str

Location of the dataset

Note

This can be set explicitly when instantiating the class (e.g. if data is available on local machine). If unspecified, the data location is attempted to be inferred via get_obsnetwork_dir()

Raises:

FileNotFoundError – if data directory does not exist or cannot be retrieved automatically

Type:

str

property data_id

ID of dataset

property data_revision

Revision string from file Revision.txt in the main data directory

find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters:

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns:

list containing all files in files that match pattern

Return type:

list

Raises:

IOError – if no matches can be found

get_file_list(pattern=None)

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters:

pattern (str, optional) – file name pattern applied to search

Returns:

list containing retrieved file locations

Return type:

list

Raises:

IOError – if no files can be found

infer_wavelength_colname(colname, low=250, high=2000)[source]

Get variable wavelength from column name

Parameters:
  • colname (str) – string of column name

  • low (int) – lower limit of accepted value range

  • high (int) – upper limit of accepted value range

Returns:

wavelength in nm as floating str

Return type:

str

Raises:

ValueError – if None or more than one number is detected in variable string

logger

Class own instance of logger class

print_all_columns()[source]
read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)[source]

Method that reads list of files as instance of UngriddedData

Parameters:
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • file_pattern (str, optional) – string pattern for file search (cf get_file_list())

  • common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the UngriddedData object that is returned)

Returns:

data object

Return type:

UngriddedData

abstract read_file(filename, vars_to_retrieve=None)

Read single file

Parameters:
  • filename (str) – string specifying filename

  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

Returns:

imported data in a suitable format that can be handled by read() which is supposed to append the loaded results from this method (which reads one datafile) to an instance of UngriddedData for all files.

Return type:

dict or StationData, or other…

read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters:

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns:

dictionary or similar containing loaded results from first file

Return type:

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters:
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns:

loaded data

Return type:

UngriddedData

Raises:

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)

Check if input variable is supported

Parameters:

var_name (str) – AeroCom variable name or alias

Raises:

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns:

True, if variable is supported by this interface, else False

Return type:

bool

property verbosity_level

Current level of verbosity of logger

AERONET Sun (V3)

class pyaerocom.io.read_aeronet_sunv3.ReadAeronetSunV3(data_id=None, data_dir=None)[source]

Bases: ReadAeronetBase

Interface for reading Aeronet direct sun version 3 Level 1.5 and 2.0 data

See also

Base classes ReadAeronetBase and ReadUngriddedBase

ALT_VAR_NAMES_FILE = {}

dictionary specifying alternative column names for variables defined in VAR_NAMES_FILE

Type:

OPTIONAL

AUX_FUNS = {'ang44&87aer': <function calc_ang4487aer>, 'od550aer': <function calc_od550aer>, 'od550lt1ang': <function calc_od550lt1ang>, 'proxyod550aerh2o': <function calc_od550aer>, 'proxyod550bc': <function calc_od550aer>, 'proxyod550dust': <function calc_od550aer>, 'proxyod550nh4': <function calc_od550aer>, 'proxyod550no3': <function calc_od550aer>, 'proxyod550oa': <function calc_od550aer>, 'proxyod550so4': <function calc_od550aer>, 'proxyod550ss': <function calc_od550aer>}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {'ang44&87aer': ['od440aer', 'od870aer'], 'od550aer': ['od440aer', 'od500aer', 'ang4487aer'], 'od550lt1ang': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550aerh2o': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550bc': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550dust': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550nh4': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550no3': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550oa': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550so4': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550ss': ['od440aer', 'od500aer', 'ang4487aer']}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

COL_DELIM = ','

column delimiter in data block of files

property DATASET_PATH

Wrapper for data_dir.

DATA_ID = 'AeronetSunV3Lev2.daily'

Name of dataset (OBS_ID)

DEFAULT_UNIT = '1'

Default data unit that is assigned to all variables that are not specified in UNITS dictionary (cf. UNITS)

DEFAULT_VARS = ['od550aer', 'ang4487aer']

default variables for read method

IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
INSTRUMENT_NAME = 'sun_photometer'

name of measurement instrument

META_NAMES_FILE = {'altitude': 'Site_Elevation(m)', 'data_quality_level': 'Data_Quality_Level', 'date': 'Date(dd:mm:yyyy)', 'day_of_year': 'Day_of_Year', 'instrument_number': 'AERONET_Instrument_Number', 'latitude': 'Site_Latitude(Degrees)', 'longitude': 'Site_Longitude(Degrees)', 'station_name': 'AERONET_Site', 'time': 'Time(hh:mm:ss)'}

dictionary specifying the file column names (values) for each metadata key (cf. attributes of StationData, e.g. ‘station_name’, ‘longitude’, ‘latitude’, ‘altitude’)

META_NAMES_FILE_ALT = {'AERONET_Site': ['AERONET_Site_Name']}
NAN_VAL = -999.0
PROVIDES_VARIABLES = ['od340aer', 'od440aer', 'od500aer', 'od870aer', 'ang4487aer']

List of variables that are provided by this dataset (will be extended by auxiliary variables on class init, for details see __init__ method of base class ReadUngriddedBase)

property REVISION_FILE

Name of revision file located in data directory

SUPPORTED_DATASETS = ['AeronetSunV3Lev1.5.daily', 'AeronetSunV3Lev1.5.AP', 'AeronetSunV3Lev2.daily', 'AeronetSunV3Lev2.AP']

List of all datasets supported by this interface

property TS_TYPE

Default implementation of string for temporal resolution

TS_TYPES = {'AeronetSunV3Lev1.5.daily': 'daily', 'AeronetSunV3Lev2.daily': 'daily'}

dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution

UNITS = {}

Variable specific units, only required for variables that deviate from DEFAULT_UNIT (is irrelevant for all variables that are so far supported by the implemented Aeronet products, i.e. all variables are dimensionless as specified in DEFAULT_UNIT)

VAR_NAMES_FILE = {'ang4487aer': '440-870_Angstrom_Exponent', 'od340aer': 'AOD_340nm', 'od440aer': 'AOD_440nm', 'od500aer': 'AOD_500nm', 'od870aer': 'AOD_870nm'}

dictionary specifying the file column names (values) for each Aerocom variable (keys)

VAR_PATTERNS_FILE = {'AOD_([0-9]*)nm': 'od*aer'}

Mappings for identifying variables in file

check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters:

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns:

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type:

tuple

property col_index

Dictionary that specifies the index for each data column

Note

Implementation depends on the data. For instance, if the variable information is provided in all files (of all stations) and always in the same column, then this can be set as a fixed dictionary in the __init__ function of the implementation (see e.g. class ReadAeronetSunV2). In other cases, it may not be ensured that each variable is available in all files or the column definition may differ between different stations. In the latter case you may automise the column index retrieval by providing the header names for each meta and data column you want to extract using the attribute dictionaries META_NAMES_FILE and VAR_NAMES_FILE by calling _update_col_index() in your implementation of read_file() when you reach the line that contains the header information.

compute_additional_vars(data, vars_to_compute)

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns:

updated data object now containing also computed variables

Return type:

dict

property data_dir: str

Location of the dataset

Note

This can be set explicitly when instantiating the class (e.g. if data is available on local machine). If unspecified, the data location is attempted to be inferred via get_obsnetwork_dir()

Raises:

FileNotFoundError – if data directory does not exist or cannot be retrieved automatically

Type:

str

property data_id

ID of dataset

property data_revision

Revision string from file Revision.txt in the main data directory

find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters:

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns:

list containing all files in files that match pattern

Return type:

list

Raises:

IOError – if no matches can be found

get_file_list(pattern=None)

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters:

pattern (str, optional) – file name pattern applied to search

Returns:

list containing retrieved file locations

Return type:

list

Raises:

IOError – if no files can be found

infer_wavelength_colname(colname, low=250, high=2000)

Get variable wavelength from column name

Parameters:
  • colname (str) – string of column name

  • low (int) – lower limit of accepted value range

  • high (int) – upper limit of accepted value range

Returns:

wavelength in nm as floating str

Return type:

str

Raises:

ValueError – if None or more than one number is detected in variable string

logger

Class own instance of logger class

print_all_columns()
read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)

Method that reads list of files as instance of UngriddedData

Parameters:
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • file_pattern (str, optional) – string pattern for file search (cf get_file_list())

  • common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the UngriddedData object that is returned)

Returns:

data object

Return type:

UngriddedData

read_file(filename, vars_to_retrieve=None, vars_as_series=False)[source]

Read Aeronet Sun V3 level 1.5 or 2 file

Parameters:
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list, optional) – list of str with variable names to read. If None, use DEFAULT_VARS

  • vars_as_series (bool) – if True, the data columns of all variables in the result dictionary are converted into pandas Series objects

Returns:

dict-like object containing results

Return type:

StationData

read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters:

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns:

dictionary or similar containing loaded results from first file

Return type:

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters:
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns:

loaded data

Return type:

UngriddedData

Raises:

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)

Check if input variable is supported

Parameters:

var_name (str) – AeroCom variable name or alias

Raises:

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns:

True, if variable is supported by this interface, else False

Return type:

bool

property verbosity_level

Current level of verbosity of logger

AERONET SDA (V3)

class pyaerocom.io.read_aeronet_sdav3.ReadAeronetSdaV3(data_id=None, data_dir=None)[source]

Bases: ReadAeronetBase

Interface for reading Aeronet Sun SDA V3 Level 1.5 and 2.0 data

See also

Base classes ReadAeronetBase and ReadUngriddedBase

ALT_VAR_NAMES_FILE = {}

dictionary specifying alternative column names for variables defined in VAR_NAMES_FILE

Type:

OPTIONAL

AUX_FUNS = {'od550aer': <function calc_od550aer>, 'od550dust': <function calc_od550gt1aer>, 'od550gt1aer': <function calc_od550gt1aer>, 'od550lt1aer': <function calc_od550lt1aer>}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {'od550aer': ['od500aer', 'ang4487aer'], 'od550dust': ['od500gt1aer', 'ang4487aer'], 'od550gt1aer': ['od500gt1aer', 'ang4487aer'], 'od550lt1aer': ['od500lt1aer', 'ang4487aer']}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

COL_DELIM = ','

column delimiter in data block of files

property DATASET_PATH

Wrapper for data_dir.

DATA_ID = 'AeronetSDAV3Lev2.daily'

Name of dataset (OBS_ID)

DEFAULT_UNIT = '1'

Default data unit that is assigned to all variables that are not specified in UNITS dictionary (cf. UNITS)

DEFAULT_VARS = ['od550aer', 'od550gt1aer', 'od550lt1aer', 'od550dust']

default variables for read method

IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
INSTRUMENT_NAME = 'sun_photometer'

name of measurement instrument

META_NAMES_FILE = {'altitude': 'Site_Elevation(m)', 'data_quality_level': 'Data_Quality_Level', 'date': 'Date_(dd:mm:yyyy)', 'day_of_year': 'Day_of_Year', 'instrument_number': 'AERONET_Instrument_Number', 'latitude': 'Site_Latitude(Degrees)', 'longitude': 'Site_Longitude(Degrees)', 'station_name': 'AERONET_Site', 'time': 'Time_(hh:mm:ss)'}

dictionary specifying the file column names (values) for each metadata key (cf. attributes of StationData, e.g. ‘station_name’, ‘longitude’, ‘latitude’, ‘altitude’)

META_NAMES_FILE_ALT = ({},)
NAN_VAL = -999.0

value corresponding to invalid measurement

PROVIDES_VARIABLES = ['od500gt1aer', 'od500lt1aer', 'od500aer', 'ang4487aer', 'od500dust']

List of variables that are provided by this dataset (will be extended by auxiliary variables on class init, for details see __init__ method of base class ReadUngriddedBase)

property REVISION_FILE

Name of revision file located in data directory

SUPPORTED_DATASETS = ['AeronetSDAV3Lev1.5.daily', 'AeronetSDAV3Lev2.daily']

List of all datasets supported by this interface

property TS_TYPE

Default implementation of string for temporal resolution

TS_TYPES = {'AeronetSDAV3Lev1.5.daily': 'daily', 'AeronetSDAV3Lev2.daily': 'daily'}

dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution

UNITS = {}

Variable specific units, only required for variables that deviate from DEFAULT_UNIT (is irrelevant for all variables that are so far supported by the implemented Aeronet products, i.e. all variables are dimensionless as specified in DEFAULT_UNIT)

VAR_NAMES_FILE = {'ang4487aer': 'Angstrom_Exponent(AE)-Total_500nm[alpha]', 'od500aer': 'Total_AOD_500nm[tau_a]', 'od500dust': 'Coarse_Mode_AOD_500nm[tau_c]', 'od500gt1aer': 'Coarse_Mode_AOD_500nm[tau_c]', 'od500lt1aer': 'Fine_Mode_AOD_500nm[tau_f]'}

dictionary specifying the file column names (values) for each Aerocom variable (keys)

VAR_PATTERNS_FILE = {}

Mappings for identifying variables in file (may be specified in addition to explicit variable names specified in VAR_NAMES_FILE)

check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters:

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns:

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type:

tuple

property col_index

Dictionary that specifies the index for each data column

Note

Implementation depends on the data. For instance, if the variable information is provided in all files (of all stations) and always in the same column, then this can be set as a fixed dictionary in the __init__ function of the implementation (see e.g. class ReadAeronetSunV2). In other cases, it may not be ensured that each variable is available in all files or the column definition may differ between different stations. In the latter case you may automise the column index retrieval by providing the header names for each meta and data column you want to extract using the attribute dictionaries META_NAMES_FILE and VAR_NAMES_FILE by calling _update_col_index() in your implementation of read_file() when you reach the line that contains the header information.

compute_additional_vars(data, vars_to_compute)

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns:

updated data object now containing also computed variables

Return type:

dict

property data_dir: str

Location of the dataset

Note

This can be set explicitly when instantiating the class (e.g. if data is available on local machine). If unspecified, the data location is attempted to be inferred via get_obsnetwork_dir()

Raises:

FileNotFoundError – if data directory does not exist or cannot be retrieved automatically

Type:

str

property data_id

ID of dataset

property data_revision

Revision string from file Revision.txt in the main data directory

find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters:

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns:

list containing all files in files that match pattern

Return type:

list

Raises:

IOError – if no matches can be found

get_file_list(pattern=None)

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters:

pattern (str, optional) – file name pattern applied to search

Returns:

list containing retrieved file locations

Return type:

list

Raises:

IOError – if no files can be found

infer_wavelength_colname(colname, low=250, high=2000)

Get variable wavelength from column name

Parameters:
  • colname (str) – string of column name

  • low (int) – lower limit of accepted value range

  • high (int) – upper limit of accepted value range

Returns:

wavelength in nm as floating str

Return type:

str

Raises:

ValueError – if None or more than one number is detected in variable string

logger

Class own instance of logger class

print_all_columns()
read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)

Method that reads list of files as instance of UngriddedData

Parameters:
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • file_pattern (str, optional) – string pattern for file search (cf get_file_list())

  • common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the UngriddedData object that is returned)

Returns:

data object

Return type:

UngriddedData

read_file(filename, vars_to_retrieve=None, vars_as_series=False)[source]

Read Aeronet SDA V3 file and return it in a dictionary

Parameters:
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list, optional) – list of str with variable names to read. If None, use DEFAULT_VARS

  • vars_as_series (bool) – if True, the data columns of all variables in the result dictionary are converted into pandas Series objects

Returns:

dict-like object containing results

Return type:

StationData

read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters:

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns:

dictionary or similar containing loaded results from first file

Return type:

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters:
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns:

loaded data

Return type:

UngriddedData

Raises:

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)

Check if input variable is supported

Parameters:

var_name (str) – AeroCom variable name or alias

Raises:

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns:

True, if variable is supported by this interface, else False

Return type:

bool

property verbosity_level

Current level of verbosity of logger

AERONET Inversion (V3)

class pyaerocom.io.read_aeronet_invv3.ReadAeronetInvV3(data_id=None, data_dir=None)[source]

Bases: ReadAeronetBase

Interface for reading Aeronet inversion V3 Level 1.5 and 2.0 data

Parameters:

data_id – string specifying either of the supported datasets that are defined in SUPPORTED_DATASETS

ALT_VAR_NAMES_FILE = {}

dictionary specifying alternative column names for variables defined in VAR_NAMES_FILE

Type:

OPTIONAL

AUX_FUNS = {'abs550aer': <function calc_abs550aer>, 'od550aer': <function calc_od550aer>}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {'abs550aer': ['abs440aer', 'angabs4487aer'], 'od550aer': ['od440aer', 'ang4487aer']}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

COL_DELIM = ','

column delimiter in data block of files

property DATASET_PATH

Wrapper for data_dir.

DATA_ID = 'AeronetInvV3Lev2.daily'

Name of dataset (OBS_ID)

DEFAULT_UNIT = '1'

Default data unit that is assigned to all variables that are not specified in UNITS dictionary (cf. UNITS)

DEFAULT_VARS = ['abs550aer', 'od550aer']

default variables for read method

IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
INSTRUMENT_NAME = 'sun_photometer'

name of measurement instrument

META_NAMES_FILE = {'altitude': 'Elevation(m)', 'date': 'Date(dd:mm:yyyy)', 'day_of_year': 'Day_of_Year(fraction)', 'latitude': 'Latitude(Degrees)', 'longitude': 'Longitude(Degrees)', 'station_name': 'AERONET_Site', 'time': 'Time(hh:mm:ss)'}

dictionary specifying the file column names (values) for each metadata key (cf. attributes of StationData, e.g. ‘station_name’, ‘longitude’, ‘latitude’, ‘altitude’)

META_NAMES_FILE_ALT = ({},)
NAN_VAL = -999.0

value corresponding to invalid measurement

PROVIDES_VARIABLES = ['abs440aer', 'angabs4487aer', 'od440aer', 'ang4487aer']

List of variables that are provided by this dataset (will be extended by auxiliary variables on class init, for details see __init__ method of base class ReadUngriddedBase)

property REVISION_FILE

Name of revision file located in data directory

SUPPORTED_DATASETS = ['AeronetInvV3Lev2.daily', 'AeronetInvV3Lev1.5.daily']

List of all datasets supported by this interface

property TS_TYPE

Default implementation of string for temporal resolution

TS_TYPES = {'AeronetInvV3Lev1.5.daily': 'daily', 'AeronetInvV3Lev2.daily': 'daily'}

dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution

UNITS = {}

Variable specific units, only required for variables that deviate from DEFAULT_UNIT (is irrelevant for all variables that are so far supported by the implemented Aeronet products, i.e. all variables are dimensionless as specified in DEFAULT_UNIT)

VAR_NAMES_FILE = {'abs440aer': 'Absorption_AOD[440nm]', 'ang4487aer': 'Extinction_Angstrom_Exponent_440-870nm-Total', 'angabs4487aer': 'Absorption_Angstrom_Exponent_440-870nm', 'od440aer': 'AOD_Extinction-Total[440nm]'}

dictionary specifying the file column names (values) for each Aerocom variable (keys)

VAR_PATTERNS_FILE = {}

Mappings for identifying variables in file (may be specified in addition to explicit variable names specified in VAR_NAMES_FILE)

check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters:

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns:

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type:

tuple

property col_index

Dictionary that specifies the index for each data column

Note

Implementation depends on the data. For instance, if the variable information is provided in all files (of all stations) and always in the same column, then this can be set as a fixed dictionary in the __init__ function of the implementation (see e.g. class ReadAeronetSunV2). In other cases, it may not be ensured that each variable is available in all files or the column definition may differ between different stations. In the latter case you may automise the column index retrieval by providing the header names for each meta and data column you want to extract using the attribute dictionaries META_NAMES_FILE and VAR_NAMES_FILE by calling _update_col_index() in your implementation of read_file() when you reach the line that contains the header information.

compute_additional_vars(data, vars_to_compute)

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns:

updated data object now containing also computed variables

Return type:

dict

property data_dir: str

Location of the dataset

Note

This can be set explicitly when instantiating the class (e.g. if data is available on local machine). If unspecified, the data location is attempted to be inferred via get_obsnetwork_dir()

Raises:

FileNotFoundError – if data directory does not exist or cannot be retrieved automatically

Type:

str

property data_id

ID of dataset

property data_revision

Revision string from file Revision.txt in the main data directory

find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters:

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns:

list containing all files in files that match pattern

Return type:

list

Raises:

IOError – if no matches can be found

get_file_list(pattern=None)

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters:

pattern (str, optional) – file name pattern applied to search

Returns:

list containing retrieved file locations

Return type:

list

Raises:

IOError – if no files can be found

infer_wavelength_colname(colname, low=250, high=2000)

Get variable wavelength from column name

Parameters:
  • colname (str) – string of column name

  • low (int) – lower limit of accepted value range

  • high (int) – upper limit of accepted value range

Returns:

wavelength in nm as floating str

Return type:

str

Raises:

ValueError – if None or more than one number is detected in variable string

logger

Class own instance of logger class

print_all_columns()
read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)

Method that reads list of files as instance of UngriddedData

Parameters:
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • file_pattern (str, optional) – string pattern for file search (cf get_file_list())

  • common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the UngriddedData object that is returned)

Returns:

data object

Return type:

UngriddedData

read_file(filename, vars_to_retrieve=None, vars_as_series=False)[source]

Read Aeronet file containing results from v2 inversion algorithm

Parameters:
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list) – list of str with variable names to read

  • vars_as_series (bool) – if True, the data columns of all variables in the result dictionary are converted into pandas Series objects

Returns:

dict-like object containing results

Return type:

StationData

Example

>>> import pyaerocom.io as pio
>>> obj = pio.read_aeronet_invv2.ReadAeronetInvV2()
>>> files = obj.get_file_list()
>>> filedata = obj.read_file(files[0])
read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters:

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns:

dictionary or similar containing loaded results from first file

Return type:

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters:
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns:

loaded data

Return type:

UngriddedData

Raises:

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)

Check if input variable is supported

Parameters:

var_name (str) – AeroCom variable name or alias

Raises:

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns:

True, if variable is supported by this interface, else False

Return type:

bool

property verbosity_level

Current level of verbosity of logger

EARLINET

European Aerosol Research Lidar Network (EARLINET)

class pyaerocom.io.read_earlinet.ReadEarlinet(data_id=None, data_dir=None)[source]

Bases: ReadUngriddedBase

Interface for reading of EARLINET data

ALTITUDE_ID = 'altitude'

variable name of altitude in files

AUX_FUNS = {}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

property DATASET_PATH

Wrapper for data_dir.

DATA_ID = 'EARLINET'

Name of dataset (OBS_ID)

DEFAULT_VARS = ['bsc532aer', 'ec532aer']

default variables for read method

ERR_VARNAMES = {'ec355aer': 'error_extinction', 'ec532aer': 'error_extinction'}

Variable names of uncertainty data

EXCLUDE_CASES = ['cirrus.txt']
IGNORE_META_KEYS = []
KEEP_ADD_META = ['location', 'wavelength', 'zenith_angle', 'comment', 'shots', 'backscatter_evaluation_method']

Metadata keys from META_NAMES_FILE that are additional to standard keys defined in StationMetaData and that are supposed to be inserted into UngriddedData object created in read()

META_NAMES_FILE = {'PI': 'PI', 'altitude': 'altitude', 'comment': 'comment', 'dataset_name': 'title', 'instrument_name': 'system', 'location': 'location', 'start_utc': 'measurement_start_datetime', 'stop_utc': 'measurement_stop_datetime', 'wavelength_emis': 'wavelength', 'website': 'references'}
META_NEEDED = ['location', 'measurement_start_datetime', 'measurement_start_datetime']

metadata keys that are needed for reading (must be values in META_NAMES_FILE)

PROVIDES_VARIABLES = ['ec532aer', 'ec355aer', 'bsc532aer', 'bsc355aer', 'bsc1064aer']
READ_ERR = True

If true, the uncertainties are also read (where available, cf. ERR_VARNAMES)

property REVISION_FILE

Name of revision file located in data directory

SUPPORTED_DATASETS = ['EARLINET']

List of all datasets supported by this interface

TS_TYPE = 'hourly'
VAR_NAMES_FILE = {'bsc1064aer': 'backscatter', 'bsc355aer': 'backscatter', 'bsc532aer': 'backscatter', 'ec1064aer': 'extinction', 'ec355aer': 'extinction', 'ec532aer': 'extinction', 'zdust': 'DustLayerHeight'}

dictionary specifying the file column names (values) for each Aerocom variable (keys)

VAR_PATTERNS_FILE = {'bsc1064aer': '_Lev02_b1064', 'bsc355aer': '_Lev02_b0355', 'bsc532aer': '_Lev02_b0532', 'ec355aer': '_Lev02_e0355', 'ec532aer': '_Lev02_e0532'}
VAR_UNIT_NAMES = {'altitude': 'units', 'backscatter': ['units'], 'dustlayerheight': ['units'], 'extinction': ['units']}

Attribute access names for unit reading of variable data

check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters:

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns:

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type:

tuple

compute_additional_vars(data, vars_to_compute)

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns:

updated data object now containing also computed variables

Return type:

dict

property data_dir: str

Location of the dataset

Note

This can be set explicitly when instantiating the class (e.g. if data is available on local machine). If unspecified, the data location is attempted to be inferred via get_obsnetwork_dir()

Raises:

FileNotFoundError – if data directory does not exist or cannot be retrieved automatically

Type:

str

property data_id

ID of dataset

property data_revision

Revision string from file Revision.txt in the main data directory

exclude_files

files that are supposed to be excluded from reading

excluded_files

files that were actually excluded from reading

find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters:

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns:

list containing all files in files that match pattern

Return type:

list

Raises:

IOError – if no matches can be found

get_file_list(vars_to_retrieve=None, pattern=None)[source]

Perform recusive file search for all input variables

Note

Overloaded implementation of base class, since for Earlinet, the paths are variable dependent

Parameters:
  • vars_to_retrieve (list) – list of variables to retrieve

  • pattern (str, optional) – file name pattern applied to search

Returns:

list containing file paths

Return type:

list

logger

Class own instance of logger class

read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, read_err=None, remove_outliers=True, pattern=None)[source]

Method that reads list of files as instance of UngriddedData

Parameters:
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used

  • read_err (bool) –

    if True, uncertainty data is also read (where available). If

    unspecified (None), then the default is used (cf. READ_ERR)

    patternstr, optional

    string pattern for file search (cf get_file_list())

Returns:

data object

Return type:

UngriddedData

read_file(filename, vars_to_retrieve=None, read_err=None, remove_outliers=True)[source]

Read EARLINET file and return it as instance of StationData

Parameters:
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list, optional) – list of str with variable names to read. If None, use DEFAULT_VARS

  • read_err (bool) – if True, uncertainty data is also read (where available).

  • remove_outliers (bool) – if True, outliers are removed for each variable using the minimum and maximum attributes for that variable (accessed via pyaerocom.const.VARS[var_name]).

Returns:

dict-like object containing results

Return type:

StationData

read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters:

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns:

dictionary or similar containing loaded results from first file

Return type:

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters:
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns:

loaded data

Return type:

UngriddedData

Raises:

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)

Check if input variable is supported

Parameters:

var_name (str) – AeroCom variable name or alias

Raises:

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns:

True, if variable is supported by this interface, else False

Return type:

bool

property verbosity_level

Current level of verbosity of logger

EBAS

EBAS is a database with atmospheric measurement data hosted by the Norwegian Institute for Air Research. Declaration of AEROCOM variables in EBAS and assocaited information such as acceptable minimum and maximum values occurs in pyaerocom/data/variables.ini .

class pyaerocom.io.read_ebas.ReadEbas(data_id=None, data_dir=None)[source]

Bases: ReadUngriddedBase

Interface for reading EBAS data

Parameters:
  • data_id – string specifying either of the supported datasets that are defined in SUPPORTED_DATASETS

  • data_dir (str) – directory where data is located (NOTE: needs to point to the directory that contains the “ebas_file_index.sqlite3” file and not to the underlying directory “data” which contains the actual NASA Ames files.)

ASSUME_AAE_SHIFT_WVL = 1.0
ASSUME_AE_SHIFT_WVL = 1
AUX_FUNS = {'ac550dryaer': <function compute_ac550dryaer>, 'ang4470dryaer': <function compute_ang4470dryaer_from_dry_scat>, 'proxydryhno3': <function compute_wetoxn_from_concprcpoxn>, 'proxydryhono': <function compute_wetoxn_from_concprcpoxn>, 'proxydryn2o5': <function compute_wetoxn_from_concprcpoxn>, 'proxydryna': <function compute_wetna_from_concprcpna>, 'proxydrynh3': <function compute_wetrdn_from_concprcprdn>, 'proxydrynh4': <function compute_wetrdn_from_concprcprdn>, 'proxydryno2': <function compute_wetoxn_from_concprcpoxn>, 'proxydryno2no2': <function compute_wetoxn_from_concprcpoxn>, 'proxydryno3c': <function compute_wetoxn_from_concprcpoxn>, 'proxydryno3f': <function compute_wetoxn_from_concprcpoxn>, 'proxydryo3': <function make_proxy_drydep_from_O3>, 'proxydryoxn': <function compute_wetoxn_from_concprcpoxn>, 'proxydryoxs': <function compute_wetoxs_from_concprcpoxs>, 'proxydrypm10': <function compute_wetoxs_from_concprcpoxs>, 'proxydrypm25': <function compute_wetoxs_from_concprcpoxs>, 'proxydryrdn': <function compute_wetrdn_from_concprcprdn>, 'proxydryso2': <function compute_wetoxs_from_concprcpoxs>, 'proxydryso4': <function compute_wetoxs_from_concprcpoxs>, 'proxydryss': <function compute_wetna_from_concprcpna>, 'proxywethno3': <function compute_wetoxn_from_concprcpoxn>, 'proxywethono': <function compute_wetoxn_from_concprcpoxn>, 'proxywetn2o5': <function compute_wetoxn_from_concprcpoxn>, 'proxywetnh3': <function compute_wetrdn_from_concprcprdn>, 'proxywetnh4': <function compute_wetrdn_from_concprcprdn>, 'proxywetno2': <function compute_wetoxn_from_concprcpoxn>, 'proxywetno2no2': <function compute_wetoxn_from_concprcpoxn>, 'proxywetno3c': <function compute_wetoxn_from_concprcpoxn>, 'proxywetno3f': <function compute_wetoxn_from_concprcpoxn>, 'proxyweto3': <function make_proxy_wetdep_from_O3>, 'proxywetoxn': <function compute_wetoxn_from_concprcpoxn>, 'proxywetoxs': <function compute_wetoxs_from_concprcpoxs>, 'proxywetpm10': <function compute_wetoxs_from_concprcpoxs>, 'proxywetpm25': <function compute_wetoxs_from_concprcpoxs>, 'proxywetrdn': <function compute_wetrdn_from_concprcprdn>, 'proxywetso2': <function compute_wetoxs_from_concprcpoxs>, 'proxywetso4': <function compute_wetoxs_from_concprcpoxs>, 'sc440dryaer': <function compute_sc440dryaer>, 'sc550dryaer': <function compute_sc550dryaer>, 'sc700dryaer': <function compute_sc700dryaer>, 'vmro3max': <function calc_vmro3max>, 'wetna': <function compute_wetna_from_concprcpna>, 'wetnh4': <function compute_wetnh4_from_concprcpnh4>, 'wetno3': <function compute_wetno3_from_concprcpno3>, 'wetoxn': <function compute_wetoxn_from_concprcpoxn>, 'wetoxs': <function compute_wetoxs_from_concprcpoxs>, 'wetoxsc': <function compute_wetoxs_from_concprcpoxsc>, 'wetoxst': <function compute_wetoxs_from_concprcpoxst>, 'wetrdn': <function compute_wetrdn_from_concprcprdn>, 'wetso4': <function compute_wetso4_from_concprcpso4>}

Functions supposed to be used for computation of auxiliary variables

AUX_REQUIRES = {'ac550dryaer': ['ac550aer', 'acrh'], 'ang4470dryaer': ['sc440dryaer', 'sc700dryaer'], 'proxydryhno3': ['concprcpoxn', 'pr'], 'proxydryhono': ['concprcpoxn', 'pr'], 'proxydryn2o5': ['concprcpoxn', 'pr'], 'proxydryna': ['concprcpna', 'pr'], 'proxydrynh3': ['concprcprdn', 'pr'], 'proxydrynh4': ['concprcprdn', 'pr'], 'proxydryno2': ['concprcpoxn', 'pr'], 'proxydryno2no2': ['concprcpoxn', 'pr'], 'proxydryno3c': ['concprcpoxn', 'pr'], 'proxydryno3f': ['concprcpoxn', 'pr'], 'proxydryo3': ['vmro3'], 'proxydryoxn': ['concprcpoxn', 'pr'], 'proxydryoxs': ['concprcpoxs', 'pr'], 'proxydrypm10': ['concprcpoxs', 'pr'], 'proxydrypm25': ['concprcpoxs', 'pr'], 'proxydryrdn': ['concprcprdn', 'pr'], 'proxydryso2': ['concprcpoxs', 'pr'], 'proxydryso4': ['concprcpoxs', 'pr'], 'proxydryss': ['concprcpna', 'pr'], 'proxywethno3': ['concprcpoxn', 'pr'], 'proxywethono': ['concprcpoxn', 'pr'], 'proxywetn2o5': ['concprcpoxn', 'pr'], 'proxywetnh3': ['concprcprdn', 'pr'], 'proxywetnh4': ['concprcprdn', 'pr'], 'proxywetno2': ['concprcpoxn', 'pr'], 'proxywetno2no2': ['concprcpoxn', 'pr'], 'proxywetno3c': ['concprcpoxn', 'pr'], 'proxywetno3f': ['concprcpoxn', 'pr'], 'proxyweto3': ['vmro3'], 'proxywetoxn': ['concprcpoxn', 'pr'], 'proxywetoxs': ['concprcpoxs', 'pr'], 'proxywetpm10': ['concprcpoxs', 'pr'], 'proxywetpm25': ['concprcpoxs', 'pr'], 'proxywetrdn': ['concprcprdn', 'pr'], 'proxywetso2': ['concprcpoxs', 'pr'], 'proxywetso4': ['concprcpoxs', 'pr'], 'sc440dryaer': ['sc440aer', 'scrh'], 'sc550dryaer': ['sc550aer', 'scrh'], 'sc700dryaer': ['sc700aer', 'scrh'], 'vmro3max': ['vmro3'], 'wetna': ['concprcpna', 'pr'], 'wetnh4': ['concprcpnh4', 'pr'], 'wetno3': ['concprcpno3', 'pr'], 'wetoxn': ['concprcpoxn', 'pr'], 'wetoxs': ['concprcpoxs', 'pr'], 'wetoxsc': ['concprcpoxsc', 'pr'], 'wetoxst': ['concprcpoxst', 'pr'], 'wetrdn': ['concprcprdn', 'pr'], 'wetso4': ['concprcpso4', 'pr']}

variables required for computation of auxiliary variables

AUX_USE_META = {'ac550dryaer': 'ac550aer', 'sc440dryaer': 'sc440aer', 'sc550dryaer': 'sc550aer', 'sc700dryaer': 'sc700aer'}

Meta information supposed to be migrated to computed variables

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

CACHE_SQLITE_FILE = ['EBASMC']

For the following data IDs, the sqlite database file will be cached if const.EBAS_DB_LOCAL_CACHE is True

property DATASET_PATH

Wrapper for data_dir.

DATA_ID = 'EBASMC'

Name of dataset (OBS_ID)

property DEFAULT_VARS

list of default variables to be read

Note

Currently a wrapper for PROVIDES_VARIABLES

Type:

list

property FILE_REQUEST_OPTS

List of options for file retrieval

FILE_SUBDIR_NAME = 'data'

Name of subdirectory containing data files (relative to data_dir)

IGNORE_COLS_CONTAIN = ['fraction', 'artifact']

Ignore data columns in NASA Ames files that contain any of the listed attributes

IGNORE_FILES = ['CA0420G.20100101000000.20190125102503.filter_absorption_photometer.aerosol_absorption_coefficient.aerosol.1y.1h.CA01L_Magee_AE31_ALT.CA01L_aethalometer.lev2.nas', 'DK0022R.20180101070000.20191014000000.bulk_sampler..precip.1y.15d.DK01L_bs_22.DK01L_IC.lev2.nas', 'DK0012R.20180101070000.20191014000000.bulk_sampler..precip.1y.15d.DK01L_bs_12.DK01L_IC.lev2.nas', 'DK0008R.20180101070000.20191014000000.bulk_sampler..precip.1y.15d.DK01L_bs_08.DK01L_IC.lev2.nas', 'DK0005R.20180101070000.20191014000000.bulk_sampler..precip.1y.15d.DK01L_bs_05.DK01L_IC.lev2.nas']

list of EBAS data files that are flagged invalid and will not be imported

IGNORE_META_KEYS = []
MERGE_STATIONS = {'Birkenes': 'Birkenes II', 'Rörvik': 'Råö', 'Vavihill': 'Hallahus', 'Virolahti II': 'Virolahti III'}
property NAN_VAL

Irrelevant for implementation of EBAS I/O

property PROVIDES_VARIABLES

List of variables provided by the interface

property REVISION_FILE

Name of revision file located in data directory

SQL_DB_NAME = 'ebas_file_index.sqlite3'

Name of sqlite database file

SUPPORTED_DATASETS = ['EBASMC']

List of all datasets supported by this interface

TS_TYPE = 'undefined'
TS_TYPE_CODES = {'1d': 'daily', '1h': 'hourly', '1mn': 'minutely', '1mo': 'monthly', '1w': 'weekly', 'd': 'daily', 'h': 'hourly', 'mn': 'minutely', 'mo': 'monthly', 'w': 'weekly'}

Temporal resolution codes that (so far) can be understood by pyaerocom

VAR_READ_OPTS = {'pr': {'convert_units': False, 'freq_min_cov': 0.75}, 'prmm': {'freq_min_cov': 0.75}}

Custom reading options for individual variables. Keys need to be valid attributes of ReadEbasOptions and anything specified here (for a given variable) will be overwritten from the defaults specified in the options class.

property all_station_names

List of all available station names in EBAS database

check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters:

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns:

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type:

tuple

compute_additional_vars(data, vars_to_compute)[source]

Compute additional variables and put into station data

Note

Extended version of ReadUngriddedBase.compute_additional_vars()

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns:

updated data object now containing also computed variables

Return type:

dict

property data_dir: str

Location of the dataset

Note

This can be set explicitly when instantiating the class (e.g. if data is available on local machine). If unspecified, the data location is attempted to be inferred via get_obsnetwork_dir()

Raises:

FileNotFoundError – if data directory does not exist or cannot be retrieved automatically

Type:

str

property data_id

ID of dataset

property data_revision

Revision string from file Revision.txt in the main data directory

property file_dir

Directory containing EBAS NASA Ames files

property file_index

SQlite file mapping metadata with filenames

files_contain

this is filled in method get_file_list and specifies variables to be read from each file

find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters:

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns:

list containing all files in files that match pattern

Return type:

list

Raises:

IOError – if no matches can be found

find_var_cols(vars_to_read, loaded_nasa_ames)[source]

Find best-match variable columns in loaded NASA Ames file

For each of the input variables, try to find one or more matches in the input NASA Ames file (loaded data object). If more than one match occurs, identify the best one (an example here is: user wants sc550aer and file contains scattering coefficients at 530 nm and 580 nm: in this case the 530 nm column will be used, cf. also accepted wavelength tolerance for reading of wavelength dependent variables wavelength_tol_nm).

Parameters:
  • vars_to_read (list) – list of variables that are supposed to be read

  • loaded_nasa_ames (EbasNasaAmesFile) – loaded data object

Returns:

dictionary specifying the best-match variable column for each of the input variables.

Return type:

dict

get_ebas_var(var_name)[source]

Get instance of EbasVarInfo for input AeroCom variable

get_file_list(vars_to_retrieve, **constraints)[source]

Get list of files for all variables to retrieve

Parameters:
Returns:

unified list of file paths each containing either of the specified variables

Return type:

list

get_read_opts(var_name)[source]

Get reading options for input variable

Parameters:

var_name (str) – name of variable

Returns:

options

Return type:

EbasReadOptions

logger

Class own instance of logger class

read(vars_to_retrieve=None, first_file=None, last_file=None, files=None, **constraints)[source]

Method that reads list of files as instance of UngriddedData

Parameters:
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used

  • files (list) – list of files

  • **constraints – further reading constraints deviating from default (default info for each AEROCOM variable can be found in `ebas_config.ini < https://github.com/metno/pyaerocom/blob/master/pyaerocom/data/ ebas_config.ini>`__). For details on possible input parameters see EbasSQLRequest (or this tutorial)

Returns:

data object

Return type:

UngriddedData

read_file(filename, vars_to_retrieve=None, _vars_to_read=None, _vars_to_compute=None)[source]

Read EBAS NASA Ames file

Parameters:
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list, optional) – list of str with variable names to read, if None (and if not both of the alternative possible parameters _vars_to_read and _vars_to_compute are specified explicitely) then the default settings are used

Returns:

dict-like object containing results

Return type:

StationData

read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters:

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns:

dictionary or similar containing loaded results from first file

Return type:

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters:
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns:

loaded data

Return type:

UngriddedData

Raises:

IOError – if no files can be found for this station ID

property readopts_default

Default reading options

These are applied to all variables if not defined explicitly for individual variables in :attr:`REA

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters:
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

property sqlite_database_file

Path to EBAS SQL database

var_info(var_name)[source]

Aerocom variable info for input var_name

var_supported(var_name)

Check if input variable is supported

Parameters:

var_name (str) – AeroCom variable name or alias

Raises:

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns:

True, if variable is supported by this interface, else False

Return type:

bool

property verbosity_level

Current level of verbosity of logger

class pyaerocom.io.read_ebas.ReadEbasOptions(**args)[source]

Bases: BrowseDict

Options for EBAS reading routine

prefer_statistics

preferred order of data statistics. Some files may contain multiple columns for one variable, where each column corresponds to one of the here defined statistics that where applied to the data. This attribute is only considered for ebas variables, that have not explicitely defined what statistics to use (and in which preferred order, if applicable). Reading preferences for all Ebas variables are specified in the file ebas_config.ini in the data directory of pyaerocom.

Type:

list

ignore_statistics

columns that have either of these statistics applied are ignored for variable data reading.

Type:

list

wavelength_tol_nm

Wavelength tolerance in nm for reading of (wavelength dependent) variables. If multiple matches occur (e.g. query -> variable at 550nm but file contains 3 columns of that variable, e.g. at 520, 530 and 540 nm), then the closest wavelength to the queried wavelength is used within the specified tolerance level.

Type:

int

shift_wavelengths

(only for wavelength dependent variables). If True, and a data columns candidate is valid within wavelength tolerance around desired wavelength, that column will be considered to be used for data import. Defaults to True.

Type:

bool

assume_default_ae_if_unavail

assume an Angstrom Exponent for applying wavelength shifts of data. See ReadEbas.ASSUME_AE_SHIFT_WVL and ReadEbas.ASSUME_AAE_SHIFT_WVL for AE and AAE assumptions related to scattering and absorption coeffs. Defaults to True.

Type:

bool

check_correct_MAAP_wrong_wvl

(BETA, do not use): set correct wavelength for certain absorption coeff measurements. Defaults to False.

Type:

bool

eval_flags

If True, the flag columns in the NASA Ames files are read and decoded (using EbasFlagCol.decode()) and the (up to 3 flags for each measurement) are evaluated as valid / invalid using the information in the flags CSV file. The evaluated flags are stored in the data files returned by the reading methods ReadEbas.read() and ReadEbas.read_file().

Type:

bool

keep_aux_vars

if True, auxiliary variables required for computed variables will be written to the UngriddedData object created in ReadEbas.read() (e.g. if sc550dryaer is requested, this requires reading of sc550aer and scrh. The latter 2 will be written to the data object if this parameter evaluates to True)

Type:

bool

convert_units

if True, variable units in EBAS files will be checked and attempted to be converted into AeroCom default unit for that variable. Defaults to True.

Type:

bool

try_convert_vmr_conc

attempt to convert vmr data to conc if user requires conc (e.g. user wants conco3 but file only contains vmro3), and vice versa.

Type:

bool

ensure_correct_freq

if True, the frequency set in NASA Ames files (provided via attr resolution_code) is checked using time differences inferred from start and stop time of each measurement. Measurements that are not in that resolution (within 5% tolerance level) will be flagged invalid.

Type:

bool

freq_from_start_stop_meas

infer frequency from start / stop intervals of individual measurements.

Type:

bool

freq_min_cov

defines minimum number of measurements that need to correspond to the detected sampling frequency in the file within the specified tolerance range. Only applies if ensure_correct_freq is True. E.g. if a file contains 100 measurements and the most common frequency (as inferred from stop-start of each measurement) is daily. Then, if freq_min_cov is 0.75, it will be ensured that at least 75 of the measurements are daily (within +/- 5% tolerance), otherwise this file is discarded. Defaults to 0.

Type:

float

Parameters:

**args – key / value pairs specifying any of the supported settings.

ADD_GLOB = []
FORBIDDEN_KEYS = []
IGNORE_JSON = []

Keys to be ignored when converting to json

MAXLEN_KEYS = 100.0
SETTER_CONVERT = {}
clear() None.  Remove all items from D.
property filter_dict
get(k[, d]) D[k] if k in D, else d.  d defaults to None.
import_from(other) None

Import key value pairs from other object

Other than update() this method will silently ignore input keys that are not contained in this object.

Parameters:

other (dict or BrowseDict) – other dict-like object containing content to be updated.

Raises:

ValueError – If input is inalid type.

Return type:

None

items() a set-like object providing a view on D's items
json_repr() dict

Convert object to serializable json dict

Returns:

content of class

Return type:

dict

keys() a set-like object providing a view on D's keys
pop(k[, d]) v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

pretty_str()
setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D
to_dict()
update([E, ]**F) None.  Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() an object providing a view on D's values

EBAS (low level)

Pyearocom module for reading and processing of EBAS NASA Ames files

For details on the file format see here

class pyaerocom.io.ebas_nasa_ames.EbasColDef(name, is_var, is_flag, unit='1')[source]

Dict-like object for EBAS NASA Ames column definitions

Note

The meta attribute name ‘unit’ can also be accessed using the CF attr name ‘units’

name

column name

Type:

str

unit

unit of data in column (if applicable)

Type:

str

is_var

True if column corresponds to variable data, False if not

Type:

bool

is_flag

True, if column corresponds to Flag column, False if not

Type:

bool

flag_col

column number of flag column that corresponds to this data column (only relevant if is_var is True)

Type:

int

Parameters:
  • name (str) – column name

  • is_var (bool) – True if column corresponds to variable data, False if not

  • is_flag (bool) – True, if column corresponds to Flag column, False if not

  • unit (str, optional) – unit of data in column (if applicable)

  • flag_col (str, optional) – name of flag column that corresponds to this data colum (only relevant if is_var is True)

get_wavelength_nm()[source]

Try to access wavelength information in nm (as float)

to_dict(ignore_keys=['is_var', 'is_flag', 'flag_col', 'wavelength_nm'])[source]
class pyaerocom.io.ebas_nasa_ames.EbasFlagCol(raw_data, interpret_on_init=True)[source]

Simple helper class to decode and interpret EBAS flag columns

raw_data

raw flag column (containing X-digit floating point numbers)

Type:

ndarray

property FLAG_INFO

Detailed information about EBAS flag definitions

decode()[source]

Decode raw flag column

property decoded

Nx3 numpy array containing decoded flag columns

property valid

Boolean array specifying valid and invalid measurements

class pyaerocom.io.ebas_nasa_ames.EbasNasaAmesFile(file=None, only_head=False, replace_invalid_nan=True, convert_timestamps=True, evaluate_flags=False, quality_check=True, **kwargs)[source]

EBAS NASA Ames file interface

Class interface for reading and processing of EBAS NASA Ames file

time_stamps

array containing datetime64 objects with timestamps

Type:

ndarray

flags

dictionary containing EbasFlagCol objects for each column containing flags

Type:

dict

Parameters:
  • file (str, optional) – EBAS NASA Ames file. if valid file path, then the file is read on init (please note following options for import)

  • only_head (bool) – read only file header

  • replace_invalid_nan (bool) – replace all invalid values in the table by NaNs. The invalid values for each dependent data column are identified based on the information in the file header.

  • convert_timestamps (bool) – compute array of numpy datetime64 timestamps from numeric timestamps in data

  • evaluate_flags (bool) – if True, all flags in all flag columns are decoded from floating point representation to 3 integers, e.g. 0.111222333 -> 111 222 333

  • quality_check (bool) – perform quality check after import (for details see _quality_check())

  • **kwargs – optional input args that are passed to init of NasaAmesHeader base class

ERR_HIGH_STATS = 'percentile:84.13'
ERR_LOW_STATS = 'percentile:15.87'
TIMEUNIT2SECFAC = {'Days': 86400, 'days': 86400}
all_cols_contain(colnums, what)[source]

Check if all input columns contain input attr what

Parameters:
  • colnums (list) – list of column numbers

  • what (str) – name of attribute (e.g. matrix, statistics, tower_inlet_height)

Returns:

True if all input columns contain what attr., else False.

Return type:

bool

assign_flagcols()[source]
property base_date

Base date of data as numpy.datetime64[s]

property col_names

Column names of table

property col_names_vars

Names of all columns that are flagged as variables

property col_num

Number of columns in table

property col_nums_vars

Column index number of all variables

compute_time_stamps()[source]

Compute time stamps from first two data columns

property data

2D numpy array containing data table

property data_header
get_dt_meas(np_freq='s')[source]

Get array with time between individual measurements

This is computed based on start and timestamps, e.g. dt[0] = start[1] - start[0]

Parameters:

np_freq (str) – string specifying output frequency of gap values

Returns:

array with time-differences as floating point number in specified input resolution

Return type:

ndarray

get_time_gaps_meas(np_freq='s')[source]

Get array with time gaps between individual measurements

This is computed based on start and stop timestamps, e.g. =dt[0] = start[1] - stop[0]

Parameters:

np_freq (str) – string specifying output frequency of gap values

Returns:

array with time-differences as floating point number in specified input resolution

Return type:

ndarray

init_flags(evaluate=True)[source]

Decode flag columns and store info in flags

static numarr_to_datetime64(basedate, num_arr, mulfac_to_sec)[source]

Convert array of numerical timestamps into datetime64 array

Parameters:
  • basedate (datetime64) – reference date

  • num_arr (ndarray) – numerical time stamps relative to basedate

  • mulfac_to_sec (float) – multiplicative factor to convert numerical values to unit of seconds

Returns:

array containing timestamps as datetime64 objects

Return type:

ndarray

print_col_info()[source]

Print information about individual columns

read_file(nasa_ames_file, only_head=False, replace_invalid_nan=True, convert_timestamps=True, evaluate_flags=False, quality_check=False)[source]

Read NASA Ames file

Parameters:
  • nasa_ames_file (str) – EBAS NASA Ames file

  • only_head (bool) – read only file header

  • replace_invalid_nan (bool) – replace all invalid values in the table by NaNs. The invalid values for each dependent data column are identified based on the information in the file header.

  • convert_timestamps (bool) – compute array of numpy datetime64 timestamps from numeric timestamps in data

  • evaluate_flags (bool) – if True, all data columns get assigned their corresponding flag column, the flags in all flag columns are decoded from floating point representation to 3 integers, e.g. 0.111222333 -> 111 222 333 and if input `replace_invalid_nan==True`, then the invalid measurements in each column are replaced with NaN’s.

  • quality_check (bool) – perform quality check after import (for details see _quality_check())

read_header(nasa_ames_file, quality_check=True)[source]
property shape

Shape of data array

property time_unit

Time unit of data

class pyaerocom.io.ebas_nasa_ames.NasaAmesHeader(**kwargs)[source]

Header class for Ebas NASA Ames file

Note

Is used in EbasNasaAmesFile and should not be used directly.

CONV_FLOAT()
CONV_INT()
CONV_MULTIFLOAT()
CONV_MULTIINT()
CONV_PI()
CONV_STR()
property head_fix

Dictionary containing fixed header info (that is always available)

property meta

Meta data dictionary (specific for this file)

update(**kwargs)[source]
property var_defs

List containing column variable definitions

List index is column index in file and value is instance of EbasColDef

class pyaerocom.io.ebas_file_index.EbasFileIndex(database=None)[source]

EBAS SQLite I/O interface

Takes care of connection to database and execution of requests

property ALL_INSTRUMENTS

List of all variables available

property ALL_MATRICES

List of all matrix values available

property ALL_STATION_CODES

List of all available station codes in database

Note

Not tested whether the order is the same as the order in STATION_NAMES, i.e. the lists should not be linked to each other

property ALL_STATION_NAMES

List of all available station names in database

property ALL_STATISTICS_PARAMS

List of all statistical parameters available

For more info see here

property ALL_VARIABLES

List of all variables available

property database

Path to ebas_file_index.sqlite3 file

execute_request(request, file_request=False)[source]

Connect to database and retrieve data for input request

Parameters:

request (EbasSQLRequest or str) – request specifications

Returns:

list of tuples containing the retrieved results. The number of items in each tuple corresponds to the number of requested parameters (usually one, can be specified in make_query_str() using argument what)

Return type:

list

get_file_names(request)[source]

Get all files that match the request specifications

Parameters:

request (EbasSQLRequest or str) – request specifications

Returns:

list of file paths that match the request

Return type:

list

get_table_columns(table_name)[source]

Get all columns of a table in SQLite database file

get_table_names()[source]

Get all table names in SQLite database file

class pyaerocom.io.ebas_file_index.EbasSQLRequest(variables=None, start_date=None, stop_date=None, station_names=None, matrices=None, altitude_range=None, lon_range=None, lat_range=None, instrument_types=None, statistics=None, datalevel=None)[source]

Low level dictionary like object for EBAS sqlite queries

variables

tuple containing variable names to be extracted (e.g. ('aerosol_light_scattering_coefficient', 'aerosol_optical_depth')). If None, all available is used

Type:

tuple, optional

start_date

start date of data request (format YYYY-MM-DD). If None, all available is used

Type:

str, optional

stop_date

stop date of data request (format YYYY-MM-DD). If None, all available is used

Type:

str, optional

station_names

tuple containing station_names of request (e.g. ('Birkenes II', 'Asa')).If None, all available is used

Type:

tuple, optional

matrices

tuple containing station_names of request (e.g. ('pm1', 'pm10', 'pm25', 'aerosol')) If None, all available is used

Type:

tuple, optional

altitude_range

tuple specifying altitude range of station in m (e.g. (0.0, 500.0)). If None, all available is used

Type:

tuple, optional

lon_range

tuple specifying longitude range of station in degrees (e.g. (-20, 20)). If None, all available is used

Type:

tuple, optional

lat_range

tuple specifying latitude range of station in degrees (e.g. (50, 80)). If None, all available is used

Type:

tuple, optional

instrument_type

string specifying instrument types (e.g. ("nephelometer"))

Type:

str, optional

statistics

string specifying statistics code (e.g. ("arithmetic mean"))

Type:

tuple, optional

Parameters:

Attributes (see) –

make_file_query_str(distinct=True, **kwargs)[source]

Wrapper for base method make_query_str()

Parameters:
  • distinct (bool) – return unique files

  • **kwargs – update request attributes (e.g. lon_range=(30, 60))

Returns:

SQL file request command for current specs

Return type:

str

make_query_str(what=None, distinct=True, **kwargs)[source]

Translate current class state into SQL query command string

Parameters:
  • what (str or tuple, optional) – what columns to retrieve (e.g. comp_name for all variables) from table specified. Defaults to None, in which case “filename” is used

  • distinct (bool) – return unique files

  • **kwargs – update request attributes (e.g. lon_range=(30, 60))

Returns:

SQL file request command for current specs

Return type:

str

update([E, ]**F) None.  Update D from mapping/iterable E and F.[source]

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

class pyaerocom.io.ebas_varinfo.EbasVarInfo(var_name: str, init: bool = True, **kwargs)[source]

Interface for mapping between EBAS variable information and AeroCom

For more information about EBAS variable and data information see EBAS website.

var_name

AeroCom variable name

Type:

str

component

list of EBAS variable / component names that are mapped to var_name

Type:

list

matrix

list of EBAS matrix values that are accepted, default is None, i.e. all available matrices are used

Type:

list, optional

instrument

list of all instruments that are accepted for this variable

Type:

list, optional

requires

for variables that are computed and not directly available in EBAS. Provided as list of (AeroCom) variables that are required to compute var_name (e.g. for sc550dryaer this would be [sc550aer,scrh]).

Type:

list, optional

scale_factor

multiplicative scale factor that is applied in order to convert EBAS variable into AeroCom variable (e.g. 1.4 for conversion of EBAS OC measurement to AeroCom concoa variable)

Type:

float, optional

Parameters:
  • var_name (str) – AeroCom variable name

  • init (bool) – if True, EBAS configuration for input variable is retrieved from data file ebas_config.ini (if possible)

  • **kwargs – additional keyword arguments (currently not used)

static PROVIDES_VARIABLES() list[str][source]

List specifying provided variables

instrument

list of instrument names (EBAS side, optional)

make_sql_request(**constraints) EbasSQLRequest[source]

Create an SQL request for the specifications in this object

Parameters:

constraints – request constraints deviating from default. For details on parameters see EbasSQLRequest

Returns:

the SQL request object that can be used to retrieve corresponding file names using instance of EbasFileIndex.get_file_names().

Return type:

EbasSQLRequest

make_sql_requests(**constraints) list[EbasSQLRequest][source]

Create a list of SQL requests for the specifications in this object

Parameters:
  • requests (dict, optional) – other SQL requests linked to this one (e.g. if this variable requires)

  • constraints – request constraints deviating from default. For details on parameters see EbasSQLRequest

Returns:

list of EbasSQLRequest instances for this component and potential required components.

Return type:

list

matrix

list of matrix names (EBAS side, optional)

static open_config()[source]

Open ebas_config.ini file with ConfigParser

Return type:

ConfigParser

parse_from_ini(var_name: str, conf_reader: ConfigParser | None = None)[source]

Parse EBAS info for input AeroCom variable (works also for aliases)

Parameters:
  • var_name (str) – AeroCom variable name

  • conf_reader (ConfigParser) – open config parser object

Raises:

VarNotAvailableError – if variable is not supported

Returns:

True, if default could be loaded, False if not

Return type:

bool

requires

list of additional variable required for retrieval of this variable

scale_factor

scale factor for conversion to Aerocom units

statistics

list containing variable statistics info (EBAS side, optional)

to_dict() dict[source]

Convert into dictionary

property var_name_aerocom: str

Variable name in AeroCom convention

EEA data

EEA base reader

Reader for European air pollution data from EEA AqERep files.

Interface for reading EEA AqERep files (formerly known as Airbase data).

class pyaerocom.io.read_eea_aqerep_base.ReadEEAAQEREPBase(data_id=None, data_dir=None)[source]

Class for reading EEA AQErep data

Extended class derived from low-level base class ReadUngriddedBase that contains some more functionality.

Note

Currently only single variable reading into an UngriddedData object is supported.

ALTITUDENAME = 'altitude'

name of altitude variable in metadata file

AUX_FUNS = {'concNno': NotImplementedError(), 'concNno2': NotImplementedError(), 'concSso2': NotImplementedError(), 'vmrno2': NotImplementedError(), 'vmro3': NotImplementedError(), 'vmro3max': NotImplementedError()}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {'concNno': ['concno'], 'concNno2': ['concno2'], 'concSso2': ['concso2'], 'vmrno2': ['concno2'], 'vmro3': ['conco3'], 'vmro3max': ['conco3']}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

CONV_FACTOR = {'concNno': 0.466788868521913, 'concNno2': 0.3044517868011477, 'concSso2': 0.50052292274792, 'vmrno2': 0.514, 'vmro3': 0.493, 'vmro3max': 0.493}
CONV_UNIT = {'concNno': 'µgN/m3', 'concNno2': 'µgN/m3', 'concSso2': 'µgS/m3', 'vmrno2': 'ppb', 'vmro3': 'ppb', 'vmro3max': 'ppb'}
property DATASET_NAME

Name of the dataset

DATA_ID = ''

Name of the dataset (OBS_ID)

DATA_PRODUCT = ''
DEFAULT_METADATA_FILE = 'metadata.csv'
property DEFAULT_VARS

List of default variables

END_TIME_NAME = 'datetimeend'

filed name of the end time of the measurement (in lower case)

FILE_COL_DELIM = ','

Column delimiter

FILE_MASKS = {'concNno': '**/??_38_*_timeseries.csv*', 'concNno2': '**/??_8_*_timeseries.csv*', 'concSso2': '**/??_1_*_timeseries.csv*', 'concco': '**/??_10_*_timeseries.csv*', 'concno': '**/??_38_*_timeseries.csv*', 'concno2': '**/??_8_*_timeseries.csv*', 'conco3': '**/??_7_*_timeseries.csv*', 'concpm10': '**/??_5_*_timeseries.csv*', 'concpm25': '**/??_6001_*_timeseries.csv*', 'concso2': '**/??_1_*_timeseries.csv*', 'vmrno2': '**/??_8_*_timeseries.csv*', 'vmro3': '**/??_7_*_timeseries.csv*', 'vmro3max': '**/??_7_*_timeseries.csv*'}

file masks for the data files

INSTRUMENT_NAME = 'unknown'

there’s no general instrument name in the data

LATITUDENAME = 'latitude'

Name of latitude variable in metadata file

LONGITUDENAME = 'longitude'

name of longitude variable in metadata file

MAX_LINES_TO_READ = 8784
NAN_VAL = {}

Dictionary specifying values corresponding to invalid measurements there’s no value for NaNs in this data set. It uses an empty string

PROVIDES_VARIABLES = ['concso2', 'conco3', 'concno2', 'concco', 'concno', 'concpm10', 'concpm25', 'vmro3', 'vmro3max', 'vmrno2', 'concSso2', 'concNno', 'concNno2']

List of variables that are provided by this dataset (will be extended by auxiliary variables on class init, for details see __init__ method of base class ReadUngriddedBase)

START_TIME_NAME = 'datetimebegin'

field name of the start time of the measurement (in lower case)

SUPPORTED_DATASETS = ['']

List of all datasets supported by this interface

TS_TYPE = 'variable'

There is no global ts_type but it is specified in the data files…

TS_TYPES_FILE = {'day': 'daily', 'hour': 'hourly'}

sampling frequencies found in data files

VAR_CODES = {'1': 'concso2', '10': 'concco', '38': 'concno', '5': 'concpm10', '6001': 'concpm25', '7': 'conco3', '8': 'concno2'}

dictionary that connects the EEA variable codes with aerocom variable names

VAR_CODE_NAME = 'airpollutantcode'

column name that holds the EEA variable code

VAR_NAMES_FILE = {'concNno': 'concentration', 'concNno2': 'concentration', 'concSso2': 'concentration', 'concco': 'concentration', 'concno': 'concentration', 'concno2': 'concentration', 'conco3': 'concentration', 'concpm10': 'concentration', 'concpm25': 'concentration', 'concso2': 'concentration', 'vmrno2': 'concentration', 'vmro3': 'concentration', 'vmro3max': 'concentration'}
VAR_UNITS_FILE = {'mg/m3': 'mg m-3', 'ppb': 'ppb', 'µg/m3': 'ug m-3', 'µgN/m3': 'ug N m-3', 'µgS/m3': 'ug S m-3'}

units of variables in files (needs to be defined for each variable supported)

WEBSITE = 'https://discomap.eea.europa.eu/map/fme/AirQualityExport.htm'

this class reads the European Environment Agency’s Eionet data for details please read https://www.eea.europa.eu/about-us/countries-and-eionet

get_file_list(pattern=None)[source]

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters:

pattern (str, optional) – file name pattern applied to search

Returns:

list containing retrieved file locations

Return type:

list

Raises:

IOError – if no files can be found

get_station_coords(meta_key)[source]

get a station’s coordinates

Parameters:

meta_key (str) – string with the internal station key

read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, metadatafile=None)[source]

Method that reads list of files as instance of UngriddedData

Parameters:
  • vars_to_retrieve (list or similar, optional) – List containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded.

  • files (list, optional) – List of files to be read. If None, then the file list used is the returned from get_file_list().

  • first_file (int, optional) – Index of the first file in :obj:’file’ to be read. If None, the very first file in the list is used.

  • last_file (int, optional) – Index of the last file in :obj:’file’ to be read. If None, the very last file in the list is used.

  • metadatafile (:obj:'str', optional) – full qualified path to metadata file. If None, the default metadata file will be used

Returns:

data object

Return type:

UngriddedData

read_file(filename, var_name, vars_as_series=False)[source]

Read a single EEA file

Note that there’s only a single variable in the file

Parameters:
  • filename (str) – Absolute path to filename to read.

  • var_name (str) – Name of variable in file.

  • vars_as_series (bool) – If True, the data columns of all variables in the result dictionary are converted into pandas Series objects.

Returns:

Dict-like object containing the results.

Return type:

StationData

EEA E2a product (NRT)

Near realtime EEA data.

Interface for reading EEA AqERep files (formerly known as Airbase data).

class pyaerocom.io.read_eea_aqerep.ReadEEAAQEREP(data_id=None, data_dir=None)[source]

Class for reading EEA AQErep data

Extended class derived from low-level base class :class: ReadUngriddedBase that contains the main functionality.

DATA_ID = 'EEAAQeRep.NRT'

Name of the dataset (OBS_ID)

DATA_PRODUCT = 'E2a'
SUPPORTED_DATASETS = ['EEAAQeRep.NRT']

List of all datasets supported by this interface

EEA E1a product (QC)

Quality controlled EEA data.

Interface for reading EEA AqERep files (formerly known as Airbase data).

class pyaerocom.io.read_eea_aqerep_v2.ReadEEAAQEREP_V2(data_id=None, data_dir=None)[source]

Class for reading EEA AQErep data

Extended class derived from low-level base class :class: ReadUngriddedBase that contains the main functionality.

DATA_ID = 'EEAAQeRep.v2'

Name of the dataset (OBS_ID)

DATA_PRODUCT = 'E1a'
SUPPORTED_DATASETS = ['EEAAQeRep.v2']

List of all datasets supported by this interface

AirNow data

Reader for air quality measurements from North America.

class pyaerocom.io.read_airnow.ReadAirNow(data_id=None, data_dir=None)[source]

Reading routine for North-American Air Now observations

BASEYEAR = 2000
DATA_ID = 'AirNow'

Name of dataset (OBS_ID)

DEFAULT_VARS = ['concbc', 'concpm10', 'concpm25', 'vmrco', 'vmrnh3', 'vmrno', 'vmrno2', 'vmrnox', 'vmrnoy', 'vmro3', 'vmrso2']

Default variables

FILE_COL_DELIM = '|'

Column delimiter

FILE_COL_NAMES = ['date', 'time', 'station_id', 'station_name', 'time_zone', 'variable', 'unit', 'value', 'institute']

Columns in data files

FILE_COL_ROW_NUMBER = 9
PROVIDES_VARIABLES = ['concbc', 'concpm10', 'concpm25', 'vmrco', 'vmrnh3', 'vmrno', 'vmrno2', 'vmrnox', 'vmrnoy', 'vmro3', 'vmrso2']

List of variables that are provided

REPLACE_STATNAME = {'&': 'and', "'": '', '.': ' ', '/': ' ', ':': ' '}
ROW_VAR_COL = 5
STATION_META_DTYPES = {'address': <class 'str'>, 'altitude': <class 'float'>, 'area_classification': <class 'str'>, 'city': <class 'str'>, 'comment': <class 'str'>, 'latitude': <class 'float'>, 'longitude': <class 'float'>, 'modificationdate': <class 'str'>, 'station_classification': <class 'str'>, 'station_id': <class 'str'>, 'station_name': <class 'str'>, 'timezone': <class 'str'>}

conversion functions for metadata dtypes

STATION_META_MAP = {'address': 'address', 'aqsid': 'station_id', 'city': 'city', 'comment': 'comment', 'elevation': 'altitude', 'environment': 'area_classification', 'lat': 'latitude', 'lon': 'longitude', 'modificationdate': 'modificationdate', 'name': 'station_name', 'populationclass': 'station_classification', 'timezone': 'timezone'}

Mapping of columns in station metadata file to pyaerocom standard

STAT_METADATA_FILENAME = 'allStations_20191224.csv'

file containing station metadata

SUPPORTED_DATASETS = ['AirNow']

List of all datasets supported by this interface

TS_TYPE = 'hourly'

Frequency of measurements

UNIT_MAP = {'C': 'celcius', 'M/S': 'm s-1', 'MILLIBAR': 'mbar', 'MM': 'mm', 'PERCENT': '%', 'PPB': 'ppb', 'PPM': 'ppm', 'UG/M3': 'ug m-3', 'WATTS/M2': 'W m-2'}

Units found in data files

VAR_MAP = {'concbc': 'BC', 'concpm10': 'PM10', 'concpm25': 'PM2.5', 'vmrco': 'CO', 'vmrnh3': 'NH3', 'vmrno': 'NO', 'vmrno2': 'NO2', 'vmrnox': 'NOX', 'vmrnoy': 'NOY', 'vmro3': 'OZONE', 'vmrso2': 'SO2'}

Variable names in data files

get_all_file_encodings(filename)[source]
get_file_bom_encoding(filename)[source]
get_file_encoding(filename)[source]
get_file_list()[source]

Retrieve list of data files

Return type:

list

read(vars_to_retrieve=None, first_file=None, last_file=None)[source]

Read variable data

Parameters:
  • vars_to_retrieve (str or list, optional) – List of variables to be retrieved. The default is None.

  • first_file (int, optional) – Index of first file to be read. The default is None, in which case index 0 in file list is used.

  • last_file (int, optional) – Index of last file to be read. The default is None, in which case last index in file list is used.

Returns:

data – loaded data object.

Return type:

UngriddedData

read_file(filename, vars_to_retrieve=None)[source]

This method is returns just the raw content of a file as a dict

Parameters:
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list, optional) – list of str with variable names to read. If None, use DEFAULT_VARS

  • vars_as_series (bool) – if True, the data columns of all variables in the result dictionary are converted into pandas Series objects

Returns:

dict-like object containing results

Return type:

StationData

Raises:

NotImplementedError

property station_metadata

Dictionary containing global metadata for each site

MarcoPolo data

Reader for air quality measurements for China from the EU-FP7 project MarcoPolo.

GHOST

GHOST (Globally Harmonised Observational Surface Treatment) project developed at the Earth Sciences Department of the Barcelona Supercomputing Center (see e.g., Petetin et al., 2020 for more information).

Further I/O features

Note

The pyaerocom.io package also includes all relevant data import and reading routines. These are introduced above, in Section reading.

AeroCom database browser

class pyaerocom.io.aerocom_browser.AerocomBrowser(*args, **kwargs)[source]

Interface for browsing all Aerocom data direcories

Note

Use browse() to find directories matching a certain search pattern. The class methods find_matches() and find_dir() both use browse(), the only difference is, that the find_matches() adds the search result (a list with strings) to

property dirs_found

All directories that were found

find_data_dir(name_or_pattern, ignorecase=True)[source]

Find match of input name or pattern in Aerocom database

Parameters:
  • name_or_pattern (str) – name or pattern of data (can be model or obs data)

  • ignorecase (bool) – if True, upper / lower case is ignored

Returns:

data directory of match

Return type:

str

Raises:

DataSearchError – if no matches or no unique match can be found

find_matches(name_or_pattern, ignorecase=True)[source]

Search all Aerocom data directories that match input name or pattern

Parameters:
  • name_or_pattern (str) – name or pattern of data (can be model or obs data)

  • ignorecase (bool) – if True, upper / lower case is ignored

Returns:

list of names that match the pattern (corresponding paths can be accessed from this class instance)

Return type:

list

Raises:

DataSearchError – if no matches can be found

property ids_found

All data IDs that were found

File naming conventions

class pyaerocom.io.fileconventions.FileConventionRead(name='aerocom3', file_sep='_', year_pos=None, var_pos=None, ts_pos=None, vert_pos=None, data_id_pos=None, from_file=None)[source]

Class that represents a file naming convention for reading Aerocom files

name

name of this convention (e.g. “aerocom3”)

Type:

str

file_sep

filename delimiter for accessing different variables

Type:

str

year_pos

position of year information in filename after splitting using delimiter file_sep

Type:

int

var_pos

position of variable information in filename after splitting using delimiter file_sep

Type:

int

ts_pos

position of information of temporal resolution in filename after splitting using delimiter file_sep

Type:

int

vert_pos

position of information about vertical resolution of data

Type:

int

data_id_pos

position of data ID

Type:

int

AEROCOM3_VERT_INFO = {'2d': ['surface', 'column', 'modellevel', '2d'], '3d': ['modellevelatstations']}
check_validity(file)[source]

Check if filename is valid

from_dict(new_vals)[source]

Load info from dictionary

Parameters:

new_vals (dict) – dictionary containing information

Return type:

self

from_file(file)[source]

Identify convention from a file

Currently only two conventions (aerocom2 and aerocom3) exist that are identified by the delimiter used.

Parameters:

file (str) – file path or file name

Returns:

this object (with updated convention)

Return type:

FileConventionRead

Raises:

FileConventionError – if convention cannot be identified

Example

>>> from pyaerocom.io import FileConventionRead
>>> filename = 'aerocom3_CAM5.3-Oslo_AP3-CTRL2016-PD_od550aer_Column_2010_monthly.nc'
>>> print(FileConventionRead().from_file(filename))
pyaeorocom FileConventionRead
name: aerocom3
file_sep: _
year_pos: -2
var_pos: -4
ts_pos: -1
get_info_from_file(file: str) dict[source]

Identify convention from a file

Currently only two conventions (aerocom2 and aerocom3) exist that are identified by the delimiter used.

Parameters:

file (str) – file path or file name

Returns:

dictionary containing keys year, var_name, ts_type and corresponding variables, extracted from the filename

Return type:

dict

Raises:

FileConventionError – if convention cannot be identified

Example

>>> from pyaerocom.io import FileConventionRead
>>> filename = 'aerocom3_CAM5.3-Oslo_AP3-CTRL2016-PD_od550aer_Column_2010_monthly.nc'
>>> conv = FileConventionRead("aerocom3")
>>> info = conv.get_info_from_file(filename)
>>> for item in info.items(): print(item)
('year', 2010)
('var_name', 'od550aer')
('ts_type', 'monthly')
import_default(name: str)[source]

Checks and load default information from database

property info_init

Empty dictionary containing init values of infos to be extracted from filenames

string_mask(data_id, var, year, ts_type, vert_which=None)[source]

Returns mask that can be used to identify files of this convention

Parameters:
  • data_id (str) – experiment ID (e.g. GISS-MATRIX.A2.CTRL)

  • var (str) – variable string ID (e.g. “od550aer”)

  • year (int) – desired year of observation (e.g. 2012)

  • ts_type (str) – string specifying temporal resolution (e.g. “daily”)

Example

conf_aero2 = FileConventionRead(name=”aerocom2”) conf_aero3 = FileConventionRead(name=”aerocom2”)

var = od550aer year = 2012 ts_type = “daily”

match_str_aero2 = conf_aero2.string_mask(var, year, ts_type)

match_str_aero3 = conf_aero3.string_mask(var, year, ts_type)

to_dict()[source]

Convert this object to ordered dictionary

Iris helpers

Module containing helper functions related to iris I/O methods. These contain reading of Cubes, and some methods to perform quality checks of the data, e.g.

  1. checking and correction of time definition

  2. number and length of dimension coordinates must match data array

  3. Longitude definition from -180 to 180 (corrected if defined on 0 -> 360 intervall)

pyaerocom.io.iris_io.check_and_regrid_lons_cube(cube)[source]

Checks and corrects for if longitudes of grid are 0 -> 360

Note

This method checks if the maximum of the current longitudes array exceeds 180. Thus, it is not recommended to use this function after subsetting a cube, rather, it should be checked directly when the file is loaded (cf. load_input())

Parameters:

cube (iris.cube.Cube) – gridded data loaded as iris.Cube

Returns:

True, if longitudes were on 0 -> 360 and have been rolled, else False

Return type:

bool

pyaerocom.io.iris_io.check_dim_coord_names_cube(cube)[source]
pyaerocom.io.iris_io.check_dim_coords_cube(cube)[source]

Checks, and if necessary and applicable, updates coords names in Cube

Parameters:

cube (iris.cube.Cube) – input cube

Returns:

updated or unchanged cube

Return type:

iris.cube.Cube

pyaerocom.io.iris_io.check_time_coord(cube, ts_type, year)[source]

Method that checks the time coordinate of an iris Cube

This method checks if the time dimension of a cube is accessible and according to the standard (i.e. fully usable). It only checks, and does not correct. For the latter, please see correct_time_coord().

Parameters:
  • cube (Cube) – cube containing data

  • ts_type (str) – pyaerocom ts_type

  • year – year of data

Returns:

True, if time dimension is ok, False if not

Return type:

bool

pyaerocom.io.iris_io.concatenate_iris_cubes(cubes, error_on_mismatch=True)[source]

Concatenate list of iris.Cube instances cubes into single Cube

Helper method for concatenating list of cubes

This method is not supposed to be called directly but rather concatenate_cubes() (which ALWAYS returns instance of Cube or raises Exception) or concatenate_possible_cubes() (which ALWAYS returns instance of CubeList or raises Exception)

Parameters:
  • cubes (CubeList or list(Cubes)) – list of individual cubes

  • error_on_mismatch – boolean specifying whether an Exception is supposed to be raised or not

Returns:

result of concatenation

Return type:

Cube

Raises:

iris.exceptions.ConcatenateError – if error_on_mismatch=True and input cubes could not all concatenated into a single instance of iris.Cube class.

pyaerocom.io.iris_io.correct_time_coord(cube, ts_type, year)[source]

Method that corrects the time coordinate of an iris Cube

Parameters:
  • cube (Cube) – cube containing data

  • ts_type (TsType or str) – temporal resolution of data (e.g. “hourly”, “daily”). This information is e.g. encoded in the filename of a NetCDF file and may be accessed using pyaerocom.io.FileConventionRead

  • year (int) – integer specifying start year, e.g. 2017

Returns:

the same instance of the input cube with corrected time dimension axis

Return type:

Cube

pyaerocom.io.iris_io.get_coord_names_cube(cube)[source]
pyaerocom.io.iris_io.get_dim_names_cube(cube)[source]
pyaerocom.io.iris_io.load_cube_custom(file, var_name=None, file_convention=None, perform_fmt_checks=None)[source]

Load netcdf file as iris.Cube

Parameters:
  • file (str) – netcdf file

  • var_name (str) – name of variable to read

  • quality_check (bool) – if True, then a quality check of data is performed against the information provided in the filename

  • file_convention (FileConventionRead, optional) – Aerocom file convention. If provided, then the data content (e.g. dimension definitions) is tested against definition in file name

  • perform_fmt_checks (bool) – if True, additional quality checks (and corrections) are (attempted to be) performed.

Returns:

loaded data as Cube

Return type:

iris.cube.Cube

pyaerocom.io.iris_io.load_cubes_custom(files, var_name=None, file_convention=None, perform_fmt_checks=True)[source]

Load multiple NetCDF files into CubeList

Note

This function does not apply any concatenation or merging of the variable data in the individual files, it only loads the files into individual instances of iris.cube.Cube, which can be accessed via the returned list.

Parameters:
  • files (list) – list of netcdf file paths

  • var_name (str) – name of variable to be imported from input files.

  • file_convention (FileConventionRead, optional) – Aerocom file convention. If provided, then the data content (e.g. dimension definitions) is tested against definition in file name

  • perform_fmt_checks (bool) – if True, additional quality checks (and corrections) are (attempted to be) performed.

Returns:

  • list – loaded cube instances.

  • list – list containing all files from which the input variable could be successfully loaded.

pyaerocom.io.aux_read_cubes.add_cubes(cube1, cube2)[source]

Method to add cubes from 2 gridded data objects

pyaerocom.io.aux_read_cubes.apply_rh_thresh_cubes(cube, rh_cube, rh_max=None)[source]

Method that applies a low RH filter to input cube

pyaerocom.io.aux_read_cubes.compute_angstrom_coeff_cubes(cube1, cube2, lambda1=None, lambda2=None)[source]

Compute Angstrom coefficient cube based on 2 optical densitiy cubes

Parameters:
  • cube1 (iris.cube.Cube) – AOD at wavelength 1

  • cube2 (iris.cube.Cube) – AOD at wavelength 2

  • lambda1 (float) – wavelength 1

  • 2 (lambda) – wavelength 2

Returns:

Cube containing Angstrom exponent(s)

Return type:

Cube

pyaerocom.io.aux_read_cubes.conc_from_vmr(cube, ts, ps)[source]
pyaerocom.io.aux_read_cubes.conc_from_vmr_STP(cube)[source]
pyaerocom.io.aux_read_cubes.divide_cubes(cube1, cube2)[source]

Method to divide 2 cubes with each other

pyaerocom.io.aux_read_cubes.lifetime_from_load_and_dep(load, wetdep, drydep)[source]

Compute lifetime from load and wet and dry deposition

pyaerocom.io.aux_read_cubes.merge_meta_cubes(cube1, cube2)[source]
pyaerocom.io.aux_read_cubes.mmr_from_vmr(cube)[source]

Convvert gas volume/mole mixing ratios into mass mixing ratios.

Parameters:

cube (iris.cube.Cube) – A cube containing gas vmr data to be converted into mmr.

Returns:

cube_out – Cube containing mmr data.

Return type:

iris.cube.Cube

pyaerocom.io.aux_read_cubes.mmr_to_vmr_cube(data)[source]

Convert cube containing MMR data to VMR

Parameters:

data (iris.Cube or GriddedData) – input data object containing MMR data for a certain variable. Needs to have var_name attr. assigned and valid MMR AeroCom variable name (e.g. mmro3, mmrno2)

Raises:

AttributeError – if attr. var_name of input data does not start with mmr

Returns:

cube containing mixing ratios expressed as VMR in units of nmole mole-1

Return type:

iris.Cube

pyaerocom.io.aux_read_cubes.multiply_cubes(cube1, cube2)[source]

Method to multiply 2 cubes

pyaerocom.io.aux_read_cubes.rho_from_ts_ps(ts, ps)[source]
pyaerocom.io.aux_read_cubes.subtract_cubes(cube1, cube2)[source]

Method to subtract 1 cube from another

Handling of cached ungridded data objects

Caching class for reading and writing of ungridded data Cache objects

class pyaerocom.io.cachehandler_ungridded.CacheHandlerUngridded(reader=None, cache_dir=None, **kwargs)[source]

Interface for reading and writing of cache files

Cache filename mask is

<data_id>_<var>.pkl

e.g. EBASMC_scatc550aer.pkl

reader

reading class for dataset

Type:

ReadUngriddedBase

loaded_data

dictionary containing successfully loaded instances of single variable UngriddedData objects (keys are variable names)

Type:

dict

CACHE_HEAD_KEYS = ['pyaerocom_version', 'newest_file_in_read_dir', 'newest_file_date_in_read_dir', 'data_revision', 'reader_version', 'ungridded_data_version', 'cacher_version']

Cache file header keys that are checked (and required unchanged) when reading a cache file

property cache_dir

Directory where cache data objects are stored

cache_meta_info()[source]

Dictionary containing relevant caching meta-info

check_and_load(var_or_file_name, force_use_outdated=False, cache_dir=None)[source]

Check if cache file exists and load

Note

If a cache file exists for this database, but cannot be loaded or is outdated against pyaerocom updates, then it will be removed (the latter only if pyaerocom.const.RM_CACHE_OUTDATED is True).

Parameters:
  • var_or_file_name (str) – name of output filename or variable that is supposed to be stored. Default usage is to provide variable and then default_file_name() is used. Can be None if input data contains only a single variable.ead

  • force_use_outdated (bool) – if True, read existing cache file even if it is not up to date or pyaerocom version changed (not recommended to use)

  • cache_dir (str, optional) – output directory (default is pyaerocom cache dir accessed via cache_dir()).

Returns:

True, if cache file exists and could be successfully loaded, else False. Note: if import is successful, the corresponding data object (instance of pyaerocom.UngriddedData can be accessed via :attr:`loaded_data’

Return type:

bool

Raises:

TypeError – if cached file is not an instance of pyaerocom.UngriddedData class (which should not happen)

property data_id

Data ID of the associated dataset

default_file_name(var_name)[source]

File name of cache file

Parameters:

var_name (str) – name of variable to be cached.

Returns:

file name of pickle file

Return type:

str

file_path(var_or_file_name, cache_dir=None)[source]

File path of cache file

Parameters:
  • var_or_file_name (str) – name of output filename or variable that is supposed to be stored. Default usage is to provide variable and then default_file_name() is used. Can be None if input data contains only a single variable.

  • cache_dir (str, optional) – output directory (default is pyaerocom cache dir accessed via cache_dir()).

Returns:

output file path

Return type:

str

property reader

Instance of reader class

property src_data_dir

Data source directory of the associated dataset

Needed to check whether an existing cache file is outdated

write(data, var_or_file_name=None, cache_dir=None)[source]

Write single-variable instance of UngriddedData to cache

Parameters:
  • data (UngriddedData) – object containing the data (possibly containing multiple variables)

  • var_or_file_name (str, optional) – name of output filename or variable that is supposed to be stored. Default usage is to provide variable and then default_file_name() is used. Can be None if input data contains only a single variable.

  • cache_dir (str, optional) – output directory (default is pyaerocom cache dir accessed via cache_dir()).

Returns:

output file path

Return type:

str

pyaerocom.io.cachehandler_ungridded.list_cache_files() Iterator[Path][source]

List all pickled data objects in cache directory

If not set differently, the cache directory is the pyaerocom default, accessible via pyaerocom.const.CACHEDIR.

I/O utils

High level I/O utility methods for pyaerocom

pyaerocom.io.utils.browse_database(model_or_obs, verbose=False)[source]

Browse Aerocom database using model or obs ID (or wildcard)

Searches database for matches and prints information about all matches found (e.g. available variables, years, etc.)

Parameters:
  • model_or_obs (str) – model or obs ID or search pattern

  • verbose (bool) – if True, verbosity level will be set to debug, else to critical

Returns:

list with data_ids of all matches

Return type:

list

Example

>>> import pyaerocom as pya
>>> pya.io.browse_database('AATSR*ORAC*v4*')
Pyaerocom ReadGridded
---------------------
Model ID: AATSR_ORAC_v4.01
Data directory: /lustre/storeA/project/aerocom/aerocom-users-database/CCI-Aerosol/CCI_AEROSOL_Phase2/AATSR_ORAC_v4.01/renamed
Available variables: ['abs550aer', 'ang4487aer', 'clt', 'landseamask', 'od550aer', 'od550dust', 'od550gt1aer', 'od550lt1aer', 'pixelcount']
Available years: [2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012]
Available time resolutions ['daily']
pyaerocom.io.utils.get_ungridded_reader(obs_id)[source]

I/O helpers

I/O helper methods of the pyaerocom package

pyaerocom.io.helpers.COUNTRY_CODE_FILE = 'country_codes.json'

country code file name will be prepended with the path later on

pyaerocom.io.helpers.add_file_to_log(filepath, err_msg)[source]

Add input file path to error logdir

The logdir location can be accessed via pyaerocom.const.LOGFILESDIR

Parameters:
  • filepath (str or Path) – path of file that has an error

  • err_msg (str) – Problem associated with input file

pyaerocom.io.helpers.aerocom_savename(data_id, var_name, vert_code, year, ts_type)[source]

Generate filename in AeroCom conventions

ToDo: complete docstring

pyaerocom.io.helpers.get_all_supported_ids_ungridded()[source]

Get list of datasets that are supported by ReadUngridded

Returns:

list with supported network names

Return type:

list

pyaerocom.io.helpers.get_country_name_from_iso(iso_code: str | None = None, filename: str | Path | None = None, return_as_dict: bool = False)[source]

get the country name from the 2 digit iso country code

the underlaying json file was taken from this github repository https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes

Parameters:
  • iso_code (str) – string containing the 2 character iso code of the country (e.g. no for Norway)

  • filename (str , optional) – optional string with the json file to read

  • return_as_dict (bool, optional) – flag to get the entire list of countries as a dictionary with the country codes as keys and the country names as value Useful if you have to get the names for a lot of country codes

Returns:

  • string with country name or dictionary with iso codes as keys and the country names as values

  • empty string if the country code was not found

Raises:

ValueError – if the country code ins invalid

pyaerocom.io.helpers.get_metadata_from_filename(filename)[source]

Try access metadata information from filename

pyaerocom.io.helpers.get_obsnetwork_dir(obs_id)[source]

Returns data path for obsnetwork ID

Parameters:

obs_id (str) – ID of obsnetwork (e.g. AeronetSunV2Lev2.daily)

Returns:

corresponding directory from pyaerocom.const

Return type:

str

Raises:
pyaerocom.io.helpers.get_standard_name(var_name)[source]

Get standard name of aerocom variable

Parameters:

var_name (str) – HTAP2 variable name

Returns:

corresponding standard name

Return type:

str

Raises:
pyaerocom.io.helpers.read_ebas_flags_file(ebas_flags_csv)[source]

Reads file ebas_flags.csv

Parameters:

ebas_flags_csv (str) – file containing flag info

Returns:

dict with loaded flag info

Return type:

dict

pyaerocom.io.helpers.search_data_dir_aerocom(name_or_pattern, ignorecase=True)[source]

Search Aerocom data directory based on model / data ID

Metadata and vocabulary standards

class pyaerocom.metastandards.AerocomDataID(data_id=None, **meta_info)[source]

Class representing a model data ID following AeroCom PhaseIII conventions

The ID must contain 4 substrings with meta parameters:

<ModelName>-<MeteoConfigSpecifier>_<ExperimentName>-<PerturbationName>

E.g.

NorESM2-met2010_CTRL-AP3

For more information see AeroCom diagnostics spreadsheet

This interface can be used to make sure a provided data ID is following this convention and to extract the corresponding meta parameters as dictionary (to_dict()) or to create an data_id from the corresponding meta parameters from_dict().

DELIM = '_'
KEYS = ['model_name', 'meteo', 'experiment', 'perturbation']
SUBDELIM = '-'
property data_id

str AeroCom data ID

static from_dict(meta)[source]

Create instance of AerocomDataID from input meta dictionary

Parameters:

meta (dict) – dictionary containing required keys (cf. KEYS) and corresponding values to create an data_id

Raises:

KeyError – if not all information required is provided

Return type:

AerocomDataID

static from_values(values)[source]

Create data_id from list of values

Note

The values have to be in the right order, cf. KEYS

Parameters:

values (list) – list containing values for each key in KEYS

Raises:

ValueError – if length of input list mismatches length of KEYS

Returns:

generated data_id

Return type:

str

to_dict()[source]

Convert data_id to dictionary

Returns:

dictionary with metadata information

Return type:

dict

property values
class pyaerocom.metastandards.DataSource(**info)[source]

Dict-like object defining a data source

data_id

name (or ID) of dataset (e.g. AeronetSunV3Lev2.daily)

dataset_name

name of dataset (e.g. AERONET)

data_product

data product (e.g. SDA, Inv, Sun for Aeronet)

data_version

version of data (e.g. 3)

data_level

level of data (e.g. 2)

framework

ID of framework to which data is associated (e.g. ACTRIS, GAW)

Type:

str

instr_vert_loc

Vertical location of measuring instrument(s).

Type:

str

revision_date

last revision date of dataset

ts_type_src

sampling frequency as defined in data files (use None if undefined)

stat_merge_pref_attr

optional, a metadata attribute that is available in data and that is used to order the individual stations by relevance in case overlaps occur. The associated values of this attribute need to be sortable (e.g. revision_date). This is only relevant in case overlaps occur.

Type:

str

SUPPORTED_VERT_LOCS = ['ground', 'space', 'airborne']
property data_dir

Directory containing data files

dataset_str()[source]
load_dataset_info()[source]

Wrapper for _parse_source_info_from_ini()

class pyaerocom.metastandards.StationMetaData(**info)[source]

This object defines a standard for station metadata in pyaerocom

Variable names associated with meta data can vary significantly between different conventions (e.g. conventions in modellers community vs. observations community).

Note

  • This object is a dictionary and can be easily expanded

  • In many cases, only some of the attributes are relevant

filename

name of file (may be full path or only filename)

Type:

str

station_id

Code or unique ID of station

Type:

str

station_name

name or ID of a station. Note, that the concept of a station in pyaerocom is not necessarily related to a fixed coordinate. A station can also be a satellite, ship, or a human walking around and measuring something

Type:

str

instrument_name

name (or ID) of instrument

Type:

str

PI

principal investigator

Type:

str

country

string specifying country (or country ID)

Type:

str

ts_type

frequency of data (e.g. monthly). Note the difference between ts_type_src of DataSource, which specifies the freq. of the original files.

Type:

str

latitude

latitude coordinate

Type:

float

longitude

longitude coordinate

Type:

float

altitude

altitude coordinate

Type:

float

Variables

Variable collection

class pyaerocom.variable.Variable(var_name=None, init=True, cfg=None, **kwargs)[source]

Interface that specifies default settings for a variable

See variables.ini file for an overview of currently available default variables.

Parameters:
  • var_name (str) – string ID of variable (see file variables.ini for valid IDs)

  • init (bool) – if True, input variable name is attempted to be read from config file

  • cfg (ConfigParser) – open config parser that holds the information in config file available (i.e. ConfigParser.read() has been called with config file as input)

  • **kwargs – any valid class attribute (e.g. map_vmin, map_vmax, …)

var_name

input variable name

Type:

str

var_name_aerocom

AEROCOM variable name (see e.g. AEROCOM protocol for a list of available variables)

Type:

str

is_3d

flag that indicates if variable is 3D

Type:

bool

is_dry

flag that is set based on filename that indicates if variable data corresponds to dry conditions.

Type:

bool

units

unit of variable (None if no unit)

Type:

str

default_vert_code

default vertical code to be loaded (i.e. Column, ModelLevel, Surface). Only relevant during reading and in case conflicts occur (e.g. abs550aer, 2010, Column and Surface files)

Type:

str, optional

aliases

list of alternative names for this variable

Type:

list

minimum

lower limit of allowed value range

Type:

float

upper_limit

upper limit of allowed value range

Type:

float

obs_wavelength_tol_nm

wavelength tolerance (+/-) for reading of obsdata. Default is 10, i.e. if this variable is defined at 550 nm and obsdata contains measured values of this quantity within interval of 540 - 560, then these data is used

Type:

float

scat_xlim

x-range for scatter plot

Type:

float

scat_ylim

y-range for scatter plot

Type:

float

scat_loglog

scatter plot on loglog scale

Type:

bool

scat_scale_factor

scale factor for scatter plot

Type:

float

map_cmap

name of default colormap (matplotlib) of this variable.

Type:

str

map_vmin

data value corresponding to lower end of colormap in map plots of this quantity

Type:

float

map_vmax

data value corresponding to upper end of colormap in map plots of this quantity

Type:

float

map_c_under

color used for values below map_vmin in map plots of this quantity

Type:

str

map_c_over

color used for values exceeding map_vmax in map plots of this quantity

Type:

str

map_cbar_levels

levels of colorbar

Type:

list, optional

map_cbar_ticks

colorbar ticks

Type:

list, optional

ALT_NAMES = {'unit': 'units'}
VMAX_DEFAULT = inf
VMIN_DEFAULT = -inf
property aliases

Alias variable names that are frequently found or used

Returns:

list containing valid aliases

Return type:

list

get_cmap()[source]

Get cmap str for var

Return type:

str

get_cmap_bins(infer_if_missing=True)[source]

Get cmap discretisation bins

Parameters:

infer_if_missing (bool) – if True and map_cbar_levels is not defined, try to infer using _cmap_bins_from_vmin_vmax().

Raises:

AttributeError – if unavailable

Returns:

levels

Return type:

list

get_default_vert_code()[source]

Get default vertical code for variable name

property has_unit

Boolean specifying whether variable has unit

property is_3d

True if str ‘3d’ is contained in var_name_input

property is_alias
property is_at_dry_conditions

Indicate whether variable denotes dry conditions

property is_deposition

Indicates whether input variables is a deposition rate

Note

This funtion only identifies wet and dry deposition based on the variable names, there might be other variables that are deposition variables but cannot be identified by this function.

Parameters:

var_name (str) – Name of variable to be checked

Returns:

If True, then variable name denotes a deposition variables

Return type:

bool

property is_emission

Indicates whether input variables is an emission rate

Note

This funtion only identifies wet and dry deposition based on the variable names, there might be other variables that are deposition variables but cannot be identified by this function.

Parameters:

var_name (str) – Name of variable to be checked

Returns:

If True, then variable name denotes a deposition variables

Return type:

bool

property is_rate

Indicates whether variable name is a rate

Rates include e.g. deposition or emission rate variables but also precipitation

Returns:

True if variable is rate, else False

Return type:

bool

property is_wavelength_dependent

Indicates whether this variable is wavelength dependent

keys()[source]
literal_eval_list()
property long_name

Wrapper for description

property lower_limit

Old attribute name for minimum (following HTAP2 defs)

parse_from_ini(var_name=None, cfg=None)[source]

Import information about default region

Parameters:
  • var_name (str) – variable name

  • var_name_alt (str) – alternative variable name that is used if variable name is not available

  • cfg (ConfigParser) – open config parser object

Returns:

True, if default could be loaded, False if not

Return type:

bool

property plot_info

Dictionary containing plot information

plot_info_keys = ['scat_xlim', 'scat_ylim', 'scat_loglog', 'scat_scale_factor', 'map_vmin', 'map_vmax', 'map_cmap', 'map_c_under', 'map_c_over', 'map_cbar_levels', 'map_cbar_ticks']
static read_config()[source]
str2bool()
str2list()
property unit

Unit of variable (old name, deprecated)

property unit_str

string representation of unit

update(**kwargs)[source]
property upper_limit

Old attribute name for maximum (following HTAP2 defs)

property var_name_aerocom

AeroCom variable name of the input variable

property var_name_info
property var_name_input

Input variable

Variable class

class pyaerocom.variable.Variable(var_name=None, init=True, cfg=None, **kwargs)[source]

Interface that specifies default settings for a variable

See variables.ini file for an overview of currently available default variables.

Parameters:
  • var_name (str) – string ID of variable (see file variables.ini for valid IDs)

  • init (bool) – if True, input variable name is attempted to be read from config file

  • cfg (ConfigParser) – open config parser that holds the information in config file available (i.e. ConfigParser.read() has been called with config file as input)

  • **kwargs – any valid class attribute (e.g. map_vmin, map_vmax, …)

var_name

input variable name

Type:

str

var_name_aerocom

AEROCOM variable name (see e.g. AEROCOM protocol for a list of available variables)

Type:

str

is_3d

flag that indicates if variable is 3D

Type:

bool

is_dry

flag that is set based on filename that indicates if variable data corresponds to dry conditions.

Type:

bool

units

unit of variable (None if no unit)

Type:

str

default_vert_code

default vertical code to be loaded (i.e. Column, ModelLevel, Surface). Only relevant during reading and in case conflicts occur (e.g. abs550aer, 2010, Column and Surface files)

Type:

str, optional

aliases

list of alternative names for this variable

Type:

list

minimum

lower limit of allowed value range

Type:

float

upper_limit

upper limit of allowed value range

Type:

float

obs_wavelength_tol_nm

wavelength tolerance (+/-) for reading of obsdata. Default is 10, i.e. if this variable is defined at 550 nm and obsdata contains measured values of this quantity within interval of 540 - 560, then these data is used

Type:

float

scat_xlim

x-range for scatter plot

Type:

float

scat_ylim

y-range for scatter plot

Type:

float

scat_loglog

scatter plot on loglog scale

Type:

bool

scat_scale_factor

scale factor for scatter plot

Type:

float

map_cmap

name of default colormap (matplotlib) of this variable.

Type:

str

map_vmin

data value corresponding to lower end of colormap in map plots of this quantity

Type:

float

map_vmax

data value corresponding to upper end of colormap in map plots of this quantity

Type:

float

map_c_under

color used for values below map_vmin in map plots of this quantity

Type:

str

map_c_over

color used for values exceeding map_vmax in map plots of this quantity

Type:

str

map_cbar_levels

levels of colorbar

Type:

list, optional

map_cbar_ticks

colorbar ticks

Type:

list, optional

ALT_NAMES = {'unit': 'units'}
VMAX_DEFAULT = inf
VMIN_DEFAULT = -inf
property aliases

Alias variable names that are frequently found or used

Returns:

list containing valid aliases

Return type:

list

get_cmap()[source]

Get cmap str for var

Return type:

str

get_cmap_bins(infer_if_missing=True)[source]

Get cmap discretisation bins

Parameters:

infer_if_missing (bool) – if True and map_cbar_levels is not defined, try to infer using _cmap_bins_from_vmin_vmax().

Raises:

AttributeError – if unavailable

Returns:

levels

Return type:

list

get_default_vert_code()[source]

Get default vertical code for variable name

property has_unit

Boolean specifying whether variable has unit

property is_3d

True if str ‘3d’ is contained in var_name_input

property is_alias
property is_at_dry_conditions

Indicate whether variable denotes dry conditions

property is_deposition

Indicates whether input variables is a deposition rate

Note

This funtion only identifies wet and dry deposition based on the variable names, there might be other variables that are deposition variables but cannot be identified by this function.

Parameters:

var_name (str) – Name of variable to be checked

Returns:

If True, then variable name denotes a deposition variables

Return type:

bool

property is_emission

Indicates whether input variables is an emission rate

Note

This funtion only identifies wet and dry deposition based on the variable names, there might be other variables that are deposition variables but cannot be identified by this function.

Parameters:

var_name (str) – Name of variable to be checked

Returns:

If True, then variable name denotes a deposition variables

Return type:

bool

property is_rate

Indicates whether variable name is a rate

Rates include e.g. deposition or emission rate variables but also precipitation

Returns:

True if variable is rate, else False

Return type:

bool

property is_wavelength_dependent

Indicates whether this variable is wavelength dependent

keys()[source]
literal_eval_list()
property long_name

Wrapper for description

property lower_limit

Old attribute name for minimum (following HTAP2 defs)

parse_from_ini(var_name=None, cfg=None)[source]

Import information about default region

Parameters:
  • var_name (str) – variable name

  • var_name_alt (str) – alternative variable name that is used if variable name is not available

  • cfg (ConfigParser) – open config parser object

Returns:

True, if default could be loaded, False if not

Return type:

bool

property plot_info

Dictionary containing plot information

plot_info_keys = ['scat_xlim', 'scat_ylim', 'scat_loglog', 'scat_scale_factor', 'map_vmin', 'map_vmax', 'map_cmap', 'map_c_under', 'map_c_over', 'map_cbar_levels', 'map_cbar_ticks']
static read_config()[source]
str2bool()
str2list()
property unit

Unit of variable (old name, deprecated)

property unit_str

string representation of unit

update(**kwargs)[source]
property upper_limit

Old attribute name for maximum (following HTAP2 defs)

property var_name_aerocom

AeroCom variable name of the input variable

property var_name_info
property var_name_input

Input variable

Variable helpers

pyaerocom.variable_helpers.get_aliases(var_name: str, parser: ConfigParser | None = None)[source]

Get aliases for a certain variable

pyaerocom.variable_helpers.get_variable(var_name: str)[source]

Get a certain variable

Parameters:

var_name (str) – variable name

Return type:

Variable

pyaerocom.variable_helpers.parse_aliases_ini()[source]

Returns instance of ConfigParser to access information

pyaerocom.variable_helpers.parse_variables_ini(fpath: str | Path | None = None)[source]

Returns instance of ConfigParser to access information

Variable name info

class pyaerocom.varnameinfo.VarNameInfo(var_name)[source]

This class can be used to retrieve information from variable names

DEFAULT_VERT_CODE_PATTERNS = {'abs*': 'Column', 'ang*': 'Column', 'dry*': 'Surface', 'emi*': 'Surface', 'load*': 'Column', 'od*': 'Column', 'wet*': 'Surface'}
PATTERNS = {'od': 'od\\d+aer'}
property contains_numbers

Boolean specifying whether this variable name contains numbers

property contains_wavelength_nm

Boolean specifying whether this variable contains a certain wavelength

get_default_vert_code()[source]

Get default vertical code for variable name

in_wavelength_range(low, high)[source]

Boolean specifying whether variable is within wavelength range

Parameters:
  • low (float) – lower end of wavelength range to be tested

  • high (float) – upper end of wavelength range to be tested

Returns:

True, if this variable is wavelength dependent and if the wavelength that is inferred from the filename is within the specified input range

Return type:

bool

property is_wavelength_dependent

Boolean specifying whether this variable name is wavelength dependent

translate_to_wavelength(to_wavelength)[source]

Create new variable name at a different wavelength

Parameters:

to_wavelength (float) – new wavelength in nm

Returns:

new variable name

Return type:

VarNameInfo

property wavelength_nm

Wavelength in nm (if appliable)

Helpers for auxiliary variables

pyaerocom.aux_var_helpers.calc_abs550aer(data)[source]

Compute AOD at 550 nm using Angstrom coefficient and 500 nm AOD

Parameters:

data (dict-like) – data object containing imported results

Returns:

AOD(s) at shifted wavelength

Return type:

float or ndarray

pyaerocom.aux_var_helpers.calc_ang4487aer(data)[source]

Compute Angstrom coefficient (440-870nm) from 440 and 870 nm AODs

Parameters:

data (dict-like) – data object containing imported results

Note

Requires the following two variables to be available in provided data object:

  1. od440aer

  2. od870aer

Raises:

AttributError – if either ‘od440aer’ or ‘od870aer’ are not available in data object

Returns:

array containing computed angstrom coefficients

Return type:

ndarray

pyaerocom.aux_var_helpers.calc_od550aer(data)[source]

Compute AOD at 550 nm using Angstrom coefficient and 500 nm AOD

Parameters:

data (dict-like) – data object containing imported results

Returns:

AOD(s) at shifted wavelength

Return type:

float or ndarray

pyaerocom.aux_var_helpers.calc_od550gt1aer(data)[source]

Compute coarse mode AOD at 550 nm using Angstrom coeff. and 500 nm AOD

Parameters:

data (dict-like) – data object containing imported results

Returns:

AOD(s) at shifted wavelength

Return type:

float or ndarray

pyaerocom.aux_var_helpers.calc_od550lt1aer(data)[source]

Compute fine mode AOD at 550 nm using Angstrom coeff. and 500 nm AOD

Parameters:

data (dict-like) – data object containing imported results

Returns:

AOD(s) at shifted wavelength

Return type:

float or ndarray

pyaerocom.aux_var_helpers.calc_od550lt1ang(data)[source]
Compute AOD at 550 nm using Angstrom coeff. and 500 nm AOD,

that is filtered for angstrom coeff < 1 to get AOD representative of coarse particles.

Parameters:

data (dict-like) – data object containing imported results

Returns:

AOD(s) at shifted wavelength

Return type:

float or ndarray

pyaerocom.aux_var_helpers.calc_vmro3max(data)[source]
pyaerocom.aux_var_helpers.compute_ac550dryaer(data)[source]

Compute aerosol dry absorption coefficent applying RH threshold

Cf. _compute_dry_helper()

Parameters:

dict – data object containing scattering and RH data

Returns:

modified data object containing new column sc550dryaer

Return type:

dict

pyaerocom.aux_var_helpers.compute_ang4470dryaer_from_dry_scat(data)[source]

Compute angstrom exponent between 440 and 700 nm

Parameters:

dict (StationData or) – data containing dry scattering coefficients at 440 and 700 nm (i.e. keys sc440dryaer and sc700dryaer)

Returns:

extended data object containing angstrom exponent

Return type:

StationData or dict

pyaerocom.aux_var_helpers.compute_angstrom_coeff(od1, od2, lambda1, lambda2)[source]

Compute Angstrom coefficient based on 2 optical densities

Parameters:
  • od1 (float or ndarray) – AOD at wavelength 1

  • od2 (float or ndarray) – AOD at wavelength 2

  • lambda1 (float or ndarray) – wavelength 1

  • 2 (lambda) – wavelength 2

Returns:

Angstrom exponent(s)

Return type:

float or ndarray

pyaerocom.aux_var_helpers.compute_od_from_angstromexp(to_lambda, od_ref, lambda_ref, angstrom_coeff)[source]

Compute AOD at specified wavelength

Uses Angstrom coefficient and reference AOD to compute the corresponding wavelength shifted AOD

Parameters:
  • to_lambda (float or ndarray) – wavelength for which AOD is calculated

  • od_ref (float or ndarray) – reference AOD

  • lambda_ref (float or ndarray) – wavelength corresponding to reference AOD

  • angstrom_coeff (float or ndarray) – Angstrom coefficient

Returns:

AOD(s) at shifted wavelength

Return type:

float or ndarray

pyaerocom.aux_var_helpers.compute_sc440dryaer(data)[source]

Compute dry scattering coefficent applying RH threshold

Cf. _compute_dry_helper()

Parameters:

dict – data object containing scattering and RH data

Returns:

modified data object containing new column sc550dryaer

Return type:

dict

pyaerocom.aux_var_helpers.compute_sc550dryaer(data)[source]

Compute dry scattering coefficent applying RH threshold

Cf. _compute_dry_helper()

Parameters:

dict – data object containing scattering and RH data

Returns:

modified data object containing new column sc550dryaer

Return type:

dict

pyaerocom.aux_var_helpers.compute_sc700dryaer(data)[source]

Compute dry scattering coefficent applying RH threshold

Cf. _compute_dry_helper()

Parameters:

dict – data object containing scattering and RH data

Returns:

modified data object containing new column sc550dryaer

Return type:

dict

pyaerocom.aux_var_helpers.compute_wetna_from_concprcpna(data)[source]
pyaerocom.aux_var_helpers.compute_wetnh4_from_concprcpnh4(data)[source]
pyaerocom.aux_var_helpers.compute_wetno3_from_concprcpno3(data)[source]
pyaerocom.aux_var_helpers.compute_wetoxn_from_concprcpoxn(data)[source]

Compute wdep from conc in precip and precip data

Note

In addition to the returned numpy array, the input instance of StationData is modified by additional metadata and flags for the new variable. See also _compute_wdep_from_concprcp_helper().

Parameters:

StationData – data object containing concprcp and precip data

Returns:

array with wet deposition values

Return type:

numpy.ndarray

pyaerocom.aux_var_helpers.compute_wetoxs_from_concprcpoxs(data)[source]

Compute wdep from conc in precip and precip data

Note

In addition to the returned numpy array, the input instance of StationData is modified by additional metadata and flags for the new variable. See also _compute_wdep_from_concprcp_helper().

Parameters:

StationData – data object containing concprcp and precip data

Returns:

array with wet deposition values

Return type:

numpy.ndarray

pyaerocom.aux_var_helpers.compute_wetoxs_from_concprcpoxsc(data)[source]

Compute wdep from conc in precip and precip data

Note

In addition to the returned numpy array, the input instance of StationData is modified by additional metadata and flags for the new variable. See also _compute_wdep_from_concprcp_helper().

Parameters:

StationData – data object containing concprcp and precip data

Returns:

array with wet deposition values

Return type:

numpy.ndarray

pyaerocom.aux_var_helpers.compute_wetoxs_from_concprcpoxst(data)[source]

Compute wdep from conc in precip and precip data

Note

In addition to the returned numpy array, the input instance of StationData is modified by additional metadata and flags for the new variable. See also _compute_wdep_from_concprcp_helper().

Parameters:

StationData – data object containing concprcp and precip data

Returns:

array with wet deposition values

Return type:

numpy.ndarray

pyaerocom.aux_var_helpers.compute_wetrdn_from_concprcprdn(data)[source]

Compute wdep from conc in precip and precip data

Note

In addition to the returned numpy array, the input instance of StationData is modified by additional metadata and flags for the new variable. See also _compute_wdep_from_concprcp_helper().

Parameters:

StationData – data object containing concprcp and precip data

Returns:

array with wet deposition values

Return type:

numpy.ndarray

pyaerocom.aux_var_helpers.compute_wetso4_from_concprcpso4(data)[source]
pyaerocom.aux_var_helpers.concx_to_vmrx(data, p_pascal, T_kelvin, conc_unit, mmol_var, mmol_air=None, to_unit=None)[source]

Convert mass concentration to volume mixing ratio (vmr)

Parameters:
  • data (float or ndarray) – array containing vmr values

  • p_pascal (float) – pressure in Pa of input data

  • T_kelvin (float) – temperature in K of input data

  • vmr_unit (str) – unit of input data

  • mmol_var (float) – molar mass of variable represented by input data

  • mmol_air (float, optional) – Molar mass of air. Uses average density of dry air if None. The default is None.

  • to_unit (str, optional) – Unit to which output data is converted. If None, output unit is kg m-3. The default is None.

Returns:

input data converted to volume mixing ratio

Return type:

float or ndarray

pyaerocom.aux_var_helpers.identity(data)[source]
pyaerocom.aux_var_helpers.make_proxy_drydep_from_O3(data)[source]
pyaerocom.aux_var_helpers.make_proxy_wetdep_from_O3(data)[source]
pyaerocom.aux_var_helpers.vmrx_to_concx(data, p_pascal, T_kelvin, vmr_unit, mmol_var, mmol_air=None, to_unit=None)[source]

Convert volume mixing ratio (vmr) to mass concentration

Parameters:
  • data (float or ndarray) – array containing vmr values

  • p_pascal (float) – pressure in Pa of input data

  • T_kelvin (float) – temperature in K of input data

  • vmr_unit (str) – unit of input data

  • mmol_var (float) – molar mass of variable represented by input data

  • mmol_air (float, optional) – Molar mass of air. Uses average density of dry air if None. The default is None.

  • to_unit (str, optional) – Unit to which output data is converted. If None, output unit is kg m-3. The default is None.

Returns:

input data converted to mass concentration

Return type:

float or ndarray

Variable categorisations

Variable categorisation groups

These are needed in some cases to infer, e.g. units associated with variable names. Used in pyaerocom.variable.Variable to identify certain groups.

Note

The below definitions are far from complete

pyaerocom.var_groups.dep_add_vars = []

additional deposition rate variables (that do not start with wet* or dry*)

pyaerocom.var_groups.drydep_startswith = 'dry'

start string of dry deposition variables

pyaerocom.var_groups.emi_add_vars = []

additional emission rate variables (that do not start with emi*)

pyaerocom.var_groups.emi_startswith = 'emi'

start string of emission variables

pyaerocom.var_groups.totdep_startswith = 'dep'

start string of total deposition variables

pyaerocom.var_groups.wetdep_startswith = 'wet'

start string of wet deposition variables

Regions and data filtering

Region class and helper functions

This module contains functionality related to regions in pyaerocom

class pyaerocom.region.Region(region_id=None, **kwargs)[source]

Class specifying a region

region_id

ID of region (e.g. EUROPE)

Type:

str

name

name of region (e.g. Europe) used e.g. in plotting.

Type:

str

lon_range

longitude range (min, max) covered by region

Type:

list

lat_range

latitude range (min, max) covered by region

Type:

list

lon_range_plot

longitude range (min, max) used for plotting region.

Type:

list

lat_range_plot

latitude range (min, max) used for plotting region.

Type:

list

lon_ticks

list of longitude ticks used for plotting

Type:

list

lat_ticks

list of latitude ticks used for plotting

Type:

list

Parameters:
  • region_id (str) – ID of region (e.g. “EUROPE”). If the input region ID is registered as a default region in pyaerocom.region_defs, then the default information is automatically imported on class instantiation.

  • **kwargs – additional class attributes (see above for available default attributes). Note, any attr. values provided by kwargs are preferred over potentially defined default attrs. that are imported automatically.

property center_coordinate

Center coordinate of this region

contains_coordinate(lat, lon)[source]

Check if input lat/lon coordinate is contained in region

Parameters:
  • lat (float) – latitude of coordinate

  • lon (float) – longitude of coordinate

Returns:

True if coordinate is contained in this region, False if not

Return type:

bool

distance_to_center(lat, lon)[source]

Compute distance of input coordinate to center of this region

Parameters:
  • lat (float) – latitude of coordinate

  • lon (float) – longitude of coordinate

Returns:

distance in km

Return type:

float

get_mask_data()[source]
import_default(region_id)[source]

Import region definition

Parameters:

region_id (str) – ID of region

Raises:

KeyError – if no region is registered for the input ID

is_htap()[source]

Boolean specifying whether region is an HTAP binary region

mask_available()[source]
plot(ax=None)[source]

Plot this region

Draws a rectangle of the outer bounds of the region and if a binary mask is available for this region, it will be plotted as well.

Parameters:

ax (GeoAxes, optional) – axes instance to be used for plotting. Defaults to None in which case a new instance is created.

Returns:

axes instance used for plotting

Return type:

GeoAxes

plot_borders(ax, color, lw=2)[source]
plot_mask(ax, color, alpha=0.2)[source]
pyaerocom.region.all()[source]

Wrapper for get_all_default_region_ids()

pyaerocom.region.find_closest_region_coord(lat: float, lon: float, regions: dict | None = None, **kwargs) list[str][source]

Finds list of regions sorted by their center closest to input coordinate

Parameters:
  • lat (float) – latitude of coordinate

  • lon (float) – longitude of coordinate

  • regions (dict, optional) – dictionary containing instances of Region as values, which are considered. If None, then all default regions are used.

Returns:

sorted list of region IDs of identified regions

Return type:

list[str]

pyaerocom.region.get_all_default_region_ids()[source]

Get list containing IDs of all default regions

Returns:

IDs of all predefined default regions

Return type:

list

pyaerocom.region.get_all_default_regions()[source]

Get dictionary containing all default regions from region.ini file

Returns:

dictionary containing all default regions; keys are region ID’s, values are instances of Region.

Return type:

dict

pyaerocom.region.get_htap_regions()[source]

Load dictionary with HTAP regions

Returns:

keys are region ID’s, values are instances of Region

Return type:

dict

pyaerocom.region.get_old_aerocom_default_regions()[source]

Load dictionary with default AeroCom regions

Returns:

keys are region ID’s, values are instances of Region

Return type:

dict

pyaerocom.region.get_regions_coord(lat, lon, regions=None)[source]

Get the region that contains an input coordinate

Note

This does not yet include HTAP, since this causes troules in automated AeroCom processing

Parameters:
  • lat (float) – latitude of coordinate

  • lon (float) – longitude of coordinate

  • regions (dict, optional) – dictionary containing instances of Region as values, which are considered. If None, then all default regions are used.

Returns:

list of regions that contain this coordinate

Return type:

list

Region definitions

Definitions of rectangular regions used in pyaerocom

NOTE: replaces former regions.ini in pyaerocom/data dir

pyaerocom.region_defs.ALL_REGION_NAME: Final = 'ALL'

Name of region containing absolute all valid data points (WORLD in old aerocom notation)

Region filter

class pyaerocom.filter.Filter(name=None, region=None, altitude_filter=None, land_ocn=None, **kwargs)[source]

Class that can be used to filter gridded and ungridded data objects

Note

  • BETA version (currently being tested)

  • Can only filter spatially

  • Might be renamed to RegionFilter at some point in the future

ALTITUDE_FILTERS = {'noMOUNTAINS': [-1000000.0, 1000.0], 'wMOUNTAINS': None}

dictionary specifying altitude filters

LAND_OCN_FILTERS = ['LAND', 'OCN']
NO_ALTITUDE_FILTER_NAME = 'wMOUNTAINS'
NO_REGION_FILTER_NAME = 'ALL'
property alt_range

Altitude range of filter

apply(data_obj)[source]

Apply filter to data object

Parameters:

data_obj (UngriddedData, GriddedData) – input data object that is supposed to be filtered

Returns:

filtered data object

Return type:

UngriddedData, GriddedData

Raises:

IOError – if input is invalid

from_list(lst)[source]

Set filter name based on input list

property land_ocn
property lat_range

Latitude range of region

property lon_range

Longitude range of region

property name

Name of filter

String containing up to 3 substrings (delimited using dash -) containing: <region_id>-<altitude_filter>-<land_or_sea_only_info>

property region

Region associated with this filter (instance of Region)

property region_name

Name of region

property spl
to_dict()[source]

Convert filter to dictionary

property valid_alt_filter_codes

Valid codes for altitude filters

property valid_land_sea_filter_codes

Codes specifying land/sea filters

property valid_regions

Names of valid regions (AeroCom regions and HTAP regions)

Land / Sea masks

Helper methods for access of and working with land/sea masks. pyaerocom provides automatic access to HTAP land sea masks from this URL:

https://pyaerocom.met.no/pyaerocom-suppl

Filtering by these masks is implemented in Filter and all relevant data classes (i.e. GriddedData, UngriddedData, ColocatedData).

pyaerocom.helpers_landsea_masks.available_htap_masks()[source]

List of HTAP mask names

Returns:

Returns a list of available htap region masks.

Return type:

list

pyaerocom.helpers_landsea_masks.check_all_htap_available()[source]

Check for missing HTAP masks on local computer and download

pyaerocom.helpers_landsea_masks.download_htap_masks(regions_to_download=None)[source]

Download HTAP mask

URL: https://pyaerocom.met.no/pyaerocom-suppl.

Parameters:

regions_to_download (list) – List containing the regions to download.

Returns:

List of file paths that point to the mask files that were successfully downloaded

Return type:

list

Raises:
pyaerocom.helpers_landsea_masks.get_htap_mask_files(*region_ids)[source]

Get file paths to input HTAP regions

Parameters:

*region_ids – ID’s of regions for which mask files are supposed to be retrieved

Returns:

list of file paths for each input region

Return type:

list

Raises:
  • FileNotFoundError – if default local directory for storage of HTAP masks does not exist

  • NameError – if multiple mask files are found for the same region

pyaerocom.helpers_landsea_masks.get_lat_lon_range_mask_region(mask, latdim_name=None, londim_name=None)[source]

Get outer lat/lon rectangle of a binary mask

Parameters:
  • mask (xr.DataArray) – binary mask

  • latdim_name (str, optional) – Name of latitude dimension. The default is None, in which case lat is assumed.

  • londim_name (str, optional) – Name of longitude dimension. The default is None, in which case long is assumed.

Returns:

dictionary containing lat and lon ranges of the mask.

Return type:

dict

pyaerocom.helpers_landsea_masks.get_mask_value(lat, lon, mask)[source]

Get value of mask at input lat / lon position

Parameters:
Returns:

neirest neigbhour mask value to input lat lon

Return type:

float

pyaerocom.helpers_landsea_masks.load_region_mask_iris(*regions)[source]

Loads regional mask to iris.

Parameters:

region_id (str) – Chosen region.

Returns:

cube representing merged mask from input regions

Return type:

iris.cube.Cube

pyaerocom.helpers_landsea_masks.load_region_mask_xr(*regions)[source]

Load boolean mask for input regions (as xarray.DataArray)

Parameters:

*regions – regions that are supposed to be loaded and merged (just use string, no list or similar)

Returns:

boolean mask for input region(s)

Return type:

xarray.DataArray

Time and frequencies

Handling of time frequencies

General helper methods for the pyaerocom library.

class pyaerocom.tstype.TsType(val)[source]
FROM_PANDAS = {'AS': 'yearly', 'D': 'daily', 'H': 'hourly', 'MS': 'monthly', 'Q': 'season', 'T': 'minutely', 'W-MON': 'weekly'}
TOL_SECS_PERCENT = 5
TO_NUMPY = {'daily': 'D', 'hourly': 'h', 'minutely': 'm', 'monthly': 'M', 'weekly': 'W', 'yearly': 'Y'}
TO_PANDAS = {'daily': 'D', 'hourly': 'H', 'minutely': 'T', 'monthly': 'MS', 'season': 'Q', 'weekly': 'W-MON', 'yearly': 'AS'}
TO_SI = {'daily': 'd', 'hourly': 'h', 'minutely': 'min', 'monthly': 'month', 'weekly': 'week', 'yearly': 'yr'}
TSTR_TO_CF = {'daily': 'days', 'hourly': 'hours', 'monthly': 'days'}
TS_MAX_VALS = {'daily': 180, 'hourly': 168, 'minutely': 360, 'monthly': 120, 'weekly': 104}
VALID = ['minutely', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'native']
VALID_ITER = ['minutely', 'hourly', 'daily', 'weekly', 'monthly', 'yearly']
property base

Base string (without multiplication factor, cf mulfac)

property cf_base_unit

Convert ts_type str to CF convention time unit

check_match_total_seconds(total_seconds)[source]

Check if this object matches with input interval length in seconds

Parameters:

total_seconds (int or float) – interval length in units of seconds (e.g. 86400 for daily)

Return type:

bool

property datetime64_str

Convert ts_type str to datetime64 unit string

static from_total_seconds(total_seconds)[source]

Try to infer TsType based on interval length

Parameters:

total_seconds (int or float) – total number of seconds

Raises:

TemporalResolutionError – If no TsType can be inferred for input number of seconds

Return type:

TsType

get_min_num_obs(to_ts_type: TsType, min_num_obs: dict) int[source]
property mulfac

Multiplication factor of frequency

property next_higher

Next lower resolution code

property next_lower

Next lower resolution code

This will go to the next lower base resolution, that is if current is 3daily, it will return weekly, however, if current exceeds next lower base, it will iterate that base, that is, if current is 8daily, next lower will be 2weekly (and not 9daily).

property num_secs

Number of seconds in one period

Note

Be aware that for monthly frequency the number of seconds is not well defined!

property timedelta64_str

Convert ts_type str to datetime64 unit string

to_numpy_freq()[source]
to_pandas_freq()[source]

Convert ts_type to pandas frequency string

to_si()[source]

Convert to SI conform string (e.g. used for unit conversion)

to_timedelta64()[source]

Convert frequency to timedelta64 object

Can be used, e.g. as tolerance when reindexing pandas Series

Return type:

timedelta64

property tol_secs

Tolerance in seconds for current TsType

property val

Value of frequency (string type), e.g. 3daily

static valid(val)[source]

Temporal resampling

Module containing time resampling functionality

class pyaerocom.time_resampler.TimeResampler(input_data=None)[source]

Object that can be use to resample timeseries data

It supports hierarchical resampling of xarray.DataArray objects and pandas.Series objects.

Hierarchical means, that resampling constraints can be applied for each level, that is, if hourly data is to be resampled to monthly, it may be specified to first required minimum number of hours per day, and minimum days per month, to create the output data.

AGGRS_UNIT_PRESERVE = ('mean', 'median', 'std', 'max', 'min')
DEFAULT_HOW = 'mean'
property fun

Resamplig method (depends on input data type)

property input_data

Input data object that is to be resampled

property last_units_preserved

Boolean indicating if last resampling operation preserves units

resample(to_ts_type, input_data=None, from_ts_type=None, how=None, min_num_obs=None, **kwargs)[source]

Resample input data

Parameters:
  • to_ts_type (str or TsType) – output resolution

  • input_data (pandas.Series or xarray.DataArray) – data to be resampled

  • from_ts_type (str or TsType, optional) – current temporal resolution of data

  • how (str) – string specifying how the data is to be aggregated, default is mean

  • min_num_obs (dict or int, optinal) –

    integer or nested dictionary specifying minimum number of observations required to resample from higher to lower frequency. For instance, if input_data is hourly and to_ts_type is monthly, you may specify something like:

    min_num_obs =
        {'monthly'  :   {'daily'  : 7},
         'daily'    :   {'hourly' : 6}}
    

    to require at least 6 hours per day and 7 days per month.

  • **kwargs – additional input arguments passed to resampling method

Returns:

resampled data object

Return type:

pandas.Series or xarray.DataArray

Global constants

Definitions and helpers related to time conversion

Vertical coordinate support

Note

BETA: most functionality of this module is currently not implemented in any of the pyaerocom standard API.

Methods to convert different standards of vertical coordinates

For details see here:

http://cfconventions.org/Data/cf-conventions/cf-conventions-1.0/build/apd.html

Note

UNDER DEVELOPMENT -> NOT READY YET

class pyaerocom.vert_coords.AltitudeAccess(gridded_data)[source]
ADD_FILE_OPT = {'pres': ['temp']}
ADD_FILE_REQ = {'deltaz3d': ['ps']}

Additional variables that are required to compute altitude levels

ADD_FILE_VARS = ['z', 'z3d', 'pres', 'deltaz3d']

Additional variable names (in AEROCOM convention) that are used to search for additional files that can be used to access or compute the altitude levels at each grid point

check_altitude_access(**coord_info)[source]

Checks if altitude levels can be accessed

Parameters:

**coord_info – test coordinate specifications for extraction of 1D data object. Passed to extract_1D_subset_from_data().

Returns:

True, if altitude access is provided, else False

Return type:

bool

property coord_list

List of AeroCom coordinate names for altitude access

extract_1D_subset_from_data(**coord_info)[source]

Extract 1D subset containing only vertical coordinate dimension

Note

So far this Works only for 4D or 3D data that contains latitude and longitude dimension and a vertical coordinate, optionally also a time dimension.

The subset is extracted for a test coordinate (latitude, longitude) that may be specified optinally via coord_info.

Parameters:

**coord_info – optional test coordinate specifications for other than vertical dimension. For all dimensions that are not specified explicitely, the first available coordinate in data_obj is used.

get_altitude(latitude, longitude)[source]
property has_access

Boolean specifying whether altitudes can be accessed

Note

Performs access check using check_altitude_access() if access flag is False

property reader

Instance of ReadGridded

search_aux_coords(coord_list)[source]

Search and assign coordinates provided by input list

All coordinates that are found are assigned to this object and can be accessed via self[coord_name].

Parameters:

coord_list (list) – list containing AeroCom coordinate names

Returns:

True if all coordinates can be accessed, else False

Return type:

bool

Raises:

CoordinateNameError – if one of the input coordinate names is not supported by pyaerocom. See coords.ini file of pyaerocom for available coordinates.

class pyaerocom.vert_coords.VerticalCoordinate(name=None)[source]
CONVERSION_METHODS = {'ahspc': <function atmosphere_hybrid_sigma_pressure_coordinate_to_pressure>, 'asc': <function atmosphere_sigma_coordinate_to_pressure>, 'gph': <function geopotentialheight2altitude>}
CONVERSION_REQUIRES = {'ahspc': ['a', 'b', 'ps', 'p0'], 'asc': ['sigma', 'ps', 'ptop'], 'gph': []}
FUNS_YIELD = {'ahspc': 'air_pressure', 'asc': 'air_pressure', 'gph': 'altitude'}
NAMES_NOT_SUPPORTED = ['model_level_number']
NAMES_SUPPORTED = {'air_pressure': 'pres', 'altitude': 'z', 'atmosphere_hybrid_sigma_pressure_coordinate': 'ahspc', 'atmosphere_sigma_coordinate': 'asc', 'geopotential_height': 'gph'}
REGISTERED = ['altitude', 'air_pressure', 'geopotential_height', 'atmosphere_sigma_coordinate', 'atmosphere_hybrid_sigma_pressure_coordinate', 'model_level_number']

registered names

STANDARD_NAMES = {'ahspc': 'atmosphere_hybrid_sigma_pressure_coordinate', 'asc': 'atmosphere_sigma_coordinate', 'gph': 'geopotential_height', 'pres': 'air_pressure', 'z': 'altitude'}
calc_pressure(lev, **kwargs)[source]

Compute pressure levels for input vertical coordinate

Parameters:
  • vals – level values that are supposed to be converted into pressure

  • **kwargs – additional keyword args required for computation of pressure levels (cf. CONVERSION_METHODS and corresponding inputs for method available)

Returns:

pressure levels in Pa

Return type:

ndarray

property conversion_requires

Valid argument names for fun()

property conversion_supported

Boolean specifying whether a conversion scheme is provided

property fun

Function used to convert levels into pressure

property lev_increases_with_alt

Boolean specifying whether coordinate levels increase with altitude

Return type:

True

pressure2altitude(p, **kwargs)[source]

Convert pressure to altitude

Wrapper for method

property vars_supported_str
pyaerocom.vert_coords.atmosphere_hybrid_sigma_pressure_coordinate_to_pressure(a, b, ps, p0=None)[source]

Convert atmosphere_hybrid_sigma_pressure_coordinate to pressure in Pa

Formula:

Either

\[p(k) = a(k) \cdot p_0 + b(k) \cdot p_{surface}\]

or

\[p(k) = ap(k) + b(k) \cdot p_{surface}\]
Parameters:
  • a (ndarray) – sigma level values (a(k) in formula 1, and ap(k) in formula 2)

  • b (ndarray) – dimensionless fraction per level (must be same length as a)

  • ps (float) – surface pressure

  • p0 – reference pressure (only relevant for alternative formula 1)

Returns:

computed pressure levels in Pa (standard_name=air_pressure)

Return type:

ndarray

pyaerocom.vert_coords.atmosphere_sigma_coordinate_to_pressure(sigma, ps, ptop)[source]

Convert atmosphere sigma coordinate to pressure in Pa

Note

This formula only works at one lon lat coordinate and at one instant in time.

Formula:

\[p(k) = p_{top} + \sigma(k) \cdot (p_{surface} - p_{top})\]
Parameters:
  • sigma (ndarray or float) – sigma coordinate (1D) array

  • ps (float) – surface pressure

  • ptop (float) – ToA pressure

Returns:

computed pressure levels in Pa (standard_name=air_pressure)

Return type:

ndarray or float

pyaerocom.vert_coords.geopotentialheight2altitude(geopotential_height)[source]

Convert geopotential height in m to altitude in m

Note

This is a dummy function that returns the input, as the conversion is not yet implemented.

Parameters:

geopotential_height – input geopotential height values in m

Return type:

Computed altitude levels

pyaerocom.vert_coords.is_supported(standard_name)[source]

Checks if input coordinate standard name is supported by pyaerocom

Parameters:

standard_name (str) – standard name of vertical coordinate

Returns:

True, if this coordinate is supported, else False

Return type:

bool

pyaerocom.vert_coords.pressure2altitude(p, *args, **kwargs)[source]

General formula to convert atm. pressure to altitude

Wrapper method for geonum.atmosphere.pressure2altitude()

Formula:

\[h = h_{ref} + \frac{T_{ref}}{L} \left(\exp\left[-\frac{\ln\left(\frac{p}{p_{ref}} \right)}{\beta}\right] - 1\right) \quad [m]\]

where:

  • \($h_{ref}$\) is a reference altitude

  • \($T_{ref}$\) is a reference temperature

  • \($L$\) is the atmospheric lapse-rate (cf. L_STD_ATM, L_DRY_AIR)

  • \($p$\) is the pressure (cf. pressure())

  • \($p_{ref}$\) is a reference pressure

  • \($\beta$\) is computed using beta_exp()

Parameters:
  • p – pressure in Pa

  • *args – additional non-keyword args passed to geonum.atmosphere.pressure2altitude()

  • **kwargs – additional keyword args passed to geonum.atmosphere.pressure2altitude()

Return type:

altitudes in m corresponding to input pressure levels in defined atmosphere

Utility functions

pyaerocom.utils.create_varinfo_table(model_ids, vars_or_var_patterns, read_data=False, sort_by_cols=['Var', 'Model'])[source]

Create an info table for model list based on variables

The method iterates over all models in model_list and creates an instance of ReadGridded. Variable matches are searched based on input list vars_or_var_patterns (you may also use wildcards to specify a family of variables) and for each match the information below is collected. The search also includes variables that are not directly available in the model data but can be computed from other available variables. That is, all variables that are defined in ReadGridded.AUX_REQUIRES.

The output table (DataFrame) then consists of the following columns:

  • Var: variable name

  • Model: model name

  • Years: available years

  • Freq: frequency

  • Vertical: information about vertical dimension (inferred from Aerocom file name)

  • At stations: data is at stations (inferred from filename)

  • AUX vars: Auxiliary variable required to compute Var (col 1). Only relevant for variables that are computed by the interface

  • Dim: number of dimensions (only retrieved if read_data is True)

  • Dim names: names of dimension coordinates (only retrieved if read_data is True)

  • Shape: Shape of data (only retrieved if read_data is True)

  • Read ok: reading was successful (only retrieved if read_data is True)

Parameters:
  • model_ids (list) – list of model ids to be analysed (can also be string -> single model)

  • vars_or_var_patterns (list) – list of variables or variable patterns to be analysed (can also be string -> single variable or variable family)

  • read_data (bool) – if True, more information about the imported data will be available in the table (e.g. no. of dimensions, names of dimension coords) but the routine will run longer since the data is imported

  • sort_by_cols (list) – column sort order (use header names in listing above). Defaults to [‘Var’, ‘Model’]

Returns:

dataframe including result table (ready to be saved as csv or other tabular format or to be displayed in a jupyter notebook)

Return type:

pandas.DataFrame

Example

>>> from pyaerocom import create_varinfo_table
>>> models = ['INCA-BCext_CTRL2016-PD',
              'GEOS5-freegcm_CTRL2016-PD']
>>> vars = ['ang4487aer', 'od550aer', 'ec*']
>>> df = create_varinfo_table(models, vars)
>>> print(df)
pyaerocom.utils.print_file(path: Path | str)[source]

Helpers

General helper methods for the pyaerocom library.

pyaerocom.helpers.calc_climatology(s, start, stop, min_count=None, set_year=None, resample_how='mean')[source]

Compute climatological timeseries from pandas.Series

Parameters:
  • s (pandas.Series) – time series data

  • start (numpy.datetime64 or similar) – start time of data used to compute climatology

  • stop (numpy.datetime64 or similar) – start time of data used to compute climatology

  • mincount_month (int, optional) – minimum number of observations required per aggregated month in climatological interval. Months not meeting this requirement will be set to NaN.

  • set_year (int, optional) – if specified, the output data will be assigned the input year. Else the middle year of the climatological interval is used.

  • resample_how (str) – string specifying how the climatological timeseries is to be aggregated

Returns:

dataframe containing climatological timeseries as well as columns std and count

Return type:

DataFrame

pyaerocom.helpers.cftime_to_datetime64(times, cfunit=None, calendar=None)[source]

Convert numerical timestamps with epoch to numpy datetime64

This method was designed to enhance the performance of datetime conversions and is based on the corresponding information provided in the cftime package (see here). Particularly, this object does, what the num2date() therein does, but faster, in case the time stamps are not defined on a non standard calendar.

Parameters:
  • times (list or ndarray or iris.coords.DimCoord) – array containing numerical time stamps (relative to basedate of cfunit). Can also be a single number.

  • cfunit (str or Unit, optional) – CF unit string (e.g. day since 2018-01-01 00:00:00.00000000 UTC) or unit. Required if times is not an instance of iris.coords.DimCoord

  • calendar (str, optional) – string specifying calendar (only required if cfunit is of type str).

Returns:

numpy array containing timestamps as datetime64 objects

Return type:

ndarray

Raises:

ValueError – if cfunit is str and calendar is not provided or invalid, or if the cfunit string is invalid

Example

>>> cfunit_str = 'day since 2018-01-01 00:00:00.00000000 UTC'
>>> cftime_to_datetime64(10, cfunit_str, "gregorian")
array(['2018-01-11T00:00:00.000000'], dtype='datetime64[us]')
pyaerocom.helpers.check_coord_circular(coord_vals, modulus, rtol=1e-05)[source]

Check circularity of coordinate

Parameters:
  • coord_vals (list or ndarray) – values of coordinate to be tested

  • modulus (float or int) – modulus of coordinate (e.g. 360 for longitude)

  • rtol (float) – relative tolerance

Returns:

True if circularity is given, else False

Return type:

bool

Raises:

ValueError – if circularity is given and results in overlap (right end of input array is mapped to a value larger than the first one at the left end of the array)

pyaerocom.helpers.copy_coords_cube(to_cube, from_cube, inplace=True)[source]

Copy all coordinates from one cube to another

Requires the underlying data to be the same shape.

Warning

This operation will delete all existing coordinates and auxiliary coordinates and will then copy the ones from the input data object. No checks of any kind will be performed

Parameters:
  • to_cube

  • other (GriddedData or Cube) – other data object (needs to be same shape as this object)

Returns:

data object containing coordinates from other object

Return type:

GriddedData

pyaerocom.helpers.datetime2str(time, ts_type=None)[source]
pyaerocom.helpers.delete_all_coords_cube(cube, inplace=True)[source]

Delete all coordinates of an iris cube

Parameters:
  • cube (iris.cube.Cube) – input cube that is supposed to be cleared of coordinates

  • inplace (bool) – if True, then the coordinates are deleted in the input object, else in a copy of it

Returns:

input cube without coordinates

Return type:

iris.cube.Cube

pyaerocom.helpers.extract_latlon_dataarray(arr, lat, lon, lat_dimname=None, lon_dimname=None, method='nearest', new_index_name=None, check_domain=True)[source]

Extract individual lat / lon coordinates from DataArray

Parameters:
  • arr (DataArray) – data (must contain lat and lon dimensions)

  • lat (array or similar) – 1D array containing latitude coordinates

  • lon (array or similar) – 1D array containing longitude coordinates

  • lat_dimname (str, optional) – name of latitude dimension in input data (if None, it assumes standard name)

  • lon_dimname (str, optional) – name of longitude dimension in input data (if None, it assumes standard name)

  • method (str) – how to interpolate to input coordinates (defaults to nearest neighbour)

  • new_index_name (str, optional) – name of flattend latlon dimension (defaults to latlon)

  • check_domain (bool) – if True, lat/lon domain of datarray is checked and all input coordinates that are outside of the domain are ignored.

Returns:

data at input coordinates

Return type:

DataArray

pyaerocom.helpers.get_constraint(lon_range=None, lat_range=None, time_range=None, meridian_centre=True)[source]

Function that creates an iris.Constraint based on input

Note

Please be aware of the definition of the longitudes in your data when cropping within the longitude dimension. The longitudes in your data may be defined either from -180 <= lon <= 180 (pyaerocom standard) or from 0 <= lon <= 360. In the former case (-180 -> 180) you can leave the additional input parameter meridian_centre=True (default).

Parameters:
  • lon_range (tuple, optional) – 2-element tuple containing longitude range for cropping Example input to crop around meridian: lon_range=(-30, 30)

  • lat_range (tuple, optional) – 2-element tuple containing latitude range for cropping.

  • time_range (tuple, optional) –

    2-element tuple containing time range for cropping. Allowed data types for specifying the times are

    1. a combination of 2 pandas.Timestamp instances or

    2. a combination of two strings that can be directly converted into pandas.Timestamp instances (e.g. time_range=(“2010-1-1”, “2012-1-1”)) or

    3. directly a combination of indices (int).

  • meridian_centre (bool) – specifies the coordinate definition range of longitude array. If True, then -180 -> 180 is assumed, else 0 -> 360

Returns:

the combined constraint from all valid input parameters

Return type:

iris.Constraint

pyaerocom.helpers.get_highest_resolution(ts_type, *ts_types)[source]

Get the highest resolution from several ts_type codes

Parameters:
  • ts_type (str) – first ts_type

  • *ts_types – one or more additional ts_type codes

Returns:

the ts_type that corresponds to the highest resolution

Return type:

str

Raises:

ValueError – if one of the input ts_type codes is not supported

pyaerocom.helpers.get_lat_rng_constraint(low, high)[source]

Create latitude constraint based on input range

Parameters:
  • low (float or int) – lower latitude coordinate

  • high (float or int) – upper latitude coordinate

Returns:

the corresponding iris.Constraint instance

Return type:

iris.Constraint

pyaerocom.helpers.get_lon_rng_constraint(low, high, meridian_centre=True)[source]

Create longitude constraint based on input range

Parameters:
  • low (float or int) – left longitude coordinate

  • high (float or int) – right longitude coordinate

  • meridian_centre (bool) – specifies the coordinate definition range of longitude array of the data to be cropped. If True, then -180 -> 180 is assumed, else 0 -> 360

Returns:

the corresponding iris.Constraint instance

Return type:

iris.Constraint

Raises:
  • ValueError – if first coordinate in lon_range equals or exceeds second

  • LongitudeConstraintError – if the input implies cropping over border of longitude array (e.g. 160 -> - 160 if -180 <= lon <= 180).

pyaerocom.helpers.get_lowest_resolution(ts_type, *ts_types)[source]

Get the lowest resolution from several ts_type codes

Parameters:
  • ts_type (str) – first ts_type

  • *ts_types – one or more additional ts_type codes

Returns:

the ts_type that corresponds to the lowest resolution

Return type:

str

Raises:

ValueError – if one of the input ts_type codes is not supported

pyaerocom.helpers.get_max_period_range(periods)[source]
pyaerocom.helpers.get_standard_name(var_name)[source]

Converts AeroCom variable name to CF standard name

Also handles alias names for variables, etc. or strings corresponding to older conventions (e.g. names containing 3D).

Parameters:

var_name (str) – AeroCom variable name

Returns:

corresponding standard name

Return type:

str

pyaerocom.helpers.get_standard_unit(var_name)[source]

Gets standard unit of AeroCom variable

Also handles alias names for variables, etc. or strings corresponding to older conventions (e.g. names containing 3D).

Parameters:

var_name (str) – AeroCom variable name

Returns:

corresponding standard unit

Return type:

str

pyaerocom.helpers.get_time_rng_constraint(start, stop)[source]

Create iris.Constraint for data extraction along time axis

Parameters:
  • start (Timestamp or :obj:` str`) – start time of desired subset. If string, it must be convertible into pandas.Timestamp (e.g. “2012-1-1”)

  • stop (Timestamp or :obj:` str`) – start time of desired subset. If string, it must be convertible into pandas.Timestamp (e.g. “2012-1-1”)

Returns:

iris Constraint instance that can, e.g., be used as input for pyaerocom.griddeddata.GriddedData.extract()

Return type:

iris.Constraint

pyaerocom.helpers.get_tot_number_of_seconds(ts_type, dtime=None)[source]

Get total no. of seconds for a given frequency

Parameters:
  • ts_type (str or TsType) – frequency for which number of seconds is supposed to be retrieved

  • dtime (TYPE, optional) – DESCRIPTION. The default is None.

Raises:

AttributeError – DESCRIPTION.

Returns:

DESCRIPTION.

Return type:

TYPE

pyaerocom.helpers.infer_time_resolution(time_stamps, dt_tol_percent=5, minfrac_most_common=0.8)[source]

Infer time resolution based on input time-stamps

Calculates time difference dt between consecutive timestamps provided via input array or list. Then it counts the most common dt (e.g. 86400 s for daily). Before inferring the frequency it then checks all other dts occurring in the input array to see if they are within a certain interval around the most common one (e.g. +/- 5% as default, via arg dt_tol_percent), that is, 86390 would be included if most common dt is 86400 s but not 80000s. Then it checks if the number of dts that are within that tolerance level around the most common dt exceed a certain fraction (arg minfrac_most_common) of the total number of dts that occur in the input array (default is 80%). If that is the case, the most common frequency is attempted to be derived using TsType.from_total_seconds() based on the most common dt (in this example that would be daily).

Parameters:
  • time_stamps (pandas.DatetimeIndex, or similar) – list of time stamps

  • dt_tol_percent (int) – tolerance in percent of accepted range of time diffs with respect to most common time difference.

  • minfrac_most_common (float) – minimum required fraction of time diffs that have to be equal to, or within tolerance range, the most common time difference.

Raises:

TemporalResolutionError – if frequency cannot be derived.

Returns:

inferred frequency

Return type:

str

pyaerocom.helpers.is_year(val)[source]

Check if input is / may be year

Parameters:

val – input that is supposed to be checked

Returns:

True if input is a number between -2000 and 10000, else False

Return type:

bool

pyaerocom.helpers.isnumeric(val)[source]

Check if input value is numeric

Parameters:

val – input value to be checked

Returns:

True, if input value corresponds to a range, else False.

Return type:

bool

pyaerocom.helpers.isrange(val)[source]

Check if input value corresponds to a range

Checks if input is list, or array or tuple with 2 entries, or alternatively a slice that has defined start and stop and has set step to None.

Note

No check is performed, whether first entry is smaller than second entry if all requirements for a range are fulfilled.

Parameters:

val – input value to be checked

Returns:

True, if input value corresponds to a range, else False.

Return type:

bool

pyaerocom.helpers.lists_to_tuple_list(*lists)[source]

Convert input lists (of same length) into list of tuples

e.g. input 2 lists of latitude and longitude coords, output one list with tuple coordinates at each index

pyaerocom.helpers.make_datetime_index(start, stop, freq)[source]

Make pandas.DatetimeIndex for input specs

Note

If input frequency is specified in PANDAS_RESAMPLE_OFFSETS, an offset will be added (e.g. 15 days for monthly data).

Parameters:
  • start – start time. Preferably as pandas.Timestamp, else it will be attempted to be converted.

  • stop – stop time. Preferably as pandas.Timestamp, else it will be attempted to be converted.

  • freq – frequency of datetime index.

Return type:

DatetimeIndex

pyaerocom.helpers.make_datetimeindex_from_year(freq, year)[source]

Create pandas datetime index

Parameters:
  • freq (str) – pandas frequency str

  • year (int) – year

Returns:

index object

Return type:

pandas.DatetimeIndex

pyaerocom.helpers.make_dummy_cube(var_name: str, start_yr: int = 2000, stop_yr: int = 2020, freq: str = 'daily', dtype=<class 'float'>) Cube[source]
pyaerocom.helpers.make_dummy_cube_latlon(lat_res_deg: float = 2, lon_res_deg: float = 3, lat_range: list[float] | tuple[float, float] = (-90, 90), lon_range: list[float] | tuple[float, float] = (-180, 180))[source]

Make an empty Cube with given latitude and longitude resolution

Dimensions will be lat, lon

Parameters:
  • lat_res_deg (float or int) – latitude resolution of grid

  • lon_res_deg (float or int) – longitude resolution of grid

  • lat_range (tuple or list) – 2-element list containing latitude range. If None, then (-90, 90) is used.

  • lon_range (tuple or list) – 2-element list containing longitude range. If None, then (-180, 180) is used.

Returns:

dummy cube in input resolution

Return type:

Cube

pyaerocom.helpers.merge_station_data(stats, var_name, pref_attr=None, sort_by_largest=True, fill_missing_nan=True, add_meta_keys=None, resample_how=None, min_num_obs=None)[source]

Merge multiple StationData objects (from one station) into one instance

Note

all input StationData objects need to have same attributes station_name, latitude, longitude and altitude

Parameters:
  • stats (list) – list containing StationData objects (note: all of these objects must contain variable data for the specified input variable)

  • var_name (str) – data variable name that is to be merged

  • pref_attr – optional argument that may be used to specify a metadata attribute that is available in all input StationData objects and that is used to order the input stations by relevance. The associated values of this attribute need to be sortable (e.g. revision_date). This is only relevant in case overlaps occur. If unspecified the relevance of the stations is sorted based on the length of the associated data arrays.

  • sort_by_largest (bool) – if True, the result from the sorting is inverted. E.g. if pref_attr is unspecified, then the stations will be sorted based on the length of the data vectors, starting with the shortest, ending with the longest. This sorting result will then be inverted, if sort_by_largest=True, so that the longest time series get’s highest importance. If, e.g. pref_attr='revision_date', then the stations are sorted by the associated revision date value, starting with the earliest, ending with the latest (which will also be inverted if this argument is set to True)

  • fill_missing_nan (bool) – if True, the resulting time series is filled with NaNs. NOTE: this requires that information about the temporal resolution (ts_type) of the data is available in each of the StationData objects.

  • add_meta_keys (str or list, optional) – additional non-standard metadata keys that are supposed to be considered for merging.

  • resample_how (str or dict, optional) – in case input stations come in different frequencies they are merged to the lowest common freq. This parameter can be used to control, which aggregator(s) are to be used (e.g. mean, median).

  • min_num_obs (str or dict, optional) – in case input stations come in different frequencies they are merged to the lowest common freq. This parameter can be used to control minimum number of observation constraints for the downsampling.

Returns:

merged data

Return type:

StationData

pyaerocom.helpers.numpy_to_cube(data, dims=None, var_name=None, units=None, **attrs)[source]

Make a cube from a numpy array

Parameters:
  • data (ndarray) – input data

  • dims (list, optional) – list of iris.coord.DimCoord instances in order of dimensions of input data array (length of list and shapes of each of the coordinates must match dimensions of input data)

  • var_name (str, optional) – name of variable

  • units (str) – unit of variable

  • **attrs – additional attributes to be added to metadata

Return type:

iris.cube.Cube

Raises:

DataDimensionError – if input dims is specified and results in conflict

pyaerocom.helpers.resample_time_dataarray(arr, freq, how=None, min_num_obs=None)[source]

Resample the time dimension of a xarray.DataArray

Note

The dataarray must have a dimension coordinate named “time”

Parameters:
  • arr (DataArray) – data array to be resampled

  • freq (str) – new temporal resolution (can be pandas freq. string, or pyaerocom ts_type)

  • how (str) – how to aggregate (e.g. mean, median)

  • min_num_obs (int, optional) – minimum number of observations required per period (when downsampling). E.g. if input is in daily resolution and freq is monthly and min_num_obs is 10, then all months that have less than 10 days of data are set to nan.

Returns:

resampled data array object

Return type:

DataArray

Raises:
  • IOError – if data input arr is not an instance of DataArray

  • DataDimensionError – if time dimension is not available in dataset

pyaerocom.helpers.resample_timeseries(ts, freq, how=None, min_num_obs=None)[source]

Resample a timeseries (pandas.Series)

Parameters:
  • ts (Series) – time series instance

  • freq (str) – new temporal resolution (can be pandas freq. string, or pyaerocom ts_type)

  • how – aggregator to be used, accepts everything that is accepted by pandas.core.resample.Resampler.agg() and in addition, percentiles may be provided as str using e.g. 75percentile as input for the 75% percentile.

  • min_num_obs (int, optional) – minimum number of observations required per period (when downsampling). E.g. if input is in daily resolution and freq is monthly and min_num_obs is 10, then all months that have less than 10 days of data are set to nan.

Returns:

resampled time series object

Return type:

Series

pyaerocom.helpers.same_meta_dict(meta1, meta2, ignore_keys=['PI'], num_keys=['longitude', 'latitude', 'altitude'], num_rtol=0.01)[source]

Compare meta dictionaries

Parameters:
  • meta1 (dict) – meta dictionary that is to be compared with meta2

  • meta2 (dict) – meta dictionary that is to be compared with meta1

  • ignore_keys (list) – list containing meta keys that are supposed to be ignored

  • num_keys (keys that contain numerical values) –

  • num_rtol (float) – relative tolerance level for comparison of numerical values

Returns:

True, if dictionaries are the same, else False

Return type:

bool

pyaerocom.helpers.seconds_in_periods(timestamps, ts_type)[source]

Calculates the number of seconds for each period in timestamps.

Parameters:
Returns:

Array with same length as timestamps containing number of seconds for each period.

Return type:

np.array

pyaerocom.helpers.sort_ts_types(ts_types)[source]

Sort a list of ts_types

Parameters:

ts_types (list) – list of strings (or instance of TsType) to be sorted

Returns:

list of strings with sorted frequencies

Return type:

list

Raises:

TemporalResolutionError – if one of the input ts_types is not supported

pyaerocom.helpers.start_stop(start, stop=None, stop_sub_sec=True)[source]

Create pandas timestamps from input start / stop values

Note

If input suggests climatological data in AeroCom format (i.e. year=9999) then the year is converted to 2222 instead since pandas cannot handle year 9999.

Parameters:
  • start – start time (any format that can be converted to pandas.Timestamp)

  • stop – stop time (any format that can be converted to pandas.Timestamp)

  • stop_sub_sec (bool) – if True and if input for stop is a year (e.g. 2015) then one second is subtracted from stop timestamp (e.g. if input stop is 2015 and denotes “until 2015”, then for the returned stop timestamp one second will be subtracted, so it would be 31.12.2014 23:59:59).

Returns:

  • pandas.Timestamp – start timestamp

  • pandas.Timestamp – stop timestamp

Raises:

ValueError – if input cannot be converted to pandas timestamps

pyaerocom.helpers.start_stop_from_year(year)[source]

Create start / stop timestamp from year

Parameters:

year (int) – the year for which start / stop is to be instantiated

Returns:

  • numpy.datetime64 – start datetime

  • numpy.datetime64 – stop datetime

pyaerocom.helpers.start_stop_str(start, stop=None, ts_type=None)[source]
pyaerocom.helpers.str_to_iris(key, **kwargs)[source]

Mapping function that converts strings into iris analysis objects

Please see dictionary STR_TO_IRIS in this module for valid definitions

Parameters:

key (str) – key of STR_TO_IRIS dictionary

Returns:

corresponding iris analysis object (e.g. Aggregator, method)

Return type:

obj

pyaerocom.helpers.to_datestring_YYYYMMDD(value)[source]

Convert input time to string with format YYYYMMDD

Parameters:

value – input time, may be string, datetime, numpy.datetime64 or pandas.Timestamp

Returns:

input formatted to string YYYYMMDD

Return type:

str

Raises:

ValueError – if input is not supported

pyaerocom.helpers.to_datetime64(value)[source]

Convert input value to numpy.datetime64

Parameters:

value – input value that is supposed to be converted, needs to be either str, datetime.datetime, pandas.Timestamp or an integer specifying the desired year.

Returns:

input timestamp converted to datetime64

Return type:

datetime64

pyaerocom.helpers.to_pandas_timestamp(value)[source]

Convert input to instance of pandas.Timestamp

Parameters:

value – input value that is supposed to be converted to time stamp

Return type:

pandas.Timestamp

pyaerocom.helpers.tuple_list_to_lists(tuple_list)[source]

Convert list with tuples (e.g. (lat, lon)) into multiple lists

pyaerocom.helpers.varlist_aerocom(varlist)[source]

Mathematical helpers

Mathematical low level utility methods of pyaerocom

pyaerocom.mathutils.calc_statistics(data, ref_data, lowlim=None, highlim=None, min_num_valid=1, weights=None, drop_stats=None)[source]

Calc statistical properties from two data arrays

Calculates the following statistical properties based on the two provided 1-dimensional data arrays and returns them in a dictionary (keys are provided after the arrows):

  • Mean value of both arrays -> refdata_mean, data_mean

  • Standard deviation of both arrays -> refdata_std, data_std

  • RMS (Root mean square) -> rms

  • NMB (Normalised mean bias) -> nmb

  • MNMB (Modified normalised mean bias) -> mnmb

  • MB (Mean Bias) -> mb

  • MAB (Mean Absolute Bias) -> mab

  • FGE (Fractional gross error) -> fge

  • R (Pearson correlation coefficient) -> R

  • R_spearman (Spearman corr. coeff) -> R_spearman

Note

Nans are removed from the input arrays, information about no. of removed points can be inferred from keys totnum and num_valid in return dict.

Parameters:
  • data (ndarray) – array containing data, that is supposed to be compared with reference data

  • ref_data (ndarray) – array containing data, that is used to compare data array with

  • lowlim (float) – lower end of considered value range (e.g. if set 0, then all datapoints where either data or ref_data is smaller than 0 are removed)

  • highlim (float) – upper end of considered value range

  • min_num_valid (int) – minimum number of valid measurements required to compute statistical parameters.

  • weights (ndarray) – array containing weights if computing weighted statistics

  • drop_stats (tuple) – tuple which drops the provided statistics from computed json files. For example, setting drop_stats = (“mb”, “mab”), results in json files in hm/ts with entries which do not contain the mean bias and mean absolute bias, but the other statistics are preserved.

Returns:

dictionary containing computed statistics

Return type:

dict

Raises:

ValueError – if either of the input arrays has dimension other than 1

pyaerocom.mathutils.closest_index(num_array, value)[source]

Returns index in number array that is closest to input value

pyaerocom.mathutils.corr(ref_data, data, weights=None)[source]

Compute correlation coefficient

Parameters:
  • data_ref (ndarray) – x data

  • data (ndarray) – y data

  • weights (ndarray, optional) – array containing weights for each point in data

Returns:

correlation coefficient

Return type:

float

pyaerocom.mathutils.estimate_value_range(vmin, vmax, extend_percent=0)[source]

Round and extend input range to estimate lower and upper bounds of range

Parameters:
  • vmin (float) – lower value of range

  • vmax (float) – upper value of range

  • extend_percent (int) – percentage specifying to which extent the input range is supposed to be extended.

Returns:

  • float – estimated lower end of range

  • float – estimated upper end of range

pyaerocom.mathutils.exponent(num)[source]

Get exponent of input number

Parameters:

num (float or iterable) – input number

Returns:

exponent of input number(s)

Return type:

int or ndarray containing ints

Example

>>> from pyaerocom.mathutils import exponent
>>> exponent(2340)
3
pyaerocom.mathutils.in_range(x, low, high)
pyaerocom.mathutils.is_strictly_monotonic(iter1d) bool[source]

Check if 1D iterble is strictly monotonic

Parameters:

iter1d – 1D iterable object to be tested

Return type:

bool

pyaerocom.mathutils.make_binlist(vmin: float, vmax: float, num: int | None = None) list[source]
pyaerocom.mathutils.numbers_in_str(input_string)[source]

This method finds all numbers in a string

Note

  • Beta version, please use with care

  • Detects only integer numbers, dots are ignored

Parameters:

input_string (str) – string containing numbers

Returns:

list of strings specifying all numbers detected in string

Return type:

list

Example

>>> numbers_in_str('Bla42Blub100')
[42, 100]
pyaerocom.mathutils.range_magnitude(low, high)[source]

Returns magnitude of value range

Parameters:
  • low (float) – lower end of range

  • high (float) – upper end of range

Returns:

magnitudes spanned by input numbers

Return type:

int

Example

>>> range_magnitude(0.1, 100)
3
>>> range_magnitude(100, 0.1)
-3
>>> range_magnitude(1e-3, 1e6)
9
pyaerocom.mathutils.sum(data, weights=None)[source]

Summing operation with option to perform weighted sum

Parameters:
  • data (ndarray) – data array that is supposed to be summed up

  • weights (ndarray, optional) – array containing weights for each point in data

Returns:

sum of values in input array

Return type:

float or int

pyaerocom.mathutils.weighted_corr(ref_data, data, weights)[source]

Compute weighted correlation

Parameters:
  • data_ref (ndarray) – x data

  • data (ndarray) – y data

  • weights (ndarray) – array containing weights for each point in data

Returns:

weighted correlation coefficient

Return type:

float

pyaerocom.mathutils.weighted_cov(ref_data, data, weights)[source]

Compute weighted covariance

Parameters:
  • data_ref (ndarray) – x data

  • data (ndarray) – y data

  • weights (ndarray) – array containing weights for each point in data

Returns:

covariance

Return type:

float

pyaerocom.mathutils.weighted_mean(data, weights)[source]

Compute weighted mean

Parameters:
  • data (ndarray) – data array that is supposed to be averaged

  • weights (ndarray) – array containing weights for each point in data

Returns:

weighted mean of data array

Return type:

float or int

pyaerocom.mathutils.weighted_sum(data, weights)[source]

Compute weighted sum using numpy dot product

Parameters:
  • data (ndarray) – data array that is supposed to be summed up

  • weights (ndarray) – array containing weights for each point in data

Returns:

weighted sum of values in input array

Return type:

float

Geodesic calculations and topography

Module for geographical calculations

This module contains low-level methods to perform geographical calculations, (e.g. distance between two coordinates)

pyaerocom.geodesy.calc_distance(lat0, lon0, lat1, lon1, alt0=None, alt1=None, auto_altitude_srtm=False)[source]

Calculate distance between two coordinates

Parameters:
  • lat0 (float) – latitude of first point in decimal degrees

  • lon0 (float) – longitude of first point in decimal degrees

  • lat1 (float) – latitude of secondpoint in decimal degrees

  • lon1 (float) – longitude of second point in decimal degrees

  • alt0 (float, optional) – altitude of first point in m

  • alt1 (float, optional) – altitude of second point in m

  • auto_altitude_srtm (bool) – if True, then all altitudes that are unspecified are set to the corresponding topographic altitude of that coordinate, using SRTM (only works for coordinates where SRTM topographic data is accessible).

Returns:

distance between points in km

Return type:

float

pyaerocom.geodesy.calc_latlon_dists(latref, lonref, latlons)[source]

Calculate distances of (lat, lon) coords to input lat, lon coordinate

Parameters:
  • latref (float) – latitude of reference coordinate

  • lonref (float) – longitude of reference coordinate

  • latlons (list) – list of (lat, lon) tuples for which distances to (latref, lonref) are computed

Returns:

list of computed geographic distances to input reference coordinate for all (lat, lon) coords in latlons

Return type:

list

pyaerocom.geodesy.find_coord_indices_within_distance(latref, lonref, latlons, radius=1)[source]

Find indices of coordinates that match input coordinate

Parameters:
  • latref (float) – latitude of reference coordinate

  • lonref (float) – longitude of reference coordinate

  • latlons (list) – list of (lat, lon) tuples for which distances to (latref, lonref) are computed

  • radius (float or int, optional) – Maximum allowed distance to input coordinate. The default is 1.

Returns:

Indices of latlon coordinates in :param:`latlons` that are within the specified radius around (latref, lonref). The indices are sorted by distance to the input coordinate, starting with the closest

Return type:

ndarray

pyaerocom.geodesy.get_country_info_coords(coords)[source]

Get country information for input lat/lon coordinates

Parameters:

coords (list or tuple) – list of coord tuples (lat, lon) or single coord tuple

Raises:

ValueError – if input format is incorrect

Returns:

list of dictionaries containing country information for each input coordinate

Return type:

list

pyaerocom.geodesy.get_topo_altitude(lat, lon, topo_dataset='srtm', topodata_loc=None, try_etopo1=True)[source]

Retrieve topographic altitude for a certain location

Supports topography datasets supported by geonum. These are currently (20 Feb. 19) srtm (SRTM dataset, default, automatic access if online) and etopo1 (ETOPO1 dataset, lower resolution, must be available on local machine or server).

Parameters:
  • lat (float) – latitude of coordinate

  • lon (float) – longitude of coordinate

  • topo_dataset (str) – name of topography dataset

  • topodata_loc (str) – filepath or directory containing supported topographic datasets

  • try_etopo1 (bool) – if True and if access fails via input arg topo_dataset, then try to access altitude using ETOPO1 dataset.

Returns:

dictionary containing input latitude, longitude, altitude and topographic dataset name used to retrieve the altitude.

Return type:

dict

Raises:

ValueError – if altitude data cannot be accessed

pyaerocom.geodesy.get_topo_data(lat0, lon0, lat1=None, lon1=None, topo_dataset='srtm', topodata_loc=None, try_etopo1=False)[source]

Retrieve topographic altitude for a certain location

Supports topography datasets supported by geonum. These are currently (20 Feb. 19) srtm (SRTM dataset, default, automatic access if online) and etopo1 (ETOPO1 dataset, lower resolution, must be available on local machine or server).

Parameters:
  • lat0 (float) – start longitude for data extraction

  • lon0 (float) – start latitude for data extraction

  • lat1 (float) – stop longitude for data extraction (default: None). If None only data around lon0, lat0 will be extracted.

  • lon1 (float) – stop latitude for data extraction (default: None). If None only data around lon0, lat0 will be extracted

  • topo_dataset (str) – name of topography dataset

  • topodata_loc (str) – filepath or directory containing supported topographic datasets

  • try_etopo1 (bool) – if True and if access fails via input arg topo_dataset, then try to access altitude using ETOPO1 dataset.

Returns:

data object containing topography data in specified range

Return type:

geonum.TopoData

Raises:

ValueError – if altitude data cannot be accessed

pyaerocom.geodesy.haversine(lat0, lon0, lat1, lon1, earth_radius=6371.0)[source]

Haversine formula

Approximate horizontal distance between 2 points assuming a spherical earth using haversine formula.

Note

This code was copied from geonum library (date 12/11/2018, J. Gliss)

Parameters:
  • lat0 (float) – latitude of first point in decimal degrees

  • lon0 (float) – longitude of first point in decimal degrees

  • lat1 (float) – latitude of second point in decimal degrees

  • lon1 (float) – longitude of second point in decimal degrees

  • earth_radius (float) – average earth radius in km, defaults to 6371.0

Returns:

horizontal distance between input coordinates in km

Return type:

float

pyaerocom.geodesy.is_within_radius_km(lat0, lon0, lat1, lon1, maxdist_km, alt0=0, alt1=0, **kwargs)[source]

Checks if two lon/lat coordinates are within a certain distance to each other

Parameters:
  • lat0 (float) – latitude of first point in decimal degrees

  • lon0 (float) – longitude of first point in decimal degrees

  • lat1 (float) – latitude of second point in decimal degrees

  • lon1 (float) – longitude of second point in decimal degrees

  • maxdist_km (float) – maximum distance between two points in km

  • alt0 (float) – altitude of first point in m

  • alt1 (float) – altitude of second point in m

Returns:

True, if coordinates are within specified distance to each other, else False

Return type:

bool

Units and unit conversion

Units helpers in base package

pyaerocom.units_helpers.RATES_FREQ_DEFAULT = 'd'

default frequency for rates variables (e.g. deposition, precip)

pyaerocom.units_helpers.UCONV_MUL_FACS = to       fac var_name  from                      concso2   ug S/m3  ug m-3  1.997910 concbc    ug C/m3  ug m-3  1.000000 concoa    ug C/m3  ug m-3  1.000000 concoc    ug C/m3  ug m-3  1.000000 conctc    ug C/m3  ug m-3  1.000000 concpm25  ug m-3        1  1.000000 concpm10  ug m-3        1  1.000000 concno2   ug N/m3  ug m-3  3.284478 concnh3   ug N/m3  ug m-3  1.215862 wetso4    kg S/ha  kg m-2  0.000300 concso4pr mg S/L    g m-3  2.995821

Custom unit conversion factors for certain variables columns: variable -> from unit -> to_unit -> conversion factor

pyaerocom.units_helpers.convert_unit(data, from_unit, to_unit, var_name=None, ts_type=None)[source]

Convert unit of data

Parameters:
  • data (np.ndarray or similar) – input data

  • from_unit (cf_units.Unit or str) – current unit of input data

  • to_unit (cf_units.Unit or str) – new unit of input data

  • var_name (str, optional) – name of variable. If provided, and standard conversion with cf_units fails, then custom unit conversion is attempted.

  • ts_type (str, optional) – frequency of data. May be needed for conversion of rate variables such as precip, deposition, etc, that may be defined implictly without proper frequency specification in the unit string.

Returns:

data in new unit

Return type:

data

pyaerocom.units_helpers.get_unit_conversion_fac(from_unit, to_unit, var_name=None, ts_type=None)[source]
pyaerocom.units_helpers.rate_unit_implicit(unit)[source]

Check whether input rate unit is implicit

Implicit rate units do not contain frequency string, e.g. “mg m-2” instead of “mg m-2 d-1”. Such units are, e.g. used in EMEP output where the frequency corresponds to the output frequency, e.g. “mg m-2” per day if output is daily.

Note

For now, this is just a wrapper for _check_unit_endswith_freq(), but there may be more sophisticated options in the future, which may be added to this function.

Parameters:

unit (str) – unit to be tested

Returns:

True if input unit appears to be implicit, else False.

Return type:

bool

Units helpers in io sub-package

pyaerocom.io.helpers_units.mass_to_nr_molecules(mass, mm)[source]

Calculating the number of molecules form mass and molarmass.

Mass, Molar mass need to be in the same unit, either both g and g/mol or kg and kg/mol.

Parameters:
  • mass (float) – mass of all compounds.

  • mm (float) – molar mass of compounds.

Returns:

nr_molecules – number of molecules

Return type:

float

pyaerocom.io.helpers_units.nr_molecules_to_mass(nr_molecules, mm)[source]

Calculates the mass from the number of molecules and molar mass.

Parameters:
  • nr_molecules (int) – Number of molecules

  • mm (float) – Molar mass [g/mol]

Returns:

mass – mass in grams

Return type:

float

pyaerocom.io.helpers_units.unitconv_sfc_conc(data, nr_of_O=2)[source]

Unitconverting: ugS/m3 to ugSOx/m3

Parameters:
  • data (array_like) – Contains the data in units of ugS/m3.

  • nr_of_O (int) – The number of O’s in you desired SOx compound.

Returns:

data – data in units of ug SOx/m3

Return type:

ndarray

pyaerocom.io.helpers_units.unitconv_sfc_conc_bck(data, x=2)[source]

Converting: ugSOx/m3 to ugS/ m3.

Parameters:
  • data (ndarray) – Contains the data in units of ugSoX/m3.

  • x (int) – The number of oxygen atoms, O in you desired SOx compound.

Returns:

data – in units of ugS/ m3.

Return type:

ndarray

Notes

micro grams to kilos is 10**6

pyaerocom.io.helpers_units.unitconv_wet_depo(data, time, ts_type='monthly')[source]

Unitconversion kg S/ha to kg SOx m-2 s-1.

Adding mass of oksygen.

Parameters:
  • data (ndarray) – data in unit kg S/ha = kg S/(1000 m2)

  • time (pd.Seires[numpy.datetime64]) – Array of datetime64 timesteps.

  • ts_type (str) – The timeseries type. Default “monthly”.

Returns:

data – data in units of ugSOx/m3

Return type:

ndarray

pyaerocom.io.helpers_units.unitconv_wet_depo_bck(data, time, ts_type='monthly')[source]

The unitconversion kg SO4 m-2 s-1 to kgS/ha.

Removing the weight of oxygen.

Parameters:
  • data (ndarray) – Sulphur data you wish to convert.

  • time (pd.Seires[numpy.datetime64]) – Array of datetime64 timesteps.

  • ts_type (str) – The timeseries type. Default monthly.

Returns:

data – Sulphur data in units of ugSOx m-3 s-1.

Return type:

ndarray

pyaerocom.io.helpers_units.unitconv_wet_depo_from_emep(data, time, ts_type='monthly')[source]

Unitconversion mgS m-2 to kg SO4 m-2 s-1.

Milligram to kilos is 10-6.

Adding mass of oksygen.

Parameters:
  • data (ndarray) – data in unit mg S m-2.

  • time (pd.Seires[numpy.datetime64]) – Array of datetime64 timesteps.

  • ts_type (str) – The timeseries type. Default “monthly”.

Returns:

data – data in units of ugSOx/m3

Return type:

ndarray

Plotting / visualisation (sub package plot)

The pyaerocom.plot package contains algorithms related to data visualisation and plotting.

Plotting of maps

pyaerocom.plot.mapping.get_cmap_maps_aerocom(color_theme=None, vmin=None, vmax=None)[source]

Get colormap using pyAeroCom color scheme

Parameters:
  • color_theme (:ColorTheme, optional) – instance of pyaerocom color theme. If None, the default schemes is used

  • vmin (float, optional) – lower end of value range (only considered for diverging color maps with non-symmetric mapping)

  • vmax (float, optional) – upper end of value range only considered for diverging color maps with non-symmetric mapping)

Return type:

colormap

pyaerocom.plot.mapping.init_map(xlim=(-180, 180), ylim=(-90, 90), figh=8, fix_aspect=False, xticks=None, yticks=None, color_theme=light, projection=None, title=None, gridlines=False, fig=None, ax=None, draw_coastlines=True, contains_cbar=False)[source]

Initalise a map plot

Parameters:
  • xlim (tuple) – 2-element tuple specifying plotted longitude range

  • ylim (tuple) – 2-element tuple specifying plotted latitude range

  • figh (int) – height of figure in inches

  • fix_aspect (bool, optional) – if True, the aspect of the GeoAxes instance is kept fix using the default aspect MAP_AXES_ASPECT defined in pyaerocom.plot.config

  • xticks (iterable, optional) – ticks of x-axis (longitudes)

  • yticks (iterable, optional) – ticks of y-axis (latitudes)

  • color_theme (ColorTheme) – pyaerocom color theme.

  • projection – projection instance from cartopy.crs module (e.g. PlateCaree). May also be string.

  • title (str, optional) – title that is supposed to be inserted

  • gridlines (bool) – whether or not to add gridlines to the map

  • fig (matplotlib.figure.Figure, optional) – instance of matplotlib Figure class. If specified, the former to input args (figh and fix_aspect) are ignored. Note that the Figure is wiped clean before plotting, so any plotted content will be lost

  • ax (GeoAxes, optional) – axes in which the map is plotted

  • draw_coastlines (bool) – whether or not to draw coastlines

  • contains_cbar (bool) – whether or not a colorbar is intended to be added to the figure ( impacts the aspect ratio of the figure).

Returns:

ax – axes instance

Return type:

cartopy.mpl.geoaxes.GeoAxes

pyaerocom.plot.mapping.plot_griddeddata_on_map(data, lons=None, lats=None, var_name=None, unit=None, xlim=(-180, 180), ylim=(-90, 90), vmin=None, vmax=None, add_zero=False, c_under=None, c_over=None, log_scale=True, discrete_norm=True, cbar_levels=None, cbar_ticks=None, add_cbar=True, cmap=None, cbar_ticks_sci=False, color_theme=None, ax=None, ax_cbar=None, **kwargs)[source]

Make a plot of gridded data onto a map

Parameters:
  • data (ndarray) – 2D data array

  • lons (ndarray) – longitudes of data

  • lats (ndarray) – latitudes of data

  • var_name (str, optional) – name of variable that is plotted

  • xlim (tuple) – 2-element tuple specifying plotted longitude range

  • ylim (tuple) – 2-element tuple specifying plotted latitude range

  • vmin (float, optional) – lower value of colorbar range

  • vmax (float, optional) – upper value of colorbar range

  • add_zero (bool) – if True and vmin is not 0, then, the colorbar is extended down to 0. This may be used, e.g. for logarithmic scales that should include 0.

  • c_under (float, optional) – colour of data values smaller than vmin

  • c_over (float, optional) – colour of data values exceeding vmax

  • log_scale (bool) – if True, the value to color mapping is done in a pseudo log scale (see get_cmap_levels_auto() for implementation)

  • discrete_norm (bool) – if True, color mapping will be subdivided into discrete intervals

  • cbar_levels (iterable, optional) – discrete colorbar levels. Will be computed automatically, if None (and applicable)

  • cbar_ticks (iterable, optional) – ticks of colorbar levels. Will be computed automatically, if None (and applicable)

Returns:

matplotlib figure instance containing plot result. Use fig.axes[0] to access the map axes instance (e.g. to modify the title or lon / lat range, etc.)

Return type:

fig

pyaerocom.plot.mapping.plot_map_aerocom(data, region, **kwargs)[source]

High level map plotting function for Aerocom default plotting

Note

This function does not iterate over a cube in time, but uses the first available time index in the data.

Parameters:
  • data (GriddedData) – input data from one timestamp (if data contains more than one time stamp, the first index is used)

  • region (str or Region) – valid region ID or region

pyaerocom.plot.mapping.plot_nmb_map_colocateddata(coldata, in_percent=True, vmin=-100, vmax=100, cmap=None, s=80, marker=None, step_bounds=None, add_cbar=True, norm=None, cbar_extend=None, add_mean_edgecolor=True, ax=None, ax_cbar=None, cbar_outline_visible=False, cbar_orientation=None, ref_label=None, stats_area_weighted=False, **kwargs)[source]

Plot map of normalised mean bias from instance of ColocatedData

Parameters:
  • coldata (ColocatedData) – data object

  • in_percent (bool) – plot bias in percent

  • vmin (int) – minimum value of colormapping

  • vmax (int) – maximum value of colormapping

  • cmap (str or cmap) – colormap used, defaults to bwr

  • s (int) – size of marker

  • marker (str) – marker used

  • step_bounds (int, optional) – step used for discrete colormapping (if None, continuous is used)

  • cbar_extend (str) – extend colorbar

  • ax (GeoAxes, optional) – axes into which the bias is supposed to be plotted

  • ax_cbar (plt.Axes, optional) – axes for colorbar

  • cbar_outline_visible (bool) – if False, borders of colorbar are removed

  • cbar_orientation (str) – e.g. ‘vertical’, defaults to ‘vertical’

  • **kwargs – keyword args passed to init_map()

Return type:

GeoAxes

pyaerocom.plot.mapping.set_map_ticks(ax, xticks=None, yticks=None)[source]

Set or update ticks in instance of GeoAxes object (cartopy)

Parameters:
  • ax (cartopy.GeoAxes) – map axes instance

  • xticks (iterable, optional) – ticks of x-axis (longitudes)

  • yticks (iterable, optional) – ticks of y-axis (latitudes)

Returns:

modified axes instance

Return type:

cartopy.GeoAxes

Plotting coordinates on maps

pyaerocom.plot.plotcoordinates.plot_coordinates(lons, lats, xlim=None, ylim=None, label=None, legend=True, color=None, marker=None, markersize=8, ax=None, **kwargs)[source]

Plot input coordinates on a map

lonsndarray

array of longitude coordinates (can also be list or tuple)

latsndarray

array of latitude coordinates (can also be list or tuple)

xlimtuple

longitude range

ylimtuple

latitude range

labelstr, optional

label of data

legendbool

whether or not to display a legend, defaults to True.

colorstr, optional

color of markers, defaults to red

markerstr, optional

marker shape, defaults to ‘o’

markersizeint

size of markers

axGeoAxes

axes instance to be plotted into

**kwargs

additional keyword args passed on to init_map()

Return type:

GeoAxes

Scatter plots

This module contains scatter plot routines for Aerocom data.

pyaerocom.plot.plotscatter.plot_scatter(x_vals, y_vals, **kwargs)[source]

Scatter plot

Currently a wrapper for high-level method plot_scatter_aerocom (same module, see there for details)

pyaerocom.plot.plotscatter.plot_scatter_aerocom(x_vals, y_vals, var_name=None, var_name_ref=None, x_name=None, y_name=None, start=None, stop=None, ts_type=None, unit=None, stations_ok=None, filter_name=None, lowlim_stats=None, highlim_stats=None, loglog=None, ax=None, figsize=None, fontsize_base=11, fontsize_annot=None, marker=None, color=None, alpha=0.5, **kwargs)[source]

Method that performs a scatter plot of data in AEROCOM format

Parameters:
  • y_vals (ndarray) – 1D array (or list) of model data points (y-axis)

  • x_vals (ndarray) – 1D array (or list) of observation data points (x-axis)

  • var_name (str, optional) – name of variable that is plotted

  • var_name_ref (str, optional) – name of variable of reference data

  • x_name (str, optional) – Name of observation network

  • y_name (str, optional) – Name / ID of model

  • start (str or :obj`datetime` or similar) – start time of data

  • stop (str or :obj`datetime` or similar) – stop time of data

  • ts_type (str) – frequency of data

  • unit (str, optional) – unit of data

  • stations_ok (int, optional) – number of stations from which data were generated

  • filter_name (str, optional) – name of filter

  • lowlim_stats (float, optional) – lower value considered for statistical parameters

  • highlim_stats (float, optional) – upper value considered for statistical parameters

  • loglog (bool, optional) – plot log log scale, if None, pyaerocom default is used

  • ax (Axes) – axes into which the data are to be plotted

  • figsize (tuple) – size of figure (if new figure is created, ie ax is None)

  • fontsize_base (int) – basic fontsize, defaults to 11

  • fontsize_annot (int, optional) – fontsize used for annotations

  • marker (str, optional) – marker used for data, if None, ‘+’ is used

  • color (str, optional) – color of markers, default to ‘k’

  • alpha (float, optional) – transparency of markers (does not apply to all marker types), defaults to 0.5.

  • **kwargs – additional keyword args passed to ax.plot()

Returns:

plot axes

Return type:

matplotlib.axes.Axes

Heatmap plots

pyaerocom.plot.heatmaps.df_to_heatmap(df, cmap=None, center=None, low=0.3, high=0.3, vmin=None, vmax=None, color_rowwise=False, normalise_rows=False, normalise_rows_how=None, normalise_rows_col=None, norm_ref=None, sub_norm_before_div=True, annot=True, num_digits=None, ax=None, figsize=(12, 12), cbar=False, cbar_label=None, cbar_labelsize=None, xticklabels=None, xtick_rot=45, yticklabels=None, ytick_rot=45, xlabel=None, ylabel=None, title=None, labelsize=12, annot_fontsize=None, annot_fmt_rowwise=False, annot_fmt_exceed=None, annot_fmt_rows=None, cbar_ax=None, cbar_kws=None, **kwargs)[source]

Plot a pandas dataframe as heatmap

Parameters:
  • df (DataFrame) – table data

  • cmap (str, optional) – string specifying colormap to be used

  • center (float, optional) – value that is mapped to center colour of colormap (e.g. 0)

  • low (float, optional) – Extends lower range of the table values so that when mapped to the colormap, it’s entire range isn’t used. E.g. 0.3 roughly corresponds to colormap crop of 30% at the lower end.

  • high (float, optional) – Extends upper range of the table values so that when mapped to the colormap, it’s entire range isn’t used. E.g. 0.3 roughly corresponds to colormap crop of 30% at the upper end.

  • vmin (float, optional) – lower end of value range to be plotted. If specified, input arg low will be ignored.

  • vmax (float, optional) – upper end of value range to be plotted. If specified, input arg low will be ignored.

  • color_rowwise (bool, optional) – if True, the color mapping is applied row by row, else, for the whole table. Defaults to False.

  • normalise_rows (bool, optional) – if True, the table is normalised in a rowwise manner either using the mean value in each row (if argument normalise_rows_col is unspecified) or using the value in a specified column. Defaults to False.

  • normalise_rows_how (str, optional) – aggregation string for row normalisation. Choose from mean or median. Only relevant if input arg normalise_rows==True.

  • normalise_rows_col (int, optional) – if provided and if arg. normalise_rows==True, then the corresponding table column is used for normalisation rather than the mean value of each row.

  • norm_ref (float or ndarray, optional) – reference value(s) used for rowwise normalisation. Only relevant if normalise_rows is True. If specified, normalise_rows_how and normalise_rows_col will be ignored.

  • sub_norm_before_div (bool, optional) – if True, the rowwise normilisation is applied by subtracting the normalisation value for each row before dividing by it. This can be useful to visualise positive or negative departures from the mean or median.

  • annot (bool or list or ndarray, optional) – if True, the table values are printed into the heatmap. Defaults to True, in which case the values are computed based on the table content. If list or ndarray, the shape needs to be the same as input table shape (no of rows and cols), in which case the values of that 2D frame are used.

  • num_digits (int, optional) – number of digits printed in heatmap annotation.

  • ax (axes, optional) – matplotlib axes instance used for plotting, if None, an axes will be created

  • figsize (tuple, optional) – size of figure for plot

  • cbar (bool, optional) – if True, a colorbar is included

  • cbar_label (str, optional) – label of colorbar (if colorbar is included, see cbar option)

  • cbar_labelsize (int, optional) – size of colorbar label

  • xticklabels (list, optional) – List of x axis labels.

  • xtick_rot (int, optional) – rotation of x axis labels, defaults to 45 degrees.

  • yticklabels (list, optional) – List of string labels.

  • ytick_rot (int, optional) – rotation of y axis labels, defaults to 45 degrees.

  • xlabel (str, optional) – x axis label

  • ylabel (str, optional) – y axis label

  • title (str, optional) – title of heatmap

  • labelsize (int, optional) – fontsize of labels, default to 12

  • annot_fontsize (int, optional) – fontsize of annotated text.

  • annot_fmt_rowwise (bool) – rowwise formatting of annotation values, based on row value ranges. Defaults to False.

  • annot_fmt_exceed (list, optional) – how to format annotated values that exceed a certain threshold. The list contains 2 entries, 1. the threshold values, 2. how values exceeding this thrshold should be formatted. This parameter is only considered if annot_fmt_rowwise is True. See also _format_annot_heatmap().

  • annot_fmt_rows (list) – annotation formatting strings for each row of the input table. This parameter is only considered if annot_fmt_rowwise is True. See also _format_annot_heatmap().

  • cbar_ax (Axes, optional) – axes instance for colorbar, parsed to seaborn.heatmap().

  • cbar_kws (dict, optional) – keywords for colorbar formatting, , parsed to seaborn.heatmap().

  • **kwargs – further keyword args parsed to seaborn.heatmap()

Raises:

ValueError – if input annot is list or ndarray and has a different shape than the input df.

Returns:

  • Axes – plot axes instance

  • list or None – annotation information for rows

Colors schemes

class pyaerocom.plot.config.ColorTheme(name='dark', cmap_map=None, color_coastline=None, cmap_map_div=None, cmap_map_div_shifted=True)[source]

Pyaerocom class specifiying plotting color theme

name

name of color theme (e.g. “light” or “dark”)

Type:

str

cmap_map

name of colormap or colormap for map plotting

Type:

str

color_coastline

coastline color for map plotting

Type:

str

cmap_map_div

name of diverging colormap (used in map plots when plotted value range crosses 0)

Type:

str

cmap_map_div_shifted

boolean specifying whether center of diverging colormaps for map plots is supposed to be shifted to 0

Type:

bool

Example

Load default color theme >>> theme = ColorTheme(name=”dark”) >>> print(theme) pyaerocom ColorTheme name : dark cmap_map : viridis color_coastline : #e6e6e6

from_dict(info_dict)[source]

Import theme information from dictionary

info_dictdict

dictionary containing theme settings

load_default(theme_name='dark')[source]

Load default color theme

Parameters:

theme_name (str) – name of default theme

Raises:

ValueError – if theme_name is not a valid default theme

to_dict()[source]

Convert this object into dictionary

Returns:

dictionary representation of this object

Return type:

dict

pyaerocom.plot.config.get_color_theme(theme_name='dark')[source]

Plot helper functions

pyaerocom.plot.helpers.calc_figsize(lon_range, lat_range, figh=8)[source]

Calculate figure size based on data

The required figure width is computed based on the input height and the aspect ratio of the longitude and latitude arrays

Parameters:
  • lon_range (tuple) – 2-element tuple specifying longitude range (may also be list or array)

  • lat_range (tuple) – 2-element tuple specifying latitude range (may also be list or array)

  • figh (int) – figure height in inches

  • add_cbar (bool) – if True, the width is adapted accordingly

Returns:

2-element tuple containing figure width and height

Return type:

tuple

pyaerocom.plot.helpers.calc_pseudolog_cmaplevels(vmin, vmax, add_zero=False)[source]

Initiate pseudo-log discrete colormap levels

Parameters:
  • vmin (float) – lower end of colormap (e.g. minimum value of data)

  • vmax (float) – upper value of colormap (e.g. maximum value of data)

  • add_zero (bool) – if True, the lower bound is set to 0 (irrelevant if vmin is 0).

Returns:

list containing boundary array for discrete colormap (e.g. using BoundaryNorm)

Return type:

list

Example

>>> vmin, vmax = 0.02, 0.75
>>> vals = calc_pseudolog_cmaplevels(vmin, vmax, num_per_mag=10, add_zero=True)
>>> for val in vals: print("%.4f" %val)
0.0000
0.0100
0.0126
0.0158
0.0200
0.0251
0.0316
0.0398
0.0501
0.0631
0.0794
0.1000
pyaerocom.plot.helpers.custom_mpl(mpl_rcparams=None, default_large=True, **kwargs)[source]

Custom matplotlib settings

pyaerocom.plot.helpers.get_cmap_levels_auto(vmin, vmax, num_per_mag=10)[source]

Initiate pseudo-log discrete colormap levels

Note

This is a beta version and aims to

Parameters:
  • vmin (float) – lower end of colormap (e.g. minimum value of data)

  • vmax (float) – upper value of colormap (e.g. maximum value of data)

pyaerocom.plot.helpers.get_cmap_ticks_auto(lvls, num_per_mag=3)[source]

Compute cmap ticks based on cmap levels

The cmap levels may be computed automatically using get_cmap_levels_auto().

Parameters:
  • lvls (list) – list containing colormap levels

  • num_per_mag (int) – desired number of ticks per magnitude

pyaerocom.plot.helpers.projection_from_str(projection_str='PlateCarree')[source]

Return instance of cartopy projection class based on string ID

Configuration and global constants

Basic configuration class

Will be initiated on input and is accessible via pyaerocom.const.

class pyaerocom.config.Config(config_file=None, try_infer_environment=True)[source]

Class containing relevant paths for read and write routines

A loaded instance of this class is created on import of pyaerocom and can be accessed via pyaerocom.const.

TODO: provide more information

AEOLUS_NAME = 'AeolusL2A'
AERONET_INV_V2L15_ALL_POINTS_NAME = 'AeronetInvV2Lev1.5.AP'
AERONET_INV_V2L15_DAILY_NAME = 'AeronetInvV2Lev1.5.daily'
AERONET_INV_V2L2_ALL_POINTS_NAME = 'AeronetInvV2Lev2.AP'
AERONET_INV_V2L2_DAILY_NAME = 'AeronetInvV2Lev2.daily'
AERONET_INV_V3L15_DAILY_NAME = 'AeronetInvV3Lev1.5.daily'

Aeronet V3 inversions

AERONET_INV_V3L2_DAILY_NAME = 'AeronetInvV3Lev2.daily'
AERONET_SUN_V2L15_AOD_ALL_POINTS_NAME = 'AeronetSun_2.0_NRT'
AERONET_SUN_V2L15_AOD_DAILY_NAME = 'AeronetSunV2Lev1.5.daily'

Aeronet Sun V2 access names

AERONET_SUN_V2L2_AOD_ALL_POINTS_NAME = 'AeronetSunV2Lev2.AP'
AERONET_SUN_V2L2_AOD_DAILY_NAME = 'AeronetSunV2Lev2.daily'
AERONET_SUN_V2L2_SDA_ALL_POINTS_NAME = 'AeronetSDAV2Lev2.AP'
AERONET_SUN_V2L2_SDA_DAILY_NAME = 'AeronetSDAV2Lev2.daily'

Aeronet SDA V2 access names

AERONET_SUN_V3L15_AOD_ALL_POINTS_NAME = 'AeronetSunV3Lev1.5.AP'
AERONET_SUN_V3L15_AOD_DAILY_NAME = 'AeronetSunV3Lev1.5.daily'

Aeronet Sun V3 access names

AERONET_SUN_V3L15_SDA_ALL_POINTS_NAME = 'AeronetSDAV3Lev1.5.AP'
AERONET_SUN_V3L15_SDA_DAILY_NAME = 'AeronetSDAV3Lev1.5.daily'

Aeronet SDA V3 access names

AERONET_SUN_V3L2_AOD_ALL_POINTS_NAME = 'AeronetSunV3Lev2.AP'
AERONET_SUN_V3L2_AOD_DAILY_NAME = 'AeronetSunV3Lev2.daily'
AERONET_SUN_V3L2_SDA_ALL_POINTS_NAME = 'AeronetSDAV3Lev2.AP'
AERONET_SUN_V3L2_SDA_DAILY_NAME = 'AeronetSDAV3Lev2.daily'
property ALL_DATABASE_IDS

ID’s of available database configurations

property CACHEDIR

Cache directory for UngriddedData objects

property CACHING

Activate writing of and reading from cache files

CLIM_FREQ = 'daily'
CLIM_MIN_COUNT = {'daily': 30, 'monthly': 5}
CLIM_RESAMPLE_HOW = 'mean'
CLIM_START = 2005
CLIM_STOP = 2015
property COLOCATEDDATADIR

Directory for accessing and saving colocated data objects

property COORDINFO

Instance of VarCollection containing coordinate info

property DATA_SEARCH_DIRS

Directories which pyaerocom will consider for data access

Note

This corresponds to directories considered for searching gridded data (e.g. models and level 3 satellite products). Please see OBSLOCS_UNGRIDDED for available data directories for reading of ungridded data.

Returns:

list of directories

Return type:

list

DEFAULT_REG_FILTER = 'ALL-wMOUNTAINS'
DEFAULT_VERT_GRID_DEF = {'lower': 0, 'step': 250, 'upper': 15000}

Information specifying default vertical grid for post processing of profile data. The values are in units of m.

DMS_AMS_CVO_NAME = 'DMS_AMS_CVO'

DMS

DONOTCACHEFILE = None
property DOWNLOAD_DATADIR

Directory where data is downloaded into

EARLINET_NAME = 'EARLINET'

Earlinet access name;

EBAS_DB_LOCAL_CACHE = True

boolean specifying wheter EBAS DB is copied to local cache for faster access, defaults to True

property EBAS_FLAGS_FILE

Location of CSV file specifying meaning of EBAS flags

EBAS_MULTICOLUMN_NAME = 'EBASMC'

EBAS name

EEA_NAME = 'EEAAQeRep'

EEA nmea

EEA_NRT_NAME = 'EEAAQeRep.NRT'

EEA.NRT name

EEA_V2_NAME = 'EEAAQeRep.v2'

EEAV2 name

property ERA5_SURFTEMP_FILE
ERA5_SURFTEMP_FILENAME = 'era5.msl.t2m.201001-201012.nc'
property ETOPO1_AVAILABLE

Boolean specifying if access to ETOPO1 dataset is provided

Return type:

bool

property FILTERMASKKDIR
GAWTADSUBSETAASETAL_NAME = 'GAWTADsubsetAasEtAl'

GAW TAD subset aas et al paper

GRID_IO

Settings for reading and writing of gridded data

property HOMEDIR

Home directory of user

HTAP_REGIONS = ['PAN', 'EAS', 'NAF', 'MDE', 'LAND', 'SAS', 'SPO', 'OCN', 'SEA', 'RBU', 'EEUROPE', 'NAM', 'WEUROPE', 'SAF', 'USA', 'SAM', 'EUR', 'NPO', 'MCA']
ICOS_NAME = 'ICOS'

ICOS name

ICPFORESTS_NAME = 'ICPFORESTS'

ICP Forests

property LOCAL_TMP_DIR

Local TEMP directory

property LOGFILESDIR

Directory where logfiles are stored

MAX_YEAR = 20000

Highest possible year in data

MEP_NAME = 'MEP'

MEP name

MIN_YEAR = 0

Lowest possible year in data

OBS_ALLOW_ALT_WAVELENGTHS = True

This boolean can be used to enable / disable the former (i.e. use available wavelengths of variable in a certain range around variable wavelength).

property OBS_IDS_UNGRIDDED

List of all data IDs of supported ungridded observations

OBS_MIN_NUM_RESAMPLE = {'daily': {'hourly': 6}, 'hourly': {'minutely': 15}, 'monthly': {'daily': 7}, 'yearly': {'monthly': 3}}

Time resample strategies for certain cominations, first level refers to TO, second to FROM and values are minimum number of observations

OBS_WAVELENGTH_TOL_NM = 10.0

Wavelength tolerance for observations imports

OLD_AEROCOM_REGIONS = ['ALL', 'ASIA', 'AUSTRALIA', 'CHINA', 'EUROPE', 'INDIA', 'NAFRICA', 'SAFRICA', 'SAMERICA', 'NAMERICA']
property OUTPUTDIR

Default output directory

REVISION_FILE = 'Revision.txt'

Name of the file containing the revision string of an obs data network

RH_MAX_PERCENT_DRY = 40

maximum allowed RH to be considered dry

RM_CACHE_OUTDATED = True
property ROOTDIR

Local root directory

SENTINEL5P_NAME = 'Sentinel5P'
SERVER_CHECK_TIMEOUT = 1

timeout to check if one of the supported server locations can be accessed

STANDARD_COORD_NAMES = ['latitude', 'longitude', 'altitude']

standard names for coordinates

URL_HTAP_MASKS = 'https://pyaerocom.met.no/pyaerocom-suppl/htap_masks/'
property VARS

Instance of class VarCollection (for default variable information)

property VAR_PARAM

Deprecated name, please use VARS instead

add_data_search_dir(*dirs)[source]

Add data search directories for database browsing

add_ungridded_obs(obs_id, data_dir, reader=None, check_read=False)[source]

Add a network to the data search structure

Parameters:
  • obs_id (str) – name of network. E.g. MY_OBS or EBASMC

  • data_dir (str) – directory where data files are stored

  • reader (pyaerocom.io.ReadUngriddedBase, optional) – reading class used to import these data. If obs_id is known (e.g. EBASMC) this is not needed.

Raises:
  • AttributeError – if the network name is already reserved in OBSLOCS_UNGRIDDED

  • ValueError – if the data directory does not exist

add_ungridded_post_dataset(obs_id, obs_vars, obs_aux_requires, obs_merge_how, obs_aux_funs=None, obs_aux_units=None, **kwargs)[source]

Register new ungridded dataset

Other than add_ungridded_obs(), this method adds required logic for a “virtual” ungridded observation datasets, that is, a dataset that can only be computed from other ungridded datasets but not read from disk.

If all input parameters are okay, the new dataset will be registered in OBS_UNGRIDDED_POST and will then be accessible for import in ungridded reading factory class pyaerocom.io.ReadUngridded.

Parameters:
  • obs_id (str) – Name of new dataset.

  • obs_vars (str or list) – variables supported by this dataset.

  • obs_aux_requires (dict) – dicionary specifying required datasets and variables for each variable supported by the auxiliary dataset.

  • obs_merge_how (str or dict) – info on how to derive each of the supported coordinates (e.g. eval, combine). For valid input args see pyaerocom.combine_vardata_ungridded. If value is string, then the same method is used for all variables.

  • obs_aux_funs (dict, optional) –

    dictionary specifying computation methods for auxiliary variables that are supposed to be retrieved via obs_merge_how=’eval’. Keys are variable names, values are respective computation methods (which need to be strings as they will be evaluated via

    pandas.DataFrame.eval() in pyaerocom.combine_vardata_ungridded). This input is

    optional, but mandatory if any of the obs_vars is supposed to be retrieved via merge_how=’eval’.

  • obs_aux_units (dict, optional) – output units of auxiliary variables (only needed for varibales that are derived via merge_how=’eval’)

  • **kwargs – additional keyword arguments (unused, but serves the purpose to allow for parsing info from dictionaries and classes that contain additional attributes than the ones needed here).

Raises:

ValueError – if input obs_id is already reserved

Return type:

None.

property cache_basedir

Base directory for caching

The actual files are cached in user subdirectory, cf CACHEDIR

property ebas_flag_info

Information about EBAS flags

Note

Is loaded upon request -> cf. pyaerocom.io.ebas_nasa_ames.EbasFlagCol.FLAG_INFO

Dictionary containing 3 dictionaries (keys: `valid, values, info`) that contain information about validity of each flag (`valid`), their actual values (`values`, e.g. V, M, I)

property has_access_lustre

Boolean specifying whether MetNO AeroCom server is accessible

property has_access_users_database
infer_basedir_and_config()[source]

Boolean specifying whether the lustre database can be accessed

make_default_vert_grid()[source]

Makes default vertical grid for resampling of profile data

path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/pyaerocom/checkouts/stable/pyaerocom/data/coords.ini')
read_config(config_file, basedir=None, init_obslocs_ungridded=False, init_data_search_dirs=False)[source]

Import paths from one of the config ini files

Parameters:
  • config_file (str) – file location of config ini file

  • basedir (str, optional) – Base directory to be used for relative model and obs dirs specified via BASEDIR in config file. If None, then the BASEDIR value in the config file is used. The default is None.

  • init_obslocs_ungridded (bool, optional) – If True, OBSLOCS_UNGRIDDED will be re-instantiated (i.e. all currently set obs locations will be deleted). The default is False.

  • init_data_search_dirs (bool, optional) – If True, DATA_SEARCH_DIRS will be re-instantiated (i.e. all currently set data search directories will be deleted). The default is False.

Raises:

FileNotFoundError – If input config file is not a file or does not exist.

Return type:

None.

reload(keep_basedirs=True)[source]

Reload config file (for details see read_config())

short_str()[source]

Deprecated method

property user

User ID

Config defaults related to gridded data

class pyaerocom.grid_io.GridIO(**kwargs)[source]

Global I/O settings for gridded data

This class includes options related to the import of gridded data. This includes both options related to file search as well as preprocessing options.

FILE_TYPE

file type of data files. Defaults to .nc

Type:

str

TS_TYPES

list of strings specifying temporal resolution options encrypted in file names.

Type:

list

PERFORM_FMT_CHECKS

perform formatting checks when reading netcdf data, using metadata encoded in filenames (requires that NetCDF file follows a registered naming convention)

Type:

bool

DEL_TIME_BOUNDS

if True, preexisting bounds on time are deleted when grid data is loaded. Else, nothing is done. Aerocom default is True

Type:

bool

SHIFT_LONS

if True, longitudes are shifted to -180 <= lon <= 180 when data is loaded (in case they are defined 0 <= lon <= 360. Aerocom default is True.

Type:

bool

CHECK_TIME_FILENAME

the times stored in NetCDF files may be wrong or not stored according to the CF conventions. If True, the times are checked and if CORRECT_TIME_FILENAME, corrected for on data import based what is encrypted in the file name. In case of Aerocom models, it is ensured that the filename contains both the year and the temporal resolution in the filenames (for details see pyaerocom.io.FileConventionRead). Aerocom default is True

Type:

bool

CORRECT_TIME_FILENAME

if True and time dimension in data is found to be different from filename, it is attempted to be corrected

Type:

bool

EQUALISE_METADATA

if True (and if metadata varies between different NetCDF files that are supposed to be merged in time), the metadata in all loaded objects is unified based on the metadata of the first grid (otherwise, concatenating them in time might not work using the Iris interface). This might need to be reviewed and should be used with care if specific metadata aspects of individual files need to be accessed. Aerocom default is True

Type:

bool

USE_FILECONVENTION

if True, file names are strictly required to follow one of the file naming conventions that can be specified in the file file_conventions.ini. Aerocom default is True.

Type:

bool

INCLUDE_SUBDIRS

if True, search for files is expanded to all subdirecories included in data directory. Aerocom default is False.

Type:

bool

INFER_SURFACE_LEVEL

if True then surface level for 4D gridded data is inferred automatically when necessary (e.g. when extracting surface time series from 4D gridded data object that does not contain sufficient information about vertical dimension)

Type:

bool

UNITS_ALIASES = {'/m': 'm-1'}
from_dict(dictionary=None, **settings)[source]

Import settings from dictionary

load_aerocom_default()[source]
load_default()[source]
to_dict()[source]

Convert object to dictionary

Returns:

settings dictionary

Return type:

dict

Config details related to observations

Settings and helper methods / classes for I/O of obervation data

Note

Some settings like paths etc can be found in pyaerocom.config.py

class pyaerocom.obs_io.AuxInfoUngridded(data_id, vars_supported, aux_requires, aux_merge_how, aux_funs=None, aux_units=None)[source]
MAX_VARS_PER_METHOD = 2
check_status()[source]

Check if specifications are correct and consistent

Raises:
  • ValueError – If one of the class attributes is invalid

  • NotImplementedError – If computation method contains more than 2 variables / datasets

to_dict()[source]

Dictionary representation of this object

Ignores any potential private attributes.

pyaerocom.obs_io.OBS_ALLOW_ALT_WAVELENGTHS = True

This boolean can be used to enable / disable the former (i.e. use available wavelengths of variable in a certain range around variable wavelength).

pyaerocom.obs_io.OBS_WAVELENGTH_TOL_NM = 10.0

Wavelength tolerance for observations if data for required wavelength is not available

class pyaerocom.obs_io.ObsVarCombi(obs_id, var_name)[source]

Molar masses and related helpers

exception pyaerocom.molmasses.UnkownSpeciesError[source]
pyaerocom.molmasses.get_mmr_to_vmr_fac(var_name)[source]

Get conversion factor for MMR -> VMR conversion for input variable

Note

Assumes dry air molar mass

Parameters:

var_name (str) – Name of variable to be converted

Returns:

multiplication factor to convert MMR -> VMR

Return type:

float

pyaerocom.molmasses.get_molmass(var_name)[source]

Get molar mass for input variable

Parameters:

var_name (str) – pyaerocom variable name (cf. variables.ini) or name of species

Returns:

molar mass of species in units of g/mol

Return type:

float

pyaerocom.molmasses.get_species(var_name)[source]

Get species name from variable name

Parameters:

var_name (str) – pyaerocom variable name (cf. variables.ini)

Raises:

UnkownSpeciesError – if species cannot be inferred

Returns:

name of species

Return type:

str

Access to minimal test dataset

Low-level helper classes and functions

Small helper utility functions for pyaerocom

class pyaerocom._lowlevel_helpers.AsciiFileLoc(default=None, assert_exists=False, auto_create=False, tooltip=None, logger=None)[source]
create(value)[source]
class pyaerocom._lowlevel_helpers.BrowseDict(*args, **kwargs)[source]

Dictionary-like object with getattr and setattr options

Extended dictionary that supports dynamic value generation (i.e. if an assigned value is callable, it will be executed on demand).

ADD_GLOB = []
FORBIDDEN_KEYS = []
IGNORE_JSON = []

Keys to be ignored when converting to json

MAXLEN_KEYS = 100.0
SETTER_CONVERT = {}
import_from(other) None[source]

Import key value pairs from other object

Other than update() this method will silently ignore input keys that are not contained in this object.

Parameters:

other (dict or BrowseDict) – other dict-like object containing content to be updated.

Raises:

ValueError – If input is inalid type.

Return type:

None

items() a set-like object providing a view on D's items[source]
json_repr() dict[source]

Convert object to serializable json dict

Returns:

content of class

Return type:

dict

keys() a set-like object providing a view on D's keys[source]
pretty_str()[source]
to_dict()[source]
values() an object providing a view on D's values[source]
class pyaerocom._lowlevel_helpers.ConstrainedContainer(*args, **kwargs)[source]

Restrictive dict-like class with fixed keys

This class enables to create dict-like objects that have a fixed set of keys and value types (once assigned). Optional values may be instantiated as None, in which case the first time instantiation definecs its type.

Note

The limitations for assignments are only restricted to setitem operations and attr assignment via “.” works like in every other class.

Example

class MyContainer(ConstrainedContainer):
def __init__(self):

self.val1 = 1 self.val2 = 2 self.option = None

>>> mc = MyContainer()
>>> mc['option'] = 42
CRASH_ON_INVALID = True
class pyaerocom._lowlevel_helpers.DictStrKeysListVals[source]
validate(val: dict)[source]
class pyaerocom._lowlevel_helpers.DictType[source]
validate(val)[source]
class pyaerocom._lowlevel_helpers.DirLoc(default=None, assert_exists=False, auto_create=False, tooltip=None, logger=None)[source]
create(value)[source]
class pyaerocom._lowlevel_helpers.EitherOf(allowed: list)[source]
validate(val)[source]
class pyaerocom._lowlevel_helpers.FlexList[source]

list that can be instantated via input str, tuple or list or None

validate(val)[source]
class pyaerocom._lowlevel_helpers.ListOfStrings[source]
validate(val)[source]
class pyaerocom._lowlevel_helpers.Loc(default=None, assert_exists=False, auto_create=False, tooltip=None, logger=None)[source]

Abstract descriptor representing a path location

Descriptor??? See here: https://docs.python.org/3/howto/descriptor.html#complete-practical-example

Note

  • Child classes need to implement create()

  • value is allowed to be None in which case no checks are performed

abstract create(value)[source]
validate(value)[source]
class pyaerocom._lowlevel_helpers.NestedContainer(*args, **kwargs)[source]
keys_unnested() list[source]
update([E, ]**F) None.  Update D from mapping/iterable E and F.[source]

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

class pyaerocom._lowlevel_helpers.StrType[source]
validate(val)[source]
class pyaerocom._lowlevel_helpers.StrWithDefault(default: str)[source]
validate(val)[source]
class pyaerocom._lowlevel_helpers.TypeValidator(type)[source]
validate(val)[source]
class pyaerocom._lowlevel_helpers.Validator[source]
abstract validate(val)[source]
pyaerocom._lowlevel_helpers.check_dir_access(path)[source]

Uses multiprocessing approach to check if location can be accessed

Parameters:

loc (str) – path that is supposed to be checked

Returns:

True, if location is accessible, else False

Return type:

bool

pyaerocom._lowlevel_helpers.check_dirs_exist(*dirs, **add_dirs)[source]
pyaerocom._lowlevel_helpers.check_write_access(path)[source]

Check if input location provides write access

Parameters:

path (str) – directory to be tested

pyaerocom._lowlevel_helpers.chk_make_subdir(base, name)[source]

Check if sub-directory exists in parent directory

pyaerocom._lowlevel_helpers.dict_to_str(dictionary, indent=0, ignore_null=False)[source]

Custom function to convert dictionary into string (e.g. for print)

Parameters:
  • dictionary (dict) – the dictionary

  • indent (int) – indent of dictionary content

  • ignore_null (bool) – if True, None entries in dictionary are ignored

Returns:

the modified input string

Return type:

str

pyaerocom._lowlevel_helpers.invalid_input_err_str(argname, argval, argopts)[source]

Just a small helper to format an input error string for functions

Parameters:
  • argname (str) – name of input argument

  • argval – (invalid) value of input argument

  • argopts – possible input args for arg

Returns:

formatted string that can be parsed to an Exception

Return type:

str

pyaerocom._lowlevel_helpers.list_to_shortstr(lst, indent=0)[source]

Custom function to convert a list into a short string representation

pyaerocom._lowlevel_helpers.merge_dicts(dict1, dict2, discard_failing=True)[source]

Merge two dictionaries

Parameters:
  • dict1 (dict) – first dictionary

  • dict2 (dict) – second dictionary

  • discard_failing (bool) – if True, any key, value pair that cannot be merged from the 2nd into the first will be skipped, which means, the value of the output dict for that key will be the one of the first input dict. All keys that could not be merged can be accessed via key ‘merge_failed’ in output dict. If False, any Exceptions that may occur will be raised.

Returns:

merged dictionary

Return type:

dict

pyaerocom._lowlevel_helpers.sort_dict_by_name(d, pref_list: list | None = None) dict[source]

Sort entries of input dictionary by their names and return ordered

Parameters:
  • d (dict) – input dictionary

  • pref_list (list, optional) – preferred order of items (may be subset of keys in input dict)

Returns:

sorted and ordered dictionary

Return type:

dict

pyaerocom._lowlevel_helpers.str_underline(title: str, indent: int = 0)[source]

Create underlined string

Custom exceptions

Module containing pyaerocom custom exceptions

exception pyaerocom.exceptions.AeronetReadError[source]
exception pyaerocom.exceptions.CacheReadError[source]
exception pyaerocom.exceptions.CacheWriteError[source]
exception pyaerocom.exceptions.CachingError[source]
exception pyaerocom.exceptions.ColocationError[source]
exception pyaerocom.exceptions.ColocationSetupError[source]
exception pyaerocom.exceptions.CoordinateError[source]
exception pyaerocom.exceptions.CoordinateNameError[source]
exception pyaerocom.exceptions.DataCoverageError[source]
exception pyaerocom.exceptions.DataDimensionError[source]
exception pyaerocom.exceptions.DataExtractionError[source]
exception pyaerocom.exceptions.DataIdError[source]
exception pyaerocom.exceptions.DataQueryError[source]
exception pyaerocom.exceptions.DataRetrievalError[source]
exception pyaerocom.exceptions.DataSearchError[source]
exception pyaerocom.exceptions.DataSourceError[source]
exception pyaerocom.exceptions.DataUnitError[source]
exception pyaerocom.exceptions.DeprecationError[source]
exception pyaerocom.exceptions.DimensionOrderError[source]
exception pyaerocom.exceptions.EEAv2FileError[source]
exception pyaerocom.exceptions.EbasFileError[source]
exception pyaerocom.exceptions.EntryNotAvailable[source]
exception pyaerocom.exceptions.EvalEntryNameError[source]
exception pyaerocom.exceptions.FileConventionError[source]
exception pyaerocom.exceptions.InitialisationError[source]
exception pyaerocom.exceptions.LongitudeConstraintError[source]
exception pyaerocom.exceptions.MetaDataError[source]
exception pyaerocom.exceptions.NasaAmesReadError[source]
exception pyaerocom.exceptions.NetcdfError[source]
exception pyaerocom.exceptions.NetworkNotImplemented[source]
exception pyaerocom.exceptions.NetworkNotSupported[source]
exception pyaerocom.exceptions.NotInFileError[source]
exception pyaerocom.exceptions.ResamplingError[source]
exception pyaerocom.exceptions.StationCoordinateError[source]
exception pyaerocom.exceptions.StationNotFoundError[source]
exception pyaerocom.exceptions.TemporalResolutionError[source]
exception pyaerocom.exceptions.TemporalSamplingError[source]
exception pyaerocom.exceptions.TimeMatchError[source]
exception pyaerocom.exceptions.TimeZoneError[source]
exception pyaerocom.exceptions.UnitConversionError[source]
exception pyaerocom.exceptions.UnknownRegion[source]
exception pyaerocom.exceptions.UnresolvableTimeDefinitionError[source]

Is raised if time definition in NetCDF file is wrong and cannot be corrected

exception pyaerocom.exceptions.VarNotAvailableError[source]
exception pyaerocom.exceptions.VariableDefinitionError[source]
exception pyaerocom.exceptions.VariableNotFoundError[source]