Core API
Documentation of the core API of pyaerocom.
Logging
pyaerocom
initializes logging automatically on import in the following way.
info
-messages or worse are logged tologs/pyaerocom.log.$PID
or (dynamic feature) the file given in the environment variablePYAEROCOM_LOG_FILE
- (dynamic feature) these log-files will be deleted after 7 days.warning
-messages or worse are also printed on stdout. (dynamic feature) Output to stdout is disabled if the script is called non-interactive.
Besides the default records as defined in https://docs.python.org/3/library/logging.html#logrecord-attributes pyaerocom also adds a special mem_usage keyword to be able to detect memory-leaks of the python process early.
Putting a file with the name logging.ini
in the scripts current working directory will use that
configuration instead of above described default. An example logging.ini
doing about the same as
described above, except for the dynamic features, and enable debug
logging on one package (pyaerocom.io.ungridded
), is
provided here:
[loggers]
keys=root,pyaerocom-ungridded
[handlers]
keys=console,file
[formatters]
keys=plain,detailed
[formatter_plain]
format=%(message)s
[formatter_detailed]
format=%(asctime)s:%(name)s:%(mem_usage)s:%(levelname)s:%(message)s
datefmt=%F %T
[handler_console]
class=StreamHandler
formatter=plain
args=(sys.stdout,)
level=WARN
[handler_file]
class=FileHandler
formatter=detailed
level=DEBUG
file_name=logs/pyaerocom.log.%(pid)s
args=('%(file_name)s', "w")
[logger_root]
handlers=file,console
level=INFO
[logger_pyaerocom-ungridded]
handlers=file
qualname=pyaerocom.io.readungriddedbase
level=DEBUG
propagate=0
Data classes
Gridded data
- class pyaerocom.griddeddata.GriddedData(input=None, var_name=None, check_unit=True, convert_unit_on_init=True, proj_info: ProjectionInformation | None = None, **meta)[source]
pyaerocom object representing gridded data (e.g. model diagnostics)
Gridded data refers to data that can be represented on a regular, multidimensional grid. In
pyaerocom
this comprises both model output and diagnostics as well as gridded level 3 satellite data, typically with dimensions latitude, longitude, time (for surface or columnar data) and an additional dimension lev (or similar) for vertically resolved data.Under the hood, this data object is based on (but not inherited from) the
iris.cube.Cube
object, and makes large use of the therein implemented functionality (many methods implemented here inGriddedData
are simply wrappers for Cube methods.Note
Note that the implemented functionality in this class is mostly limited to what is needed in the pyaerocom API (e.g. for
pyaerocom.colocation
routines or data import) and is not aimed at replacing or competing with similar data classes such asiris.cube.Cube
orxarray.DataArray
. Rather, dependent on the use case, one or another of such gridded data objects is needed for optimal processing, which is whyGriddedData
provides methods and / or attributes to convert to or from other such data classes (e.g.GriddedData.cube
is an instance ofiris.cube.Cube
and methodGriddedData.to_xarray()
can be used to convert toxarray.DataArray
). Thus,GriddedData
can be considered rather high-level as compared to the other mentioned data classes from iris or xarray.Note
Since
GriddedData
object is based on theiris.cube.Cube
object it is optimised for netCDF files that follow the CF conventions and may not work out of the box for files that do not follow this standard.- Parameters:
input (
str:
orCube
) – data input. Can be a single .nc file or a preloaded iris Cube.var_name (
str
, optional) – variable name that is extracted if input is a file path. Irrelevant if input is preloaded Cubecheck_unit (bool) – if True, the assigned unit is checked and if it is an alias to another unit the unit string will be updated. It will print a warning if the unit is invalid or not equal the associated AeroCom unit for the input variable. Set convert_unit_on_init to True, if you want an automatic conversion to AeroCom units. Defaults to True.
convert_unit_on_init (bool) – if True and if unit check indicates non-conformity with AeroCom unit it will be converted automatically, and warning will be printed if that conversion fails. Defaults to True.
- COORDS_ORDER_TSERIES = ['time', 'latitude', 'longitude']
Req. order of dimension coordinates for time-series computation
- SUPPORTED_VERT_SCHEMES = ['mean', 'max', 'min', 'surface', 'altitude', 'profile']
- property TS_TYPES
List with valid filename encryptions specifying temporal resolution
- aerocom_savename(data_id=None, var_name=None, vert_code=None, year=None, ts_type=None)[source]
Get filename for saving following AeroCom conventions
- Parameters:
data_id (str, optional) – data ID used in output filename. Defaults to None, in which case
data_id
is used.var_name (str, optional) – variable name used in output filename. Defaults to None, in which case
var_name
is used.vert_code (str, optional) – vertical code used in output filename (e.g. Surface, Column, ModelLevel). Defaults to None, in which case assigned value in
metadata
is used.year (str, optional) – year to be used in filename. If None, then it is attempted to be inferred from values in time dimension.
ts_type (str, optional) – frequency string to be used in filename. If None, then
ts_type
is used.
- Raises:
ValueError – if vertical code is not provided and cannot be inferred or if year is not provided and data is not single year. Note that if year is provided, then no sanity checking is done against time dimension.
- Returns:
output filename following AeroCom Phase 3 conventions.
- Return type:
- property altitude_access
- property area_weights
Area weights of lat / lon grid
- property base_year
Base year of time dimension
Note
Changing this attribute will update the time-dimension.
- change_base_year(new_year, inplace=True)[source]
Changes base year of time dimension
Relevant, e.g. for climatological analyses.
Note
This method does not account for offsets arising from leap years ( affecting daily or higher resolution data). It is thus recommended to use this method with care. E.g. if you use this method on a 2016 daily data object, containing a calendar that supports leap years, you’ll end up with 366 time stamps also in the new data object.
- Parameters:
- Returns:
modified data object
- Return type:
- check_dimcoords_tseries() None [source]
Check order of dimension coordinates for time series retrieval
For computation of time series at certain lon / lat coordinates, the data dimensions have to be in a certain order specified by
COORDS_ORDER_TSERIES
.This method checks the current order (and dimensionality) of data and raises appropriate errors.
- Raises:
DataDimensionError – if dimension of data is not supported (currently, 3D or 4D data is supported)
NotImplementedError – if one of the required coordinates is associated with more than one dimension.
DimensionOrderError – if dimensions are not in the right order (in which case
reorder_dimensions_tseries()
may be used to catch the Exception)
- collapsed(coords, aggregator, **kwargs)[source]
Collapse cube
Reimplementation of method
iris.cube.Cube.collapsed()
, for details see here- Parameters:
coords (str or list) – string IDs of coordinate(s) that are to be collapsed (e.g.
["longitude", "latitude"]
)aggregator (str or Aggregator or WeightedAggretor) – the aggregator used. If input is string, it is converted into the corresponding iris Aggregator object, see
str_to_iris()
for valid strings**kwargs – additional keyword args (e.g.
weights
)
- Returns:
collapsed data object
- Return type:
- property computed
- property concatenated
- property coord_names
List containing coordinate names
- property coords_order
Array containing the order of coordinates
- copy_coords(other, inplace=True)[source]
Copy all coordinates from other data object
Requires the underlying data to be the same shape.
Warning
This operation will delete all existing coordinates and auxiliary coordinates and will then copy the ones from the input data object. No checks of any kind will be performed
- Parameters:
other (GriddedData or Cube) – other data object (needs to be same shape as this object)
inplace (bool) – if True, then this object will be modified and returned, else a copy.
- Returns:
data object containing coordinates from other object
- Return type:
- crop(lon_range=None, lat_range=None, time_range=None, region=None)[source]
High level function that applies cropping along multiple axes
Note
1. For cropping of longitudes and latitudes, the method
iris.cube.Cube.intersection()
is used since it automatically accepts and understands longitude input based on definition 0 <= lon <= 360 as well as for -180 <= lon <= 180 2. Time extraction may be provided directly as index or in form ofpandas.Timestamp
objects.- Parameters:
lon_range (
tuple
, optional) – 2-element tuple containing longitude range for cropping. If None, the longitude axis remains unchanged. Example input to crop around meridian: lon_range=(-30, 30)lat_range (
tuple
, optional) – 2-element tuple containing latitude range for cropping. If None, the latitude axis remains unchangedtime_range (
tuple
, optional) –2-element tuple containing time range for cropping. Allowed data types for specifying the times are
a combination of 2
pandas.Timestamp
instances ora combination of two strings that can be directly converted into
pandas.Timestamp
instances (e.g. time_range=(“2010-1-1”, “2012-1-1”)) ordirectly a combination of indices (
int
).
If None, the time axis remains unchanged.
region (
str
orRegion
, optional) – string ID of pyaerocom default region or directly an instance of theRegion
object. May be used instead oflon_range
andlat_range
, if these are unspecified.
- Returns:
new data object containing cropped grid
- Return type:
- property cube
Instance of underlying cube object
- property data
Data array (n-dimensional numpy array)
Note
This is a pointer to the data object of the underlying iris.Cube instance and will load the data into memory. Thus, in case of large datasets, this may lead to a memory error
- property data_id
ID of data object (e.g. model run ID, obsnetwork ID)
Note
This attribute was formerly named
name
which is alse the corresponding attribute name inmetadata
- property data_revision
Revision string from file Revision.txt in the main data directory
- delete_all_coords(inplace=True)[source]
Deletes all coordinates (dimension + auxiliary) in this object
- property delta_t
Array containing timedelta values for each time stamp
- property dimcoord_names
List containing coordinate names
- estimate_value_range_from_data(extend_percent=5)[source]
Estimate lower and upper end of value range for these data
- Parameters:
extend_percent (int) – percentage specifying to which extend min and max values are to be extended to estimate the value range. Defaults to 5.
- Returns:
float – lower end of estimated value range
float – upper end of estimated value range
- extract(constraint, inplace=False)[source]
Extract subset
- Parameters:
constraint (iris.Constraint) – constraint that is to be applied
- Returns:
new data object containing cropped data
- Return type:
- filter_altitude(alt_range=None)[source]
Currently dummy method that makes life easier in
Filter
- Returns:
current instance
- Return type:
- filter_region(region_id, inplace=False, **kwargs)[source]
Filter region based on ID
This works both for rectangular regions and mask regions
- Parameters:
region_id (str) – name of region
inplace (bool) – if True, the current data object is modified, else a new object is returned
**kwargs – additional keyword args passed to
apply_region_mask()
if input region is a mask.
- Returns:
filtered data object
- Return type:
- find_closest_index(**dimcoord_vals)[source]
Find the closest indices for dimension coordinate values
- property from_files
List of file paths from which this data object was created
- get_area_weighted_timeseries(region=None)[source]
Helper method to extract area weighted mean timeseries
- Parameters:
region – optional, name of AeroCom default region for which the mean is to be calculated (e.g. EUROPE)
- Returns:
station data containing area weighted mean
- Return type:
- property grid
Underlying grid data object
- property has_data
True if sum of shape of underlying Cube instance is > 0, else False
- property has_latlon_dims
Boolean specifying whether data has latitude and longitude dimensions
- property has_time_dim
Boolean specifying whether data has latitude and longitude dimensions
- infer_ts_type()[source]
Try to infer sampling frequency from time dimension data
- Returns:
ts_type that was inferred (is assigned to metadata too)
- Return type:
- Raises:
DataDimensionError – if data object does not contain a time dimension
- interpolate(sample_points=None, scheme='nearest', collapse_scalar=True, **coords)[source]
Interpolate cube at certain discrete points
Reimplementation of method
iris.cube.Cube.interpolate()
, for details see hereNote
The input coordinates may also be provided using the input arg **coords which provides a more intuitive option (e.g. input
(sample_points=[("longitude", [10, 20]), ("latitude", [1, 2])])
is the same as input(longitude=[10, 20], latitude=[1,2])
- Parameters:
sample_points (list) – sequence of coordinate pairs over which to interpolate. Sample coords should be sorted in ascending order without duplicates.
scheme (str or iris interpolator object) – interpolation scheme, pyaerocom default is nearest. If input is string, it is converted into the corresponding iris Interpolator object, see
str_to_iris()
for valid stringscollapse_scalar (bool) – Whether to collapse the dimension of scalar sample points in the resulting cube. Default is True.
**coords – additional keyword args that may be used to provide the interpolation coordinates in an easier way than using the
Cube
argument sample_points. May also be a combination of both.
- Returns:
new data object containing interpolated data
- Return type:
Examples
>>> from pyaerocom import GriddedData >>> data = GriddedData() >>> data._init_testdata_default() >>> itp = data.interpolate([("longitude", (10)), ... ("latitude" , (35))]) >>> print(itp.shape) (365, 1, 1)
- intersection(*args, **kwargs)[source]
Ectract subset using
iris.cube.Cube.intersection()
See here for details related to method and input parameters.
Note
Only works if underlying grid data type is
iris.cube.Cube
- Parameters:
*args – non-keyword args
**kwargs – keyword args
- Returns:
new data object containing cropped data
- Return type:
- property is_climatology
- property is_masked
Flag specifying whether data is masked or not
Note
This method only works if the data is loaded.
- property lat_res
- load_input(input, var_name=None, perform_fmt_checks=None)[source]
Import input as cube
- Parameters:
input (
str:
orCube
) – data input. Can be a single .nc file or a preloaded iris Cube.var_name (
str
, optional) – variable name that is extracted if input is a file path . Irrelevant if input is preloaded Cubeperform_fmt_checks (bool, optional) – perform formatting checks based on information in filenames. Only relevant if input is a file
- property lon_res
- property long_name
Long name of variable
- mean(areaweighted=True)[source]
Mean value of data array
Note
Corresponds to numerical mean of underlying N-dimensional numpy array. Does not consider area-weights or any other advanced averaging.
- mean_at_coords(latitude=None, longitude=None, time_resample_kwargs=None, **kwargs)[source]
Compute mean value at all input locations
- Parameters:
latitude (1D list or similar) – list of latitude coordinates of coordinate locations. If None, please provided coords in iris style as list of (lat, lon) tuples via coords (handled via arg kwargs)
longitude (1D list or similar) – list of longitude coordinates of coordinate locations. If None, please provided coords in iris style as list of (lat, lon) tuples via coords (handled via arg kwargs)
time_resample_kwargs (dict, optional) – time resampling arguments passed to
StationData.resample_time()
**kwargs – additional keyword args passed to
to_time_series()
- Returns:
mean value at coordinates over all times available in this object
- Return type:
- property metadata
- property name
ID of model to which data belongs
- property ndim
Number of dimensions
- property plot_settings
Variable
instance that contains plot settingsThe settings can be specified in the variables.ini file based on the unique var_name, see e.g. here
If no default settings can be found for this variable, all parameters will be initiated with
None
, in which case the Aerocom plot method uses
- property proj_info: ProjectionInformation
- quickplot_map(time_idx=0, xlim=(-180, 180), ylim=(-90, 90), add_mean=True, **kwargs)[source]
Make a quick plot onto a map
- Parameters:
time_idx (int) – index in time to be plotted
xlim (tuple) – 2-element tuple specifying plotted longitude range
ylim (tuple) – 2-element tuple specifying plotted latitude range
add_mean (bool) – if True, the mean value over the region and period is inserted
**kwargs – additional keyword arguments passed to
pyaerocom.quickplot.plot_map()
- Returns:
matplotlib figure instance containing plot
- Return type:
fig
- property reader
Instance of reader class from which this object was created
Note
Currently only supports instances of
ReadGridded
.
- regrid(other=None, lat_res_deg=None, lon_res_deg=None, scheme='areaweighted', **kwargs)[source]
Regrid this grid to grid resolution of other grid
- Parameters:
other (GriddedData or Cube, optional) – other data object to regrid to. If None, then input args lat_res and lon_res are used to regrid.
lat_res_deg (float or int, optional) – latitude resolution in degrees (is only used if input arg other is None)
lon_res_deg (float or int, optional) – longitude resolution in degrees (is only used if input arg other is None)
scheme (str) – regridding scheme (e.g. linear, neirest, areaweighted)
- Returns:
regridded data object (new instance, this object remains unchanged)
- Return type:
- remove_outliers(low=None, high=None, inplace=True)[source]
Remove outliers from data
- Parameters:
low (float) – lower end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. minimum attribute of available variables)
high (float) – upper end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. maximum attribute of available variables)
inplace (bool) – if True, this object is modified, else outliers are removed in a copy of this object
- Returns:
modified data object
- Return type:
- reorder_dimensions_tseries() None [source]
Transpose dimensions of data such that
to_time_series()
works- Raises:
DataDimensionError – if not all needed coordinates are available
NotImplementedError – if one of the required coordinates is associated with more than one dimension.
- resample_time(to_ts_type, how=None, min_num_obs=None, use_iris=False)[source]
Resample time to input resolution
- Parameters:
to_ts_type (str) – either of the supported temporal resolutions (cf.
IRIS_AGGREGATORS
inhelpers
, e.g. “monthly”)how (str) – string specifying how the data is to be aggregated, default is mean
min_num_obs (dict or int, optional) –
integer or nested dictionary specifying minimum number of observations required to resample from higher to lower frequency. For instance, if input_data is hourly and to_ts_type is monthly, you may specify something like:
min_num_obs = {'monthly' : {'daily' : 7}, 'daily' : {'hourly' : 6}}
to require at least 6 hours per day and 7 days per month.
use_iris (bool) – option to use resampling scheme from iris library rather than xarray.
- Returns:
new data object containing downscaled data
- Return type:
- Raises:
TemporalResolutionError – if input resolution is not provided, or if it is higher temporal resolution than this object
- search_other(var_name)[source]
Searches data for another variable
The search is constrained to the time period spanned by this object and it is attempted to load the same frequency. Uses
reader
(instance ofReadGridded
to search for the other variable data).- Parameters:
var_name (str) – variable to be searched
- Raises:
VariableNotFoundError – if data for input variable cannot be found.
- Returns:
input variable data
- Return type:
- sel(use_neirest=True, **dimcoord_vals)[source]
Select subset by dimension names
Note
This is a BETA version, please use with care
- Parameters:
**dimcoord_vals – key / value pairs specifying coordinate values to be extracted
- Returns:
subset data object
- Return type:
- property shape
- split_years(years=None)[source]
Generator to split data object into individual years
Note
This is a generator method and thus should be looped over
- Parameters:
years (list, optional) – List of years that should be excluded. If None, it uses output from
years_avail()
.- Yields:
GriddedData – single year data object
- property standard_name
Standard name of variable
- property start
Start time of dataset as datetime64 object
- property stop
Start time of dataset as datetime64 object
- property suppl_info
- time_stamps()[source]
Convert time stamps into list of numpy datetime64 objects
The conversion is done using method
cfunit_to_datetime64()
- Returns:
list containing all time stamps as datetime64 objects
- Return type:
- to_netcdf(out_dir, savename=None, **kwargs)[source]
Save as NetCDF file
- Parameters:
out_dir (str) – output direcory (must exist)
savename (str, optional) – name of file. If None,
aerocom_savename()
is used which is generated automatically and may be modified via **kwargs**kwargs – keywords for name
- Returns:
list of output files created
- Return type:
- to_time_series(sample_points=None, scheme='nearest', vert_scheme=None, add_meta=None, use_iris=False, **coords)[source]
Extract time-series for provided input coordinates (lon, lat)
Extract time series for each lon / lat coordinate in this cube or at predefined sample points (e.g. station data). If sample points are provided, the cube is interpolated first onto the sample points.
- Parameters:
sample_points (list) – coordinates (e.g. lon / lat) at which time series is supposed to be retrieved
scheme (str or iris interpolator object) – interpolation scheme (for details, see
interpolate()
)vert_scheme (str) – string specifying how to treat vertical coordinates. This is only relevant for data that contains vertical levels. It will be ignored otherwise. Note that if the input coordinate specifications contain altitude information, this parameter will be set automatically to ‘altitude’. Allowed inputs are all data collapse schemes that are supported by
pyaerocom.helpers.str_to_iris()
(e.g. mean, median, sum). Further valid schemes are altitude, surface, profile. If not other specified and if altitude coordinates are provided via sample_points (or **coords parameters) then, vert_scheme will be set to altitude. Else, profile is used.add_meta (dict, optional) – dictionary specifying additional metadata for individual input coordinates. Keys are meta attribute names (e.g. station_name) and corresponding values are lists (with length of input coords) or single entries that are supposed to be assigned to each station. E.g. add_meta=dict(station_name=[<list_of_station_names>])).
**coords – additional keyword args that may be used to provide the interpolation coordinates (for details, see
interpolate()
)
- Returns:
list of result dictionaries for each coordinate. Dictionary keys are:
longitude, latitude, var_name
- Return type:
- transpose(new_order)[source]
Re-order data dimensions in object
Wrapper for
iris.cube.Cube.transpose()
Note
Changes THIS object (i.e. no new instance of
GriddedData
will be created)- Parameters:
order (list) – new index order
- property ts_type
Temporal resolution of data
- property unit
Unit of data
- property unit_ok
Boolean specifying if variable unit is AeroCom default
- property units
Unit of data
- update_meta(**kwargs)[source]
Update metadata dictionary
- Parameters:
**kwargs – metadata to be added to
metadata
.
- property var_info
Print information about variable
- property var_name
Name of variable
- property var_name_aerocom
AeroCom variable name
- property vert_code
Vertical code of data (e.g. Column, Surface, ModelLevel)
Ungridded data
- class pyaerocom.ungriddeddata.UngriddedData(num_points=None, add_cols=None)[source]
Class representing point-cloud data (ungridded)
The data is organised in a 2-dimensional numpy array where the first index (rows) axis corresponds to individual measurements (i.e. one timestamp of one variable) and along the second dimension (containing 11 columns) the actual values are stored (in column 6) along with additional information, such as metadata index (can be used as key in
metadata
to access additional information related to this measurement), timestamp, latitude, longitude, altitude of instrument, variable index and, in case of 3D data (e.g. LIDAR profiles), also the altitude corresponding to the data value.Note
That said, let’s look at two examples.
Example 1: Suppose you load 3 variables from 5 files, each of which contains 30 timestamps. This corresponds to a total of 3*5*30=450 data points and hence, the shape of the underlying numpy array will be 450x11.
Example 2: 3 variables, 5 files, 30 timestamps, but each variable is height resolved, containing 100 altitudes => 3*5*30*100=4500 data points, thus, the final shape will be 4500x11.
- metadata
dictionary containing meta information about the data. Keys are floating point numbers corresponding to each station, values are corresponding dictionaries containing station information.
- meta_idx
dictionary containing index mapping for each station and variable. Keys correspond to metadata key (float -> station, see
metadata
) and values are dictionaries containing keys specifying variable name and corresponding values are arrays or lists, specifying indices (rows) of these station / variable information in_data
. Note: this information is redunant and is there to accelarate station data extraction since the data index matches for a given metadata block do not need to be searched in the underlying numpy array.
- var_idx
mapping of variable name (keys, e.g. od550aer) to numerical variable index of this variable in data numpy array (in column specified by
_VARINDEX
)
- Parameters:
- ALLOWED_VERT_COORD_TYPES = ['altitude']
- STANDARD_META_KEYS = ['filename', 'station_id', 'station_name', 'instrument_name', 'PI', 'country', 'country_code', 'ts_type', 'latitude', 'longitude', 'altitude', 'data_id', 'dataset_name', 'data_product', 'data_version', 'data_level', 'framework', 'instr_vert_loc', 'revision_date', 'website', 'ts_type_src', 'stat_merge_pref_attr']
- add_chunk(size=None)[source]
Extend the size of the data array
- Parameters:
size (
int
, optional) – number of additional rows. If None (default) or smaller than minimum chunksize specified in attribute_CHUNKSIZE
, then the latter is used.
- all_datapoints_var(var_name)[source]
Get array of all data values of input variable
- Parameters:
var_name (str) – variable name
- Returns:
1-d numpy array containing all values of this variable
- Return type:
ndarray
- Raises:
AttributeError – if variable name is not available
- property altitude
Altitudes of stations
- append(other)[source]
Append other instance of
UngriddedData
to this objectNote
Calls
merge(other, new_obj=False)()
- Parameters:
other (UngriddedData) – other data object
- Returns:
merged data object
- Return type:
- Raises:
ValueError – if input object is not an instance of
UngriddedData
- apply_filters(var_outlier_ranges=None, **filter_attributes)[source]
Extended filtering method
Combines
filter_by_meta()
and adds option to also remove outliers (keyword remove_outliers), set flagged data points to NaN (keyword set_flags_nan) and to extract individual variables (keyword var_name).- Parameters:
var_outlier_ranges (dict, optional) – dictionary specifying custom outlier ranges for individual variables.
**filter_attributes (dict) – filters that are supposed to be applied to the data. To remove outliers, use keyword remove_outliers, to set flagged values to NaN, use keyword set_flags_nan, to extract single or multiple variables, use keyword var_name. Further filter keys are assumed to be metadata specific and are passed to
filter_by_meta()
.
- Returns:
filtered data object
- Return type:
- property available_meta_keys
List of all available metadata keys
Note
This is a list of all metadata keys that exist in this dataset, but it does not mean that all of the keys are registered in all metadata blocks, especially if the data is merged from different sources with different metadata availability
- change_var_idx(var_name, new_idx)[source]
Change index that is assigned to variable
Each variable in this object has assigned a unique index that is stored in the dictionary
var_idx
and which is used internally to access data from a certain variable from the data array_data
(the indices are stored in the data column specified by_VARINDEX
, cf. class header).This index thus needs to be unique for each variable and hence, may need to be updated, when two instances of
UngriddedData
are merged (cf.merge()
).And the latter is exactrly what this function does.
- Parameters:
- Raises:
ValueError – if input
new_idx
already exist in this object as a variable index
- check_set_country()[source]
CHecks all metadata entries for availability of country information
Metadata blocks that are missing country entry will be updated based on country inferred from corresponding lat / lon coordinate. Uses
pyaerocom.geodesy.get_country_info_coords()
(library reverse-geocode) to retrieve countries. This may be errouneous close to country borders as it uses eucledian distance based on a list of known locations.Note
Metadata blocks that do not contain latitude and longitude entries are skipped.
- Returns:
list – metadata entries where country was added
list – corresponding countries that were inferred from lat / lon
- check_unit(var_name, unit=None)[source]
Check if variable unit corresponds to AeroCom unit
- Parameters:
- Raises:
MetaDataError – if unit information is not accessible for input variable name
- clear_meta_no_data(inplace=True)[source]
Remove all metadata blocks that do not have data associated with it
- Parameters:
inplace (bool) – if True, the changes are applied to this instance directly, else to a copy
- Returns:
cleaned up data object
- Return type:
- Raises:
DataCoverageError – if filtering results in empty data object
- code_lat_lon_in_float()[source]
method to code lat and lon in a single number so that we can use np.unique to determine single locations
- property contains_datasets
List of all datasets in this object
- property contains_instruments
List of all instruments in this object
- copy()[source]
Make a copy of this object
- Returns:
copy of this object
- Return type:
- Raises:
MemoryError – if copy is too big to fit into memory together with existing instance
- property countries_available
Alphabetically sorted list of country names available
- decode_lat_lon_from_float()[source]
method to decode lat and lon from a single number calculated by code_lat_lon_in_float
- extract_dataset(data_id)[source]
Extract single dataset into new instance of
UngriddedData
Calls
filter_by_meta()
.- Parameters:
data_id (str) – ID of dataset
- Returns:
new instance of ungridded data containing only data from specified input network
- Return type:
- extract_var(var_name, check_index=True)[source]
Split this object into single-var UngriddedData objects
- Parameters:
var_name (str) – name of variable that is supposed to be extracted
check_index (Bool) – Call
_check_index()
in the new data object.
- Returns:
new data object containing only input variable data
- Return type:
- extract_vars(var_names, check_index=True)[source]
Extract multiple variables from dataset
Loops over input variable names and calls
extract_var()
to retrieve single variable UngriddedData objects for each variable and then merges all of these into one object- Parameters:
- Returns:
new data object containing input variables
- Return type:
- Raises:
VarNotAvailableError – if one of the input variables is not available in this data object
- filter_altitude(alt_range)[source]
Filter altitude range
- filter_by_meta(negate=None, **filter_attributes)[source]
Flexible method to filter these data based on input meta specs
- Parameters:
negate (list or str, optional) – specified meta key(s) provided via filter_attributes that are supposed to be treated as ‘not valid’. E.g. if station_name=”bad_site” is input in filter_attributes and if station_name is listed in negate, then all metadata blocks containing “bad_site” as station_name will be excluded in output data object.
**filter_attributes – valid meta keywords that are supposed to be filtered and the corresponding filter values (or value ranges) Only valid meta keywords are considered (e.g. data_id, longitude, latitude, altitude, ts_type)
- Returns:
filtered ungridded data object
- Return type:
- Raises:
NotImplementedError – if attempt variables are supposed to be filtered (not yet possible)
IOError – if any of the input keys are not valid meta key
Example
>>> import pyaerocom as pya >>> r = pya.io.ReadUngridded(['AeronetSunV2Lev2.daily', 'AeronetSunV3Lev2.daily'], 'od550aer') >>> data = r.read() >>> data_filtered = data.filter_by_meta(data_id='AeronetSunV2Lev2.daily', ... longitude=[-30, 30], ... latitude=[20, 70], ... altitude=[0, 1000])
- filter_by_projection(projection, xrange: tuple[float, float], yrange: tuple[float, float])[source]
Filter the ungridded data to a horizontal bounding box given by a projection
- Parameters:
projection – a function turning projection(lat, lon) -> (x, y)
xrange – x range (min/max included) in the projection plane
yrange – y range (min/max included) in the projection plane
- filter_region(region_id, check_mask=True, check_country_meta=False, **kwargs)[source]
Filter object by a certain region
- Parameters:
region_id (str) – name of region (must be valid AeroCom region name or HTAP region)
check_mask (bool) – if True and region_id a valid name for a binary mask, then the filtering is done based on that binary mask.
check_country_meta (bool) – if True, then the input region_id is first checked against available country names in metadata. If that fails, it is assumed that this regions is either a valid name for registered rectangular regions or for available binary masks.
**kwargs – currently not used in method (makes usage in higher level classes such as
Filter
easier as other data objects have the same method with possibly other input possibilities)
- Returns:
filtered data object (containing only stations that fall into input region)
- Return type:
- find_common_stations(other: UngriddedData, check_vars_available=None, check_coordinates: bool = True, max_diff_coords_km: float = 0.1) dict [source]
Search common stations between two UngriddedData objects
This method loops over all stations that are stored within this object (using
metadata
) and checks if the corresponding station exists in a second instance ofUngriddedData
that is provided. The check is performed on basis of the station name, and optionally, if desired, for each station name match, the lon lat coordinates can be compared within a certain radius (defaul 0.1 km).Note
This is a beta version and thus, to be treated with care.
- Parameters:
other (UngriddedData) – other object of ungridded data
check_vars_available (
list
(or similar), optional) – list of variables that need to be available in stations of both datasetscheck_coordinates (bool) – if True, check that lon and lat coordinates of station candidates match within a certain range, specified by input parameter
max_diff_coords_km
- Returns:
dictionary where keys are meta_indices of the common station in this object and corresponding values are meta indices of the station in the other object
- Return type:
- find_station_meta_indices(station_name_or_pattern, allow_wildcards=True)[source]
Find indices of all metadata blocks matching input station name
You may also use wildcard pattern as input (e.g. Potenza)
- Parameters:
- Returns:
list containing all metadata indices that match the input station name or pattern
- Return type:
- Raises:
StationNotFoundError – if no such station exists in this data object
- property first_meta_idx
- static from_cache(data_dir, file_name)[source]
Load pickled instance of UngriddedData
- Parameters:
- Raises:
ValueError – if loading failed
- Returns:
loaded UngriddedData object. If this method is called from an instance of UngriddedData, this instance remains unchanged. You may merge the returned reloaded instance using
merge()
.- Return type:
- static from_station_data(stats, add_meta_keys=None)[source]
Create UngriddedData from input station data object(s)
- Parameters:
stats (iterator or StationData) – input data object(s)
add_meta_keys (list, optional) – list of metadata keys that are supposed to be imported from the input StationData objects, in addition to the default metadata retrieved via
StationData.get_meta()
.
- Raises:
ValueError – if any of the input data objects is not an instance of
StationData
.- Returns:
ungridded data object created from input station data objects
- Return type:
- get_variable_data(variables, start=None, stop=None, ts_type=None, **kwargs)[source]
Extract all data points of a certain variable
- property has_flag_data
Boolean specifying whether this object contains flag data
- property index
- property is_empty
Boolean specifying whether this object contains data or not
- property is_filtered
Boolean specifying whether this data object has been filtered
Note
Details about applied filtering can be found in
filter_hist
- property is_vertical_profile
Boolean specifying whether is vertical profile
- last_filter_applied()[source]
Returns the last filter that was applied to this dataset
To see all filters, check out
filter_hist
- property last_meta_idx
Index of last metadata block
- property latitude
Latitudes of stations
- property longitude
Longitudes of stations
- merge(other, new_obj=True)[source]
Merge another data object with this one
- Parameters:
other (UngriddedData) – other data object
new_obj (bool) – if True, this object remains unchanged and the merged data objects are returned in a new instance of
UngriddedData
. If False, then this object is modified
- Returns:
merged data object
- Return type:
- Raises:
ValueError – if input object is not an instance of
UngriddedData
- merge_common_meta(ignore_keys=None)[source]
Merge all meta entries that are the same
Note
If there is an overlap in time between the data, the blocks are not merged
- Parameters:
ignore_keys (list) – list containing meta keys that are supposed to be ignored
- Returns:
merged data object
- Return type:
- property nonunique_station_names
List of station names that occur more than once in metadata
- plot_station_coordinates(var_name=None, start=None, stop=None, ts_type=None, color='r', marker='o', markersize=8, fontsize_base=10, legend=True, add_title=True, **kwargs)[source]
Plot station coordinates on a map
All input parameters are optional and may be used to add constraints related to which stations are plotted. Default is all stations of all times.
- Parameters:
var_name (
str
, optional) – name of variable to be retrievedstart – start time (optional)
stop – stop time (optional). If start time is provided and stop time not, then only the corresponding year inferred from start time will be considered
ts_type (
str
, optional) – temporal resolutioncolor (str) – color of stations on map
marker (str) – marker type of stations
markersize (int) – size of station markers
fontsize_base (int) – basic fontsize
legend (bool) – if True, legend is added
add_title (bool) – if True, title will be added
**kwargs – Addifional keyword args passed to
pyaerocom.plot.plot_coordinates()
- Returns:
matplotlib axes instance
- Return type:
axes
- plot_station_timeseries(station_name, var_name, start=None, stop=None, ts_type=None, insert_nans=True, ax=None, **kwargs)[source]
Plot time series of station and variable
- Parameters:
station_name (
str
orint
) – station name or index of station in metadata dictvar_name (str) – name of variable to be retrieved
start – start time (optional)
stop – stop time (optional). If start time is provided and stop time not, then only the corresponding year inferred from start time will be considered
ts_type (
str
, optional) – temporal resolution**kwargs – Addifional keyword args passed to method
pandas.Series.plot()
- Returns:
matplotlib axes instance
- Return type:
axes
- remove_outliers(var_name, inplace=False, low=None, high=None, unit_ref=None, move_to_trash=True)[source]
Method that can be used to remove outliers from data
- Parameters:
var_name (str) – variable name
inplace (bool) – if True, the outliers will be removed in this object, otherwise a new oject will be created and returned
low (float) – lower end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. minimum attribute of available variables)
high (float) – upper end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. maximum attribute of available variables)
unit_ref (str) – reference unit for assessment of input outlier ranges: all data needs to be in that unit, else an Exception will be raised
move_to_trash (bool) – if True, then all detected outliers will be moved to the trash column of this data object (i.e. column no. specified at
UngriddedData._TRASHINDEX
).
- Returns:
ungridded data object that has all outliers for this variable removed.
- Return type:
- Raises:
ValueError – if input
move_to_trash
is True and in case for some of the measurements there is already data in the trash.
- save_as(file_name, save_dir)[source]
Save this object to disk
Note
So far, only storage as pickled object via CacheHandlerUngridded is supported, so input file_name must end with .pkl
- set_flags_nan(inplace=False)[source]
Set all flagged datapoints to NaN
- Parameters:
inplace (bool) – if True, the flagged datapoints will be set to NaN in this object, otherwise a new oject will be created and returned
- Returns:
data object that has all flagged data values set to NaN
- Return type:
- Raises:
AttributeError – if no flags are assigned
- property shape
Shape of data array
- property station_coordinates
dictionary with station coordinates
- Returns:
dictionary containing station coordinates (latitude, longitude, altitude -> values) for all stations (keys) where these parameters are accessible.
- Return type:
- property station_name
Latitudes of data
- property time
Time dimension of data
- to_station_data(meta_idx, vars_to_convert=None, start=None, stop=None, freq=None, ts_type_preferred=None, merge_if_multi=True, merge_pref_attr=None, merge_sort_by_largest=True, insert_nans=False, allow_wildcards_station_name=True, add_meta_keys=None, resample_how=None, min_num_obs=None)[source]
Convert data from one station to
StationData
- Parameters:
meta_idx (float) – index of station or name of station.
vars_to_convert (
list
orstr
, optional) – variables that are supposed to be converted. If None, use all variables that are available for this stationstart – start time, optional (if not None, input must be convertible into pandas.Timestamp)
stop – stop time, optional (if not None, input must be convertible into pandas.Timestamp)
freq (str) – pandas frequency string (e.g. ‘D’ for daily, ‘M’ for month end) or valid pyaerocom ts_type
merge_if_multi (bool) – if True and if data request results in multiple instances of StationData objects, then these are attempted to be merged into one
StationData
object usingmerge_station_data()
merge_pref_attr – only relevant for merging of multiple matches: preferred attribute that is used to sort the individual StationData objects by relevance. Needs to be available in each of the individual StationData objects. For details cf.
pref_attr
in docstring ofmerge_station_data()
. Example could be revision_date. If None, then the stations will be sorted based on the number of available data points (ifmerge_sort_by_largest
is True, which is default).merge_sort_by_largest (bool) – only relevant for merging of multiple matches: cf. prev. attr. and docstring of
merge_station_data()
method.insert_nans (bool) – if True, then the retrieved
StationData
objects are filled with NaNsallow_wildcards_station_name (bool) – if True and if input meta_idx is a string (i.e. a station name or pattern), metadata matches will be identified applying wildcard matches between input meta_idx and all station names in this object.
- Returns:
StationData object(s) containing results. list is only returned if input for meta_idx is station name and multiple matches are detected for that station (e.g. data from different instruments), else single instance of StationData. All variable time series are inserted as pandas Series
- Return type:
StationData or list
- to_station_data_all(vars_to_convert=None, start=None, stop=None, freq=None, ts_type_preferred=None, by_station_name=True, ignore_index=None, **kwargs)[source]
Convert all data to
StationData
objectsCreates one instance of
StationData
for each metadata block in this object.- Parameters:
vars_to_convert (
list
orstr
, optional) – variables that are supposed to be converted. If None, use all variables that are available for this stationstart – start time, optional (if not None, input must be convertible into pandas.Timestamp)
stop – stop time, optional (if not None, input must be convertible into pandas.Timestamp)
freq (str) – pandas frequency string (e.g. ‘D’ for daily, ‘M’ for month end) or valid pyaerocom ts_type (e.g. ‘hourly’, ‘monthly’).
by_station_name (bool) – if True, then iter over unique_station_name (and merge multiple matches if applicable), else, iter over metadata index
**kwargs – additional keyword args passed to
to_station_data()
(e.g. merge_if_multi, merge_pref_attr, merge_sort_by_largest, insert_nans)
- Returns:
4-element dictionary containing following key / value pairs:
stats: list of
StationData
objectsstation_name: list of corresponding station names
latitude: list of latitude coordinates
longitude: list of longitude coordinates
- Return type:
- property unique_station_names
List of unique station names
Co-located data
- class pyaerocom.colocation.colocated_data.ColocatedData(data: Path | str | xr.DataArray | np.ndarray | None = None, **kwargs)[source]
Class representing colocated and unified data from two sources
Sources may be instances of
UngriddedData
orGriddedData
that have been compared to each other.Note
It is intended that this object can either be instantiated from scratch OR created in and returned by pyaerocom objects / methods that perform colocation. This is particauarly true as pyaerocom will now be expected to read in colocated files created outside of pyaerocom. (Related CAMS2_82 development)
The purpose of this object is not the creation of colocated objects, but solely the analysis of such data as well as I/O features (e.g. save as / read from .nc files, convert to pandas.DataFrame, plot station time series overlays, scatter plots, etc.).
In the current design, such an object comprises 3 or 4 dimensions, where the first dimension (data_source, index 0) is ALWAYS length 2 and specifies the two datasets that were co-located (index 0 is obs, index 1 is model). The second dimension is time and in case of 3D colocated data the 3rd dimension is station_name while for 4D colocated data the 3rd and 4th dimension are latitude and longitude, respectively.
3D colocated data is typically created when a model is colocated with station based ground based observations ( cf
pyaerocom.colocation.colocate_gridded_ungridded()
) while 4D colocated data is created when a model is colocated with another model or satellite observations, that cover large parts of Earth’s surface (other than discrete lat/lon pairs in the case of ground based station locations).- Parameters:
data (xarray.DataArray or numpy.ndarray or str, optional) – Colocated data. If str, then it is attempted to be loaded from file. Else, it is assumed that data is numpy array and that all further supplementary inputs (e.g. coords, dims) for the instantiation of
DataArray
is provided via **kwargs.**kwargs – Additional keyword args that are passed to init of
DataArray
in case input data is numpy array.
- Raises:
ValidationError – if init fails
- apply_country_filter(region_id, use_country_code=False, inplace=False)[source]
Apply country filter
- Parameters:
- Raises:
NotImplementedError – if data is 4D (i.e. it has latitude and longitude dimensions).
- Returns:
filtered data object.
- Return type:
- apply_latlon_filter(lat_range=None, lon_range=None, region_id=None, inplace=False)[source]
Apply rectangular latitude/longitude filter
- Parameters:
lat_range (list, optional) – latitude range that is supposed to be applied. If specified, then also lon_range need to be specified, else, region_id is checked against AeroCom default regions (and used if applicable)
lon_range (list, optional) – longitude range that is supposed to be applied. If specified, then also lat_range need to be specified, else, region_id is checked against AeroCom default regions (and used if applicable)
region_id (str) – name of region to be applied. If provided (i.e. not None) then input args lat_range and lon_range are ignored
inplace (bool, optional) – Apply filter to this object directly or to a copy. The default is False.
- Raises:
ValueError – if lower latitude bound exceeds upper latitude bound.
- Returns:
filtered data object
- Return type:
- apply_region_mask(region_id, inplace=False)[source]
Apply a binary regions mask filter to data object. Available binary regions IDs can be found at pyaerocom.const.HTAP_REGIONS.
- Parameters:
- Raises:
DataCoverageError – if filtering results in empty data object.
- Returns:
data – Filtered data object.
- Return type:
- property area_weights
Wrapper for
calc_area_weights()
- calc_area_weights()[source]
Calculate area weights
Note
Only applies to colocated data that has latitude and longitude dimension.
- Returns:
array containing weights for each datapoint (same shape as self.data[0])
- Return type:
ndarray
- calc_nmb_array()[source]
Calculate data array with normalised bias (NMB) values
- Returns:
NMBs at each coordinate
- Return type:
DataArray
- calc_spatial_statistics(aggr=None, use_area_weights=False, **kwargs)[source]
Calculate spatial statistics from model and obs data
Spatial statistics is computed by averaging first the time dimension and then, if data is 4D, flattening lat / lon dimensions into new station_name dimension, so that the resulting dimensions are data_source and station_name. These 2D data are then used to calculate standard statistics using
pyaerocom.stats.stats.calculate_statistics()
.See also
calc_statistics()
andcalc_temporal_statistics()
.- Parameters:
aggr (str, optional) – aggreagator to be used, currently only mean and median are supported. Defaults to mean.
use_area_weights (bool) – if True and if data is 4D (i.e. has lat and lon dimension), then area weights are applied when caluclating the statistics based on the coordinate cell sizes. Defaults to False.
**kwargs – additional keyword args passed to
pyaerocom.stats.stats.calculate_statistics()
- Returns:
dictionary containing statistical parameters
- Return type:
- calc_statistics(use_area_weights=False, **kwargs)[source]
Calculate statistics from model and obs data
Calculate standard statistics for model assessment. This is done by taking all model and obs data points in this object as input for
pyaerocom.stats.stats.calculate_statistics()
. For instance, if the object is 3D with dimensions data_source (obs, model), time (e.g. 12 monthly values) and station_name (e.g. 4 sites), then the input arrays for model and obs intopyaerocom.stats.stats.calculate_statistics()
will be each of size 12x4.See also
calc_temporal_statistics()
andcalc_spatial_statistics()
.- Parameters:
use_area_weights (bool) – if True and if data is 4D (i.e. has lat and lon dimension), then area weights are applied when caluclating the statistics based on the coordinate cell sizes. Defaults to False.
**kwargs – additional keyword args passed to
pyaerocom.stats.stats.calculate_statistics()
- Returns:
dictionary containing statistical parameters
- Return type:
- calc_temporal_statistics(aggr=None, **kwargs)[source]
Calculate temporal statistics from model and obs data
Temporal statistics is computed by averaging first the spatial dimension(s) (that is, station_name for 3D data, and latitude and longitude for 4D data), so that only data_source and time remains as dimensions. These 2D data are then used to calculate standard statistics using
pyaerocom.stats.stats.calculate_statistics()
.See also
calc_statistics()
andcalc_spatial_statistics()
.
- check_set_countries(inplace=True, assign_to_dim=None)[source]
Checks if country information is available and assigns if not
If not country information is available, countries will be assigned for each lat / lon coordinate using
pyaerocom.geodesy.get_country_info_coords()
.- Parameters:
- Raises:
DataDimensionError – If data is 4D (i.e. if latitude and longitude are othorgonal dimensions)
- Returns:
data object with countries assigned
- Return type:
- property coords
Coordinates of data array
- property countries_available
Alphabetically sorted list of country names available
- Raises:
MetaDataError – if no country information is available
- Returns:
list of countries available in these data
- Return type:
- property country_codes_available
Alphabetically sorted list of country codes available
- Raises:
MetaDataError – if no country information is available
- Returns:
list of countries available in these data
- Return type:
- property data_source
Coordinate array containing data sources (z-axis)
- property dims
Names of dimensions
- filter_altitude(alt_range, inplace=False)[source]
Apply altitude filter
- Parameters:
- Raises:
NotImplementedError – If data is 4D, i.e. it contains latitude and longitude dimensions.
- Returns:
Filtered data object .
- Return type:
- filter_region(region_id, check_mask=True, check_country_meta=False, inplace=False)[source]
Filter object by region
- Parameters:
region_id (str) – ID of region
inplace (bool) – if True, the filtering is done directly in this instance, else a new instance is returned
check_mask (bool) – if True and region_id a valid name for a binary mask, then the filtering is done based on that binary mask.
check_country_meta (bool) – if True, then the input region_id is first checked against available country names in metadata. If that fails, it is assumed that this regions is either a valid name for registered rectangular regions or for available binary masks.
- Returns:
filtered data object
- Return type:
- flatten_latlondim_station_name()[source]
Stack (flatten) lat / lon dimension into new dimension station_name
- Returns:
new colocated data object with dimension station_name and lat lon arrays as additional coordinates
- Return type:
- static from_dataframe(df: DataFrame) ColocatedData [source]
Create colocated Data object from dataframe
Note
This is intended to be used as back-conversion from
to_dataframe()
and methods that use the latter (e.g.to_csv()
).
- get_coords_valid_obs()[source]
Get latitude / longitude coordinates where obsdata is available
- Returns:
list – latitute coordinates
list – longitude coordinates
- get_country_codes()[source]
Get country names and codes for all locations contained in these data
- Raises:
MetaDataError – if no country information is available
- Returns:
dictionary of unique country names (keys) and corresponding country codes (values)
- Return type:
- static get_meta_from_filename(file_path)[source]
Get meta information from file name
Note
This does not yet include IDs of model and obs data as these should be included in the data anyways (e.g. column names in CSV file) and may include the delimiter _ in their name.
- Returns:
dicitonary with meta information
- Return type:
- get_meta_item(key: str)[source]
Get metadata value
- Parameters:
key (str) – meta item key.
- Raises:
AttributeError – If key is not available.
- Returns:
value of metadata.
- Return type:
- get_regional_timeseries(region_id, **filter_kwargs)[source]
Compute regional timeseries both for model and obs
- Parameters:
region_id (str) – name of region for which regional timeseries is supposed to be retrieved
**filter_kwargs – additional keyword args passed to
filter_region()
.
- Returns:
dictionary containing regional timeseries for model (key mod) and obsdata (key obs) and name of region.
- Return type:
- get_time_resampling_settings()[source]
Returns a dictionary with relevant settings for temporal resampling
- Return type:
- property has_latlon_dims
Boolean specifying whether data has latitude and longitude dimensions
- property has_time_dim
Boolean specifying whether data has a time dimension
- property lat_range
Latitude range covered by this data object
- property latitude
Array of latitude coordinates
- property lon_range
Longitude range covered by this data object
- property longitude
Array of longitude coordinates
- max()[source]
Wrapper for
xarray.DataArray.max()
called fromdata
- Returns:
maximum of data
- Return type:
- property metadata
Meta data dictionary (wrapper to
data.attrs
- min()[source]
Wrapper for
xarray.DataArray.min()
called fromdata
- Returns:
minimum of data
- Return type:
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'protected_namespaces': (), 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property model_name
- property ndim
Dimension of data array
- property num_coords
Total number of lat/lon coordinate pairs
- property num_coords_with_data
Number of lat/lon coordinate pairs that contain at least one datapoint
Note
Occurrence of valid data is only checked for obsdata (first index in data_source dimension).
- property obs_name
- open(file_path)[source]
High level helper for reading from supported file sources
- Parameters:
file_path (str) – file path
- plot_coordinates(marker='x', markersize=12, fontsize_base=10, **kwargs)[source]
Plot station coordinates
Uses
pyaerocom.plot.plotcoordinates.plot_coordinates()
.- Parameters:
marker (str, optional) – matplotlib marker name used to plot site locations. The default is ‘x’.
markersize (int, optional) – Size of site markers. The default is 12.
fontsize_base (int, optional) – Basic fontsize. The default is 10.
**kwargs – additional keyword args passed to
pyaerocom.plot.plotcoordinates.plot_coordinates()
- Return type:
matplotlib.axes.Axes
- plot_scatter(**kwargs)[source]
Create scatter plot of data
- Parameters:
**kwargs – keyword args passed to
pyaerocom.plot.plotscatter.plot_scatter()
- Returns:
matplotlib axes instance
- Return type:
Axes
- rename_variable(var_name, new_var_name, data_source, inplace=True)[source]
Rename a variable in this object
- Parameters:
- Returns:
instance with renamed variable
- Return type:
- Raises:
VarNotAvailableError – if input variable is not available in this object
DataSourceError – if input data_source is not available in this object
- resample_time(to_ts_type, how=None, min_num_obs=None, colocate_time=False, settings_from_meta=False, inplace=False, **kwargs)[source]
Resample time dimension
The temporal resampling is done using
TimeResampler
- Parameters:
to_ts_type (str) – desired output frequency.
how (str or dict, optional) – aggregator used for resampling (e.g. max, min, mean, median). Can also be hierarchical scheme via dict, similar to min_num_obs. The default is None.
min_num_obs (int or dict, optional) – Minimum number of observations required to resample from current frequency (
ts_type
) to desired output frequency.colocate_time (bool, optional) – If True, the modeldata is invalidated where obs is NaN, before resampling. The default is False (updated in v0.11.0, before was True).
settings_from_meta (bool) – if True, then input args how, min_num_obs and colocate_time are ignored and instead the corresponding values set in
metadata
are used. Defaults to False.inplace (bool, optional) – If True, modify this object directly, else make a copy and resample that one. The default is False (updated in v0.11.0, before was True).
**kwargs – Addtitional keyword args passed to
TimeResampler.resample()
.
- Returns:
Resampled colocated data object.
- Return type:
- property savename_aerocom
Default save name for data object following AeroCom convention
- set_zeros_nan(inplace=True)[source]
Replace all 0’s with NaN in data
- Parameters:
inplace (bool) – Whether to modify this object or a copy. The default is True.
- Returns:
cd – modified data object
- Return type:
- property shape
Shape of data array
- stack(inplace=False, **kwargs)[source]
Stack one or more dimensions
For details see
xarray.DataArray.stack()
.- Parameters:
inplace (bool) – modify this object or a copy.
**kwargs – input arguments passed to
DataArray.stack()
- Returns:
stacked data object
- Return type:
- property start
Start datetime of data
- property stop
Stop datetime of data
- property time
Array containing time stamps
- to_csv(out_dir, savename=None)[source]
Save data object as .csv file
Converts data to pandas.DataFrame and then saves as csv
- Parameters:
out_dir (str) – output directory
savename (
str
, optional) – name of file, if None, the default save name is used (cf.savename_aerocom
)
- to_dataframe()[source]
Convert this object into pandas.DataFrame
The resulting DataFrame will have the following columns: station: The name of the station for a given value.
The following columns will be available in the resulting dataframe: - time: Time. - station_name: Station name. - data_source_obs: Data source obs (eg. EBASMC). - data_source_mod: Data source model (eg. EMEP). - latitude. - longitude. - altitude. - {var_name}_obs: Variable value of observation. - {var_name}_mod: Variable value of model.
{var_name} is the aerocom variable name of the variable name.
- to_netcdf(out_dir, savename=None, **kwargs)[source]
Save data object as NetCDF file
Wrapper for method
xarray.DataArray.to_netdcf()
- Parameters:
out_dir (str) – output directory
savename (str, optional) – name of file, if None, the default save name is used (cf.
savename_aerocom
)**kwargs – additional, optional keyword arguments passed to
xarray.DataArray.to_netdcf()
- Returns:
file path of stored object.
- Return type:
- property ts_type
String specifying temporal resolution of data
- property units
Unit of data
- property unitstr
String representation of obs and model units in this object
- unstack(inplace=False, **kwargs)[source]
Unstack one or more dimensions
For details see
xarray.DataArray.unstack()
.- Parameters:
inplace (bool) – modify this object or a copy.
**kwargs – input arguments passed to
DataArray.unstack()
- Returns:
unstacked data object
- Return type:
- property var_name
Coordinate array containing data sources (z-axis)
Station data
- class pyaerocom.stationdata.StationData(**meta_info)[source]
Dict-like base class for single station data
ToDo: write more detailed introduction
Note
Variable data (e.g. numpy array or pandas Series) can be directly assigned to the object. When assigning variable data it is recommended to add variable metadata (e.g. unit, ts_type) in
var_info
, where key is variable name and value is dict with metadata entries.- data_err
dictionary that may be used to store uncertainty timeseries or data arrays associated with the different variable data.
- Type:
- overlap
dictionary that may be filled to store overlapping timeseries data associated with one variable. This is, for instance, used in
merge_vardata()
to store overlapping data from another station.- Type:
- PROTECTED_KEYS = ['dtime', 'var_info', 'station_coords', 'data_err', 'overlap', 'numobs', 'data_flagged']
Keys that are ignored when accessing metadata
- STANDARD_COORD_KEYS = ['latitude', 'longitude', 'altitude']
List of keys that specify standard metadata attribute names. This is used e.g. in
get_meta()
- STANDARD_META_KEYS = ['filename', 'station_id', 'station_name', 'instrument_name', 'PI', 'country', 'country_code', 'ts_type', 'latitude', 'longitude', 'altitude', 'data_id', 'dataset_name', 'data_product', 'data_version', 'data_level', 'framework', 'instr_vert_loc', 'revision_date', 'website', 'ts_type_src', 'stat_merge_pref_attr']
- VALID_TS_TYPES = ['minutely', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'native', 'coarsest']
- calc_climatology(var_name, start=None, stop=None, min_num_obs=None, clim_mincount=None, clim_freq=None, set_year=None, resample_how=None)[source]
Calculate climatological timeseries for input variable
- Parameters:
var_name (str) – name of data variable
start – start time of data used to compute climatology
stop – start time of data used to compute climatology
min_num_obs (dict or int, optional) – minimum number of observations required per period (when downsampling). For details see
pyaerocom.time_resampler.TimeResampler.resample()
)clim_micount (int, optional) – minimum number of of monthly values required per month of climatology
set_year (int, optional) – if specified, the output data will be assigned the input year. Else the middle year of the climatological interval is used.
resample_how (str) – how should the resampled data be averaged (e.g. mean, median)
**kwargs – Additional keyword args passed to
pyaerocom.time_resampler.TimeResampler.resample()
- Returns:
new instance of StationData containing climatological data
- Return type:
- check_unit(var_name, unit=None)[source]
Check if variable unit corresponds to a certain unit
- Parameters:
- Raises:
MetaDataError – if unit information is not accessible for input variable name
UnitConversionError – if current unit cannot be converted into specified unit (e.g. 1 vs m-1)
DataUnitError – if current unit is not equal to input unit but can be converted (e.g. 1/Mm vs 1/m)
- check_var_unit_aerocom(var_name)[source]
Check if unit of input variable is AeroCom default, if not, convert
- Parameters:
var_name (str) – name of variable
- Raises:
MetaDataError – if unit information is not accessible for input variable name
UnitConversionError – if current unit cannot be converted into specified unit (e.g. 1 vs m-1)
DataUnitError – if current unit is not equal to AeroCom default and cannot be converted.
- convert_unit(var_name, to_unit)[source]
Try to convert unit of data
Requires that unit of input variable is available in
var_info
- Parameters:
- Raises:
MetaDataError – if variable unit cannot be accessed
UnitConversionError – if conversion failed
- property default_vert_grid
AeroCom default grid for vertical regridding
For details, see
DEFAULT_VERT_GRID_DEF
inConfig
- Returns:
numpy array specifying default coordinates
- Return type:
ndarray
- dist_other(other)[source]
Distance to other station in km
- Parameters:
other (StationData) – other data object
- Returns:
distance between this and other station in km
- Return type:
- get_meta(force_single_value=True, quality_check=True, add_none_vals=False, add_meta_keys=None)[source]
Return meta-data as dictionary
By default, only default metadata keys are considered, use parameter add_meta_keys to add additional metadata.
- Parameters:
force_single_value (bool) – if True, then each meta value that is list or array,is converted to a single value.
quality_check (bool) – if True, and coordinate values are lists or arrays, then the standarad deviation in the values is compared to the upper limits allowed in the local variation. The upper limits are specified in attr.
COORD_MAX_VAR
.add_none_vals (bool) – Add metadata keys which have value set to None.
add_meta_keys (str or list, optional) – Add none-standard metadata.
- Returns:
dictionary containing the retrieved meta-data
- Return type:
- Raises:
AttributeError – if one of the meta entries is invalid
MetaDataError – in case of consistencies in meta data between individual time-stamps
- get_station_coords(force_single_value=True)[source]
Return coordinates as dictionary
This method uses the standard coordinate names defined in
STANDARD_COORD_KEYS
(latitude, longitude and altitude) to get the station coordinates. For each of these parameters tt first looks instation_coords
if the parameter is defined (i.e. it is not None) and if not it checks if this object has an attribute that has this name and uses that one.- Parameters:
force_single_value (bool) – if True and coordinate values are lists or arrays, then they are collapsed to single value using mean
- Returns:
dictionary containing the retrieved coordinates
- Return type:
- Raises:
AttributeError – if one of the coordinate values is invalid
CoordinateError – if local variation in either of the three spatial coordinates is found too large
- get_unit(var_name)[source]
Get unit of variable data
- Parameters:
var_name (str) – name of variable
- Returns:
unit of variable
- Return type:
- Raises:
MetaDataError – if unit cannot be accessed for variable
- get_var_ts_type(var_name, try_infer=True)[source]
Get ts_type for a certain variable
Note
Converts to ts_type string if assigned ts_type is in pandas format
- Parameters:
- Returns:
the corresponding data time resolution
- Return type:
- Raises:
MetaDataError – if no metadata is available for this variable (e.g. if
var_name
cannot be found invar_info
)
- insert_nans_timeseries(var_name)[source]
Fill up missing values with NaNs in an existing time series
Note
This method does a resample of the data onto a regular grid. Thus, if the input
ts_type
is different from the actual currentts_type
of the data, this method will not only insert NaNs but at the same.- Parameters:
- Returns:
the modified station data object
- Return type:
- merge_meta_same_station(other, coord_tol_km=None, check_coords=True, inplace=True, add_meta_keys=None, raise_on_error=False)[source]
Merge meta information from other object
Note
Coordinate attributes (latitude, longitude and altitude) are not copied as they are required to be the same in both stations. The latter can be checked and ensured using input argument
check_coords
- Parameters:
other (StationData) – other data object
coord_tol_km (float) – maximum distance in km between coordinates of input StationData object and self. Only relevant if
check_coords
is True. If None, then_COORD_MAX_VAR
is used which is defined in the class header.check_coords (bool) – if True, the coordinates are compared and checked if they are lying within a certain distance to each other (cf.
coord_tol_km
).inplace (bool) – if True, the metadata from the other station is added to the metadata of this station, else, a new station is returned with the merged attributes.
add_meta_keys (str or list, optional) – additional non-standard metadata keys that are supposed to be considered for merging.
raise_on_error (bool) – if True, then an Exception will be raised in case one of the metadata items cannot be merged, which is most often due to unresolvable type differences of metadata values between the two objects
- merge_other(other, var_name, add_meta_keys=None, **kwargs)[source]
Merge other station data object
- Parameters:
other (StationData) – other data object
var_name (str) – variable name for which info is to be merged (needs to be both available in this object and the provided other object)
add_meta_keys (str or list, optional) – additional non-standard metadata keys that are supposed to be considered for merging.
kwargs – keyword args passed on to
merge_vardata()
(e.g time resampling settings)
- Returns:
this object that has merged the other station
- Return type:
- merge_vardata(other, var_name, **kwargs)[source]
Merge variable data from other object into this object
Note
This merges also the information about this variable in the dict
var_info
. It is required, that variable meta-info is specified in both StationData objects.Note
This method removes NaN’s from the existing time series in the data objects. In order to fill up the time-series with NaNs again after merging, call
insert_nans_timeseries()
- Parameters:
other (StationData) – other data object
var_name (str) – variable name for which info is to be merged (needs to be both available in this object and the provided other object)
kwargs – keyword args passed on to
_merge_vardata_2d()
- Returns:
this object merged with other object
- Return type:
- merge_varinfo(other, var_name)[source]
Merge variable specific meta information from other object
- Parameters:
other (StationData) – other data object
var_name (str) – variable name for which info is to be merged (needs to be both available in this object and the provided other object)
- plot_timeseries(var_name, add_overlaps=False, legend=True, tit=None, **kwargs)[source]
Plot timeseries for variable
Note
If you set input arg
add_overlaps = True
the overlapping timeseries data - if it exists - will be plotted on top of the actual timeseries using red colour and dashed line. As the overlapping data may be identical with the actual data, you might want to increase the line width of the actual timeseries using an additional input argumentlw=4
, or similar.- Parameters:
- Returns:
matplotlib.axes instance of plot
- Return type:
axes
- Raises:
KeyError – if variable key does not exist in this dictionary
ValueError – if length of data array does not equal the length of the time array
- remove_outliers(var_name, low=None, high=None, check_unit=True)[source]
Remove outliers from one of the variable timeseries
- Parameters:
var_name (str) – variable name
low (float) – lower end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. minimum attribute of available variables)
high (float) – upper end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. maximum attribute of available variables)
check_unit (bool) – if True, the unit of the data is checked against AeroCom default
- remove_variable(var_name)[source]
Remove variable data
- Parameters:
var_name (str) – name of variable that is to be removed
- Returns:
current instance of this object, with data removed
- Return type:
- Raises:
VarNotAvailableError – if the input variable is not available in this object
- resample_time(var_name, ts_type, how=None, min_num_obs=None, inplace=False, **kwargs)[source]
Resample one of the time-series in this object
- Parameters:
var_name (str) – name of data variable
ts_type (str) – new frequency string (can be pyaerocom ts_type or valid pandas frequency string)
how (str) – how should the resampled data be averaged (e.g. mean, median)
min_num_obs (dict or int, optional) – minimum number of observations required per period (when downsampling). For details see
pyaerocom.time_resampler.TimeResampler.resample()
)inplace (bool) – if True, then the current data object stored in self, will be overwritten with the resampled time-series
**kwargs – Additional keyword args passed to
pyaerocom.time_resampler.TimeResampler.resample()
- Returns:
with resampled variable timeseries
- Return type:
- resample_timeseries(var_name, **kwargs)[source]
Wrapper for
resample_time()
(for backwards compatibility)Note
For backwards compatibility, this method will return a pandas Series instead of the actual StationData object
- same_coords(other, tol_km=None)[source]
Compare station coordinates of other station with this station
- Parameters:
other (StationData) – other data object
tol_km (float) – distance tolerance in km
- Returns:
if True, then the two object are located within the specified tolerance range
- Return type:
- select_altitude(var_name, altitudes)[source]
Extract variable data within certain altitude range
Note
Beta version
- Parameters:
- Returns:
data object within input altitude range
- Return type:
pandas. Series or xarray.DataArray
- to_timeseries(var_name, **kwargs)[source]
Get pandas.Series object for one of the data columns
- Parameters:
var_name (str) – name of variable (e.g. “od550aer”)
- Returns:
time series object
- Return type:
Series
- Raises:
KeyError – if variable key does not exist in this dictionary
ValueError – if length of data array does not equal the length of the time array
- property units
Dictionary containing units of all variables in this object
- property vars_available
Number of variables available in this data object
Other data classes
- class pyaerocom.vertical_profile.VerticalProfile(data: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], altitude: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], dtime, var_name: str, data_err: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None, var_unit: str, altitude_unit: str)[source]
Object representing single variable profile data
- property altitude
Array containing altitude values corresponding to data
- property data
Array containing data values corresponding to data
- property data_err
Array containing data values corresponding to data
Co-location routines
High-level co-location engine
Classes and methods to perform high-level colocation.
- class pyaerocom.colocation.colocator.Colocator(colocation_setup: ColocationSetup | dict, **kwargs)[source]
High level class for running co-location
Note
This object requires and instance from
ColocationSetup
.- get_model_name()[source]
Get name of model
Note
Not to be confused with
model_id
which is always the database ID of the model, while model_name can differ from that and is used for output files, etc.- Raises:
AttributeError – If neither model_id or model_name are set
- Returns:
preferably
model_name
, elsemodel_id
- Return type:
- get_nc_files_in_coldatadir()[source]
Get list of NetCDF files in colocated data directory
- Returns:
list of NetCDF file paths found
- Return type:
- get_obs_name()[source]
Get name of obsdata source
Note
Not to be confused with
obs_id
which is always the database ID of the observation dataset, while obs_name can differ from that and is used for output files, etc.- Raises:
AttributeError – If neither obs_id or obs_name are set
- Returns:
preferably
obs_name
, elseobs_id
- Return type:
- property model_reader
Model data reader
- property model_vars
List of all model variables specified in config
Note
This method does not check if the variables are valid or available.
- Returns:
list of all model variables specified in this setup.
- Return type:
- property obs_reader
Observation data reader
- prepare_run(var_list: list | None = None) dict [source]
Prepare colocation run for current setup.
- Parameters:
var_name (str, optional) – Variable name that is supposed to be analysed. The default is None, in which case all defined variables are attempted to be colocated.
- Raises:
AttributeError – If no observation variables are defined (
obs_vars
empty).- Returns:
vars_to_process – Mapping of variables to be processed, keys are model vars, values are obs vars.
- Return type:
- run(var_list: list | None = None)[source]
Perform colocation for current setup
See also
prepare_run()
.- Parameters:
var_list (list, optional) – list of variables supposed to be analysed. The default is None, in which case all defined variables are attempted to be colocated.
- Returns:
nested dictionary, where keys are model variables, values are dictionaries comprising key / value pairs of obs variables and associated instances of
ColocatedData
.- Return type:
- class pyaerocom.colocation.colocation_setup.ColocationSetup(model_id: str | None = None, pyaro_config: PyaroConfig | None = None, obs_id: str | None = None, obs_vars: tuple[str, ...] | None = (), ts_type: str = 'monthly', start: Timestamp | int | None = None, stop: Timestamp | int | None = None, basedir_coldata: str = '/home/docs/MyPyaerocom/colocated_data', save_coldata: bool = False, *, OBS_VERT_TYPES_ALT: dict[str, str] = {'2D': '2D', 'Surface': 'ModelLevel'}, CRASH_ON_INVALID: bool = False, FORBIDDEN_KEYS: list[str] = ['var_outlier_ranges', 'var_ref_outlier_ranges', 'remove_outliers'], filter_name: str = 'ALL-wMOUNTAINS', obs_name: str | None = None, obs_data_dir: Path | str | None = None, obs_use_climatology: bool = False, obs_cache_only: bool = False, obs_vert_type: str | None = None, obs_ts_type_read: str | dict | None = None, obs_filters: dict = {}, colocation_layer_limits: tuple[LayerLimits, ...] | None = None, profile_layer_limits: tuple[LayerLimits, ...] | None = None, read_opts_ungridded: dict | None = {}, model_name: str | None = None, model_data_dir: Path | str | None = None, model_read_opts: dict | None = {}, model_use_vars: dict[str, str] | None = {}, model_rename_vars: dict[str, str] | None = {}, model_add_vars: dict[str, tuple[str, ...]] | None = {}, model_to_stp: bool = False, model_ts_type_read: str | dict | None = None, model_read_aux: dict[str, dict[Literal['vars_required', 'fun'], list[str] | Callable]] | None = {}, model_use_climatology: bool = False, gridded_reader_id: dict[str, str] = {'model': 'ReadGridded', 'obs': 'ReadGridded'}, flex_ts_type: bool = True, min_num_obs: dict | int | None = None, resample_how: str | dict | None = 'mean', obs_remove_outliers: bool = False, model_remove_outliers: bool = False, obs_outlier_ranges: dict[str, tuple[float, float]] | None = {}, model_outlier_ranges: dict[str, tuple[float, float]] | None = {}, zeros_to_nan: bool = False, harmonise_units: bool = False, regrid_res_deg: float | RegridResDeg | None = None, colocate_time: bool = False, reanalyse_existing: bool = True, raise_exceptions: bool = False, keep_data: bool = True, add_meta: dict | None = {}, model_kwargs: dict = {}, main_freq: str = 'monthly', freqs: list[str] = ['monthly', 'yearly'])[source]
Setup class for high-level model / obs co-location.
An instance of this setup class can be used to run a colocation analysis between a model and an observation network and will create a number of
pya.ColocatedData
instances, which can be saved automatically as NetCDF files.Apart from co-location, this class also handles reading of the input data for co-location. Supported co-location options are:
1. gridded vs. ungridded data For instance 3D model data (instance of
GriddedData
) with lat, lon and time dimension that is co-located with station based observations which are represented in pyaerocom throughUngriddedData
objects. The co-location function used ispyaerocom.colocation.colocated_gridded_ungridded()
. For this type of co-location, the output co-located data object will be 3-dimensional, with dimensions data_source (index 0: obs, index 1: model), time and station_name.2. gridded vs. gridded data For instance 3D model data that is co-located with 3D satellite data (both instances of
GriddedData
), both objects with lat, lon and time dimensions. The co-location function used ispyaerocom.colocation.colocated_gridded_gridded()
. For this type of co-location, the output co-located data object will be 4-dimensional, with dimensions data_source (index 0: obs, index 1: model), time and latitude and longitude.- pyaro_config
In the case Pyaro is used, a config must be provided. In that case obs_id(see below) is ignored and only the config is used.
- Type:
PyaroConfig
- obs_vars
Variables to be analysed (need to be available in input obs dataset). Variables that are not available in the model data output will be skipped. Alternatively, model variables to be used for a given obs variable can also be specified via attributes
model_use_vars
andmodel_add_vars
.
- start
Start time of colocation. Input can be integer denoting the year or anything that can be converted into
pandas.Timestamp
usingpyaerocom.helpers.to_pandas_timestamp()
. If None, than the first available date in the model data is used.- Type:
pandas._libs.tslibs.timestamps.Timestamp | int | str | None
- stop
stop time of colocation. int or anything that can be converted into
pandas.Timestamp
usingpyaerocom.helpers.to_pandas_timestamp()
or None. If None and ifstart
is on resolution of year (e.g.start=2010
) thenstop
will be automatically set to the end of that year. Else, it will be set to the last available timestamp in the model data.- Type:
pandas._libs.tslibs.timestamps.Timestamp | int | str | None
- filter_name
name of filter to be applied. If None, no filter is used (to be precise, if None, then
pyaerocom.const.DEFAULT_REG_FILTER
is used which should default to ALL-wMOUNTAINS, that is, no filtering).- Type:
- obs_name
if provided, this string will be used in colocated data filename to specify obsnetwork, else obs_id will be used.
- Type:
str, optional
- obs_data_dir
location of obs data. If None, attempt to infer obs location based on obs ID.
- Type:
str, optional
- obs_use_climatology
BETA if True, pyaerocom default climatology is computed from observation stations (so far only possible for unrgidded / gridded colocation).
- Type:
- obs_vert_type
AeroCom vertical code encoded in the model filenames (only AeroCom 3 and later). Specifies which model file should be read in case there are multiple options (e.g. surface level data can be read from a Surface.nc file as well as from a ModelLevel.nc file). If input is string (e.g. ‘Surface’), then the corresponding vertical type code is used for reading of all variables that are colocated (i.e. that are specified in
obs_vars
).- Type:
- obs_ts_type_read
may be specified to explicitly define the reading frequency of the observation data (so far, this does only apply to gridded obsdata such as satellites), either as str (same for all obs variables) or variable specific as dict. For ungridded reading, the frequency may be specified via
obs_id
, where applicable (e.g. AeronetSunV3Lev2.daily). Not to be confused withts_type
, which specifies the frequency used for colocation. Can be specified variable specific in form of dictionary.
- obs_filters
filters applied to the observational dataset before co-location. In case of gridded / gridded, these are filters that can be passed to
pyaerocom.io.ReadGridded.read_var()
, for instance, flex_ts_type, or constraints. In case the obsdata is ungridded (gridded / ungridded co-locations) these are filters that are handled through keyword filter_post inpyaerocom.io.ReadUngridded.read()
. These filters are applied to theUngriddedData
objects after reading and caching the data, so changing them, will not invalidate the latest cache of theUngriddedData
.- Type:
- read_opts_ungridded
dictionary that specifies reading constraints for ungridded reading, and are passed as **kwargs to
pyaerocom.io.ReadUngridded.read()
. Note that - other than for obs_filters these filters are applied during the reading of theUngriddedData
objects and specifying them will deactivate caching.- Type:
dict, optional
- model_name
if provided, this string will be used in colocated data filename to specify model, else obs_id will be used.
- Type:
str, optional
- model_data_dir
Location of model data. If None, attempt to infer model location based on model ID.
- Type:
str, optional
- model_read_opts
options for model reading (passed as keyword args to
pyaerocom.io.ReadUngridded.read()
).- Type:
dict, optional
- model_use_vars
dictionary that specifies mapping of model variables. Keys are observation variables, values are the corresponding model variables (e.g. model_use_vars=dict(od550aer=’od550csaer’)). Example: your observation has var od550aer but your model model uses a different variable name for that variable, say od550. Then, you can specify this via model_use_vars = {‘od550aer’ : ‘od550’}. NOTE: in this case, a model variable od550aer will be ignored, even if it exists (cf
model_add_vars
).- Type:
dict, optional
- model_rename_vars
rename certain model variables after co-location, before storing the associated
ColocatedData
object on disk. Keys are model variables, values are new names (e.g. model_rename_vars={‘od550aer’:’MyAOD’}). Note: this does not impact which variables are read from the model.- Type:
dict, optional
- model_add_vars
additional model variables to be processed for one obs variable. E.g. model_add_vars={‘od550aer’: [‘od550so4’, ‘od550gt1aer’]} would co-locate both model SO4 AOD (od550so4) and model coarse mode AOD (od550gt1aer) with total AOD (od550aer) from obs (in addition to od550aer vs od550aer if applicable).
- Type:
dict, optional
- model_to_stp
ALPHA (please do not use): convert model data values to STP conditions after co-location. Note: this only works for very particular settings at the moment and needs revision, as it relies on access to meteorological data.
- Type:
- model_ts_type_read
may be specified to explicitly define the reading frequency of the model data, either as str (same for all obs variables) or variable specific as dict. Not to be confused with
ts_type
, which specifies the output frequency of the co-located data.
- model_read_aux
may be used to specify additional computation methods of variables from models. Keys are variables to be computed, values are dictionaries with keys vars_required (list of required variables for computation of var and fun (method that takes list of read data objects and computes and returns var).
- Type:
dict, optional
- model_use_climatology
if True, attempt to use climatological model data field. Note: this only works if model data is in AeroCom conventions (climatological fields are indicated with 9999 as year in the filename) and if this is active, only single year analysis are supported (i.e. provide int to
start
to specify the year and leavestop
empty).- Type:
- model_kwargs
Key word arguments to be given to the model reader class’s read_var and init function
- Type:
- gridded_reader_id
BETA: dictionary specifying which gridded reader is supposed to be used for model (and gridded obs) reading. Note: this is a workaround solution and will likely be removed in the future when the gridded reading API is more harmonised (see https://github.com/metno/pyaerocom/issues/174).
- Type:
- flex_ts_type
Bboolean specifying whether reading frequency of gridded data is allowed to be flexible. This includes all gridded data, whether it is model or gridded observation (e.g. satellites). Defaults to True.
- Type:
- min_num_obs
time resampling constraints applied, defaults to None, in which case no constraints are applied. For instance, say your input is in daily resolution and you want output in monthly and you want to make sure to have roughly 50% daily coverage for the monthly averages. Then you may specify min_num_obs=15 which will ensure that at least 15 daily averages are available to compute a monthly average. However, you may also define a hierarchical scheme that first goes from daily to weekly and then from weekly to monthly, via a dict. E.g. min_num_obs=dict(monthly=dict(weekly=4), weekly=dict(daily=3)) would ensure that each week has at least 3 daily values, as well as that each month has at least 4 weekly values.
- resample_how
string specifying how data should be aggregated when resampling in time. Default is “mean”. Can also be a nested dictionary, e.g. resample_how={‘conco3’: ‘daily’: {‘hourly’ : ‘max’}} would use the maximum value to aggregate from hourly to daily for variable conco3, rather than the mean.
- obs_remove_outliers
if True, outliers are removed from obs data before colocation, else not. Default is False. Custom outlier ranges for each variable can be specified via
obs_outlier_ranges
, and for all other variables, the pyaerocom default outlier ranges are used. The latter are specified in variables.ini file via minimum and maximum attributes and can also be accessed throughpyaerocom.variable.Variable.minimum
andpyaerocom.variable.Variable.maximum
, respectively.- Type:
- model_remove_outliers
if True, outliers are removed from model data (normally this should be set to False, as the models are supposed to be assessed, including outlier cases). Default is False. Custom outlier ranges for each variable can be specified via
model_outlier_ranges
, and for all other variables, the pyaerocom default outlier ranges are used. The latter are specified in variables.ini file via minimum and maximum attributes and can also be accessed throughpyaerocom.variable.Variable.minimum
andpyaerocom.variable.Variable.maximum
, respectively.- Type:
- obs_outlier_ranges
dictionary specifying outlier ranges for individual obs variables. (e.g. dict(od550aer = [-0.05, 10], ang4487aer=[0,4])). Only relevant if
obs_remove_outliers
is True.- Type:
dict, optional
- model_outlier_ranges
like
obs_outlier_ranges
but for model variables. Only relevant ifmodel_remove_outliers
is True.- Type:
dict, optional
- zeros_to_nan
If True, zero’s in output co-located data object will be converted to NaN. Default is False.
- Type:
- harmonise_units
if True, units are attempted to be harmonised during co-location (note: raises Exception if True and in case units cannot be harmonised).
- Type:
- regrid_res_deg
regrid resolution in degrees. If specified, the input gridded data objects will be regridded in lon / lat dimension to the input resolution (if input is float, both lat and lon are regridded to that resolution, if input is dict, use keys lat_res_deg and lon_res_deg to specify regrid resolutions, respectively). Default is None.
- colocate_time
if True and if obs and model sampling frequency (e.g. daily) are higher than output colocation frequency (e.g. monthly), then the datasets are first colocated in time (e.g. on a daily basis), before the monthly averages are calculated. Default is False.
- Type:
- reanalyse_existing
if True, always redo co-location, even if there is already an existing co-located NetCDF file (under the output location specified by
basedir_coldata
) for the given variable combination to be co-located. If False and output already exists, then co-location is skipped for the associated variable. This flag is also used for contour-plots. Default is True.- Type:
- raise_exceptions
if True, Exceptions that may occur for individual variables to be processed, are raised, else the analysis is skipped for such cases.
- Type:
- keep_data
if True, then all colocated data objects computed when running
run()
will be stored indata
. Defaults to True.- Type:
- add_meta
additional metadata that is supposed to be added to each output
ColocatedData
object.- Type:
- main_freq
Main output frequency for AeroVal (some of the AeroVal processing steps are only done for this resolution, since they would create too much output otherwise, such as statistics timeseries or scatter plot in “Overall Evaluation” tab on AeroVal). Note that this frequency needs to be included in next setting “freqs”.
- Type:
- CRASH_ON_INVALID: bool
do not raise Exception if invalid item is attempted to be assigned (Overwritten from base class)
- OBS_VERT_TYPES_ALT: dict[str, str]
Dictionary specifying alternative vertical types that may be used to read model data. E.g. consider the variable is ec550aer, obs_vert_type=’Surface’ and obs_vert_type_alt=dict(Surface=’ModelLevel’). Now, if a model that is used for the analysis does not contain a data file for ec550aer at the surface (’ec550aer*Surface.nc’), then, the colocation routine will look for ‘ec550aer*ModelLevel.nc’ and if this exists, it will load it and extract the surface level.
- add_glob_meta(**kwargs)[source]
Add global metadata to
add_meta
- Parameters:
kwargs – metadata to be added
- Return type:
None
- model_config: ClassVar[ConfigDict] = {'allow': 'extra', 'arbitrary_types_allowed': True, 'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Low-level co-location functions
Methods and / or classes to perform colocation
- pyaerocom.colocation.colocation_utils.colocate_gridded_gridded(data, data_ref, ts_type=None, start=None, stop=None, filter_name=None, regrid_res_deg: float | RegridResDeg | None = None, harmonise_units=True, regrid_scheme: str = 'areaweighted', update_baseyear_gridded=None, min_num_obs=None, colocate_time=False, resample_how=None, **kwargs)[source]
Colocate 2 gridded data objects
- Parameters:
data (GriddedData) – gridded data (e.g. model results)
data_ref (GriddedData) – reference data (e.g. gridded satellite object) that is co-located with data. observation data or other model)
ts_type (str, optional) – desired temporal resolution of output colocated data (e.g. “monthly”). Defaults to None, in which case the highest possible resolution is used.
start (str or datetime64 or similar, optional) – start time for colocation, if None, the start time of the input
GriddedData
object is usedstop (str or datetime64 or similar, optional) – stop time for colocation, if None, the stop time of the input
GriddedData
object is usedfilter_name (str, optional) – string specifying filter used (cf.
pyaerocom.filter.Filter
for details). If None, then it is set to ‘ALL-wMOUNTAINS’, which corresponds to no filtering (world with mountains). Use ALL-noMOUNTAINS to exclude mountain sites.regrid_res_deg (int or dict, optional) – regrid resolution in degrees. If specified, the input gridded data objects will be regridded in lon / lat dimension to the input resolution (if input is integer, both lat and lon are regridded to that resolution, if input is dict, use keys lat_res_deg and lon_res_deg to specify regrid resolutions, respectively).
harmonise_units (bool) – if True, units are attempted to be harmonised (note: raises Exception if True and units cannot be harmonised). Defaults to True.
regrid_scheme (str) – iris scheme used for regridding (defaults to area weighted regridding)
update_baseyear_gridded (int, optional) – optional input that can be set in order to redefine the time dimension in the first gridded data object `data`to be analysed. E.g., if the data object is a climatology (one year of data) that has set the base year of the time dimension to a value other than the specified input start / stop time this may be used to update the time in order to make co-location possible.
min_num_obs (int or dict, optional) – minimum number of observations for resampling of time
colocate_time (bool) – if True and if original time resolution of data is higher than desired time resolution (ts_type), then both datasets are colocated in time before resampling to lower resolution.
resample_how (str or dict) – string specifying how data should be aggregated when resampling in time. Default is “mean”. Can also be a nested dictionary, e.g. resample_how={‘daily’: {‘hourly’ : ‘max’}} would use the maximum value to aggregate from hourly to daily, rather than the mean.
**kwargs – additional keyword args (not used here, but included such that factory class can handle different methods with different inputs)
- Returns:
instance of colocated data
- Return type:
- pyaerocom.colocation.colocation_utils.colocate_gridded_ungridded(data, data_ref, ts_type=None, start=None, stop=None, filter_name=None, regrid_res_deg: float | RegridResDeg | None = None, harmonise_units=True, regrid_scheme: str = 'areaweighted', var_ref=None, update_baseyear_gridded=None, min_num_obs=None, colocate_time=False, use_climatology_ref=False, resample_how=None, **kwargs)[source]
Colocate gridded with ungridded data (low level method)
For high-level colocation see
pyaerocom.colocation.Colocator
andpyaerocom.ColocationSetup
Note
Uses the variable that is contained in input
GriddedData
object (since these objects only contain a single variable). If this variable is not contained in observation data (or contained but using a different variable name) you may specify the obs variable to be used via input arg var_ref- Parameters:
data (GriddedData) – gridded data object (e.g. model results).
data_ref (UngriddedData) – ungridded data object (e.g. observations).
ts_type (str) – desired temporal resolution of colocated data (must be valid AeroCom ts_type str such as daily, monthly, yearly.).
start (
str
ordatetime64
or similar, optional) – start time for colocation, if None, the start time of the inputGriddedData
object is used.stop (
str
ordatetime64
or similar, optional) – stop time for colocation, if None, the stop time of the inputGriddedData
object is usedfilter_name (str) – string specifying filter used (cf.
pyaerocom.filter.Filter
for details). If None, then it is set to ‘ALL-wMOUNTAINS’, which corresponds to no filtering (world with mountains). Use ALL-noMOUNTAINS to exclude mountain sites.regrid_res_deg (int or dict, optional) – regrid resolution in degrees. If specified, the input gridded data object will be regridded in lon / lat dimension to the input resolution (if input is integer, both lat and lon are regridded to that resolution, if input is dict, use keys lat_res_deg and lon_res_deg to specify regrid resolutions, respectively).
harmonise_units (bool) – if True, units are attempted to be harmonised (note: raises Exception if True and units cannot be harmonised).
var_ref (
str
, optional) – variable against which data in arg data is supposed to be compared. If None, then the same variable is used (i.e. data.var_name).update_baseyear_gridded (int, optional) – optional input that can be set in order to re-define the time dimension in the gridded data object to be analysed. E.g., if the data object is a climatology (one year of data) that has set the base year of the time dimension to a value other than the specified input start / stop time this may be used to update the time in order to make colocation possible.
min_num_obs (int or dict, optional) – minimum number of observations for resampling of time
colocate_time (bool) – if True and if original time resolution of data is higher than desired time resolution (ts_type), then both datasets are colocated in time before resampling to lower resolution.
use_climatology_ref (bool) – if True, climatological timeseries are used from observations
resample_how (str or dict) – string specifying how data should be aggregated when resampling in time. Default is “mean”. Can also be a nested dictionary, e.g. resample_how={‘daily’: {‘hourly’ : ‘max’}} would use the maximum value to aggregate from hourly to daily, rather than the mean.
**kwargs – additional keyword args (passed to
UngriddedData.to_station_data_all()
)
- Returns:
instance of colocated data
- Return type:
- Raises:
VarNotAvailableError – if grid data variable is not available in ungridded data object
AttributeError – if instance of input
UngriddedData
object contains more than one datasetTimeMatchError – if gridded data time range does not overlap with input time range
ColocationError – if none of the data points in input
UngriddedData
matches the input colocation constraints
- pyaerocom.colocation.colocation_utils.correct_model_stp_coldata(coldata, p0=None, t0=273.15, inplace=False)[source]
Correct modeldata in colocated data object to STP conditions
Note
BETA version, quite unelegant coded (at 8pm 3 weeks before IPCC deadline), but should do the job for 2010 monthly colocated data files (AND NOTHING ELSE)!
- pyaerocom.colocation.colocation_utils.resolve_var_name(data)[source]
Check variable name of GriddedData against AeroCom default
Checks whether the variable name set in the data corresponds to the AeroCom variable name, or whether it is an alias. Returns both the variable name set and the AeroCom variable name.
- Parameters:
data (GriddedData) – Data to be checked.
- Returns:
str – variable name as set in data (may be alias, but may also be AeroCom variable name, in which case first and second return parameter are the same).
str – corresponding AeroCom variable name
Methods and / or classes to perform 3D colocation
- class pyaerocom.colocation.colocation_3d.ColocatedDataLists(colocateddata_for_statistics, colocateddata_for_profile_viz)[source]
- colocateddata_for_profile_viz: list[ColocatedData]
Alias for field number 1
- colocateddata_for_statistics: list[ColocatedData]
Alias for field number 0
- pyaerocom.colocation.colocation_3d.colocate_vertical_profile_gridded(data, data_ref, ts_type: str | None = None, start: str | None = None, stop: str | None = None, filter_name: str | None = None, regrid_res_deg: float | RegridResDeg | None = None, harmonise_units: bool = True, regrid_scheme: str = 'areaweighted', var_ref: str | None = None, update_baseyear_gridded: int | None = None, min_num_obs: int | dict | None = None, colocate_time: bool = False, use_climatology_ref: bool = False, resample_how: str | dict | None = None, colocation_layer_limits: tuple[LayerLimits, ...] | None = None, profile_layer_limits: tuple[LayerLimits, ...] | None = None, **kwargs) ColocatedDataLists [source]
Colocated vertical profile data with gridded (model) data
The guts of this function are placed in a helper function as not to repeat the code. This is done because colocation must occur twice:
at the the statistics are computed
at a finder vertical resoltuion for profile vizualization
Some things you do not want to compute twice, however. So (most of) the things that correspond to both colocation instances are computed here, and then passed to the helper function.
- Returns
colocated_data_lists : ColocatedDataLists
Co-locating ungridded observations
- pyaerocom.combine_vardata_ungridded.combine_vardata_ungridded(data_ids_and_vars, match_stats_how='closest', match_stats_tol_km=1, merge_how='combine', merge_eval_fun=None, var_name_out=None, data_id_out=None, var_unit_out=None, resample_how=None, min_num_obs=None, add_meta_keys=None)[source]
Combine and colocate different variables from UngriddedData
This method allows to combine different variable timeseries from different ungridded observation records in multiple ways. The source data may be all included in a single instance of UngriddedData or in multiple, for details see first input parameter :param:`data_ids_and_vars`. Merging can be done in flexible ways, e.g. by combining measurements of the same variable from 2 different datasets or by computing new variables based on 2 measured variables (e.g. concox=concno2+conco3). Doing this requires colocation of site locations and timestamps of both input observation records, which is done in this method.
It comprises 2 major steps:
- Compute list of
StationData
objects for both input data combinations (data_id1 & var1; data_id2 & var2) and based on these, find the coincident locations. Finding coincident sites can either be done based on site location name or based on their lat/lon locations. The method to use can be specified via input arg :param:`match_stats_how`.
- Compute list of
- For all coincident locations, a new instance of
StationData
is computed that has merged the 2 timeseries in the way that can be specified through input args :param:`merge_how` and :param:`merge_eval_fun`. If the 2 original timeseries from both sites come in different temporal resolutions, they will be resampled to the lower of both resolutions. Resampling constraints that are supposed to be applied in that case can be provided via the respective input args for temporal resampling. Default is pyaerocom default, which corresponds to ~25% coverage constraint (as of 22.10.2020) for major resolution steps, such as daily->monthly.
- For all coincident locations, a new instance of
Note
Currently, only 2 variables can be combined to a new one (e.g. concox=conco3+concno2).
Note
Be aware of unit conversion issues that may arise if your input data is not in AeroCom default units. For details see below.
- Parameters:
data_ids_and_vars (list) – list of 3 element tuples, each containing, in the following order 1. instance of
UngriddedData
; 2. dataset ID (remember that UngriddedData can contain more than one dataset); and 3. variable name. Note that currently only 2 of such tuples can be combined.match_stats_how (str, optional) – String specifying how site locations are supposed to be matched. The default is ‘closest’. Supported are ‘closest’ and ‘station_name’.
match_stats_tol_km (float, optional) – radius tolerance in km for matching site locations when using ‘closest’ for site location matching. The default is 1.
merge_how (str, optional) – String specifying how to merge variable data at site locations. The default is ‘combine’. If both input variables are the same and combine is used, then the first input variable will be preferred over the other. Supported are ‘combine’, ‘mean’ and ‘eval’, for the latter, merge_eval_fun needs to be specified explicitly.
merge_eval_fun (str, optional) – String specifying how var1 and var2 data should be evaluated (only relevant if merge_how=’eval’ is used) . The default is None. E.g. if one wants to retrieve the column aerosol fine mode fraction at 550nm (fmf550aer) through AERONET, this could be done through the SDA product by prodiding data_id1 and var1 are ‘AeronetSDA’ and ‘od550aer’ and second input data_id2 and var2 are ‘AeronetSDA’ and ‘od550lt1aer’ and merge_eval_fun could then be ‘fmf550aer=(AeronetSDA;od550lt1aer/AeronetSDA;od550aer)*100’. Note that the input variables will be converted to their AeroCom default units, so the specification of merge_eval_fun should take that into account in case the originally read obsdata is not in default units.
var_name_out (str, optional) – Name of output variable. Default is None, in which case it is attempted to be inferred.
data_id_out (str, optional) – data_id set in output StationData objects. Default is None, in which case it is inferred from input data_ids (e.g. in above example of merge_eval_fun, the output data_id would be ‘AeronetSDA’ since both input IDs are the same.
var_unit_out (str) – unit of output variable.
resample_how (str, optional) – String specifying how temporal resampling should be done. The default is ‘mean’.
min_num_obs (int or dict, optional) – Minimum number of observations for temporal resampling. The default is None in which case pyaerocom default is used, which is available via pyaerocom.const.OBS_MIN_NUM_RESAMPLE.
add_meta_keys (list, optional) – additional metadata keys to be added to output StationData objects from input data. If None, then only the pyaerocom default keys are added (see StationData.STANDARD_META_KEYS).
- Raises:
ValueError – If input for merge_how or match_stats_how is invalid.
NotImplementedError – If one of the input UngriddedData objects contains more than one dataset.
- Returns:
merged_stats – list of StationData objects containing the colocated and combined variable data.
- Return type:
Reading of gridded data
Gridded data specifies any dataset that can be represented and stored on a
regular grid within a certain domain (e.g. lat, lon time), for instance, model
output or level 3 satellite data, stored, for instance, as NetCDF files.
In pyaerocom, the underlying data object is GriddedData
and
pyaerocom supports reading of such data for different file naming conventions.
Gridded data using AeroCom conventions
- class pyaerocom.io.readgridded.ReadGridded(data_id=None, data_dir=None, file_convention='aerocom3')[source]
Class for reading gridded files using AeroCom file conventions
- data_id
string ID for model or obsdata network (see e.g. Aerocom interface map plots lower left corner)
- Type:
- data
imported data object
- Type:
- start
start time for data import
- Type:
- stop
stop time for data import
- Type:
- file_convention
class specifying details of the file naming convention for the model
- Type:
FileConventionRead
- files
list containing all filenames that were found. Filled, e.g. in
ReadGridded.get_model_files()
- Type:
- from_files
List of all netCDF files that were used to concatenate the current data cube (i.e. that can be based on certain matching settings such as var_name or time interval).
- Type:
- ts_types
list of all sampling frequencies (e.g. hourly, daily, monthly) that were inferred from filenames (based on Aerocom file naming convention) of all files that were found
- Type:
- vars
list containing all variable names (e.g. od550aer) that were inferred from filenames based on Aerocom model file naming convention
- Type:
- Parameters:
data_id (str) – string ID of model (e.g. “AATSR_SU_v4.3”,”CAM5.3-Oslo_CTRL2016”)
data_dir (str, optional) – directory containing data files. If provided, only this directory is considered for data files, else the input data_id is used to search for the corresponding directory.
file_convention (str) – string ID specifying the file convention of this model (cf. installation file file_conventions.ini)
init (bool) – if True, the model directory is searched (
search_data_dir()
) on instantiation and if it is found, all valid files for this model are searched usingsearch_all_files()
.
- AUX_ADD_ARGS = {'concprcpoxn': {'prlim': 0.0001, 'prlim_set_under': nan, 'prlim_units': 'm d-1', 'ts_type': 'daily'}, 'concprcpoxs': {'prlim': 0.0001, 'prlim_set_under': nan, 'prlim_units': 'm d-1', 'ts_type': 'daily'}, 'concprcprdn': {'prlim': 0.0001, 'prlim_set_under': nan, 'prlim_units': 'm d-1', 'ts_type': 'daily'}}
Additional arguments passed to computation methods for auxiliary data This is optional and defined per-variable like in AUX_FUNS
- AUX_ALT_VARS = {'ac550dryaer': ['ac550aer'], 'od440aer': ['od443aer'], 'od870aer': ['od865aer']}
- AUX_FUNS = {'ang4487aer': <function compute_angstrom_coeff_cubes>, 'angabs4487aer': <function compute_angstrom_coeff_cubes>, 'conc*': <function multiply_cubes>, 'concNhno3': <function calc_concNhno3_from_vmr>, 'concNnh3': <function calc_concNnh3_from_vmr>, 'concNnh4': <function calc_concNnh4>, 'concNno3pm10': <function calc_concNno3pm10>, 'concNno3pm25': <function calc_concNno3pm25>, 'concNtnh': <function calc_concNtnh>, 'concNtno3': <function calc_concNtno3>, 'concno3': <function add_cubes>, 'concno3pm10': <function calc_concno3pm10>, 'concno3pm25': <function calc_concno3pm25>, 'concox': <function add_cubes>, 'concprcpoxn': <function compute_concprcp_from_pr_and_wetdep>, 'concprcpoxs': <function compute_concprcp_from_pr_and_wetdep>, 'concprcprdn': <function compute_concprcp_from_pr_and_wetdep>, 'concsspm10': <function add_cubes>, 'concsspm25': <function calc_sspm25>, 'dryoa': <function add_cubes>, 'fmf550aer': <function divide_cubes>, 'mmr*': <function mmr_from_vmr>, 'od550gt1aer': <function subtract_cubes>, 'sc550dryaer': <function subtract_cubes>, 'vmrox': <function add_cubes>, 'wetoa': <function add_cubes>}
- AUX_REQUIRES = {'ang4487aer': ('od440aer', 'od870aer'), 'angabs4487aer': ('abs440aer', 'abs870aer'), 'conc*': ('mmr*', 'rho'), 'concNhno3': ('vmrhno3',), 'concNnh3': ('vmrnh3',), 'concNnh4': ('concnh4',), 'concNno3pm10': ('concno3f', 'concno3c'), 'concNno3pm25': ('concno3f', 'concno3c'), 'concNtnh': ('concnh4', 'vmrnh3'), 'concNtno3': ('concno3f', 'concno3c', 'vmrhno3'), 'concno3': ('concno3c', 'concno3f'), 'concno3pm10': ('concno3f', 'concno3c'), 'concno3pm25': ('concno3f', 'concno3c'), 'concox': ('concno2', 'conco3'), 'concprcpoxn': ('wetoxn', 'pr'), 'concprcpoxs': ('wetoxs', 'pr'), 'concprcprdn': ('wetrdn', 'pr'), 'concsspm10': ('concss25', 'concsscoarse'), 'concsspm25': ('concss25', 'concsscoarse'), 'dryoa': ('drypoa', 'drysoa'), 'fmf550aer': ('od550lt1aer', 'od550aer'), 'mmr*': ('vmr*',), 'od550gt1aer': ('od550aer', 'od550lt1aer'), 'rho': ('ts', 'ps'), 'sc550dryaer': ('ec550dryaer', 'ac550dryaer'), 'vmrox': ('vmrno2', 'vmro3'), 'wetoa': ('wetpoa', 'wetsoa')}
- CONSTRAINT_OPERATORS = {'!=': <ufunc 'not_equal'>, '<': <ufunc 'less'>, '<=': <ufunc 'less_equal'>, '==': <ufunc 'equal'>, '>': <ufunc 'greater'>, '>=': <ufunc 'greater_equal'>}
- property TS_TYPES
List with valid filename encryptions specifying temporal resolution
Update 7.11.2019: not in use anymore due to improved handling of all possible frequencies now using TsType class.
- VERT_ALT = {'Surface': 'ModelLevel'}
- apply_read_constraint(data, constraint, **kwargs)[source]
Filter a GriddeData object by value in another variable
Note
BETA version, that was hacked down in a rush to be able to apply AOD>0.1 threshold when reading AE.
- Parameters:
data (GriddedData) – data object to which constraint is applied
constraint (dict) – dictionary defining read constraint (see
check_constraint_valid()
for minimum requirement). If constraint contains key var_name (not mandatory), then the corresponding variable is attemted to be read and is used to evaluate constraint and the corresponding boolean mask is then applied to input data. Wherever this mask is True (i.e. constraint is met), the current value in input data will be replaced with numpy.ma.masked or, if specified, with entry new_val in input constraint dict.**kwargs (TYPE) – reading arguments in case additional variable data needs to be loaded, to determine filter mask (i.e. if var_name is specified in input constraint). Parse to
read_var()
.
- Raises:
ValueError – If constraint is invalid (cf.
check_constraint_valid()
for details).- Returns:
modified data objects (all grid-points that met constraint are replaced with either numpy.ma.masked or with a value that can be specified via key new_val in input constraint).
- Return type:
- browser
This object can be used to
- check_compute_var(var_name)[source]
Check if variable name belongs to family that can be computed
For instance, if input var_name is concdust this method will check
AUX_REQUIRES
to see if there is a variable family pattern (conc*) defined that specifies how to compute these variables. If a match is found, the required variables and computation method is added viaadd_aux_compute()
.
- check_constraint_valid(constraint)[source]
Check if reading constraint is valid
- Parameters:
constraint (dict) – reading constraint. Requires at lest entries for following keys: - operator (str): for valid operators see
CONSTRAINT_OPERATORS
- filter_val (float): value against which data is evaluated wrt to operator- Raises:
ValueError – If constraint is invalid
- Return type:
None.
- compute_var(var_name, start=None, stop=None, ts_type=None, experiment=None, vert_which=None, flex_ts_type=True, prefer_longer=False, vars_to_read=None, aux_fun=None, try_convert_units=True, aux_add_args=None, rename_var=None, **kwargs)[source]
Compute auxiliary variable
Like
read_var()
but for auxiliary variables (cf. AUX_REQUIRES)- Parameters:
var_name (str) – variable that are supposed to be read
start (Timestamp or str, optional) – start time of data import (if valid input, then the current
start
will be overwritten)stop (Timestamp or str, optional) – stop time of data import
ts_type (str) – string specifying temporal resolution (choose from hourly, 3hourly, daily, monthly). If None, prioritised of the available resolutions is used
experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)
vert_which (str) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel)
flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable
prefer_longer (bool) – if True and applicable, the ts_type resulting in the longer time coverage will be preferred over other possible frequencies that match the query.
try_convert_units (bool) – if True, units of GriddedData objects are attempted to be converted to AeroCom default. This applies both to the GriddedData objects being read for computation as well as the variable computed from the forme objects. This is, for instance, useful when computing concentration in precipitation from wet deposition and precipitation amount.
rename_var (str) – if this is set, the var_name attribute of the output GriddedData object will be updated accordingly.
**kwargs – additional keyword args passed to
_load_var()
- Returns:
loaded data object
- Return type:
- concatenate_cubes(cubes)[source]
Concatenate list of cubes into one cube
- Parameters:
CubeList – list of individual cubes
- Returns:
Single cube that contains concatenated cubes from input list
- Return type:
Cube
- Raises:
iris.exceptions.ConcatenateError – if concatenation of all cubes failed
- property file_type
File type of data files
- filter_files(var_name=None, ts_type=None, start=None, stop=None, experiment=None, vert_which=None, is_at_stations=False, df=None)[source]
Filter file database
- Parameters:
var_name (str) – variable that are supposed to be read
ts_type (str) – string specifying temporal resolution (choose from “hourly”, “3hourly”, “daily”, “monthly”). If None, prioritised of the available resolutions is used
start (Timestamp or str, optional) – start time of data import
stop (Timestamp or str, optional) – stop time of data import
experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)
vert_which (str or dict, optional) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel) or dictionary containing var_name as key and vertical coded string as value, accordingly
flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable
prefer_longer (bool) – if True and applicable, the ts_type resulting in the longer time coverage will be preferred over other possible frequencies that match the query.
- filter_query(var_name, ts_type=None, start=None, stop=None, experiment=None, vert_which=None, is_at_stations=False, flex_ts_type=True, prefer_longer=False)[source]
Filter files for read query based on input specs
- Returns:
dataframe containing filtered dataset
- Return type:
DataFrame
- find_common_ts_type(vars_to_read, start=None, stop=None, ts_type=None, experiment=None, vert_which=None, flex_ts_type=True)[source]
Find common ts_type for list of variables to be read
- Parameters:
vars_to_read (list) – list of variables that is supposed to be read
start (Timestamp or str, optional) – start time of data import (if valid input, then the current start will be overwritten)
stop (Timestamp or str, optional) – stop time of data import (if valid input, then the current
start
will be overwritten)ts_type (str) – string specifying temporal resolution (choose from hourly, 3hourly, daily, monthly). If None, prioritised of the available resolutions is used
experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)
vert_which (str) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel)
flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable
- Returns:
common ts_type for input variable
- Return type:
- Raises:
DataCoverageError – if no match can be found
- get_files(var_name, ts_type=None, start=None, stop=None, experiment=None, vert_which=None, is_at_stations=False, flex_ts_type=True, prefer_longer=False)[source]
Get data files based on input specs
- get_var_info_from_files() dict [source]
Creates dicitonary that contains variable specific meta information
- Returns:
dictionary where keys are available variables and values (for each variable) contain information about available ts_types, years, etc.
- Return type:
- property name
Deprecated name of attribute data_id
- read(vars_to_retrieve=None, start=None, stop=None, ts_type=None, experiment=None, vert_which=None, flex_ts_type=True, prefer_longer=False, require_all_vars_avail=False, **kwargs)[source]
Read all variables that could be found
Reads all variables that are available (i.e. in
vars_filename
)- Parameters:
vars_to_retrieve (list or str, optional) – variables that are supposed to be read. If None, all variables that are available are read.
start (Timestamp or str, optional) – start time of data import
stop (Timestamp or str, optional) – stop time of data import
ts_type (str, optional) – string specifying temporal resolution (choose from “hourly”, “3hourly”, “daily”, “monthly”). If None, prioritised of the available resolutions is used
experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)
vert_which (str or dict, optional) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel) or dictionary containing var_name as key and vertical coded string as value, accordingly
flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable
prefer_longer (bool) – if True and applicable, the ts_type resulting in the longer time coverage will be preferred over other possible frequencies that match the query.
require_all_vars_avail (bool) – if True, it is strictly required that all input variables are available.
**kwargs – optional and support for deprecated input args
- Returns:
loaded data objects (type
GriddedData
)- Return type:
- Raises:
IOError – if input variable names is not list or string
if
require_all_vars_avail=True
and one or more of the desired variables is not available in this class 2. ifrequire_all_vars_avail=True
and if none of the input variables is available in this object
- read_var(var_name, start=None, stop=None, ts_type=None, experiment=None, vert_which=None, flex_ts_type=True, prefer_longer=False, aux_vars=None, aux_fun=None, constraints=None, try_convert_units=True, rename_var=None, **kwargs)[source]
Read model data for a specific variable
This method searches all valid files for a given variable and for a provided temporal resolution (e.g. daily, monthly), optionally within a certain time window, that may be specified on class instantiation or using the corresponding input parameters provided in this method.
The individual NetCDF files for a given temporal period are loaded as instances of the
iris.Cube
object and appended to an instance of theiris.cube.CubeList
object. The latter is then used to concatenate the individual cubes in time into a single instance of thepyaerocom.GriddedData
class. In order to ensure that this works, several things need to be ensured, which are listed in the following and which may be controlled within the global settings for NetCDF import using the attributeGRID_IO
(instance ofOnLoad
) in the default instance of thepyaerocom.config.Config
object accessible viapyaerocom.const
.- Parameters:
var_name (str) – variable that are supposed to be read
start (Timestamp or str, optional) – start time of data import
stop (Timestamp or str, optional) – stop time of data import
ts_type (str) – string specifying temporal resolution (choose from “hourly”, “3hourly”, “daily”, “monthly”). If None, prioritised of the available resolutions is used
experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)
vert_which (str or dict, optional) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel) or dictionary containing var_name as key and vertical coded string as value, accordingly
flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable
prefer_longer (bool) – if True and applicable, the ts_type resulting in the longer time coverage will be preferred over other possible frequencies that match the query.
aux_vars (list) – only relevant if var_name is not available for reading but needs to be computed: list of variables that are required to compute var_name
aux_fun (callable) – only relevant if var_name is not available for reading but needs to be computed: custom method for computation (cf.
add_aux_compute()
for details)constraints (list, optional) – list of reading constraints (dict type). See
check_constraint_valid()
andapply_read_constraint()
for details related to format of the individual constraints.try_convert_units (bool) – if True, then the unit of the variable data is checked against AeroCom default unit for that variable and if it deviates, it is attempted to be converted to the AeroCom default unit. Default is True.
rename_var (str) – if this is set, the var_name attribute of the output GriddedData object will be updated accordingly.
**kwargs – additional keyword args parsed to
_load_var()
- Returns:
loaded data object
- Return type:
- Raises:
AttributeError – if none of the ts_types identified from file names is valid
VarNotAvailableError – if specified ts_type is not supported
- property registered_var_patterns
List of string patterns for computation of variables
The information is extracted from
AUX_REQUIRES
- Returns:
list of variable patterns
- Return type:
- search_all_files(update_file_convention=True)[source]
Search all valid model files for this model
This method browses the data directory and finds all valid files, that is, file that are named according to one of the aerocom file naming conventions. The file list is stored in
files
.Note
It is presumed, that naming conventions of files in the data directory are not mixed but all correspond to either of the conventions defined in
- Parameters:
update_file_convention (bool) – if True, the first file in data_dir is used to identify the file naming convention (cf.
FileConventionRead
)- Raises:
DataCoverageError – if no valid files could be found
- search_data_dir()[source]
Search data directory based on model ID
Wrapper for method
search_data_dir_aerocom()
- property start
First available year in the dataset (inferred from filenames)
- property stop
Last available year in the dataset (inferred from filenames)
- property ts_types
Available frequencies
- update(**kwargs)[source]
Update one or more valid parameters
- Parameters:
**kwargs – keyword args that will be used to update (overwrite) valid class attributes such as data, data_dir, files
- property vars
- property vars_filename
- property vars_provided
Variables provided by this dataset
Gridded data using EMEP conventions
Reading of ungridded data
Other than gridded data, ungridded data represents data that is irregularly sampled in space and time, for instance, observations at different locations around the globe. Such data is represented in pyaerocom by UngriddedData which is essentially a point-cloud dataset. Reading of UngriddedData is typically specific for different observational data records, as they typically come in various data formats using various metadata conventions, which need to be harmonised, which is done during the data import.
The following flowchart illustrates the architecture of ungridded reading in pyaerocom. Below are information about the individual reading classes for each dataset (blue in flowchart), the abstract template base classes the reading classes are based on (dark green) and the factory class ReadUngridded (orange) which has registered all individual reading classes. The data classes that are returned by the reading class are indicated in light green.

ReadUngridded factory class
Factory class that has all reading class for the individual datasets registered.
- class pyaerocom.io.readungridded.ReadUngridded(data_ids=None, ignore_cache=False, data_dirs=None, configs: PyaroConfig | list[PyaroConfig] | None = None)[source]
Factory class for reading of ungridded data based on obsnetwork ID
This class also features reading functionality that goes beyond reading of inidividual observation datasets; including, reading of multiple datasets and post computation of new variables based on datasets that can be read.
- Parameters:
SOON (COMING)
- DONOTCACHE_NAME = 'DONOTCACHE'
- property INCLUDED_DATASETS
- INCLUDED_READERS = [<class 'pyaerocom.io.read_aeronet_invv3.ReadAeronetInvV3'>, <class 'pyaerocom.io.read_aeronet_sdav3.ReadAeronetSdaV3'>, <class 'pyaerocom.io.read_aeronet_sunv3.ReadAeronetSunV3'>, <class 'pyaerocom.io.read_earlinet.ReadEarlinet'>, <class 'pyaerocom.io.read_ebas.ReadEbas'>, <class 'pyaerocom.io.read_aasetal.ReadAasEtal'>, <class 'pyaerocom.io.read_airnow.ReadAirNow'>, <class 'pyaerocom.io.read_eea_aqerep.ReadEEAAQEREP'>, <class 'pyaerocom.io.read_eea_aqerep_v2.ReadEEAAQEREP_V2'>, <class 'pyaerocom.io.cams2_83.read_obs.ReadCAMS2_83'>, <class 'pyaerocom.io.gaw.reader.ReadGAW'>, <class 'pyaerocom.io.ghost.reader.ReadGhost'>, <class 'pyaerocom.io.cnemc.reader.ReadCNEMC'>, <class 'pyaerocom.io.icos.reader.ReadICOS'>, <class 'pyaerocom.io.icpforests.reader.ReadICPForest'>]
- property SUPPORTED_DATASETS
Returns list of strings containing all supported dataset names
- SUPPORTED_READERS = [<class 'pyaerocom.io.read_aeronet_invv3.ReadAeronetInvV3'>, <class 'pyaerocom.io.read_aeronet_sdav3.ReadAeronetSdaV3'>, <class 'pyaerocom.io.read_aeronet_sunv3.ReadAeronetSunV3'>, <class 'pyaerocom.io.read_earlinet.ReadEarlinet'>, <class 'pyaerocom.io.read_ebas.ReadEbas'>, <class 'pyaerocom.io.read_aasetal.ReadAasEtal'>, <class 'pyaerocom.io.read_airnow.ReadAirNow'>, <class 'pyaerocom.io.read_eea_aqerep.ReadEEAAQEREP'>, <class 'pyaerocom.io.read_eea_aqerep_v2.ReadEEAAQEREP_V2'>, <class 'pyaerocom.io.cams2_83.read_obs.ReadCAMS2_83'>, <class 'pyaerocom.io.gaw.reader.ReadGAW'>, <class 'pyaerocom.io.ghost.reader.ReadGhost'>, <class 'pyaerocom.io.cnemc.reader.ReadCNEMC'>, <class 'pyaerocom.io.icos.reader.ReadICOS'>, <class 'pyaerocom.io.icpforests.reader.ReadICPForest'>, <class 'pyaerocom.io.pyaro.read_pyaro.ReadPyaro'>]
- add_config(config: PyaroConfig) None [source]
Adds single PyaroConfig to self.configs
- Parameters:
config (PyaroConfig)
- Raises:
ValueError – If config is not PyaroConfig
- add_pyaro_reader(config: PyaroConfig) ReadUngriddedBase [source]
- property configs
List configs
- property data_id
ID of dataset
Note
Only works if exactly one dataset is assigned to the reader, that is, length of
data_ids
is 1.- Raises:
AttributeError – if number of items in
data_ids
is unequal one.- Returns:
data ID
- Return type:
- property data_ids
List of datasets supposed to be read
- get_lowlevel_reader(data_id: str | None = None) ReadUngriddedBase [source]
Helper method that returns initiated reader class for input ID
- Parameters:
data_id (str) – Name of dataset
- Returns:
instance of reading class (needs to be implementation of base class
ReadUngriddedBase
).- Return type:
- get_vars_supported(obs_id: str, vars_desired: list[str])[source]
Filter input list of variables by supported ones for a certain data ID
- property ignore_cache
Boolean specifying whether caching is active or not
- property post_compute
Information about datasets that can be computed in post
- read(data_ids=None, vars_to_retrieve=None, only_cached=False, filter_post=None, configs: PyaroConfig | list[PyaroConfig] | None = None, **kwargs)[source]
Read observations
Iter over all datasets in
data_ids
, callread_dataset()
and append to data object- Parameters:
data_ids (str or list) – data ID or list of all datasets to be imported
vars_to_retrieve (str or list) – variable or list of variables to be imported
only_cached (bool) – if True, then nothing is reloaded but only data is loaded that is available as cached objects (not recommended to use but may be used if working offline without connection to database)
filter_post (dict, optional) – filters applied to UngriddedData object AFTER it is read into memory, via
UngriddedData.apply_filters()
. This option was introduced in pyaerocom version 0.10.0 and should be used preferably over **kwargs. There is a certain flexibility with respect to how these filters can be defined, for instance, sub dicts for each data_id. The most common way would be to provide directly the input needed for UngriddedData.apply_filters. If you want to read multiple variables from one or more datasets, and if you want to apply variable specific filters, it is recommended to read the data individually for each variable and corresponding set of filters and then merge the individual filtered UngriddedData objects afterwards, e.g. using data_var1 & data_var2.**kwargs – Additional input options for reading of data, which are applied WHILE the data is read. If any such additional options are provided that are applied during the reading, then automatic caching of the output UngriddedData object will be deactivated. Thus, it is recommended to handle data filtering via filter_post argument whenever possible, which will result in better performance as the unconstrained original data is read in and cached, and then the filtering is applied.
Example
>>> import pyaerocom.io.readungridded as pio >>> from pyaerocom import const >>> obj = pio.ReadUngridded(data_id=const.AERONET_SUN_V3L15_AOD_ALL_POINTS_NAME) >>> obj.read() >>> print(obj) >>> print(obj.metadata[0.]['latitude'])
- read_dataset(data_id, vars_to_retrieve=None, only_cached=False, filter_post=None, **kwargs)[source]
Read dataset into an instance of
ReadUngridded
- Parameters:
data_id (str) – name of dataset
vars_to_retrieve (list) – variable or list of variables to be imported
only_cached (bool) – if True, then nothing is reloaded but only data is loaded that is available as cached objects (not recommended to use but may be used if working offline without connection to database)
filter_post (dict, optional) – filters applied to UngriddedData object AFTER it is read into memory, via
UngriddedData.apply_filters()
. This option was introduced in pyaerocom version 0.10.0 and should be used preferably over **kwargs. There is a certain flexibility with respect to how these filters can be defined, for instance, sub dicts for each data_id. The most common way would be to provide directly the input needed for UngriddedData.apply_filters. If you want to read multiple variables from one or more datasets, and if you want to apply variable specific filters, it is recommended to read the data individually for each variable and corresponding set of filters and then merge the individual filtered UngriddedData objects afterwards, e.g. using data_var1 & data_var2.**kwargs – Additional input options for reading of data, which are applied WHILE the data is read. If any such additional options are provided that are applied during the reading, then automatic caching of the output UngriddedData object will be deactivated. Thus, it is recommended to handle data filtering via filter_post argument whenever possible, which will result in better performance as the unconstrained original data is read in and cached, and then the filtering is applied.
- Returns:
data object
- Return type:
- read_dataset_post(data_id, vars_to_retrieve, only_cached=False, filter_post=None, **kwargs)[source]
Read dataset into an instance of
ReadUngridded
- Parameters:
data_id (str) – name of dataset
vars_to_retrieve (list) – variable or list of variables to be imported
only_cached (bool) – if True, then nothing is reloaded but only data is loaded that is available as cached objects (not recommended to use but may be used if working offline without connection to database)
filter_post (dict, optional) – filters applied to UngriddedData object AFTER it is read into memory, via
UngriddedData.apply_filters()
. This option was introduced in pyaerocom version 0.10.0 and should be used preferably over **kwargs. There is a certain flexibility with respect to how these filters can be defined, for instance, sub dicts for each data_id. The most common way would be to provide directly the input needed for UngriddedData.apply_filters. If you want to read multiple variables from one or more datasets, and if you want to apply variable specific filters, it is recommended to read the data individually for each variable and corresponding set of filters and then merge the individual filtered UngriddedData objects afterwards, e.g. using data_var1 & data_var2.**kwargs – Additional input options for reading of data, which are applied WHILE the data is read. If any such additional options are provided that are applied during the reading, then automatic caching of the output UngriddedData object will be deactivated. Thus, it is recommended to handle data filtering via filter_post argument whenever possible, which will result in better performance as the unconstrained original data is read in and cached, and then the filtering is applied.
- Returns:
data object
- Return type:
- property supported_datasets
Wrapper for
SUPPORTED_DATASETS
ReadUngriddedBase template class
All ungridded reading routines are based on this template class.
- class pyaerocom.io.readungriddedbase.ReadUngriddedBase(data_id: str | None = None, data_dir: str | None = None)[source]
TEMPLATE: Abstract base class template for reading of ungridded data
Note
The two dictionaries
AUX_REQUIRES
andAUX_FUNS
can be filled with variables that are not contained in the original data files but are computed during the reading. The former specifies what additional variables are required to perform the computation and the latter specifies functions used to perform the computations of the auxiliary variables. See, for instance, the classReadAeronetSunV3
, which includes the computation of the AOD at 550nm and the Angstrom coefficient (in 440-870 nm range) from AODs measured at other wavelengths.- AUX_FUNS = {}
Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)
- AUX_REQUIRES = {}
dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)
- property AUX_VARS
List of auxiliary variables (keys of attr.
AUX_REQUIRES
)Auxiliary variables are those that are not included in original files but are computed from other variables during import
- abstract property DATA_ID
Name of dataset (OBS_ID)
Note
May be implemented as global constant in header of derieved class
May be multiple that can be specified on init (see example below)
- abstract property DEFAULT_VARS
List containing default variables to read
- IGNORE_META_KEYS = []
- abstract property PROVIDES_VARIABLES
List of variables that are provided by this dataset
Note
May be implemented as global constant in header
- property REVISION_FILE
Name of revision file located in data directory
- abstract property SUPPORTED_DATASETS
List of all datasets supported by this interface
Note
best practice to specify in header of class definition
needless to mention that
DATA_ID
needs to be in this list
- abstract property TS_TYPE
Temporal resolution of dataset
This should be defined in the header of an implementation class if it can be globally defined for the corresponding obs-network or in other cases it should be initated as string
undefined
and then, if applicable, updated in the reading routine of a file.The TS_TYPE information should ultimately be written into the meta-data of objects returned by the implementation of
read_file()
(e.g. instance ofStationData
or a normal dictionary) and the methodread()
(which should ALWAYS return an instance of theUngriddedData
class).Note
Please use
"undefined"
if the derived class is not sampled on a regular basis.If applicable please use Aerocom ts_type (i.e. hourly, 3hourly, daily, monthly, yearly)
Note also, that the ts_type in a derived class may or may not be defined in a general case. For instance, in the EBAS database the resolution code can be found in the file header and may thus be intiated as
"undefined"
in the initiation of the reading class and then updated when the class is being readFor derived implementation classes that support reading of multiple network versions, you may also assign
- check_vars_to_retrieve(vars_to_retrieve)[source]
Separate variables that are in file from those that are computed
Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).
The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute
AUX_REQUIRES
).This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.
- compute_additional_vars(data, vars_to_compute)[source]
Compute all additional variables
The computations for each additional parameter are done using the specified methods in
AUX_FUNS
.- Parameters:
data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param
vars_to_compute
)vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in
AUX_VARS
and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).
- Returns:
updated data object now containing also computed variables
- Return type:
- property data_dir: str
Location of the dataset
Note
This can be set explicitly when instantiating the class (e.g. if data is available on local machine). If unspecified, the data location is attempted to be inferred via
get_obsnetwork_dir()
- Raises:
FileNotFoundError – if data directory does not exist or cannot be retrieved automatically
- Type:
- property data_id
ID of dataset
- property data_revision
Revision string from file Revision.txt in the main data directory
- find_in_file_list(pattern=None)[source]
Find all files that match a certain wildcard pattern
- Parameters:
pattern (
str
, optional) – wildcard pattern that may be used to narrow down the search (e.g. usepattern=*Berlin*
to find only files that contain Berlin in their filename)- Returns:
list containing all files in
files
that match pattern- Return type:
- Raises:
IOError – if no matches can be found
- get_file_list(pattern=None)[source]
Search all files to be read
Uses
_FILEMASK
(+ optional input search pattern, e.g. station_name) to find valid files for query.
- logger
Class own instance of logger class
- abstract read(vars_to_retrieve=None, files=[], first_file=None, last_file=None)[source]
Method that reads list of files as instance of
UngriddedData
- Parameters:
vars_to_retrieve (
list
or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables inPROVIDES_VARIABLES
are loadedfiles (
list
, optional) – list of files to be read. If None, then the file list is used that is returned onget_file_list()
.first_file (
int
, optional) – index of first file in file list to read. If None, the very first file in the list is usedlast_file (
int
, optional) – index of last file in list to read. If None, the very last file in the list is used
- Returns:
instance of ungridded data object containing data from all files.
- Return type:
- abstract read_file(filename, vars_to_retrieve=None)[source]
Read single file
- Parameters:
filename (str) – string specifying filename
vars_to_retrieve (
list
or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables inPROVIDES_VARIABLES
are loaded
- Returns:
imported data in a suitable format that can be handled by
read()
which is supposed to append the loaded results from this method (which reads one datafile) to an instance ofUngriddedData
for all files.- Return type:
dict
orStationData
, or other…
- read_first_file(**kwargs)[source]
Read first file returned from
get_file_list()
Note
This method may be used for test purposes.
- Parameters:
**kwargs – keyword args passed to
read_file()
(e.g. vars_to_retrieve)- Returns:
dictionary or similar containing loaded results from first file
- Return type:
dict-like
- read_station(station_id_filename, **kwargs)[source]
Read data from a single station into
UngriddedData
Find all files that contain the station ID in their filename and then call
read()
, providing the reduced filelist as input, in order to read all files from this station into data object.- Parameters:
- Returns:
loaded data
- Return type:
- Raises:
IOError – if no files can be found for this station ID
- remove_outliers(data, vars_to_retrieve, **valid_rng_vars)[source]
Remove outliers from data
- Parameters:
data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param
vars_to_compute
)vars_to_retrieve (list) – list of variable names for which outliers will be removed from data
**valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via
pyaerocom.const.VARS[var_name]
)
- var_supported(var_name)[source]
Check if input variable is supported
- Parameters:
var_name (str) – AeroCom variable name or alias
- Raises:
VariableDefinitionError – if input variable is not supported by pyaerocom
- Returns:
True, if variable is supported by this interface, else False
- Return type:
- property verbosity_level
Current level of verbosity of logger
AERONET
Aerosol Robotic Network (AERONET)
AERONET base class
All AERONET reading classes are based on the template ReadAeronetBase
class which, in turn inherits from ReadUngriddedBase
.
- class pyaerocom.io.readaeronetbase.ReadAeronetBase(data_id=None, data_dir=None)[source]
Bases:
ReadUngriddedBase
TEMPLATE: Abstract base class template for reading of Aeronet data
Extended abstract base class, derived from low-level base class
ReadUngriddedBase
that contains some more functionality.- ALT_VAR_NAMES_FILE = {}
dictionary specifying alternative column names for variables defined in
VAR_NAMES_FILE
- Type:
OPTIONAL
- AUX_FUNS = {}
Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)
- AUX_REQUIRES = {}
dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)
- property AUX_VARS
List of auxiliary variables (keys of attr.
AUX_REQUIRES
)Auxiliary variables are those that are not included in original files but are computed from other variables during import
- COL_DELIM = ','
column delimiter in data block of files
- abstract property DATA_ID
Name of dataset (OBS_ID)
Note
May be implemented as global constant in header of derieved class
May be multiple that can be specified on init (see example below)
- DEFAULT_UNIT = '1'
Default data unit that is assigned to all variables that are not specified in UNITS dictionary (cf.
UNITS
)
- abstract property DEFAULT_VARS
List containing default variables to read
- IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
- INSTRUMENT_NAME = 'sun_photometer'
name of measurement instrument
- META_NAMES_FILE = {}
dictionary specifying the file column names (values) for each metadata key (cf. attributes of
StationData
, e.g. ‘station_name’, ‘longitude’, ‘latitude’, ‘altitude’)
- META_NAMES_FILE_ALT = ({},)
- abstract property PROVIDES_VARIABLES
List of variables that are provided by this dataset
Note
May be implemented as global constant in header
- property REVISION_FILE
Name of revision file located in data directory
- abstract property SUPPORTED_DATASETS
List of all datasets supported by this interface
Note
best practice to specify in header of class definition
needless to mention that
DATA_ID
needs to be in this list
- property TS_TYPE
Default implementation of string for temporal resolution
- TS_TYPES = {}
dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution. Key is the name of the dataset and value is the corresponding ts_type
- UNITS = {}
Variable specific units, only required for variables that deviate from
DEFAULT_UNIT
(is irrelevant for all variables that are so far supported by the implemented Aeronet products, i.e. all variables are dimensionless as specified inDEFAULT_UNIT
)
- VAR_NAMES_FILE = {}
dictionary specifying the file column names (values) for each Aerocom variable (keys)
- VAR_PATTERNS_FILE = {}
Mappings for identifying variables in file (may be specified in addition to explicit variable names specified in VAR_NAMES_FILE)
- check_vars_to_retrieve(vars_to_retrieve)
Separate variables that are in file from those that are computed
Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).
The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute
AUX_REQUIRES
).This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.
- property col_index
Dictionary that specifies the index for each data column
Note
Implementation depends on the data. For instance, if the variable information is provided in all files (of all stations) and always in the same column, then this can be set as a fixed dictionary in the __init__ function of the implementation (see e.g. class
ReadAeronetSunV2
). In other cases, it may not be ensured that each variable is available in all files or the column definition may differ between different stations. In the latter case you may automise the column index retrieval by providing the header names for each meta and data column you want to extract using the attribute dictionariesMETA_NAMES_FILE
andVAR_NAMES_FILE
by calling_update_col_index()
in your implementation ofread_file()
when you reach the line that contains the header information.
- compute_additional_vars(data, vars_to_compute)
Compute all additional variables
The computations for each additional parameter are done using the specified methods in
AUX_FUNS
.- Parameters:
data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param
vars_to_compute
)vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in
AUX_VARS
and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).
- Returns:
updated data object now containing also computed variables
- Return type:
- property data_dir: str
Location of the dataset
Note
This can be set explicitly when instantiating the class (e.g. if data is available on local machine). If unspecified, the data location is attempted to be inferred via
get_obsnetwork_dir()
- Raises:
FileNotFoundError – if data directory does not exist or cannot be retrieved automatically
- Type:
- property data_id
ID of dataset
- property data_revision
Revision string from file Revision.txt in the main data directory
- find_in_file_list(pattern=None)
Find all files that match a certain wildcard pattern
- Parameters:
pattern (
str
, optional) – wildcard pattern that may be used to narrow down the search (e.g. usepattern=*Berlin*
to find only files that contain Berlin in their filename)- Returns:
list containing all files in
files
that match pattern- Return type:
- Raises:
IOError – if no matches can be found
- get_file_list(pattern=None)
Search all files to be read
Uses
_FILEMASK
(+ optional input search pattern, e.g. station_name) to find valid files for query.
- infer_wavelength_colname(colname, low=250, high=2000)[source]
Get variable wavelength from column name
- Parameters:
- Returns:
wavelength in nm as floating str
- Return type:
- Raises:
ValueError – if None or more than one number is detected in variable string
- logger
Class own instance of logger class
- read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)[source]
Method that reads list of files as instance of
UngriddedData
- Parameters:
vars_to_retrieve (
list
or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables inPROVIDES_VARIABLES
are loadedfiles (
list
, optional) – list of files to be read. If None, then the file list is used that is returned onget_file_list()
.first_file (
int
, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.last_file (
int
, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.file_pattern (str, optional) – string pattern for file search (cf
get_file_list()
)common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the
UngriddedData
object that is returned)
- Returns:
data object
- Return type:
- abstract read_file(filename, vars_to_retrieve=None)
Read single file
- Parameters:
filename (str) – string specifying filename
vars_to_retrieve (
list
or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables inPROVIDES_VARIABLES
are loaded
- Returns:
imported data in a suitable format that can be handled by
read()
which is supposed to append the loaded results from this method (which reads one datafile) to an instance ofUngriddedData
for all files.- Return type:
dict
orStationData
, or other…
- read_first_file(**kwargs)
Read first file returned from
get_file_list()
Note
This method may be used for test purposes.
- Parameters:
**kwargs – keyword args passed to
read_file()
(e.g. vars_to_retrieve)- Returns:
dictionary or similar containing loaded results from first file
- Return type:
dict-like
- read_station(station_id_filename, **kwargs)
Read data from a single station into
UngriddedData
Find all files that contain the station ID in their filename and then call
read()
, providing the reduced filelist as input, in order to read all files from this station into data object.- Parameters:
- Returns:
loaded data
- Return type:
- Raises:
IOError – if no files can be found for this station ID
- remove_outliers(data, vars_to_retrieve, **valid_rng_vars)
Remove outliers from data
- Parameters:
data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param
vars_to_compute
)vars_to_retrieve (list) – list of variable names for which outliers will be removed from data
**valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via
pyaerocom.const.VARS[var_name]
)
- var_supported(var_name)
Check if input variable is supported
- Parameters:
var_name (str) – AeroCom variable name or alias
- Raises:
VariableDefinitionError – if input variable is not supported by pyaerocom
- Returns:
True, if variable is supported by this interface, else False
- Return type:
- property verbosity_level
Current level of verbosity of logger
AERONET Sun (V3)
- class pyaerocom.io.read_aeronet_sunv3.ReadAeronetSunV3(data_id=None, data_dir=None)[source]
Bases:
ReadAeronetBase
Interface for reading Aeronet direct sun version 3 Level 1.5 and 2.0 data
See also
Base classes
ReadAeronetBase
andReadUngriddedBase
- ALT_VAR_NAMES_FILE = {}
dictionary specifying alternative column names for variables defined in
VAR_NAMES_FILE
- Type:
OPTIONAL
- AUX_FUNS = {'ang44&87aer': <function calc_ang4487aer>, 'od550aer': <function calc_od550aer>, 'od550lt1ang': <function calc_od550lt1ang>, 'proxyod550aerh2o': <function calc_od550aer>, 'proxyod550bc': <function calc_od550aer>, 'proxyod550dust': <function calc_od550aer>, 'proxyod550nh4': <function calc_od550aer>, 'proxyod550no3': <function calc_od550aer>, 'proxyod550oa': <function calc_od550aer>, 'proxyod550so4': <function calc_od550aer>, 'proxyod550ss': <function calc_od550aer>, 'proxyzaerosol': <function calc_od550aer>, 'proxyzdust': <function calc_od550aer>}
Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)
- AUX_REQUIRES = {'ang44&87aer': ['od440aer', 'od870aer'], 'od550aer': ['od440aer', 'od500aer', 'ang4487aer'], 'od550lt1ang': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550aerh2o': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550bc': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550dust': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550nh4': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550no3': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550oa': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550so4': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyod550ss': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyzaerosol': ['od440aer', 'od500aer', 'ang4487aer'], 'proxyzdust': ['od440aer', 'od500aer', 'ang4487aer']}
dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)
- property AUX_VARS
List of auxiliary variables (keys of attr.
AUX_REQUIRES
)Auxiliary variables are those that are not included in original files but are computed from other variables during import
- COL_DELIM = ','
column delimiter in data block of files
- DATA_ID = 'AeronetSunV3Lev2.daily'
Name of dataset (OBS_ID)
- DEFAULT_UNIT = '1'
Default data unit that is assigned to all variables that are not specified in UNITS dictionary (cf.
UNITS
)
- DEFAULT_VARS = ['od550aer', 'ang4487aer']
default variables for read method
- IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
- INSTRUMENT_NAME = 'sun_photometer'
name of measurement instrument
- META_NAMES_FILE = {'altitude': 'Site_Elevation(m)', 'data_quality_level': 'Data_Quality_Level', 'date': 'Date(dd:mm:yyyy)', 'day_of_year': 'Day_of_Year', 'instrument_number': 'AERONET_Instrument_Number', 'latitude': 'Site_Latitude(Degrees)', 'longitude': 'Site_Longitude(Degrees)', 'station_name': 'AERONET_Site', 'time': 'Time(hh:mm:ss)'}
dictionary specifying the file column names (values) for each metadata key (cf. attributes of
StationData
, e.g. ‘station_name’, ‘longitude’, ‘latitude’, ‘altitude’)
- META_NAMES_FILE_ALT = {'AERONET_Site': ['AERONET_Site_Name']}
- NAN_VAL = -999.0
- PROVIDES_VARIABLES = ['od340aer', 'od440aer', 'od500aer', 'od870aer', 'ang4487aer']
List of variables that are provided by this dataset (will be extended by auxiliary variables on class init, for details see __init__ method of base class ReadUngriddedBase)
- property REVISION_FILE
Name of revision file located in data directory
- SUPPORTED_DATASETS = ['AeronetSunV3Lev1.5.daily', 'AeronetSunV3Lev1.5.AP', 'AeronetSunV3Lev2.daily', 'AeronetSunV3Lev2.AP']
List of all datasets supported by this interface
- property TS_TYPE
Default implementation of string for temporal resolution
- TS_TYPES = {'AeronetSunV3Lev1.5.daily': 'daily', 'AeronetSunV3Lev2.daily': 'daily'}
dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution
- UNITS = {'proxyzaerosol': 'km', 'proxyzdust': 'km'}
Variable specific units, only required for variables that deviate from
DEFAULT_UNIT
(is irrelevant for all variables that are so far supported by the implemented Aeronet products, i.e. all variables are dimensionless as specified inDEFAULT_UNIT
)
- VAR_NAMES_FILE = {'ang4487aer': '440-870_Angstrom_Exponent', 'od340aer': 'AOD_340nm', 'od440aer': 'AOD_440nm', 'od500aer': 'AOD_500nm', 'od870aer': 'AOD_870nm'}
dictionary specifying the file column names (values) for each Aerocom variable (keys)
- VAR_PATTERNS_FILE = {'AOD_([0-9]*)nm': 'od*aer'}
Mappings for identifying variables in file
- check_vars_to_retrieve(vars_to_retrieve)
Separate variables that are in file from those that are computed
Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).
The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute
AUX_REQUIRES
).This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.
- property col_index
Dictionary that specifies the index for each data column
Note
Implementation depends on the data. For instance, if the variable information is provided in all files (of all stations) and always in the same column, then this can be set as a fixed dictionary in the __init__ function of the implementation (see e.g. class
ReadAeronetSunV2
). In other cases, it may not be ensured that each variable is available in all files or the column definition may differ between different stations. In the latter case you may automise the column index retrieval by providing the header names for each meta and data column you want to extract using the attribute dictionariesMETA_NAMES_FILE
andVAR_NAMES_FILE
by calling_update_col_index()
in your implementation ofread_file()
when you reach the line that contains the header information.
- compute_additional_vars(data, vars_to_compute)
Compute all additional variables
The computations for each additional parameter are done using the specified methods in
AUX_FUNS
.- Parameters:
data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param
vars_to_compute
)vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in
AUX_VARS
and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).
- Returns:
updated data object now containing also computed variables
- Return type:
- property data_dir: str
Location of the dataset
Note
This can be set explicitly when instantiating the class (e.g. if data is available on local machine). If unspecified, the data location is attempted to be inferred via
get_obsnetwork_dir()
- Raises:
FileNotFoundError – if data directory does not exist or cannot be retrieved automatically
- Type:
- property data_id
ID of dataset
- property data_revision
Revision string from file Revision.txt in the main data directory
- find_in_file_list(pattern=None)
Find all files that match a certain wildcard pattern
- Parameters:
pattern (
str
, optional) – wildcard pattern that may be used to narrow down the search (e.g. usepattern=*Berlin*
to find only files that contain Berlin in their filename)- Returns:
list containing all files in
files
that match pattern- Return type:
- Raises:
IOError – if no matches can be found
- get_file_list(pattern=None)
Search all files to be read
Uses
_FILEMASK
(+ optional input search pattern, e.g. station_name) to find valid files for query.
- infer_wavelength_colname(colname, low=250, high=2000)
Get variable wavelength from column name
- Parameters:
- Returns:
wavelength in nm as floating str
- Return type:
- Raises:
ValueError – if None or more than one number is detected in variable string
- logger
Class own instance of logger class
- print_all_columns()
- read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)
Method that reads list of files as instance of
UngriddedData
- Parameters:
vars_to_retrieve (
list
or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables inPROVIDES_VARIABLES
are loadedfiles (
list
, optional) – list of files to be read. If None, then the file list is used that is returned onget_file_list()
.first_file (
int
, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.last_file (
int
, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.file_pattern (str, optional) – string pattern for file search (cf
get_file_list()
)common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the
UngriddedData
object that is returned)
- Returns:
data object
- Return type:
- read_file(filename, vars_to_retrieve=None, vars_as_series=False)[source]
Read Aeronet Sun V3 level 1.5 or 2 file
- Parameters:
filename (str) – absolute path to filename to read
vars_to_retrieve (
list
, optional) – list of str with variable names to read. If None, useDEFAULT_VARS
vars_as_series (bool) – if True, the data columns of all variables in the result dictionary are converted into pandas Series objects
- Returns:
dict-like object containing results
- Return type:
- read_first_file(**kwargs)
Read first file returned from
get_file_list()
Note
This method may be used for test purposes.
- Parameters:
**kwargs – keyword args passed to
read_file()
(e.g. vars_to_retrieve)- Returns:
dictionary or similar containing loaded results from first file
- Return type:
dict-like
- read_station(station_id_filename, **kwargs)
Read data from a single station into
UngriddedData
Find all files that contain the station ID in their filename and then call
read()
, providing the reduced filelist as input, in order to read all files from this station into data object.- Parameters:
- Returns:
loaded data
- Return type:
- Raises:
IOError – if no files can be found for this station ID
- remove_outliers(data, vars_to_retrieve, **valid_rng_vars)
Remove outliers from data
- Parameters:
data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param
vars_to_compute
)vars_to_retrieve (list) – list of variable names for which outliers will be removed from data
**valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via
pyaerocom.const.VARS[var_name]
)
- var_supported(var_name)
Check if input variable is supported
- Parameters:
var_name (str) – AeroCom variable name or alias
- Raises:
VariableDefinitionError – if input variable is not supported by pyaerocom
- Returns:
True, if variable is supported by this interface, else False
- Return type:
- property verbosity_level
Current level of verbosity of logger
AERONET SDA (V3)
- class pyaerocom.io.read_aeronet_sdav3.ReadAeronetSdaV3(data_id=None, data_dir=None)[source]
Bases:
ReadAeronetBase
Interface for reading Aeronet Sun SDA V3 Level 1.5 and 2.0 data
See also
Base classes
ReadAeronetBase
andReadUngriddedBase
- ALT_VAR_NAMES_FILE = {}
dictionary specifying alternative column names for variables defined in
VAR_NAMES_FILE
- Type:
OPTIONAL
- AUX_FUNS = {'od550aer': <function calc_od550aer>, 'od550dust': <function calc_od550gt1aer>, 'od550gt1aer': <function calc_od550gt1aer>, 'od550lt1aer': <function calc_od550lt1aer>}
Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)
- AUX_REQUIRES = {'od550aer': ['od500aer', 'ang4487aer'], 'od550dust': ['od500gt1aer', 'ang4487aer'], 'od550gt1aer': ['od500gt1aer', 'ang4487aer'], 'od550lt1aer': ['od500lt1aer', 'ang4487aer']}
dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)
- property AUX_VARS
List of auxiliary variables (keys of attr.
AUX_REQUIRES
)Auxiliary variables are those that are not included in original files but are computed from other variables during import
- COL_DELIM = ','
column delimiter in data block of files
- DATA_ID = 'AeronetSDAV3Lev2.daily'
Name of dataset (OBS_ID)
- DEFAULT_UNIT = '1'
Default data unit that is assigned to all variables that are not specified in UNITS dictionary (cf.
UNITS
)
- DEFAULT_VARS = ['od550aer', 'od550gt1aer', 'od550lt1aer', 'od550dust']
default variables for read method
- IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
- INSTRUMENT_NAME = 'sun_photometer'
name of measurement instrument
- META_NAMES_FILE = {'altitude': 'Site_Elevation(m)', 'data_quality_level': 'Data_Quality_Level', 'date': 'Date_(dd:mm:yyyy)', 'day_of_year': 'Day_of_Year', 'instrument_number': 'AERONET_Instrument_Number', 'latitude': 'Site_Latitude(Degrees)', 'longitude': 'Site_Longitude(Degrees)', 'station_name': 'AERONET_Site', 'time': 'Time_(hh:mm:ss)'}
dictionary specifying the file column names (values) for each metadata key (cf. attributes of
StationData
, e.g. ‘station_name’, ‘longitude’, ‘latitude’, ‘altitude’)
- META_NAMES_FILE_ALT = ({},)
- NAN_VAL = -999.0
value corresponding to invalid measurement
- PROVIDES_VARIABLES = ['od500gt1aer', 'od500lt1aer', 'od500aer', 'ang4487aer', 'od500dust']
List of variables that are provided by this dataset (will be extended by auxiliary variables on class init, for details see __init__ method of base class ReadUngriddedBase)
- property REVISION_FILE
Name of revision file located in data directory
- SUPPORTED_DATASETS = ['AeronetSDAV3Lev1.5.daily', 'AeronetSDAV3Lev2.daily']
List of all datasets supported by this interface
- property TS_TYPE
Default implementation of string for temporal resolution
- TS_TYPES = {'AeronetSDAV3Lev1.5.daily': 'daily', 'AeronetSDAV3Lev2.daily': 'daily'}
dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution
- UNITS = {}
Variable specific units, only required for variables that deviate from
DEFAULT_UNIT
(is irrelevant for all variables that are so far supported by the implemented Aeronet products, i.e. all variables are dimensionless as specified inDEFAULT_UNIT
)
- VAR_NAMES_FILE = {'ang4487aer': 'Angstrom_Exponent(AE)-Total_500nm[alpha]', 'od500aer': 'Total_AOD_500nm[tau_a]', 'od500dust': 'Coarse_Mode_AOD_500nm[tau_c]', 'od500gt1aer': 'Coarse_Mode_AOD_500nm[tau_c]', 'od500lt1aer': 'Fine_Mode_AOD_500nm[tau_f]'}
dictionary specifying the file column names (values) for each Aerocom variable (keys)
- VAR_PATTERNS_FILE = {}
Mappings for identifying variables in file (may be specified in addition to explicit variable names specified in VAR_NAMES_FILE)
- check_vars_to_retrieve(vars_to_retrieve)
Separate variables that are in file from those that are computed
Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).
The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute
AUX_REQUIRES
).This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.
- property col_index
Dictionary that specifies the index for each data column
Note
Implementation depends on the data. For instance, if the variable information is provided in all files (of all stations) and always in the same column, then this can be set as a fixed dictionary in the __init__ function of the implementation (see e.g. class
ReadAeronetSunV2
). In other cases, it may not be ensured that each variable is available in all files or the column definition may differ between different stations. In the latter case you may automise the column index retrieval by providing the header names for each meta and data column you want to extract using the attribute dictionariesMETA_NAMES_FILE
andVAR_NAMES_FILE
by calling_update_col_index()
in your implementation ofread_file()
when you reach the line that contains the header information.
- compute_additional_vars(data, vars_to_compute)
Compute all additional variables
The computations for each additional parameter are done using the specified methods in
AUX_FUNS
.- Parameters:
data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param
vars_to_compute
)vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in
AUX_VARS
and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).
- Returns:
updated data object now containing also computed variables
- Return type:
- property data_dir: str
Location of the dataset
Note
This can be set explicitly when instantiating the class (e.g. if data is available on local machine). If unspecified, the data location is attempted to be inferred via
get_obsnetwork_dir()
- Raises:
FileNotFoundError – if data directory does not exist or cannot be retrieved automatically
- Type:
- property data_id
ID of dataset
- property data_revision
Revision string from file Revision.txt in the main data directory
- find_in_file_list(pattern=None)
Find all files that match a certain wildcard pattern
- Parameters:
pattern (
str
, optional) – wildcard pattern that may be used to narrow down the search (e.g. usepattern=*Berlin*
to find only files that contain Berlin in their filename)- Returns:
list containing all files in
files
that match pattern- Return type:
- Raises:
IOError – if no matches can be found
- get_file_list(pattern=None)
Search all files to be read
Uses
_FILEMASK
(+ optional input search pattern, e.g. station_name) to find valid files for query.
- infer_wavelength_colname(colname, low=250, high=2000)
Get variable wavelength from column name
- Parameters:
- Returns:
wavelength in nm as floating str
- Return type:
- Raises:
ValueError – if None or more than one number is detected in variable string
- logger
Class own instance of logger class
- print_all_columns()
- read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)
Method that reads list of files as instance of
UngriddedData
- Parameters:
vars_to_retrieve (
list
or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables inPROVIDES_VARIABLES
are loadedfiles (
list
, optional) – list of files to be read. If None, then the file list is used that is returned onget_file_list()
.first_file (
int
, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.last_file (
int
, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.file_pattern (str, optional) – string pattern for file search (cf
get_file_list()
)common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the
UngriddedData
object that is returned)
- Returns:
data object
- Return type:
- read_file(filename, vars_to_retrieve=None, vars_as_series=False)[source]
Read Aeronet SDA V3 file and return it in a dictionary
- Parameters:
filename (str) – absolute path to filename to read
vars_to_retrieve (
list
, optional) – list of str with variable names to read. If None, useDEFAULT_VARS
vars_as_series (bool) – if True, the data columns of all variables in the result dictionary are converted into pandas Series objects
- Returns:
dict-like object containing results
- Return type:
- read_first_file