Using the parquet reader from pyaro-readers
The parquet reader is intended as a format for easily writing and ingesting data for use in pyaerocom. Such data could be a result of complicated pre-processing of data, merging of several data-sources or testing of functionality in pyaerocom.
If you require more efficient reading of data, caching of data, or additional functionality one should create a dedicated reader in pyaerocom.
Creating a dataset
The dataset must have the minimum columns specified in the below dataset, but one may also pass along other columns such as country
, flag
, altitude
, or standard_deviation
. The below creates some mock data.
[1]:
import pandas as pd
import numpy as np
N = 20
times = pd.date_range("2025-02-28 00:00", freq="1h", periods=N + 1)
dataset = pd.DataFrame(
{
"variable": np.repeat("concpm25", N),
"units": np.repeat("ug m^-3", N),
"start_time": times[:-1],
"end_time": times[1:],
"latitude": 63.8167,
"longitude": -22.7167,
"value": np.random.random(N),
"station": np.repeat("Reykjanes", N),
}
)
dataset.to_parquet("my_data.pq")
dataset.head()
[1]:
variable | units | start_time | end_time | latitude | longitude | value | station | |
---|---|---|---|---|---|---|---|---|
0 | concpm25 | ug m^-3 | 2025-02-28 00:00:00 | 2025-02-28 01:00:00 | 63.8167 | -22.7167 | 0.391320 | Reykjanes |
1 | concpm25 | ug m^-3 | 2025-02-28 01:00:00 | 2025-02-28 02:00:00 | 63.8167 | -22.7167 | 0.656607 | Reykjanes |
2 | concpm25 | ug m^-3 | 2025-02-28 02:00:00 | 2025-02-28 03:00:00 | 63.8167 | -22.7167 | 0.417969 | Reykjanes |
3 | concpm25 | ug m^-3 | 2025-02-28 03:00:00 | 2025-02-28 04:00:00 | 63.8167 | -22.7167 | 0.608367 | Reykjanes |
4 | concpm25 | ug m^-3 | 2025-02-28 04:00:00 | 2025-02-28 05:00:00 | 63.8167 | -22.7167 | 0.560482 | Reykjanes |
Reading the data
pyaro
[2]:
import pyaro
with pyaro.open_timeseries(
"parquet",
"my_data.pq",
filters=[],
) as ds:
data = ds.data("concpm25")
print(data.values)
[0.39132035 0.65660656 0.41796887 0.60836665 0.56048188 0.08236209
0.15304678 0.9050318 0.37591741 0.13632942 0.4753769 0.05746369
0.00773874 0.14376959 0.70172981 0.58267018 0.21053204 0.66455131
0.16707444 0.94835718]
pyaerocom
[3]:
from pyaerocom.io.pyaro.pyaro_config import PyaroConfig
from pyaerocom.io.readungridded import ReadUngridded
config = PyaroConfig(
name="some_name",
reader_id="parquet",
filename_or_obj_or_url="my_data.pq",
filters={}
)
reader = ReadUngridded(configs=[config])
print(reader)
Dataset name: some_name
Data directory: my_data.pq
Supported variables: ['concpm25']
Last revision: n/d
aeroval configuration
[4]:
from pyaerocom.io.pyaro.pyaro_config import PyaroConfig
configuration = """{
"name": "some_name",
"reader_id": "parquet",
"filename_or_obj_or_url": "my_data.pq",
"filters": {}
}"""
config = PyaroConfig.model_validate_json(configuration)