Using the parquet reader from pyaro-readers

The parquet reader is intended as a format for easily writing and ingesting data for use in pyaerocom. Such data could be a result of complicated pre-processing of data, merging of several data-sources or testing of functionality in pyaerocom.

If you require more efficient reading of data, caching of data, or additional functionality one should create a dedicated reader in pyaerocom.

Creating a dataset

The dataset must have the minimum columns specified in the below dataset, but one may also pass along other columns such as country, flag, altitude, or standard_deviation. The below creates some mock data.

[1]:
import pandas as pd
import numpy as np

N = 20
times = pd.date_range("2025-02-28 00:00", freq="1h", periods=N + 1)

dataset = pd.DataFrame(
    {
        "variable": np.repeat("concpm25", N),
        "units": np.repeat("ug m^-3", N),
        "start_time": times[:-1],
        "end_time": times[1:],
        "latitude": 63.8167,
        "longitude": -22.7167,
        "value": np.random.random(N),
        "station": np.repeat("Reykjanes", N),
    }
)

dataset.to_parquet("my_data.pq")
dataset.head()
[1]:
variable units start_time end_time latitude longitude value station
0 concpm25 ug m^-3 2025-02-28 00:00:00 2025-02-28 01:00:00 63.8167 -22.7167 0.391320 Reykjanes
1 concpm25 ug m^-3 2025-02-28 01:00:00 2025-02-28 02:00:00 63.8167 -22.7167 0.656607 Reykjanes
2 concpm25 ug m^-3 2025-02-28 02:00:00 2025-02-28 03:00:00 63.8167 -22.7167 0.417969 Reykjanes
3 concpm25 ug m^-3 2025-02-28 03:00:00 2025-02-28 04:00:00 63.8167 -22.7167 0.608367 Reykjanes
4 concpm25 ug m^-3 2025-02-28 04:00:00 2025-02-28 05:00:00 63.8167 -22.7167 0.560482 Reykjanes

Reading the data

pyaro

[2]:
import pyaro

with pyaro.open_timeseries(
    "parquet",
    "my_data.pq",
    filters=[],
) as ds:
    data = ds.data("concpm25")
    print(data.values)
[0.39132035 0.65660656 0.41796887 0.60836665 0.56048188 0.08236209
 0.15304678 0.9050318  0.37591741 0.13632942 0.4753769  0.05746369
 0.00773874 0.14376959 0.70172981 0.58267018 0.21053204 0.66455131
 0.16707444 0.94835718]

pyaerocom

[3]:
from pyaerocom.io.pyaro.pyaro_config import PyaroConfig
from pyaerocom.io.readungridded import ReadUngridded

config = PyaroConfig(
    name="some_name",
    reader_id="parquet",
    filename_or_obj_or_url="my_data.pq",
    filters={}
)

reader = ReadUngridded(configs=[config])
print(reader)
Dataset name: some_name
Data directory: my_data.pq
Supported variables: ['concpm25']
Last revision: n/d

aeroval configuration

[4]:
from pyaerocom.io.pyaro.pyaro_config import PyaroConfig

configuration = """{
    "name": "some_name",
    "reader_id": "parquet",
    "filename_or_obj_or_url": "my_data.pq",
    "filters": {}
}"""

config = PyaroConfig.model_validate_json(configuration)