A solution for handling big, multidimensional timeseries data from environmental sensors in HPC applications.
HDF5eis is designed to
- store primitive timeseries data with any number of dimensions;
- store auxiliary and meta data in columnar format or as UTF-8 encoded byte streams alongside timeseries data;
- provide a single point of fast access to diverse data distributed across many files; and
- simultaneously leverage existing technology and minimize external dependencies.
import hdf5eis
with hdf5eis.File("demo.hdf5", mode="w") as demo_file:
# Add some random multidimensional timeseries data to the demo.hdf5 file.
first_sample_time = "2022-01-01T00:00:00Z"
sampling_rate = 100
demo_file.timeseries.add(
np.random.rand(32, 16, 8, 16, 32, 1000),
first_sample_time,
sampling_rate,
tag="random"
)
# Data can be efficiently retrieved using hybrid dictionary (with regular expression parsing)
# and array metaphors.
start_time, end_time = "2022-01-01T00:01:00Z", "2022-01-01T00:02:00Z"
sliced_data = demo_file.timeseries["rand*", 8:12, ..., 0, start_time: end_time]
>$ pip install hdf5eis
Please cite White et al. (2023) if you make use of HDF5eis in published work.
White, M.C.A., Zhang, Z., Bai, T., Qiu, H., Chang, H., and Nakata, N., 2023, HDF5eis: A storage and input/output solution for big multidimensional time series data from environmental sensors: GEOPHYSICS, v. 88, p. F29–F38, doi:10.1190/geo2022-0448.1.