I/O Arrays of Numeric Types¶
All of the HIPP IO APIs can be accessed by including the header <hippio.h>. The detailed conventions
and compiling options are described in the API Reference.
Write/Read Arrays¶
In C++ programming, an array of a numeric type (e.g., double, int, …) is usually stored
in the STL vector. Or it can be manualy stored in raw memory buffer such as those allocated by new
operator or std::malloc(). Examples of such arrays are:
const size_t N = 5;
vector<double> arr1 = {0.,1.,2.,3.,4.}; // a vector of 5 elements
double *arr2 = new double [N]; // raw buffer with 5 doubles
To write them into a file of HDF5 format, you simply create a file by defining a
HIPP::IO::H5File instance with desired filename and a “w” mode flag
(“w” means create and truncate the file). Then, a call of
create_dataset<T>(dataset_name, dims) creates a dataset in the root group of the
file, with T specifying its element data type, dataset_name specifying its
name and dims specifying its length in every dimension.
Finally, a call of write(arr1) writes the vector arr1 into the dataset:
HIPP::IO::H5File file("arrays.h5", "w"); // create a file
auto dset1 = file.create_dataset<double>("arr1", {N} ); // create a dataset
dset1.write(arr1); // write out an array
The write() method also accepts a pointer to raw array. So, a call of
write(arr2) writes the data in arr2 into the dataset.
It is also allowed to chain the create_dataset() and write() operations
in one line of code:
file.create_dataset<double>("arr2", {N}).write(arr2); // write out from raw buffer
Note that the size of the dataset (specified in create_dataset()) must be
compatible with the argument passed in write(), or otherwise the result is
undefined.
To read the the dataset back into the memory, just open the dataset by open_dataset(dataset_name),
and load the data into a vector by read(vector_name). The library automatically resizes
the vector according to the dataspace in the file:
vector<double> arr1_in;
file.open_dataset("arr1").read(arr1_in); // read into the vector
If you want to load the data into a raw buffer, you open the dataset, manually
get its dataspace with dataspace(), and find the size of the dataspace
(total number of scalar elements) by size(). Then you use the size to properly
allocate the memory buffer, and again use read() to load data:
auto dset2 = file.open_dataset("arr2");
size_t n_elems = dset2.dataspace().size();
double *arr2_in = new double [n_elems];
file.open_dataset("arr2").read(arr2_in);
For multi-dimensional array (stored in row-major order in contiguous memory),
the only difference is that when create_dataset(), you specify each of the dimensions
of that array. Then the write/read operations are the same as those of one-dimensional array
const size_t n0=2, n1=3, n2=4;
vector<int> arr3(n0*n1*n2);
file.create_dataset<int>("arr3", {n0, n1, n2}).write(arr3);
file.open_dataset("arr3").read(arr3);
/**
* Again, for a reading into raw buffer, use dataspace() to retrive the
* dataspace, and use size() to get the total number of elements.
*/
auto dset3 = file.open_dataset("arr3");
int *arr3_in_buff = new int[ dset3.dataspace().size() ];
dset3.read(arr3_in_buff);
After the output operations, a file named “arrays.h5” is created in the OS’s file system.
Using h5dump arrays.h5 in the command line prompt, you can view and verify the content of
each dataset
HDF5 "arrays.h5" {
GROUP "/" {
DATASET "arr1" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 5 ) / ( 5 ) }
DATA {
(0): 0, 1, 2, 3, 4
}
}
DATASET "arr2" ...
DATASET "arr3" ...
Warning
For both one-dimensional and multi-dimensional cases, the library only accepts arrays with contiguous memory layout. That means the following cases cannot be manipulated by HIPP:
vector<vector<int> > vector_of_vectors;
vector<double *> vector_of_pointers_to_buffers;
The followings are allowed:
vector<array<double, 3>> vector_of_arrays;
struct ArrayType {
float values[3];
};
vector<ArrayType> vector_of_structs;
int raw_array[2][3];
However, in such cases you need to take the pointer to the underlying data and cast it into a proper numeric type:
dset.write((double *)&vector_of_arrays[0]);
dset.write((float *)&vector_of_structs[0]);
dset.write(&raw_array[0][0]);
Using hyperslab¶
When dealing with very large dataset, sometimes we just want to take part of the data. To do that, we need to use the hyperslab feature in hdf5. The read member function for Dataset type has two parameters, memspace and filespace, as type of dataspace. These two parameters describe what read do: to move data from filespace to memspace (write has similar member functions). The hyperslab works by attach some information on these two parameters using function select_hyperslab.
The supported data selection of hyperslab can be described by four parameters, start, stride, count and block. For example, in a two dimension array with shape = (8, 12), we want to select data marked with *
0, *, *, 0, *, *, 0, *, *, 0, *, *,
0, *, *, 0, *, *, 0, *, *, 0, *, *,
0, *, *, 0, *, *, 0, *, *, 0, *, *,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, *, *, 0, *, *, 0, *, *, 0, *, *,
0, *, *, 0, *, *, 0, *, *, 0, *, *,
0, *, *, 0, *, *, 0, *, *, 0, *, *,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
This hyperslab is specified by
start = (0, 1): starting location of the hyperslab
stride = (4, 3): number of elememts to separate each block selected
count = (2, 4): number of blocks to select
block = (3, 2): size of block selected
In the following example, we show how to use the hyperslab feature to select part of the data from a file and how to put it into another buffer with another hyperslab. Here we want to select a rectangular region, so the stride and block are set to 1-s by default.
/*
* In this code, we will show how to use the hyperslab feature in hdf5
*
* We have an dataset with shape= (4, 5), and data is
* 0, 1, 2, 3, 4,
* 5, 6, 7, 8, 9,
* 10, 11, 12, 13, 14,
* 15, 16, 17, 18, 19
* We only want to take part of the data, the mask is
* 0, 0, 0, 0, 0,
* 0, *, *, *, 0,
* 0, *, *, *, 0,
* 0, 0, 0, 0, 0
* This hyperslab can be characterized as start=(1, 1), count=(2, 3)
* So the data we read is 6, 7, 8, 11, 12, 13
*
* Then, we want to put it in a vector with shape = (4, 5) (or equivalently, length=20)
* And we want to put them in the following way
* 0, 0, 0, 0, 0,
* 0, 0, 0, 0, 0
* 0, 0, *, *, *,
* 0, 0, *, *, *,
* So the result should be
* 0, 0, 0, 0, 0,
* 0, 0, 0, 0, 0,
* 0, 0, 6, 7, 8,
* 0, 0, 11, 12, 13
*
*/
#include <hippio.h>
#include <iostream>
#include <vector>
using namespace std;
using hsize_t=unsigned long long; // default type for index and count in HIPP
int main(void)
{
// create a h5 file with dataset of shape (4, 5)
HIPP::IO::H5File o_file("./test.h5", "w");
vector<double> vec;
for (int i = 0; i < 20; ++i)
vec.push_back(i);
auto ds = o_file.create_dataset<double>("data", {4, 5});
ds.write((double *)vec.data());
o_file = HIPP::IO::H5File(nullptr);
// read the part of the data and put it in a small vector
HIPP::IO::H5File i_file("./test.h5", "r");
auto dset = i_file.open_dataset("data");
// create the file_dataspace from the dataset in the file
auto dspace_file = dset.dataspace();
vector<hsize_t> offset_file{1, 1}, shape_file{2, 3};
// create the hyperslab for the file_dataspace with offset and shape
dspace_file.select_hyperslab(offset_file, shape_file);
// create the memory dataspace to accept the data
auto dspace_mem = HIPP::IO::H5Dataspace({4, 5});
vector<hsize_t> offset_mem{2, 2}, shape_mem{2, 3};
dspace_mem.select_hyperslab(offset_mem, shape_mem);
// you can also
int vec_size = 20;
vector<double> vec_recv(vec_size, 0);
dset.read((double *)vec_recv.data(), dspace_mem, dspace_file);
for (int i = 0; i < vec_size; ++i) {
cout << vec_recv[i] << ", " << endl;
}
return 0;
}