Working with Datasets

HDF5 datasets are the objects that actually save user data in storage. The meta-info, the Datatype and Dataspace of a dataset, describe the element type and the layout of those elements.

For both clarity and flexibility, HIPP defines two sets of API for dataset operations

DatasetManager: a manager for all datasets in a group. The manager provides high-level API which are sufficient for most commonly I/O patterns. Datatype and dataspace are not directly manipulated by users but they are implicitly deduced by the data object.
Group and Dataset: more detailed controls over datasets. Dataset creation, opening, writing/reading processes are separately handled with detailed datatype and dataspace controls.

In the following sections, we begin with the high-level DatasetManager API and then move to the detailed API for datasets.

Using the Dataset Manager

The method Group::datasets() returns a dataset manage typed DatasetManager for the caller group. For example, we create a new file and get a group manager for it (i.e., its root group):

H5::File f1("f1.h5", "w");
auto dsets = f1.datasets();

The dataset manager can tell whether or not a dataset of given name exists under the group. For example, because f1 is empty now, the following assertion passes:

assert( !dsets.exists("dataset_A") );

The following parts demonstrate how to use the manager to

Writing/reading data of either numerical types or strings.
Subsetting the data to write/read.

Put/Get Numerical Data

Dataset managers support various types of data, including

single scalar value, such as int, float;
GeneralArrayTraits compliant container of scalar values, such as std::vector with scalar value_type;
RawArrayTraits compliant object of scalar values, such as raw arrays (int [3], float [2][3]), or std::array with arbitrary nesting depth (std::array<int, 3>, std::array<std::array<int, 3>, 2>);
GeneralArrayTraits compliant container of RawArrayTraits, such as std::vector< std::array<int, 3> >.

For example, consider the following objects to be written into a file:

short s;
vector<double> d10(10);
int i5[5];
long l34[3][4];
array<float, 3> f3;
vector<array<int, 3> > i83(8);

// We initialize them, for example, for i83
for(size_t i=0; i<8; ++i)
    for(size_t j=0; j<3; ++j)
        i83[i][j] = i*3+j;

To write those objects into a file, simply call the method DatasetManager::put(name, data) where name is the dataset name in the file and data is an object to be written:

dsets.put("s",      s);                     // Create a new dataset and write to it
dsets.put("d10",    d10);
dsets.put("i5",     i5);
dsets.put("l34",    l34);
dsets.put("f3",     f3);
dsets.put("i83",    i83);

The call automatically creates the dataset and write data into it. If it already exists, the call overwrite it. In the later case the data in memory must have consistent datatype and dataspace to the existing dataset.

To load data from the file, call DatasetManager::get(name, data) where the dataset content is written into data. If the outer-most container is a std::vector , it is resized to exactly fit the dataset. If the resize is not feasible, an HIPP::ErrorLogic exception is thrown:

dsets.get("s",      s);                     // Read back the data.
dsets.get("d10",    d10);
dsets.get("i5",     i5);
dsets.get("l34",    l34);
dsets.get("f3",     f3);
dsets.get("i83",    i83);

The get method has an additional single-argument version:

s = dsets.get<short>("s");
f3 = dsets.get<array<float, 3> >("f3");

In this version, an object whose type is given by the template argument is default-constructed, passed into the two-argument version to get its data, and returned.

The content of file shown by h5ls -r f1.h5 is like:

/d10                     Dataset {10}
/f3                      Dataset {3}
/i5                      Dataset {5}
/i83                     Dataset {8, 3}
/l34                     Dataset {3, 4}
/s                       Dataset {SCALAR}

Put/Get Strings

Strins have very special types. C-style strings are usually stored in raw arrays of characters, if their lengths have a predefined upper bound, or otherwise they are stored in heap memory segments and manipulated by pointers. In C++, strings are usually represented by std::string.

For the sake of simplicity, the dataset manager supports the following types of string data:

Single string, such as std::string, (const) char [N], (const) char * (null-terminated). The single string is represented by ATOMIC STRING datatype in the file.
Strings: either a std::vector<std::string> > instance or (const) char[N_STR][STR_LEN]. The strings are represented by an 1-D array of ATOMIC STRING datatype in the file.

For example, consider the following “string object”:

string str = "foo";
vector<string> str4 = {"top", "bottom", "left", "right"};
char raw_str[16] = "foo";
char raw_str4[4][16] = {"top", "bottom", "left", "right"};

To write them into the file, call DatasetManager::put_str(name, data). The library will figure out the correct datatype and dataspace for each of them:

dsets.put_str("str", str);
dsets.put_str("str4", str4);
dsets.put_str("raw_str", raw_str);
dsets.put_str("raw_str4", raw_str4);

To load back the data, call DatasetManager::get(name, data) instead:

dsets.get_str("str", str);
dsets.get_str("str4", str4);
dsets.get_str("raw_str", raw_str);
dsets.get_str("raw_str4", raw_str4);

Only in the std::string and std::vector< std::string > cases, the objects are auto-resized to fit the dataset. In other cases, the object must be large enough to hold the dataset.

The std::string or std::vector< std::string > case also has single-argument version, where the loaded string or strings are returned:

str = dsets.get_str<string>("str");
str4 = dsets.get_str<vector<string> >("str4");

The content of the file shown by h5ls -r f1.h5 is like:

/raw_str                 Dataset {SCALAR}
/raw_str4                Dataset {4}
/str                     Dataset {SCALAR}
/str4                    Dataset {4}

Subsetting the Data

The dataset manager provides three methods for the case that only a subset of the whole dataset is the target of reading.

The call of DatasetManager::at(name, data, coord) loads a single element at given coordinate coord in the dataset named name into data. For example, the following codes load an element at position (1,2):

int i_value;
dsets.at("i83", i_value, {1,2});

pout << i_value, endl;            // => 5

The call DatasetManager::ats(name, data, coords) loads a set of points into a container object data. The coordinates of the points is given by coords, which is typed Points and can be initialized with the number of points and the starting address of the coordinate array:

vector<int> i_values;
vector<hsize_t> coords = {
    0,0,
    1,1,
    2,2};

dsets.ats("i83", i_values, {3, coords.data()});

pout << i_values, endl;           // => 0,4,8

The call of DatasetManager::slab(name, data, slab) loads a hyperslab. The geometry of the hyperslab is specified by slab that is typed Hyperslab and can be initialized by the starting coordinate and the number of elements at all dimensions. For example, the following codes load a hyperslab [1:(1+2)][0:(0+3)]:

dsets.slab("i83", i_values, {{1, 0}, {2, 3}});

pout << i_values, endl;           // => 3,4,5,6,7,8

For ats and slab, the argument data must has consistent size to the selected part of data. If it is a std::vector of numerical or RawArrayTraits compliant type, it is auto-resized to fit the selection.

For writing, we have DatasetManager::put_at, DatasetManager::put_ats, and DatasetManager::put_slab. The three calls require that the dataset is already existing in the file, and overwrite the selected part.

Datatype

A Datatype instance describes the detail (i.e., size, bit-level interpretation and conversion) of a data element in the storage.

HIPP predefines a set of commonly used datatypes that corresponds to the native types in C++:

C++ Type	Corresponding `Datatype` Instance
`float`, `double`, `long double`	`NATIVE_FLOAT_T`, `NATIVE_DOUBLE_T`, `NATIVE_LDOUBLE_T`
`char`, `signed char`, `unsigned char`	`NATIVE_CHAR_T`, `NATIVE_SCHAR_T`, `NATIVE_UCHAR_T`
`short`, `unsigned short`	`NATIVE_SHORT_T`, `NATIVE_USHORT_T`
`int`, `unsigned int`	`NATIVE_INT_T`, `NATIVE_UINT_T`
`long`, `unsigned long`	`NATIVE_LONG_T`, `NATIVE_ULONG_T`
`long long`, `unsigned long long`	`NATIVE_LLONG_T`, `NATIVE_ULLONG_T`

Standard datatypes are defined for platform-independent storage, with suffix

F for floating-point number, I for signed integer and U for unsigned interger;
BE for big-endian byte order and LE for the little endian.

Little-endian	Big-endian
`IEEE_F32LE_T`, `IEEE_F64LE_T`	`IEEE_F32BE_T`, `IEEE_F64BE_T`
`STD_I8LE_T`, `STD_U8LE_T`	`STD_I8BE_T`, `STD_U8BE_T`
`STD_I16LE_T`, `STD_U16LE_T`	`STD_I16BE_T`, `STD_U16BE_T`
`STD_I32LE_T`, `STD_U32LE_T`	`STD_I32BE_T`, `STD_U32BE_T`
`STD_I64LE_T`, `STD_U64LE_T`	`STD_I64BE_T`, `STD_U64BE_T`

The special datatype, C_S1_T, is ATOMIC STRING datatype, which describe a C-style string with fixed length.

The native datatypes are sufficient for most applications. The standard datatypes are defined only for rare cases where the data element in the file has to be in specific format.

Users may define their own datatype based on existing ones to describe more complicated data element, such as array or structured record.

Using Predefined Datatypes

All the predefined datatypes are global static variables. For example, the datatypes that correspond to double and int can be referred to as:

const H5::Datatype &dt_f = H5::NATIVE_DOUBLE_T,
     &dt_i = H5::NATIVE_INT_T;

You cannot modify a predefined datatype, but you can retrieve its details, for example, its size and and datatype class:

assert( dt_f.get_class() == dt_f.cFLOAT );
assert( dt_i.get_class() == dt_i.cINTEGER );

pout << "size of float type = ", dt_f.size(),
        ", size of int type = ", dt_i.size(), endl;

The output is:

size of float type = 8, size of int type = 4

The datatype class is copy-constructable and copy-assignable - but the copy is shallow, i.e., the resulting instance refers to the same underlying HDF5 resource (operator== returns true). To make a deep copy, use the :expr`Datatype::copy()` method - the resulting instance refers to another HDF5 resource but describes the same data layout (Datatype::equal() returns true):

auto dt_f2 = dt_f;              // Shallow copy, refer to the same datatype.
auto dt_f3 = dt_f.copy();       // Deep copy, get a new datatype.

assert( dt_f == dt_f2 && dt_f.equal(dt_f2) );
assert( dt_f != dt_f3 && dt_f.equal(dt_f3) );

Atomic String Datatype

To get a datatype that describes a fixed-length C-style string, call Datatype::resized(n) on the instance C_S1_T with desired string length n. For example, to contain the following string "foo", which has 4 characters including the null-terminator, you may define a larger string datatype of length 10:

const char s[] = "foo";
auto dt_s = H5::C_S1_T.resized(10);

assert( dt_s.get_class() == dt_s.cSTRING );
pout << "size = ", dt_s.size(), endl;

The output is:

size = 10

Array Datatype

To get a datatype that describes a C-style raw array, call Datatype::create_array(dims) on an existing element datatype with desired dimensions dims. For example, the following codes create the datatype for int arr[3][4]:

int arr[3][4];
auto dt_a1 = H5::NATIVE_INT_T.create_array({3,4});

The library also defines another method Datatype::create_array_for<T>(), which takes a template argument and infers the element datatype and dimensions from it. Thus, the above call is equivalent to:

auto dt_a2 = H5::NATIVE_INT_T.create_array_for< decltype(arr) >();

To retrieve the array rank and dimensions, call Datatype::array_ndims() and Datatype::array_dims(), respectively:

pout << "rank = ",   dt_a1.array_ndims(),
        ", dims = ", dt_a1.array_dims(), endl;

The output is:

rank = 2, dims = Dimensions{3,4}

Compound Datatype

Structured types in C++ are usually defined to describe objects with multiple attribute. For example, the following type S has four fields a, b, c, and d:

struct S {
    int a;
    float b;
    double c[2][3];
    std::array<int, 2> d;
} comp;

To define the HDF5 datatype for a structured type, first, call Datatype::create_compound(size) with desired byte size to get an empty datatype, and then, call Datatype::insert(name, offset, dtype) to insert a new field whose name, byte offset and datatype are specified by the three arguments, respectively. If the type of the field has a predefined HDF5 native datatype counterpart (i.e., numerical scalar, see Native Datatypes), or the field is a raw array of such a numerical scalar, you may just pass a pointer to that class member, from which the offset and datatype are inferred automatically:

auto dt_comp1 = H5::Datatype::create_compound(sizeof(S));
dt_comp1.insert("a", dt_comp1.offset(&S::a), H5::NATIVE_INT_T)
    .insert("b", &S::b)
    .insert("c", &S::c)
    .insert("d", &S::d);

Note that insert returns a reference to the instance itself, so that multiple calls can be chained.

If all the fields are known when constructing the compound datatype, it is simpler to use another overload of create_compound:

auto dt_comp2 = H5::Datatype::create_compound(
    "a", &S::a,     "b", &S::b,
    "c", &S::c,     "d", &S::d);

Here, for each class field, we pass its name and pointer to the method. The library automatically identifies the datatype size, the offsets and datatypes of all fields.

Note that the order of the fields does not matter. It is not necessary to specify all fields, i.e., useless field may be ignored.

To retrieve the member information in a compound datatype, use Datatype::nmembers() for the total number of fields, Datatype::member_index(name) for the index of a field with given name, Datatype::member_name(idx), Datatype::member_class(idx) and Datatype::member_offset() for the name, class and offset, respectively, of a field with given index:

pout <<
    "no. members = ",            dt_comp1.nmembers(), '\n',
    "index of a = ",             dt_comp1.member_index("a"), '\n',
    "name of the 2nd member = ", dt_comp1.member_name(1), '\n',
    "3nd member class,offset = ", dt_comp1.member_class(2), ',',
        dt_comp1.member_offset(2), '\n';

The output is:

no. members = 4
index of a = 0
name of the 2nd member = b
3nd member class,offset = 10,8

Map C++ Types to HDF5 Datatypes

To aid the meta-programming, HIPP defines mappings from C++ types to HDF5 Datatypes.

The method Datatype::from_type<T>() takes a template argument T and returns the corresponding HDF5 Datatype instance. This works for all C++ types with corresponding predefined HDF5 native datatypes (i.e., numerical scalar, see Native Datatypes), or RawArrayTraits compliant type with any of such scalar as element.

For example, the following codes map int, float and double [3][4] to the corresponding HDF5 datatype:

auto dt_int =     H5::Datatype::from_type<int>(),
    dt_float =    H5::Datatype::from_type<float>(),
    dt_double34 = H5::Datatype::from_type<double[3][4]>();

For buffer-like objects such as GeneralArrayTraits compliant ones, the method Datatype::from_buff(buff) returns the datatype of its element.

For example, the following buff1, buff2``and ``buff3 all have element datatype dt_int = NATIVE_INT_T:

vector<int> buff1;
std::array<int, 3> buff2;
int buff3[3][4];
dt_int = H5::Datatype::from_buff(buff1);
dt_int = H5::Datatype::from_buff(buff2);
dt_int = H5::Datatype::from_buff(buff3);

The Geometry Types

To aid the description of dataspaces, HIPP defines the following geometry types:

Dimensions: describes the dimensions of a dataspace or the coordinates of a point in a dataspace.
Points: describes the coordinates of a set of points in a dataspace.
Hyperslab: describes a hyperslab.

Each of them is described by a single section in the following.

Dimensions

A Dimensions instance describes the dimensions of a dataspace or the coordinates of a point in a dataspace.

Several constructors are defined for the Dimensions type:

H5::Dimensions dims1,                   // rank = 0.
    dims2 {3,4},                        // rank = 2, dimensions = {3,4}.
    dims3(vector<size_t>{3,4}),         // rank = 2, dimensions = {3,4}.
    dims4(dims3.data(), dims3.ndims()), // copy from a buffer.
    dims5 = dims4;                      // copy from another Dimensions.

To reset the values of a Dimensions instance, either call Dimensions::set() with argument(s) the same as the constructors, or copy-assign from another Dimensions instance:

dims1.set({3,4});
dims1.set(vector<size_t>{3,4});
dims1.set(dims3.data(), dims3.ndims());
dims1 = H5::Dimensions {3,4};           // copy assignment.

The Dimensions type has methods that are similar to std::vector. For example, you may

push-back an element with Dimensions::push_back() , causing the rank increased by 1.
resize the object by Dimensions::resize().
visit its elements by [].

dims1.push_back(5);                     // {3,4,5}.
dims1.resize(4); dims1[3] = 6;          // {3,4,5,6}.

To visit the object as a “range”, you can

get the buffer address using Dimensions::data() and the rank using Dimensions::ndims().
Get the range iterators using Dimensions::begin() and Dimensions::end().
Retrieve the values as a std::vector using Dimensions::as_vector().
Print the object by streaming it into a std::ostream or other stream-like objects.

H5::Dimensions::value_t *data = dims1.data();
size_t ndims = dims1.ndims();
vector<H5::Dimensions::value_t> &vec = dims1.as_vector();       // get a vector {3,4,5,6}.
pout << "dims1 = {", pout(data, data+ndims), "}\n";             // print dims1 = {3,4,5,6}.
pout << "dims1 = {", pout(dims1.begin(), dims1.end()),"}\n";    // the same.
pout << dims1, endl;                                            // print the object.

The method Dimensions::empty() tests whether or not the dimensions is empty, i.e., rank = 0. operator== can be used to check the equality of two Dimensions objects, i.e., whether or not they have the same rank and dimensions. Dimensions::n_elems() get the total number of elements in the space, i.e., the products of all dimensions:

assert( !dims2.empty() );           // empty check.
assert( dims2 == dims3 );           // equality check.
assert( dims2.n_elems() == 3*4 );   // total no. of elements.

Points

A Points instance describes the coordinates of a set of points in a dataspace. You can construct a instance with different ways:

H5::Points::value_t coords[] = { 0,1,  2,3, 4,4 };
H5::Points pts1,            // 0 points.
    pts2(3, coords),        // 3 points whose coordinates are {0,1}, {2,3}, {4,4}, respectively.
    pts3(3, 2, coords);     // also specify the rank of the space.

Note that for pts2 we ignore the rank of the space, which is to be determined by the context when you use the instance.

Note that the Points instance does not hold the data - it only refers to buffer passed to the constructor. If the buffer is destroyed or changed, the Points object may be invalid.

To retrieve the pointer to the coordinates buffer, the number of points and the rank of the space, call the corresponding methods:

pts1.coords();              // => a pointer to the coords array.
pts1.n_points();            // => no. of points.
pts1.n_dims();              // => the rank of the space.

Hyperslab

A Hyperslab instance describes a hyperslab, i.e., equal-extent blocks with regular stride along each dimension.

To fully describe a hyperslab, four parameters are needed:

start: the starting coordinates of the first block.
count: the number of block along each dimension.
stride: the stride between blocks along each dimension.
block: the extent of a block.

The following figure demonstrates a typical hyperslab in a dataspace. The dataspace is a simple array with extents {8,12}. The hyperslab starts at coordinates {0,1}. Blocks in the hyperslab have stride {4,3}. The whole hyperslab has {2,4} blocks and each block has dimensions {3,2}.

../../../../_images/hyperslab-select.png — Fig. 8 **A hyperslab (blue-shaded areas) in a SIMPLE dataspace.**

The four parameters are all handled by Dimensions objects. stride and block are optional - if not specified, it is assumed to be 1 along each dimension.

Several constructors are defined for Hyperslab:

H5::Hyperslab slab1,                        // rank = 0, empty hyperslab.
    slab2(dims2, {2,3}),                    // rank = 2, start = dims2, count = {2,3}.
    slab3(dims2, {2,3}, {5,10}, {2,2});     // also specify stride = {5,10}, block size = {2,2}.

To retrieve the parameters of a hyperslab, use the corresponding methods:

H5::Dimensions &start = slab1.start(),
    &count  = slab1.count(),
    &stride = slab1.stride(),
    &block  = slab1.block();

Dataspace

A Dataspace instance describes the layout of data elements in storage. The current version of the HDF5 library supports only SIMPLE dataspace (i.e., regular array) and several special dataspaces (all-space, null-space, and scalar-space). More flexible types of dataspaces may be supported in the future.

Dataspace Creation and Access

The HDF5 library predefines three dataspaces as static member of class Dataspace:

Dataspace::vALL : all data elements in the dataset. The exact meaning depends on the context.
Dataspace::vNULL : an empty space, i.e., without any data element.
Dataspace::vSCALAR : a scalar space, i.e., containing only a single data element.

For example, the following codes create references to these three predefined variables:

auto & dsp_all = H5::Dataspace::vALL,
     & dsp_null = H5::Dataspace::vNULL,
     & dsp_scalar = H5::Dataspace::vSCALAR;

To create a new dataspace, define a Dataspace instance by passing the shape (i.e., dimensions) of it:

H5::Dataspace dsp1 = {2,3},                     // dims = {2,3}.
    dsp2 ( vector<hsize_t>{2,3} ),              // using vector.
    dsp3 ( H5::Dimensions{2,3} ),               // using Dimensions.
    dsp4 = dsp1.copy();                         // exact copy.

All the above four dataspaces have ndims = 2 and dims = {2, 3}.

You may print a Dataspace instance by streaming it into a stream-like object:

pout << dsp1, '\n',
        dsp2, '\n',
        dsp3, '\n',
        dsp4, endl;

The output is:

Dataspace{type=SIMPLE, dims={2,3}, maxdims={2,3}}
Dataspace{type=SIMPLE, dims={2,3}, maxdims={2,3}}
Dataspace{type=SIMPLE, dims={2,3}, maxdims={2,3}}
Dataspace{type=SIMPLE, dims={2,3}, maxdims={2,3}}

The method Dataspace::get_type() returns the class of the dataspace. It can be either Dataspace::cNULL, Dataspace::cSCALAR, or Dataspace::cSIMPLE:

assert( dsp_null.get_type() == dsp_null.cNULL );
assert( dsp_scalar.get_type() == dsp_scalar.cSCALAR );
assert( dsp1.get_type() == dsp1.cSIMPLE );

To retrieve the shape of a dataspace instance, call Dataspace::ndims(), Dataspace::dims() and Dataspace::size(). They return the rank, the dimensions, and the total number of elements (i.e., product of dimensions), respectively, of the dataspace:

int ndims = dsp1.ndims();                       // => 2
auto dims = dsp1.dims();                        // => {2,3}
hsize_t size = dsp1.size();                     // => 6

Sub-setting a Dataspace

In special cases where only a subset of a dataset is the I/O target, using the “selection” mechanism on the dataspace is beneficial. By selecting/sub-setting a dataspace, the extents of the dataspace are not changed, but the library maintains extra information about which elements are selected in the dataspace. Then, in the I/O process, only the selected elements in the whole dataset are touched.

HDF5 provides four kinds of selection functions:

All: select all elements in a dataspace. The is the default when a new dataspace is created. (type = Dataspace::selALL_T)
None: none of the elements is selected. (type = Dataspace::selNONE_T)
Hyperslab: select a hyperslab by specifying its start, count, and optionally, stride and block size, at each dimension. (type = Dataspace::selHYPERSLABS_T)
Elements: select a set of elements with given coordinates. (type = Dataspace::selPOINTS_T)

For example, we create a new dataspace shaped {10, 10}:

H5::Dataspace dsp5 {10, 10};

By applying Dataspace::select_none() on it, all elements become un-selected. The method Dataspace::get_select_npoints() allows examination of how many elements/points are selected:

dsp5.select_none();
pout << "no. selected points = \n";
pout << "  none -> ", dsp5.get_select_npoints(), '\n';

The opposite extreme - Dataspace::select_all(), select all elements of a dataspace:

dsp5.select_all();
pout << "  all -> ", dsp5.get_select_npoints(), '\n';

The method Dataspace::select_hyperslab() accepts a Hyperslab instance, and selects the elements described by that hyperslab. For example, we select \(3\times 4 = 12\) elements starting at coordinate {2,2}:

dsp5.select_hyperslab({ {2,2}, {3,4} });
pout << "  hyperslab -> ", dsp5.get_select_npoints(), '\n';

The method Dataspace::select_elements() accepts a Points instance, and selects elements whose coordinates are given the the points. For example, we select three elements whose coordinates are {1,1}, {2,3}, {3,5}, respectively:

hsize_t coords[] = {1,1,  2,3,  3,5};
dsp5.select_elements({3, coords});
pout << "  elements -> ", dsp5.get_select_npoints(), '\n';

The selections are demonstrated in the following Fig. 9:

../../../../_images/dspace-subsetting.svg — Fig. 9 The selected elements after calls of `select_hyperslab()` and `select_elements()`.

To retrieve which kind of selection is used, call Dataspace::get_select_type. For example, the “elements” selection is typed Dataspace::selPOINTS_T. The method Dataspace::select_valid() checks whether or not the selection is valid, i.e., all the coordinates are within the extents of the dataspace:

assert( dsp5.get_select_type() == dsp5.selPOINTS_T );
assert( dsp5.select_valid() );

The output is:

no. selected points =
  none -> 0
  all -> 100
  hyperslab -> 12
  elements -> 3

Note that setting a new selection clears the old selection, and then applies the new selection, i.e., the old and new selections are not combined. The tutorial section The Geometry Types describes how to create geometry types such as the Hyperslab and Points.

Dataset Operations

Dataset Creation and Opening

Whatever API to use for dataset creation, required information are: dataset name, datatype and dataspace. HIPP provides three levels of methods for dataset creation, from simple-but-constrained to complex-but-general:

Datatype and dataspace are auto-deduced from the data object. The methods Group::create_dataset_for(name, object) and Group::create_dataset_for_str(name, str_object) are used for this purpose.
Datatype is auto-deduced from the C++ type, dataspace is explicitly specified. The methods Group::create_dataset<T>(name, dataspace) and Group::create_dataset_str(name, shape...) are used in this cases.
Datatype and dataspace are both explicitly specified. This is the most general case - call Group::create_dataset(name, datatype, dataspace) with all the details passed.

The level-1 calls are the most convenient. For example, assume the following data objects are the targets to dump into a file f1:

short s;
vector<double> d10(10);
int i5[5];
long l34[3][4];
array<float, 3> f3;
vector<array<int, 3> > i83(8);

The allowed types of data objects are the same as those in Using the Dataset Manager.

To create a dataset for any of them, call Group::create_dataset_for() by passing the dataset name and the data object. The library auto-deduces all required information to create that dataset, but no data is actually written. On success, the newly created dataset is returned. If the dataset of that name already exists, it is opened and returned:

f1.create_dataset_for("s",   s);
f1.create_dataset_for("d10", d10);
f1.create_dataset_for("i5",  i5);
f1.create_dataset_for("l34", l34);
f1.create_dataset_for("f3",  f3);
f1.create_dataset_for("i83", i83);

For string-like objects (see Using the Dataset Manager for available types), such as:

string str = "foo";
vector<string> str4 = {"top", "bottom", "left", "right"};
char raw_str[16] = "foo";
char raw_str4[4][16] = {"top", "bottom", "left", "right"};

Call Group::create_dataset_for_str() to create datasets for them:

f1.create_dataset_for_str("str",      str);
f1.create_dataset_for_str("str4",     str4);
f1.create_dataset_for_str("raw_str",  raw_str);
f1.create_dataset_for_str("raw_str4", raw_str4);

The file contents shown by h5ls -r is:

/                        Group
/d10                     Dataset {10}
/f3                      Dataset {3}
/i5                      Dataset {5}
/i83                     Dataset {8, 3}
/l34                     Dataset {3, 4}
/raw_str                 Dataset {SCALAR}
/raw_str4                Dataset {4}
/s                       Dataset {SCALAR}
/str                     Dataset {SCALAR}
/str4                    Dataset {4}

The level-2 calls auto-deduce the datatype from C++ type, but the dataspace is manually passed. For example, for the above data object s, i83, and string-like objects, the equivalent level-2 calls are:

f1.create_dataset<short>("s", H5::Dataspace::vSCALAR);
f1.create_dataset<int>("i83", {8,3});

f1.create_dataset_str("str", str.size()+1);
f1.create_dataset_str("raw_str4", 4, 16);

The level-3 call allows user to pass both datatype and dataspace. The level-3 equivalents to the above dataset creation procedures are:

f1.create_dataset("s",   H5::NATIVE_SHORT_T, H5::Dataspace::vSCALAR);
f1.create_dataset("i83", H5::NATIVE_INT_T,  {8,3});

f1.create_dataset("str",      H5::C_S1_T.resized(str.size()+1), H5::Dataspace::vSCALAR);
f1.create_dataset("raw_str4", H5::C_S1_T.resized(16),           {4});

To open an existing dataset, use Group::open_dataset(name) with given dataset name. For example, to open the dataset “s” created above, write:

auto s_dset = f1.open_dataset("s");

Read/Write Data

Once a dataset is created or opened, read/write operations can be performed on the Dataset instance.

For example, the following codes write a raw array and a std::vector object into two newly created datasets, respectively:

long l34[3][4];
vector<double> d10(10);

auto dset_l34 = f1.create_dataset_for("l34", l34),
     dset_d10 = f1.create_dataset_for("d10", d10);

dset_l34.write(l34);
dset_d10.write(d10);

The acceptable data object argument to Dataset::write() are the same as that to Group::create_dataset_for().

The memory data elements to be read/written may have different datatype from the file data elements, but the implicit conversion must be available. The memory dataspace may also be different, but the total number of elements must match that in file. For example, write 10 single-precision floating-point values to the above dataset, d10, is valid:

float f10[10];
dset_d10.write(f10);

Data elements in raw memory buffer cannot have auto-deduced datatype. User must specify its datatype:

void *ptr_f10 = new float[10];
dset_d10.write(ptr_f10, H5::NATIVE_FLOAT_T);

String-like objects are written by Dataset::write_str() method. Allowed string-like types are the same as those for Group::create_dataset_for_str(). For example, the following codes write a single string and a std::vector of strings into two newly-created datasets, respectively:

string str = "foo";
vector<string> str4 = {"top", "bottom", "left", "right"};

auto dset_str = f1.create_dataset_for_str("str", str),
     dset_str4 = f1.create_dataset_for_str("str4", str4);

dset_str.write_str(str);
dset_str4.write_str(str);

The method Dataset::read() reads the data elements from a dataset to a data object. The object must have consistent (i.e., convertible) datatype and the same number of elements. For example, the above dataset dset_l43 is read by:

dset_l34.read(l34);

Or:

long l12[12];               // Readable if the size matches.
dset_l34.read(l12);

If the argument is a std::vector, it is auto-resized to match the file dataspace:

vector<long> v1;            // vector is auto-resized.
dset_l34.read(v1);
assert(v1.size() == 12);

The value_type of the std::vector may be a RawArrayTraits compliant type, as long as its size is a divisor of the size of the file dataspace:

vector<array<long, 4> > v2;
dset_l34.read(v2);
assert(v2.size() == 3);

If the resize is not feasible, a :class:~HIPP::ErrorLogic exception is thrown:

vector<array<long, 5> > v3;
dset_l34.read(v3);          // Fail and throw an ErrLogic.

String-like data object is similar but read through Dataset::read_str() If the argument is a std::string or std::vector<std::string>, it is auto-adjusted to fit the file contents:

string s1;
vector<string> v4;
dset_str.read_str(s1);
dset_str4.read_str(v4);

Sub-setting the Data

The sub-setting I/O operations are designed for the cases where only a subset of the whole dataset is the target. We have three sub-setting operations for both reading and writting:

Dataset::read_element() and Dataset::write_element(): read/write a single scalar element at a given coordinate in the dataspace.
Dataset::read_elements() and Dataset::write_elements: read/write elements at a set of given coordinates.
Dataset::read_hyperslab() and Dataset::write_hyperslab(): read/write elements at a hyperslab in the dataset.

As an example, we create a new dataset shaped \({3,4}\) with integer elements:

H5::File f1("f1.h5", "w");

int i34[3][4] {};

auto dset = f1.create_dataset_for("i34", i34);
dset.write(i34);

We write a single element i at the coordinate {0, 1} by Dataset::write_element():

int i = 100;
dset.write_element(i, {0,1});
/*
The dataset becomes
0   100 0   0
0   0   0   0
0   0   0   0
*/

We write two elements of i2 at the coordinates {0,2} and {0,3}, respectively, by one call of Dataset::write_elements():

int i2[2] = {8, 16};
hsize_t coords[] = {
    0,2,
    0,3};
dset.write_elements(i2, {2, coords});
/*
The dataset becomes
0   100 8   16
0   0   0   0
0   0   0   0
*/

We write 6 elements of i23 into a hyperslab starting at the coordinate {1,1} and extending {2,3} in the two axes, by Dataset::write_hyperslab():

int i23[2][3] = {0,1,2,3,4,5};
dset.write_hyperslab(i23, {{1,1},{2,3}});
/*
The dataset becomes
0   100 8   16
0   0   1   2
0   3   4   5
*/

The read calls are similar:

int i;
vector<int> v;

dset.read_element(i, {0,1});
dset.read_elements(v, {2, coords});
dset.read_hyperslab(v, {{1,1},{2,3}});

Dataset::read_element() accepts a reference to a scalar as its first argument, while Dataset::read_elements() and Dataset::read_hyperslab accepts a std::vector. The vector is auto-resized to fit the number of loaded elements.

HIPP 0.0.1 documentation

Working with Datasets

Contents

Working with Datasets

Using the Dataset Manager

Put/Get Numerical Data

Put/Get Strings

Subsetting the Data

Datatype

Using Predefined Datatypes

Atomic String Datatype

Array Datatype

Compound Datatype

Map C++ Types to HDF5 Datatypes

The Geometry Types

Dimensions

Points

Hyperslab

Dataspace

Dataspace Creation and Access

Sub-setting a Dataspace

Dataset Operations

Dataset Creation and Opening

Read/Write Data

Sub-setting the Data