Tabular-Data IO
Contents
Tabular-Data IO
The following classes are all defined within namespace HIPP::IO::H5.
XTable
-
template<typename RecordT>
class XTable XTablemanipulates the IO of a tabular data, i.e., array of structured type. The structured type (defined asrecord_t) may be any C++ simple struct/class. For example:struct S { int a; double b[3]; float c[2][3]; array<array<long, 3>, 4> d; };
Those simple structured types exist almost everywhere in C++ programming. The instances of such a data type are usually organized as arrays (e.g.,
std::vector, raw array, or heap buffer).XTabledefines clever interface which enables easy I/O of those data structures. The dataset can be either separately for each field, or as a single one with compound datatype for the wholerecord_t. The limitation is that the structured type must be “simple” type, with no virtual methods or virtual base class. It may have private attributes, but those attributes cannot be directly touched byXTableoutside the class scope.Selectors are also defined for filtering the dataset. User may choose to input/output a subset of all fields, or a subset of all rows.
-
typedef RecordT record_t
-
typedef vector<record_t> table_t
record_t: the type of the structured data element to be input/output. Aliased from the class template argumentRecordT.table_t: a vector of records each typedrecord_t.
-
XTable()
-
template<typename M, typename ...Args>
XTable(const string &name, M record_t::* mem_ptr, Args&&... args) -
template<typename R, typename ...Args>
XTable(const string &name, const std::pair<size_t, XTable<R>> &field, Args&&... args) Constructors.
(1): default constructor. No field is added.
(2,3): initialize by a list of field definitions.
args:namesanddetails, must be paired, where each namestd::stringor string-compatible types for the field name/dataset name in the file. Detail is eithera member pointer of
M R::*where the offset, size, and datatype are inferred from it. sizeof(R) must be equal to sizeof(record_t) and R must be binary compatible with record_t in the specified fields.- a std::pair<size_t, XTable> instance denoting the offset
and details of this field in memory, i.e., the field itself is another structured type.
Fields can be added later by method
add_field()or removed byremove_field().The order of the field specifications is not significant.
It is not necessary to specify all the field in the C++ types. Fields that are not added to
XTableare not touch in file and in memory.
-
XTable(const XTable&) = delete
-
XTable &operator=(const XTable&) = delete
-
XTable(XTable&&) = default
-
XTable &operator=(XTable&&) = default
-
~XTable() = default
XTable is not copyable, but it is movable. After move, the move-from object is set to a undefined but valid state.
-
template<typename R, typename M>
XTable &add_field(const string &name, M R::* mem_ptr) -
template<typename R>
XTable &add_field(const string &name, const std::pair<size_t, XTable<R>> &field) -
bool remove_field(const string &name)
-
bool has_field(const string &name) const noexcept
-
size_t n_fields() const noexcept
-
bool empty() const noexcept
Fields manipulator, acts like a unordered_map.
add_field()adds another fields into the table.(1): use a member pointer. The member offset, size, and datatype are inferred from that pointer.
R: the record type, may be different from record_t, but must be binarycompatible with it.
M: the member type.
(2): add the field that is specified by another
XTableinstance, i.e., the field itself is another structured type. The final field name is “.”-joined fromnameand the field names infield.remove_field()tries to remove a field. Return false on failure.has_field()tests whether a field of given name exists.n_field()returns the number of fields that have been added.empty()is equivalent ton_fields() == 0.
-
XTable &select_rows(hsize_t start, hsize_t count, hsize_t stride = 1, hsize_t block = 1) noexcept
-
XTable &select_all() noexcept
-
XTable &select_fields(const vector<string> &names)
-
XTable &select_all_fields() noexcept
Column (field) and row selectors.
select_rows(): select rows by an 1-D regular hyperslab with start, count stride and block specified by arguments.select_all(): select all rows.select_fields(): select a list of fields of given names.select_all_fields(): select all fields.
-
bool raw_array_as_atomic() const noexcept
-
void raw_array_as_atomic(bool as_atomic = false) noexcept
-
string dataset_create_flag() const noexcept
-
void dataset_create_flag(const string &flag = "ac") noexcept
-
bool dataset_create_pack() const noexcept
-
void dataset_create_pack(bool pack = false) noexcept
Specify or retrieve the detail of dataset creation and I/O.
raw_array_as_atomic: whether to treat raw array field (e.g., int [3], std::array<double, 4>) as ATOMIC ARRAY datatype.dataset_create_flag: use which flag to create the dataset in write operations (see Group::create_dataset).dataset_create_pack: whether to pack the datatype when creating a dataset with COMPOUND datatype.
-
table_t read(Group dgrp)
-
table_t read(const string &file_name)
-
template<typename Buff>
void read(Buff &buff, Group dgrp) -
void read(void *buff, size_t &n, Group dgrp)
Read the table data. Fields are loaded from separate datasets.
(1): load from a
Groupand return the table.(2): load from the root group of a file named
file_name.(3): losd into a buffer
buff. Ifbuffisstd::vector, it is automatically resized. If resize happens, then eitherbuffis enlarged, old data are copied to the new space, new elements are default-constructed at the tail.buffis truncated, the tail of old data is missing but the remaining is not changed.
Then the actual read operation fills the specified fields.
Otherwise if
buffis not a vector type, it must be consistent with the file content.The value_type of the buffer must be binary compatible with
record_t.(4) Read into a raw buffer. On entry,
nis the size (number of entries) of the buffer. On exit, n is the actual number of entries read. Ifnis less than required, raise a ErrLogic exception. With any exception,norbuffmay be modified.In all cases, the space with no specified field is un-modified.
If 0 field is selected, return an empty table in (1), (2);
tblis resized to 0 in (3);buffis untouched whilenis set to 0 in (4).File with 0 rows, or select 0 rows to read, is valid.
-
template<typename Buff>
void write(const Buff &buff, Group dgrp) -
template<typename Buff>
void write(const Buff &buff, const string &file_name, const string flag = "w") -
void write(const void *buff, size_t n, Group dgrp)
Write the table data as separate datasets.
(1): write to a group.
buffis a Contiguous buffer (e.g., raw array,std::vectorof a type that is binary compatible withrecord_t).(2): write into the root group of a file named
file_name. The file access mode isflag(see the file constructor File::File).(3): write
nrecords at the raw buffer starting at the addressbuffinto the groupdgrp.If any dataset already exists in the file, it is opened and modified. In this case the file dataset must have consistent length.
-
table_t read_records(Group dgrp, const string &dset_name)
-
table_t read_records(const string &file_name, const string &dset_name)
-
template<typename Buff>
void read_records(Buff &buff, Group dgrp, const string &dset_name) -
void read_records(void *buff, size_t &n, Group dgrp, const string &dset_name)
Read a single dataset with COMPOUND datatype.
(1): read from a dataset named
dset_nameunder groupdgrpand return the table.(2): read from a dataset named
dset_nameunder the root group of a file namedfile_nameand return the table.(3): the same as (1), but the data is read into
buffwhose element type is binary compatible withrecord_t. Ifbuffis std::vector, it is auto-resized to exactly fit the desired size of data, and then the read is performed.Otherwise
buffis non-vector, its size must be compatible with the size of the selected part in the dataset.(4): the same as (1), but read into a raw buffer. On entry,
nis the size (number of entries) of the buffer. On exit, n is the actual number of entries read. Ifnis less than required, raise a ErrLogic exception. With any exception,norbuffmay be modified.In all cases, the space with no specified field is un-modified.
If 0 field is selected, return an empty table in (1), (2);
tblis resized to 0 in (3);buffis untouched whilenis set to 0 in (4).File with 0 rows, or select 0 rows to read, is valid.
-
template<typename Buff>
void write_records(const Buff &buff, Group dgrp, const string &dset_name); -
template<typename Buff>
void write_records(const Buff &buff, const string &file_name, const string &dset_name, const string flag = "w"); -
void write_records(const void *buff, size_t n, Group dgrp, const string &dset_name);
Write the records into a single dataset with COMPOUND datatype.
(1): write to dataset named
dset_nameunder groupdgrp.buffis a Contiguous buffer (e.g., raw array,std::vectorof a type that is binary compatible withrecord_t).(2): write to the dataset named
dset_nameunder the root group of a file namedfile_name. The file access mode isflag(see the file constructor File::File).(3): write
nrecords at the raw buffer starting at the addressbuffinto the dataset nameddset_nameunder the groupdgrp.If the dataset already exists in the file, it is opened and modified. In this case the file dataset must have consistent length.
-
typedef RecordT record_t
Macros for XTable
-
HIPPIO_H5_XTABLE(R, ...)
Explanation of the macros:
HIPPIO_H5_XTABLE(R, ...): produces an un-named pr-value definition with
record type R and any number of members (may be 0). For example:
auto xtbl = HIPPIO_H5_XTABLE(S, a, b, c);
is expanded as:
auto xtbl = ::HIPP::IO::H5::XTable<R> { "a", &S::a,
"b", &S::b,
"c", &S::c };
HIPPIO_H5_XTABLE(identifier, R, ...): produces a named definition with
identifier name identifier and record type R. For example:
static inline HIPPIO_H5_XTABLE(xtbl, S, a, b, c);
is expanded as:
static inline ::HIPP::IO::H5::XTable<R> xtbl { "a", &S::a,
"b", &S::b,
"c", &S::c };
Example: I/O of tabular data with XTable.
A structured type may have multiple fields each of which is either scalar of RawArray:
struct S {
int a;
double b[3];
float c[2][3];
array<array<long, 3>, 4> d;
};
To read/write array of S, define a XTable by passing name and
member pointer of each desired field into the constructor:
H5::XTable xtbl {
"a", &S::a,
"b", &S::b,
"c", &S::c,
"d", &S::d};
You may reorder the fields, or specify only subset of all fields. When performing I/O, the un-specified gaps are not touched.
XTable is able to write any ContiguousBuffer or the structured type,
for example, RawArray or std::vector:
S data[10];
vector<S> vec_data(10);
The following call of XTable::write write the records into the root group of a file of given name, where each field is write as a single dataset:
xtbl.write(data, "xtbl.h5");
xtbl.write(vec_data, "xtbl.h5"); // (1)
Or, write to a group:
H5::File fout("xtbl.h5");
xtbl.write(data, fout.create_group("S-datasets"));
The call of XTable::write_records writes the records into a single
dataset with COMPOUND datatype. For example, write to a dataset named S-records
under the root group of the file fout:
xtbl.write_records(data, fout, "S-records"); // (2)
To read the databack, call XTable::read for separate datasets for fields, or XTable::read_records for COMPOUND datatype. For example:
vector<S> data_in = xtbl.read(fout.open_group("S-datasets")); // (3)
Using the preprocessor macros, the above definition of xtbl can be simplified further,
e.g.:
auto xtbl2 = HIPPIO_H5_XTABLE(S, a, b, c, d);
HIPPIO_H5_XTABLE_DEF(xtbl3, S, a, b, c, d);
The file content shown by h5dump is:
four datasets under the root group written by statement
(1);four datasets under “S-datasets” written by statement
(2);a dataset with COMPOUND datatype written by statement
(3).
HDF5 "xtbl.h5" {
GROUP "/" {
GROUP "S-datasets" { # <- written by (2)
DATASET "a" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 10 ) / ( 10 ) }
DATA { ... }
}
DATASET "b" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 10, 3 ) / ( 10, 3 ) }
DATA { ... }
}
DATASET "c" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 10, 2, 3 ) / ( 10, 2, 3 ) }
DATA { ... }
}
DATASET "d" {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 10, 4, 3 ) / ( 10, 4, 3 ) }
DATA { ... }
}
}
DATASET "S-records" { # <- written by (3)
DATATYPE H5T_COMPOUND {
H5T_STD_I32LE "a";
H5T_ARRAY { [3] H5T_IEEE_F64LE } "b";
H5T_ARRAY { [2][3] H5T_IEEE_F32LE } "c";
H5T_ARRAY { [4][3] H5T_STD_I64LE } "d";
}
DATASPACE SIMPLE { ( 10 ) / ( 10 ) }
DATA { ... }
}
DATASET "a" { # <- written by (1)
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 10 ) / ( 10 ) }
DATA { ... }
}
DATASET "b" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 10, 3 ) / ( 10, 3 ) }
DATA { ... }
}
DATASET "c" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 10, 2, 3 ) / ( 10, 2, 3 ) }
DATA { ... }
}
DATASET "d" {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 10, 4, 3 ) / ( 10, 4, 3 ) }
DATA { ... }
}
}
}