Remote Memory Access

The following classes are all defined within namespace HIPP::MPI.

Class Win: the RMA Window

class Win
enum [anonymous] : int
enumerator LOCK_SHARED = MPI_LOCK_SHARED
enumerator LOCK_EXCLUSIVE = MPI_LOCK_EXCLUSIVE

Types of lock operation.

enumerator MODE_NOSTORE = MPI_MODE_NOSTORE
enumerator MODE_NOPUT = MPI_MODE_NOPUT
enumerator MODE_NOPRECEDE = MPI_MODE_NOPRECEDE
enumerator MODE_NOSECCEED = MPI_MODE_NOSUCCEED
enumerator MODE_NOCHECK = MPI_MODE_NOCHECK

Synchronization modes.

enumerator BASE = MPI_WIN_BASE
enumerator SIZE = MPI_WIN_SIZE
enumerator DISP_UNIT = MPI_WIN_DISP_UNIT
enumerator CREATE_FLAVOR = MPI_WIN_CREATE_FLAVOR
enumerator MODEL = MPI_WIN_MODEL

Attributes of the window object.

enumerator UNIFIED = MPI_WIN_UNIFIED
enumerator SEPARATE = MPI_WIN_SEPARATE

Memory models.

enumerator FLAVOR_CREATE = MPI_WIN_FLAVOR_CREATE
enumerator FLAVOR_ALLOCATE = MPI_WIN_FLAVOR_ALLOCATE
enumerator FLAVOR_DYNAMIC = MPI_WIN_FLAVOR_DYNAMIC
enumerator FLAVOR_SHARED = MPI_WIN_FLAVOR_SHARED

Window creation flavors.

Memory management methods:

Method

Detail

default constructor

Not available.

copy constructor
and operator=(&&)

Defined; noexcept.

move constructor
and operator=(const &)

Defined; noexcept.

ostream &info(ostream &os = cout, int fmt_cntl = 1) const
friend ostream &operator<<(ostream &os, const Win &win)

info() displays some basic information of the window instance to os.

Parameters

fmt_cntl – Control the display format. 0 for inline information and 1 for a verbose, multiple-line information.

Returns

The argument os is returned.

The overloaded << operator is equivalent to info() with default fmt_cntl.

The returned reference of os allows you to chain the outputs, such as win.info(cout) << " continue printing " << endl.

void free() noexcept

Free the current window object, and set it to a null value as returned by nullval(). free() can be called at any time and even multiple times.

bool is_null() const
void *shared_query(int rank, aint_t &size, int &disp_unit) const
bool get_attr(int keyval, void *&attr_val) const
Group get_group() const
void *get_base() const
aint_t get_size() const
int get_disp_unit() const
int get_create_flavor() const
int get_model() const
void set_info(const Info &info)
Info get_info()
void attach(void *base, aint_t size)
void detach(const void *base)
static Win nullval() noexcept

Inquery the information of the instance.

is_null() tests whether this is a null window object (internally, MPI_WIN_NULL).

shared_query() - for window created with shared memory, we can query the base pointer (return value), its size and displacement unit, given the rank of the remote window.

get_attr() gets cache attribute, including the predefined values. The predefined attribute can also be access by get_base(), get_size(), get_disp_unit(), get_create_flavor(), and get_model().

get_group() returns the processes group associated with this window.

get_info() and set_info() - get used hints and set new hints.

attach() and detach() - attach and detach dynamic memory to the local window. Only valid if the window is created with dynamic flavor.

nullval() - returun a nullval (internally MPI_WIN_NULL).

void put(int target_rank, const ConstDatapacket &origin_dpacket, const ConstDatapacket &target_dpacket)
void put(int target_rank, const ConstDatapacket &origin_dpacket, aint_t target_disp)
void get(int target_rank, const Datapacket &origin_dpacket, const ConstDatapacket &target_dpacket)
void get(int target_rank, const Datapacket &origin_dpacket, aint_t target_disp)
void accumulate(int target_rank, const Oppacket &op, const ConstDatapacket &origin_dpacket, const ConstDatapacket &target_dpacket)
void accumulate(int target_rank, const Oppacket &op, const ConstDatapacket &origin_dpacket, aint_t target_disp)
void get_accumulate(int target_rank, const Oppacket &op, const Datapacket &result_dpacket, const ConstDatapacket &origin_dpacket, const ConstDatapacket &target_dpacket)
void get_accumulate(int target_rank, const Oppacket &op, const Datapacket &result_dpacket, const void *origin_addr, aint_t target_disp)
void fetch_and_op(int target_rank, const Oppacket &op, const Datatype &dtype, void *result_addr, const void *origin_addr, aint_t target_disp)
template<typename T>
void fetch_and_op(int target_rank, const Oppacket &op, T &result, const T &origin, aint_t target_disp)
void compare_and_swap(int target_rank, const Datatype &dtype, void *result_addr, const void *compare_addr, const void *origin_addr, aint_t target_disp)
template<typename T>
void compare_and_swap(int target_rank, T &result, const T &compare, const T &origin, aint_t target_disp)
Requests rput(int target_rank, const ConstDatapacket &origin_dpacket, const ConstDatapacket &target_dpacket)
Requests rput(int target_rank, const ConstDatapacket &origin_dpacket, aint_t target_disp)
Requests rget(int target_rank, const Datapacket &origin_dpacket, const ConstDatapacket &target_dpacket)
Requests rget(int target_rank, const Datapacket &origin_dpacket, aint_t target_disp)
Requests raccumulate(int target_rank, const Oppacket &op, const ConstDatapacket &origin_dpacket, const ConstDatapacket &target_dpacket)
Requests raccumulate(int target_rank, const Oppacket &op, const ConstDatapacket &origin_dpacket, aint_t target_disp)
Requests rget_accumulate(int target_rank, const Oppacket &op, const Datapacket &result_dpacket, const ConstDatapacket &origin_dpacket, const ConstDatapacket &target_dpacket)
Requests rget_accumulate(int target_rank, const Oppacket &op, const Datapacket &result_dpacket, const void *origin_addr, aint_t target_disp)

RMA communication calls.

Here the datapacket of the origin buffer/result buffer should be specified by either triplet, or std::vector, or std::string (see class definition of Datapacket). The triplet is in the form of ((aint_t)displacement, (int)size, (Datatype)datatype). Since all RMA calls are not guaranteed blocking, make sure your buffer is not released until the synchronization calls are made.

put() puts data in origin buffer to target window.

get() gets the data in the target window into origin buffer.

accumulate() accumulates the origin buffer, with operation ‘op’, to the target window.

get_accumulate() is similar to accumulate but also fetch the data before accumulating. The second version of each call accepts only a target_disp, which means the size and datatype are the same as the origin_dpacket.

fetch_and_op() is a simplified version of get_accumulate() which assume size = 1. The template version can be applied to predefined numeric types.

compare_and_swap() is a important synchronization call, which compare the data in compare with the data in target window. If the same, write the data in origin_addr to the target window. The data in target window (before writing) is always returned in result_addr. The template version accepts predefined datatypes.

The non-blocking version (rput(), rget(), …) returns a request object which can be waited or tested later. Non-blocking version is only valid in passive target access.

void fence(int assert = 0)
void start(const Group &group, int assert = 0)
void complete()
void post(const Group &group, int assert = 0)
void wait()
bool test()
void lock(int lock_type, int rank, int assert = 0)
void unlock(int rank)
void lock_all(int assert = 0)
void unlock_all()
void flush(int rank)
void flush_all()
void flush_local(int rank)
void flush_local_all()
void sync()
sync_guard_t fence_g(int begin_assert = 0, int end_assert = 0)
sync_guard_t start_g(const Group &group, int assert = 0)
sync_guard_t post_g(const Group &group, int assert = 0)
sync_guard_t lock_g(int lock_type, int rank, int assert = 0)
sync_guard_t lock_all_g(int assert = 0)

RMA synchronization calls.

These calls either finish the RMA operations or synchoronize the public and private buffer (if in a separate memory model). Look at the MPI standard specifications before using them.

The “guard” version (with a suffix “g”) adopts RAII convention. When called, it returns a guard object. The corresponding epoch-end operation (fence, complete, wait, …) is automatically called at the destruction of the guard object (user may end the epoch in advanced by calling release() of the guard object).

Class Win::sync_guard_t

class SyncGuard

SyncGuard - the Win synchronization guard object. Typically usage is:

auto win = comm.win_create(...);
void *outbuf, *inbuf;
{
    /* Start a local block. Guard is destructed at the end of the block. */
    auto g = win.fence_g();
    win.put(outbuf);
    win.get(inbuf);
}
/* Reuse the outbuf and inbuf now. */
enum [anonymous] : int
enumerator syncOVER = 0
enumerator syncFENCE = 1
enumerator syncSTART = 2
enumerator syncPOST = 3
enumerator syncLOCK = 4
enumerator syncLOCKALL = 5

Memory management methods:

Method

Detail

default constructor

Not available.

copy constructor
and operator=(&&)

Not available.

move constructor
and operator=(const &)

Defined.

SyncGuard(Win &win, int sync_type, int rank, int assert)

Construct a guard on the window, with synchoronization type sync_type and releasing assertion assert.

void release()

Release the lock/fence in the current instance.

Calling the method on a instance without lock/fence hold results an exception ErrLogic with error code ErrLogic::eDOMAIN.