AVX (256-bit) Vectors¶
The following classes are all defined within namespace HIPP::SIMD.
Vec<double, 4>¶
-
template<>
classVec<double, 4>¶ A vector of four double-precision (256 bits in total) values.
Vec<double, 4>can be copied, copy-constructed, moved, and move-constructed. The copy and move operations and destructor are allnoexcept.Vec<double, 4>is binary-compatible with the intrinsic type__m256d, i.e., they have the same length and alignment.-
typedef double
scal_t¶ -
typedef float
scal_hp_t¶ -
typedef __m256d
vec_t¶ -
typedef __m128d
vec_hc_t¶ -
typedef __m128
vec_hp_t¶ -
typedef int64_t
iscal_t¶ -
typedef int32_t
iscal_hp_t¶ -
typedef __m256i
ivec_t¶ -
typedef __m128d
ivec_hp_t¶ -
typedef __mmask8
mask8_t¶ Type aliases.
scal_tis the scalar type (i.e., element type) of the SIMD vector.scal_hp_tis half-precision scalar type.vec_tis the intrinsic SIMD vector,vec_hc_tandvec_hp_trepresent the half-precision and half-count types, respectively.Scalar and vector types for integers are also defined.
-
enum [anonymous] : size_t¶
-
enumerator
NPACK= 4¶ -
enumerator
NBIT= 256¶ -
enumerator
VECSIZE= sizeof(vec_t)¶ -
enumerator
SCALSIZE= sizeof(scal_t)¶
NPACKis the number of scalars in the vector,NBITis the number of bits of the vector register.VECSIZEandSCALSIZEare size in bytes of the vector and scalar.-
enumerator
-
Vec() noexcept¶ -
Vec(scal_t e3, scal_t e2, scal_t e1, scal_t e0) noexcept¶ -
explicit
Vec(scal_t a) noexcept¶ -
explicit
Vec(caddr_t mem_addr) noexcept¶ -
Vec(const vec_t &a) noexcept¶ -
Vec(const scal_t *base_addr, ivec_t vindex, const int scale = SCALSIZE) noexcept¶ -
Vec(vec_t src, const scal_t *base_addr, ivec_t vindex, vec_t mask, const int scale = SCALSIZE) noexcept¶ Initializers.
Default Initializer:
Vec()gives an un-initialized vector.Vec(e3, e2, e1, e0)constructs a vector of four given elements, from higher address valuee3, to lower address valuee0.Vec(scal_t a)constructs a vector of four repeated scalar valuea.Vec(mem_addr): the four elements are loaded from the memory addressmem_addr(must be aligned at 32-byte boundary).Vec(const vec_t &a): copy the intrinsic vectora.Vec(base_addr, vindex, scale): load usinggather().Vec(src, base_addr, vindex, mask, scale): load usinggatherm().
The address type
caddr_tcan be eitherconst double *,const vec_t *orconst Vec<double, 4> *.
-
ostream &
info(ostream &os = cout, int fmt_cntl = 1) const¶ -
friend ostream &
operator<<(ostream &os, const Vec &v)¶ info()displays the content of the vector toos.- Parameters
fmt_cntl – Control the display format. 0 for an inline printing and 1 for a verbose, multiple-line version.
- Returns
The argument
osis returned.
The overloaded << operator is equivalent to
info()with defaultfmt_cntl.The returned reference to
osallows you to chain the outputs, such asvec.info(cout) << " continue printing " << std::endl.
-
const vec_t &
val() const noexcept¶ -
vec_t &
val() noexcept¶ -
const scal_t &
operator[](size_t n) const noexcept¶ -
scal_t &
operator[](size_t n) noexcept¶ val()return the intrinsic vector value.operator[](n)takes the n-th scalar element from the vector.
-
Vec &
load(caddr_t mem_addr) noexcept¶ -
Vec &
loadu(caddr_t mem_addr) noexcept¶ -
Vec &
loadm(caddr_t mem_addr, ivec_t mask) noexcept¶ -
Vec &
load1(const scal_t *mem_addr) noexcept¶ -
Vec &
bcast(const scal_t *mem_addr) noexcept¶ -
Vec &
bcast(const vec_hc_t *mem_addr) noexcept¶ -
Vec &
gather(const scal_t *base_addr, ivec_t vindex, const int scale = SCALSIZE) noexcept¶ -
Vec &
gatherm(vec_t src, const scal_t *base_addr, ivec_t vindex, vec_t mask, const int scale = SCALSIZE) noexcept¶ -
Vec &
gather_idxhp(const scal_t *base_addr, ivec_hp_t vindex, const int scale = SCALSIZE) noexcept¶ -
Vec &
gatherm_idxhp(vec_t src, const scal_t *base_addr, ivec_hp_t vindex, vec_t mask, const int scale = SCALSIZE) noexcept¶ Load operations: load data from memory. The address type
caddr_tcan be eitherconst double *,const vec_t *orconst Vec<double, 4> *.load()loads a pack of 4 double precision floating-point scalar values into the calling instance from the aligned addressmem_addr.loadu()allows thatmem_addris not aligned.loadm()usesmask(elements are zeroed out when the highest bit of the corresponding element is not set).load1()load a single scalar value and repeats it four times to make a vector.bcast(const scal_t *)is the same asload1().bcast(const vec_hc_t *)loads two scalar values and repeats them twice to make a vector.gather()loads 4 scalar values from address starting atbase_addr, each offset by the corresponding 64-bit element invindex(in bytes, and scaled byscale;scalecan be 1, 2, 4, or 8).gatherm()is the same asgather()but usingmask(elements are copied from src when the highest bit is not set in the corresponding element).gather_idxhp()is likegather()but uses 32-bit offset.gatherm_idxhp()us likegatherm()but uses 32-bit offset.
-
const Vec &
store(addr_t mem_addr) const noexcept¶ -
const Vec &
storeu(addr_t mem_addr) const noexcept¶ -
const Vec &
storem(addr_t mem_addr, ivec_t mask) const noexcept¶ -
const Vec &
stream(addr_t mem_addr) const noexcept¶ -
const Vec &
scatter(void *base_addr, ivec_t vindex, int scale = SCALSIZE) const noexcept¶ -
const Vec &
scatterm(void *base_addr, mask8_t k, ivec_t vindex, int scale = SCALSIZE) const noexcept¶ -
const Vec &
scatter_idxhp(void *base_addr, ivec_hp_t vindex, int scale = SCALSIZE) const noexcept¶ -
const Vec &
scatterm_idxhp(void *base_addr, mask8_t k, ivec_hp_t vindex, int scale = SCALSIZE) const noexcept¶ -
Vec &
store(addr_t mem_addr) noexcept¶ -
Vec &
storeu(addr_t mem_addr) noexcept¶ -
Vec &
storem(addr_t mem_addr, ivec_t mask) noexcept¶ -
Vec &
stream(addr_t mem_addr) noexcept¶ -
Vec &
scatter(void *base_addr, ivec_t vindex, int scale = SCALSIZE) noexcept¶ -
Vec &
scatterm(void *base_addr, mask8_t k, ivec_t vindex, int scale = SCALSIZE) noexcept¶ -
Vec &
scatter_idxhp(void *base_addr, ivec_hp_t vindex, int scale = SCALSIZE) noexcept¶ -
Vec &
scatterm_idxhp(void *base_addr, mask8_t k, ivec_hp_t vindex, int scale = SCALSIZE) noexcept¶ Store operations: store element from the current instance to a memory location. The address type
addr_tcan be eitherdouble *,vec_t *orVec<double, 4> *.Each store operation has a non-
constversion used for a non-constant instance.All the store operations return the reference to the instance itself.
store()stores 4 double precision floating-point scalar values into the aligned addressmem_addr.storeu()does not need the address to be aligned.storem()uses themask(elements are not stored when the highest bit is not set in the corresponding element).stream()uses a non-temporal memory hint.mem_addrmust be aligned.scatter()stores elements into the address starting atbase_addrand offset by each 64-bit element invindex(in byte, and scaled byscale;scalecan be 1, 2, 4, or 8).scatterm()is the same asscatter()but uses amask(elements are not stored when the corresponding mask bit is not set).scatter_idxhp()is the same asscatter()but uses 32-bit offset.scatterm_idxhp()is the same asscatterm()but uses 32-bit offset.
-
scal_t
to_scal() const noexcept¶ -
int
movemask() const noexcept¶ -
Vec
movedup() const noexcept¶ to_scal()returns the lower double-precision floating-point scalar value.movemask()sets each bit of the returned value based on the corresponding most significate bit in each double precision floating-point scalar value.movedup()duplicates even-indexed scalar values.
-
Vec &
set(scal_t e3, scal_t e2, scal_t e1, scal_t e0) noexcept¶ -
Vec &
set1(scal_t a) noexcept¶ -
Vec &
set1(vec_hc_t a) noexcept¶ -
Vec &
set() noexcept¶ -
Vec &
setzero() noexcept¶ -
Vec &
undefined() noexcept¶ Set the scalar values of the calling instance.
set(e3,e2,e1,e0)sets each elements from the higher address valuee3to lower address valuee0.set1(scal_t a)repeats a scalar value 4 times.set1(vec_hc_t a)repeats the lower scalar value ofa4 times.set()is the same assetzero().setzero()set all bits to zero.undefined()set scalars to undefined values.
-
Vec
operator+(const Vec &a) const noexcept¶ -
Vec
operator-(const Vec &a) const noexcept¶ -
Vec
operator*(const Vec &a) const noexcept¶ -
Vec
operator/(const Vec &a) const noexcept¶ -
Vec
operator++(int) noexcept¶ -
Vec &
operator++() noexcept¶ -
Vec
operator--(int) noexcept¶ -
Vec &
operator--() noexcept¶ -
Vec &
operator+=(const Vec &a) noexcept¶ -
Vec &
operator-=(const Vec &a) noexcept¶ -
Vec &
operator*=(const Vec &a) noexcept¶ -
Vec &
operator/=(const Vec &a) noexcept¶ -
Vec
hadd(const Vec &a) const noexcept¶ -
Vec
hsub(const Vec &a) const noexcept¶ Arithmetic operations. All of the above operations are element-wise.
hadd()performs horizontal addtion, i.e., the result of a.hadd(b) is { a[0]+a[1], b[0]+b[1], a[2]+a[3], b[2]+b[3] }.hsub()performs horizontal subtration, i.e., the result of a.hsub(b) is { a[0]-a[1], b[0]-b[1], a[2]-a[3], b[2]-b[3] }.
-
Vec
operator&(const Vec &a) const noexcept¶ -
Vec
andnot(const Vec &a) const noexcept¶ -
Vec
operator|(const Vec &a) const noexcept¶ -
Vec
operator~() const noexcept¶ -
Vec
operator^(const Vec &a) const noexcept¶ -
Vec &
operator&=(const Vec &a) noexcept¶ -
Vec &
operator|=(const Vec &a) noexcept¶ -
Vec &
operator^=(const Vec &a) noexcept¶ Bitwise Logic operations.
-
Vec
operator==(const Vec &a) const noexcept¶ -
Vec
operator!=(const Vec &a) const noexcept¶ -
Vec
operator<(const Vec &a) const noexcept¶ -
Vec
operator<=(const Vec &a) const noexcept¶ -
Vec
operator>(const Vec &a) const noexcept¶ -
Vec
operator>=(const Vec &a) const noexcept¶ Relation (comparison) operations. The comparision is element-wise for each scalar. If true, all the bits are set in the corresponding result element.
-
Vec
blend(const Vec &a, const int imm8) const noexcept¶ -
Vec
blend(const Vec &a, const Vec &mask) const noexcept¶ Blend two vectors using control mask
imm8. For each bit inimm8, if set, taken the corresponding result element fromb, otherwise froma.The second version uses a vector
mask, i.e., each mask bit is taken from the highest bit of the corresponding 64-bit elements.
-
Vec
sqrt() const noexcept¶ -
Vec
ceil() const noexcept¶ -
Vec
floor() const noexcept¶ -
Vec
round(const int rounding) const noexcept¶ -
Vec
max(const Vec &a) const noexcept¶ -
Vec
min(const Vec &a) const noexcept¶ -
Vec
sin() const noexcept¶ -
Vec
cos() const noexcept¶ -
Vec
log() const noexcept¶ -
Vec
exp() const noexcept¶ -
Vec
pow(const Vec &a) const noexcept¶ Elementary math functions.
sin(),cos(),log(),exp(),pow()may not be serialized, depending on the compiler.
-
typedef double
Vector<float, 8>¶
-
template<>
classVec<float, 8>¶ A vector of eight single-precision (256 bits in total) values.
Vec<float, 4>can be copied, copy-constructed, moved, and move-constructed. The copy and move operations and destructor are allnoexcept.Vec<float, 4>is binary-compatible with the intrinsic type__m256, i.e., they have the same length and alignment.-
typedef float
scal_t¶ -
typedef __m256
vec_t¶ -
typedef __m128
vec_hc_t¶ -
typedef int32_t
iscal_t¶ -
typedef __m256i
ivec_t¶ -
typedef __mmask8
mask8_t¶ Type aliases.
scal_tis the scalar type (i.e., element type) of the SIMD vector.vec_tis the intrinsic SIMD vector,vec_hc_trepresents the half-count type.
-
enum [anonymous] : size_t¶
-
enumerator
NPACK= 8¶ -
enumerator
NBIT= 256¶ -
enumerator
VECSIZE= sizeof(vec_t)¶ -
enumerator
SCALSIZE= sizeof(scal_t)¶
NPACKis the number of scalars in the vector,NBITis the number of bits of the vector register.VECSIZEandSCALSIZEare size in bytes of the vector and scalar.-
enumerator
-
Vec() noexcept¶ -
Vec(scal_t e7, scal_t e6, scal_t e5, scal_t e4, scal_t e3, scal_t e2, scal_t e1, scal_t e0) noexcept¶ -
explicit
Vec(scal_t a) noexcept¶ -
explicit
Vec(caddr_t mem_addr) noexcept¶ -
Vec(const vec_t &a) noexcept¶ -
Vec(const scal_t *base_addr, ivec_t vindex, const int scale = SCALSIZE) noexcept¶ -
Vec(vec_t src, const scal_t *base_addr, ivec_t vindex, vec_t mask, const int scale) noexcept¶ Initializers.
Default Initializer:
Vec()gives an un-initialized vector.Vec(e7, e6, ..., e0)constructs a vector of eight given elements from higher address valuee7, to lower address valuee0.Vec(scal_t a)constructs a vector of eight repeated scalar valuea.Vec(mem_addr): the eight elements are loaded from the memory addressmem_addr(must be aligned at 32-byte boundary).Vec(const vec_t &a): copy the intrinsic vectora.Vec(base_addr, vindex, scale): load usinggather().Vec(src, base_addr, vindex, mask, scale): load usinggatherm().
The address type
caddr_tcan be eitherconst float *,const vec_t *orconst Vec<float, 8> *.
-
ostream &
info(ostream &os = cout, int fmt_cntl = 1) const¶ -
friend ostream &
operator<<(ostream &os, const Vec &v)¶ info()displays the content of the vector toos.- Parameters
fmt_cntl – Control the display format. 0 for an inline printing and 1 for a verbose, multiple-line version.
- Returns
The argument
osis returned.
The overloaded << operator is equivalent to
info()with defaultfmt_cntl.The returned reference to
osallows you to chain the outputs, such asvec.info(cout) << " continue printing " << std::endl.
-
const vec_t &
val() const noexcept¶ -
vec_t &
val() noexcept¶ -
const scal_t &
operator[](size_t n) const noexcept¶ -
scal_t &
operator[](size_t n) noexcept¶ val()return the intrinsic vector value.operator[](n)takes the n-th scalar element from the vector.
-
Vec &
load(caddr_t mem_addr) noexcept¶ -
Vec &
loadu(caddr_t mem_addr) noexcept¶ -
Vec &
loadm(caddr_t mem_addr, ivec_t mask) noexcept¶ -
Vec &
load1(const scal_t *mem_addr) noexcept¶ -
Vec &
bcast(const scal_t *mem_addr) noexcept¶ -
Vec &
bcast(const vec_hc_t *mem_addr) noexcept¶ -
Vec &
gather(const scal_t *base_addr, ivec_t vindex, const int scale = SCALSIZE) noexcept¶ -
Vec &
gatherm(vec_t src, const scal_t *base_addr, ivec_t vindex, vec_t mask, const int scale = SCALSIZE) noexcept¶ Load operations: load data from memory. The address type
caddr_tcan be eitherconst double *,const vec_t *orconst Vec<double, 4> *.load()loads a pack of 8 single precision floating-point scalar values into the calling instance from the aligned addressmem_addr.loadu()allows thatmem_addris not aligned.loadm()usesmask(elements are zeroed out when the highest bit of the corresponding element is not set).load1()load a single scalar value and repeats it eight times to make a vector.bcast(const scal_t *)is the same asload1().bcast(const vec_hc_t *)loads four scalar values and repeats them twice to make a vector.gather()loads 8 scalar values from address starting atbase_addr, each offset by the corresponding 32-bit element invindex(in bytes, and scaled byscale;scalecan be 1, 2, 4, or 8).gatherm()is the same asgather()but usingmask(elements are copied from src when the highest bit is not set in the corresponding element).
-
const Vec &
store(addr_t mem_addr) const noexcept¶ -
const Vec &
storeu(addr_t mem_addr) const noexcept¶ -
const Vec &
storem(addr_t mem_addr, ivec_t mask) const noexcept¶ -
const Vec &
stream(addr_t mem_addr) const noexcept¶ -
const Vec &
scatter(void *base_addr, ivec_t vindex, int scale = SCALSIZE) const noexcept¶ -
const Vec &
scatterm(void *base_addr, mask8_t k, ivec_t vindex, int scale = SCALSIZE) const noexcept¶ -
Vec &
store(addr_t mem_addr) noexcept¶ -
Vec &
storeu(addr_t mem_addr) noexcept¶ -
Vec &
storem(addr_t mem_addr, ivec_t mask) noexcept¶ -
Vec &
stream(addr_t mem_addr) noexcept¶ -
Vec &
scatter(void *base_addr, ivec_t vindex, int scale = SCALSIZE) noexcept¶ -
Vec &
scatterm(void *base_addr, mask8_t k, ivec_t vindex, int scale = SCALSIZE) noexcept¶ Store operations: store element from the current instance to a memory location. The address type
addr_tcan be eitherdouble *,vec_t *orVec<double, 4> *.Each store operation has a non-
constversion used for a non-constant instance.All the store operations return the reference to the instance itself.
store()stores 8 single precision floating-point scalar values into the aligned addressmem_addr.storeu()does not need the address to be aligned.storem()uses themask(elements are not stored when the highest bit is not set in the corresponding element).stream()uses a non-temporal memory hint.mem_addrmust be aligned.scatter()stores elements into the address starting atbase_addrand offset by each 32-bit element invindex(in byte, and scaled byscale;scalecan be 1, 2, 4, or 8).scatterm()is the same asscatter()but uses amask(elements are not stored when the corresponding mask bit is not set).
-
scal_t
to_scal() const noexcept¶ -
int
movemask() const noexcept¶ -
Vec
movehdup() const noexcept¶ -
Vec
moveldup() const noexcept¶ to_scal()returns the lower single precision floating-point scalar value.movemask()sets each bit of the returned value based on the corresponding most significate bit in each single precision floating-point scalar value.movehdup()duplicates odd-indexed scalar values.moveldup()duplicates even-indexed scalar values.
-
Vec &
set(scal_t e7, scal_t e6, scal_t e5, scal_t e4, scal_t e3, scal_t e2, scal_t e1, scal_t e0) noexcept¶ -
Vec &
set1(scal_t a) noexcept¶ -
Vec &
set1(vec_hc_t a) noexcept¶ -
Vec &
set() noexcept¶ -
Vec &
setzero() noexcept¶ -
Vec &
undefined() noexcept¶ Set the scalar values of the calling instance.
set(e7,e6,...,e0)sets each elements from the higher address valuee7to lower address valuee0.set1(scal_t a)repeats a scalar value 8 times.set1(vec_hc_t a)repeats the lower scalar value ofa8 times.set()is the same assetzero().setzero()set all bits to zero.undefined()set scalars to undefined values.
-
Vec
operator+(const Vec &a) const noexcept¶ -
Vec
operator-(const Vec &a) const noexcept¶ -
Vec
operator*(const Vec &a) const noexcept¶ -
Vec
operator/(const Vec &a) const noexcept¶ -
Vec
operator++(int) noexcept¶ -
Vec &
operator++() noexcept¶ -
Vec
operator--(int) noexcept¶ -
Vec &
operator--() noexcept¶ -
Vec &
operator+=(const Vec &a) noexcept¶ -
Vec &
operator-=(const Vec &a) noexcept¶ -
Vec &
operator*=(const Vec &a) noexcept¶ -
Vec &
operator/=(const Vec &a) noexcept¶ -
Vec
hadd(const Vec &a) const noexcept¶ -
Vec
hsub(const Vec &a) const noexcept¶ Arithmetic operations. All of the above operations are element-wise.
hadd()performs horizontal addtion, i.e., the result of a.hadd(b) is { a[0]+a[1], a[2]+a[3], b[0]+b[1], b[2]+b[3], …, b[4]+b[5], b[6]+b[7] }.hsub()performs horizontal subtration, i.e., the result of a.hsub(b) is { a[0]-a[1], a[2]-a[3], b[0]-b[1], b[2]-b[3], …, b[4]-b[5], b[6]-b[7] }.
-
Vec
operator&(const Vec &a) const noexcept¶ -
Vec
andnot(const Vec &a) const noexcept¶ -
Vec
operator|(const Vec &a) const noexcept¶ -
Vec
operator~() const noexcept¶ -
Vec
operator^(const Vec &a) const noexcept¶ -
Vec &
operator&=(const Vec &a) noexcept¶ -
Vec &
operator|=(const Vec &a) noexcept¶ -
Vec &
operator^=(const Vec &a) noexcept¶ Bitwise Logic operations.
-
Vec
operator==(const Vec &a) const noexcept¶ -
Vec
operator!=(const Vec &a) const noexcept¶ -
Vec
operator<(const Vec &a) const noexcept¶ -
Vec
operator<=(const Vec &a) const noexcept¶ -
Vec
operator>(const Vec &a) const noexcept¶ -
Vec
operator>=(const Vec &a) const noexcept¶ Relation (comparison) operations. The comparision is element-wise for each scalar. If true, all the bits are set in the corresponding result element.
-
Vec
blend(const Vec &a, const int imm8) const noexcept¶ -
Vec
blend(const Vec &a, const Vec &mask) const noexcept¶ Blend two vectors using control mask
imm8. For each bit inimm8, if set, taken the corresponding result element fromb, otherwise froma.The second version uses a vector
mask, i.e., each mask bit is taken from the highest bit of the corresponding 64-bit elements.
-
Vec
rcp() const noexcept¶ -
Vec
sqrt() const noexcept¶ -
Vec
rsqrt() const noexcept¶ -
Vec
ceil() const noexcept¶ -
Vec
floor() const noexcept¶ -
Vec
round(const int rounding) const noexcept¶ -
Vec
max(const Vec &a) const noexcept¶ -
Vec
min(const Vec &a) const noexcept¶ Elementary math functions.
-
Vec
log2_fast() const noexcept¶ -
Vec
log_fast() const noexcept¶ -
Vec
log10_fast() const noexcept¶ -
Vec
log2_faster() const noexcept¶ -
Vec
log_faster() const noexcept¶ -
Vec
log10_faster() const noexcept¶ -
Vec
pow2_fast() const noexcept¶ -
Vec
exp_fast() const noexcept¶ -
Vec
pow10_fast() const noexcept¶ -
Vec
pow2_faster() const noexcept¶ -
Vec
exp_faster() const noexcept¶ -
Vec
pow10_faster() const noexcept¶ Vectorized math functions. These functions are not supported by hardware. They are implemented by approximation algorithms.
xxx_faster()is faster thanxxx_fast(), but has lower precision.
-
typedef float