c++ - Align double vs align float for AVX operations -


i want multiply 2 (float/double) vectors avx operators. in order that, need aligned memory. function float values is:

#define size 65536 float *g, *h, *j; g = (float*)aligned_alloc(32, sizeof(float)*size); h = (float*)aligned_alloc(32, sizeof(float)*size); j = (float*)aligned_alloc(32, sizeof(float)*size); //filling g , h data for(int = 0; < size/8; i++)     {         __m256 a_a, b_a, c_a;         a_a = _mm256_load_ps(g+8*i);         b_a = _mm256_load_ps(h+8*i);         c_a = _mm256_mul_ps(a_a, b_a);         _mm256_store_ps (j+i*8, c_a);     } free(g); free(h); free(j); 

that works, when trying double values, memory access error (such if memory not aligned correctly):

double *g_d, *h_d, *i_d; g_d = (double*)aligned_alloc(32, sizeof(double)*size); h_d = (double*)aligned_alloc(32, sizeof(double)*size); i_d = (double*)aligned_alloc(32, sizeof(double)*size); for(int = 0; < size/4; i++) {     __m256d a_a, b_a, c_a;     a_a = _mm256_load_pd(g_d+4*i);     b_a = _mm256_load_pd(h_d+4*i);     c_a = _mm256_mul_pd(a_a, b_a);     _mm256_store_pd (i_d+i*4, c_a); } free(g_d); free(h_d); free(i_d); 

why alignment not working double-values?

when running in gdb, get

program received signal sigsegv, segmentation fault. 0x0000000000401669 in _mm256_load_pd (__p=0x619f70) @ /usr/lib/gcc/x86_64-linux-gnu/5/include/avxintrin.h:836 

edit: found mistake, copy/paste error former function, manifested in function. due not being helpful others (as assume), close question.

well, problem seems stem different data sizes.

  • in first snippet increment float loop size/8=8192. here i'm unsure why increase float array element size 4 8. i < 8192
  • in second snippet increment double loop size/4=16384. here i'm unsure why increase double array element size 8 4. i < 16384 --- ** opposite!**

the last element of double array may surpass memory boundaries!

in both cases increment loop i++. cases proceed follows:

first : (float (4)) j+i*8 (0 < < 8192 ) =>

0      4      8      12      16     20     24     28   v1     .      v2     .       v3     .      v4     .  

second: (double(8)) j+i*4 (0 < < 16384) => v1/v2/v3/v4

0      4      8      12      16     20     24     28     32   v1(h)  v1(l)  v2(l)  v3(l)   v4(l)  v5(l)  v6(l)  v7(l)  v1(h)  v2(h)  v3(h)  v4(h)   v5(h)  v6(h)  v7(h)  v8(h)  v8(h) -------------------------------------------------------------- thing ... thing ... thing .. thing ... 

in second snippet mix high parts(32-bit) , low parts(32-bit) of 64-bit double incrementing 4 (sizeof float) instead of 8 (sizeof double).

another problem _mm256_store_pd requires that...

when source or destination operand memory operand, operand must aligned on 32-byte boundary or general-protection exception (#gp) generated.

for(int = 0; < size/4; i++) doesn't fulfill requirement.

i wondering float version seems work, because _mm256_store_ps requires that...

when source or destination operand memory operand, operand must aligned on 16-byte boundary or general-protection exception (#gp) generated.

but have alignment of 8 bytes...

however, need fix 'scale' of i variable make work.


Comments

Popular posts from this blog

c# - DevExpress.Wpf.Grid.InfiniteGridSizeException was unhandled -

scala - 'wrong top statement declaration' when using slick in IntelliJ -

PySide and Qt Properties: Connecting signals from Python to QML -