c++ - Align double vs align float for AVX operations -
i want multiply 2 (float/double) vectors avx operators. in order that, need aligned memory. function float values is:
#define size 65536 float *g, *h, *j; g = (float*)aligned_alloc(32, sizeof(float)*size); h = (float*)aligned_alloc(32, sizeof(float)*size); j = (float*)aligned_alloc(32, sizeof(float)*size); //filling g , h data for(int = 0; < size/8; i++) { __m256 a_a, b_a, c_a; a_a = _mm256_load_ps(g+8*i); b_a = _mm256_load_ps(h+8*i); c_a = _mm256_mul_ps(a_a, b_a); _mm256_store_ps (j+i*8, c_a); } free(g); free(h); free(j);
that works, when trying double values, memory access error (such if memory not aligned correctly):
double *g_d, *h_d, *i_d; g_d = (double*)aligned_alloc(32, sizeof(double)*size); h_d = (double*)aligned_alloc(32, sizeof(double)*size); i_d = (double*)aligned_alloc(32, sizeof(double)*size); for(int = 0; < size/4; i++) { __m256d a_a, b_a, c_a; a_a = _mm256_load_pd(g_d+4*i); b_a = _mm256_load_pd(h_d+4*i); c_a = _mm256_mul_pd(a_a, b_a); _mm256_store_pd (i_d+i*4, c_a); } free(g_d); free(h_d); free(i_d);
why alignment not working double
-values?
when running in gdb, get
program received signal sigsegv, segmentation fault. 0x0000000000401669 in _mm256_load_pd (__p=0x619f70) @ /usr/lib/gcc/x86_64-linux-gnu/5/include/avxintrin.h:836
edit: found mistake, copy/paste error former function, manifested in function. due not being helpful others (as assume), close question.
well, problem seems stem different data sizes.
- in first snippet increment
float
loopsize/8
=8192. here i'm unsure why increasefloat
array element size 4 8.i < 8192
- in second snippet increment
double
loopsize/4
=16384. here i'm unsure why increasedouble
array element size 8 4.i < 16384
--- ** opposite!**
the last element of double
array may surpass memory boundaries!
in both cases increment loop i++
. cases proceed follows:
first : (float (4)) j+i*8 (0 < < 8192 ) =>
0 4 8 12 16 20 24 28 v1 . v2 . v3 . v4 .
second: (double(8)) j+i*4 (0 < < 16384) => v1/v2/v3/v4
0 4 8 12 16 20 24 28 32 v1(h) v1(l) v2(l) v3(l) v4(l) v5(l) v6(l) v7(l) v1(h) v2(h) v3(h) v4(h) v5(h) v6(h) v7(h) v8(h) v8(h) -------------------------------------------------------------- thing ... thing ... thing .. thing ...
in second snippet mix high parts(32-bit) , low parts(32-bit) of 64-bit double incrementing 4 (sizeof float) instead of 8 (sizeof double).
another problem _mm256_store_pd
requires that...
when source or destination operand memory operand, operand must aligned on 32-byte boundary or general-protection exception (#gp) generated.
for(int = 0; < size/4; i++)
doesn't fulfill requirement.
i wondering float
version seems work, because _mm256_store_ps
requires that...
when source or destination operand memory operand, operand must aligned on 16-byte boundary or general-protection exception (#gp) generated.
but have alignment of 8 bytes...
however, need fix 'scale' of i
variable make work.
Comments
Post a Comment