c++ - Align double vs align float for AVX operations -
i want multiply 2 (float/double) vectors avx operators. in order that, need aligned memory. function float values is:
#define size 65536 float *g, *h, *j; g = (float*)aligned_alloc(32, sizeof(float)*size); h = (float*)aligned_alloc(32, sizeof(float)*size); j = (float*)aligned_alloc(32, sizeof(float)*size); //filling g , h data for(int = 0; < size/8; i++) { __m256 a_a, b_a, c_a; a_a = _mm256_load_ps(g+8*i); b_a = _mm256_load_ps(h+8*i); c_a = _mm256_mul_ps(a_a, b_a); _mm256_store_ps (j+i*8, c_a); } free(g); free(h); free(j); that works, when trying double values, memory access error (such if memory not aligned correctly):
double *g_d, *h_d, *i_d; g_d = (double*)aligned_alloc(32, sizeof(double)*size); h_d = (double*)aligned_alloc(32, sizeof(double)*size); i_d = (double*)aligned_alloc(32, sizeof(double)*size); for(int = 0; < size/4; i++) { __m256d a_a, b_a, c_a; a_a = _mm256_load_pd(g_d+4*i); b_a = _mm256_load_pd(h_d+4*i); c_a = _mm256_mul_pd(a_a, b_a); _mm256_store_pd (i_d+i*4, c_a); } free(g_d); free(h_d); free(i_d); why alignment not working double-values?
when running in gdb, get
program received signal sigsegv, segmentation fault. 0x0000000000401669 in _mm256_load_pd (__p=0x619f70) @ /usr/lib/gcc/x86_64-linux-gnu/5/include/avxintrin.h:836 edit: found mistake, copy/paste error former function, manifested in function. due not being helpful others (as assume), close question.
well, problem seems stem different data sizes.
- in first snippet increment
floatloopsize/8=8192. here i'm unsure why increasefloatarray element size 4 8.i < 8192 - in second snippet increment
doubleloopsize/4=16384. here i'm unsure why increasedoublearray element size 8 4.i < 16384--- ** opposite!**
the last element of double array may surpass memory boundaries!
in both cases increment loop i++. cases proceed follows:
first : (float (4)) j+i*8 (0 < < 8192 ) =>
0 4 8 12 16 20 24 28 v1 . v2 . v3 . v4 . second: (double(8)) j+i*4 (0 < < 16384) => v1/v2/v3/v4
0 4 8 12 16 20 24 28 32 v1(h) v1(l) v2(l) v3(l) v4(l) v5(l) v6(l) v7(l) v1(h) v2(h) v3(h) v4(h) v5(h) v6(h) v7(h) v8(h) v8(h) -------------------------------------------------------------- thing ... thing ... thing .. thing ... in second snippet mix high parts(32-bit) , low parts(32-bit) of 64-bit double incrementing 4 (sizeof float) instead of 8 (sizeof double).
another problem _mm256_store_pd requires that...
when source or destination operand memory operand, operand must aligned on 32-byte boundary or general-protection exception (#gp) generated.
for(int = 0; < size/4; i++) doesn't fulfill requirement.
i wondering float version seems work, because _mm256_store_ps requires that...
when source or destination operand memory operand, operand must aligned on 16-byte boundary or general-protection exception (#gp) generated.
but have alignment of 8 bytes...
however, need fix 'scale' of i variable make work.
Comments
Post a Comment