c++ - Bad scaling with OpenMP (cache contention?) -
i trying learn more openmp , cache contention, wrote simple program better understand how works. getting bad thread scaling simple addition of vectors, don't understand why. program:
#include <iostream> #include <omp.h> #include <vector> using namespace std; int main(){ // initialize stuff int nuelements=20000000; // number of elements int i; vector<int> x, y, z; x.assign(nuelements,0); y.assign(nuelements,0); z.assign(nuelements,0); double start; // timer (i=0;i<nuelements;++i){ x[i]=i; y[i]=i; } // increase threads 1 every time, , add 2 vectors (int t=1;t<5;++t){ // re-set z vector values z.clear(); // set number of threads iteration omp_set_num_threads(t); // start timer start=omp_get_wtime(); // parallel #pragma omp parallel (i=0;i<nuelements;++i) { z[i]=x[i]+y[i]; } // print wall time cout<<"time "<<omp_get_max_threads()<<" thread(s) : "<<omp_get_wtime()-start<<endl; } return 0; }
running produces following output:
time 1 thread(s) : 0.020606 time 2 thread(s) : 0.022671 time 3 thread(s) : 0.026737 time 4 thread(s) : 0.02825
i compiled command : clang++ -o3 -std=c++11 -fopenmp=libiomp5 test_omp.cpp
as can see, scaling gets worse number of threads increases. running on 4-core intel-i7 processor. know what's happening?
you limited memory bandwidth, not cpu speed. takes 1 cpu keep memory busy if you're doing addition , copying, adding more cores doesn't help.
if want see benefit of adding more threads, try executing more complex operations on memory small enough fit in l1 or l2 cache.
Comments
Post a Comment