c++ - Bad scaling with OpenMP (cache contention?) -


i trying learn more openmp , cache contention, wrote simple program better understand how works. getting bad thread scaling simple addition of vectors, don't understand why. program:

#include <iostream> #include <omp.h> #include <vector>  using namespace std;  int main(){      // initialize stuff     int nuelements=20000000; // number of elements     int i;     vector<int> x, y, z;     x.assign(nuelements,0);     y.assign(nuelements,0);     z.assign(nuelements,0);     double start; // timer      (i=0;i<nuelements;++i){        x[i]=i;        y[i]=i;     }          // increase threads 1 every time, , add 2 vectors       (int t=1;t<5;++t){          // re-set z vector values         z.clear();          // set number of threads iteration         omp_set_num_threads(t);          // start timer         start=omp_get_wtime();          // parallel #pragma omp parallel         (i=0;i<nuelements;++i)         {             z[i]=x[i]+y[i];         }         // print wall time         cout<<"time "<<omp_get_max_threads()<<" thread(s) : "<<omp_get_wtime()-start<<endl;     }     return 0; } 

running produces following output:

time 1 thread(s) : 0.020606 time 2 thread(s) : 0.022671 time 3 thread(s) : 0.026737 time 4 thread(s) : 0.02825 

i compiled command : clang++ -o3 -std=c++11 -fopenmp=libiomp5 test_omp.cpp

as can see, scaling gets worse number of threads increases. running on 4-core intel-i7 processor. know what's happening?

you limited memory bandwidth, not cpu speed. takes 1 cpu keep memory busy if you're doing addition , copying, adding more cores doesn't help.

if want see benefit of adding more threads, try executing more complex operations on memory small enough fit in l1 or l2 cache.


Comments

Popular posts from this blog

PySide and Qt Properties: Connecting signals from Python to QML -

c# - DevExpress.Wpf.Grid.InfiniteGridSizeException was unhandled -

scala - 'wrong top statement declaration' when using slick in IntelliJ -