Sunday 2 January 2011

Parallel Computing Using Intel TBB in Linux

What is Parallel Computing???

Parallel computing is a computation method that run some calculations simultaneously (based on wikipedia). Using this method, the cpu optimally used, one processor do different calculation with the other processors. It will make the calculation faster. But we can't use it directly, we have to separate sequential calculation and parallel calculation then optimized it of course. 


Here I want to share the information about using intel TBB in ubuntu for parallel computing. Intel TBB is C library for parallel computing. It build based on C++ template so it is easy to use. In this blog I will give an example of using parallel_for loop. we can use it either in one dimension loop or two dimensions loop. But, in this kind of loop must be independent each other so there will be no stack or overflow.

For example of one dimension looping, supposed we have a serial routine calculation below:

    void SerialApplyFoo( float a[], size_t n ) {

         for( size_t i=0; i++) Foo(a[i]);

    }

Foo is a function that has independent loop iteration.
The iteration is from size_t = 0 till size_t = n-1
By using tbb::parallel_for we can separate iteration into chunks and run on separate thread. First, we have to build a class that has operator to proceed the chunks. Here is the example of the class:

    #include "tbb/blocked_range.h"
    class ApplyFoo {
    float *const my_a;
    public:
        void operator( )( const blocked_range(left bracket)size_t(right bracket)&r ) const {
            float *a = my_a;
            for( size_t i=r.begin(); i!=r.end( );i++ )
                Foo(a[i]);
        }
    ApplyFoo( float a[] ) :
    my_a(a)
    {}
    };

A blocked_range is a template class provided by the library. It describes a one-dimensional iteration space over type T. Class parallel_for works with other kinds of iteration spaces, too. The library provides blocked_range2d for two-dimensional spaces. After you have written the loop body as a body object, invoke the template function parallel_for, as shown below:

    #include "tbb/parallel_for.h"
    void ParallelApplyFoo( float a[], size_t n ) {
        parallel_for(blocked_range(0,n,YouPickAGrainSize), ApplyFoo(a) );
    }

The blocked_range constructed here represents the entire iteration space from 0 to n–1, which parallel_for divides into subspaces for each processor. The general form of the constructor is:

    blocked_range(begin,end,grainsize)

The T specify the value type. The begin represent the start of the iteration and the end represent the end of iteration while grainsize is the size of our chunks. The begin and end argument specify the iteration space in half-open interval style [begin,end). 

For the example of 2 dimension loop, I will show you the following code, this is from my assignment. I build it on a single file so it easier to read.



If we run this code it will produce the result (after you plot it, I'm using gnuplot) below:



Reference:
[1] Wikipedia, Parallel Computing, http://en.wikipedia.org/wiki/Parallel_computing
[2] James Reinders, 2007, Intel Threading Building Blocks, O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.

ps: I'm sorry i just put image code because it disturb the html code for this blog. If you want the code feel free to contact me and of course if there are some questions or critics. 

Enjoy it!!!