Tuesday, March 5, 2013

Intuitive understanding of the parallel SCAN algorithm

Due to lack of time I won't describe what Scan is and I also won't describe the algorithm separately. You can find a very helpful and clear description of both of them here. What we'll try to do is to try and understand the algorithm intuitively. And while doing so we will also see how the algorithm proceeds.


Let's say we have 16 elements in the original input vector.Now, there are different passes of the algorithm. Each k'th pass is a transformation in itself, meaning it will accept an 16-vector from the previous pass and produce a transformed (i.e. with potentially different values as elements) 16-vector as output. Every k'th pass is associated with an offset value, such that to each vec[i], the pass adds vec[i-offset]. The offset for k'th (k starting from 0) pass is 2^k. So first pass adds vec[i-1], second pass adds vec[i-2], third pass adds vec[i-4]. The algorithm claims that after log(n) number of passes, the prefix sum is ready. The question is: How?

In the first pass, every i'th element will add to itself the element immediately before it.  So, the penetration of every element is -1, meaning we have so far summed 1 element to the left of vec[i]. Now, when we do the second pass, every vec[i] will add vec[i-2] to itself. Now what's the resulting penetration of vec[i]? -2? No! It's -3. As vec[i-2] already has penetrated -1 relative to itself, every vec[i]'s effective penetration becomes -3. So, every ith element now stores the sum of 3 previous elements (and itself ofcourse). Now, in the third pass, the offset is 4, so we add vec[i-4] to vec[i] for each i, and as vec[i-4] already had a penetration of -3 so far, vec[i]'s resultant penetration becomes -7. I guess you have an idea of how the penetrations are getting cumulatively stretched roughly exponentially. Now, in the last (fourth) pass, the offset is 8, so for each i, we add vec[i-8] to vec[i]. And as vec[i-8]'s penetration was already -7, vec[i]'s effective penetration becomes -8-7 = -15. For a 16-element vector, -15 is the maximum penetration needed for obtaining the prefix sum. So, the algorithm terminates. 

I hope that helps.

No comments:

Post a Comment