Saturday, April 9, 2011

Delaunay Triangulation with CUDA - progress

I have been really irritated since I have not done any substantial progress till now. Finally today when I resumed working on this project, I think I have found my ray of hope! The ray is known is Dan Maljovec. He is a student from University of Utah and has worked on the exact same project in 2010. Here is his homepage. I am going to understand his algorithm in next one hour. C ya.

My Crappy Algo
Designed a not-very-bad algorithm:

Algorithm DT(points)

1) Find the bounding box of the points (thrust has API function)
2) Find Delaunay triangulation of the BB+points together. The bounding box is now the convex hull of the whole system. Its edges will be in DT.
put the four edges in the AFL ( Active Faces List).
      Call Kernel<<n_blocks, n_threads>>
      a) each block undertakes one edge at a time and will take the responsibility of finding the simplex (a triangle in 2D case) mounting on the edge.
      b) if a block has no edge to work on, it will just wait till the edge is ready.
      c) Otherwise, all the hundreds of threads of each block, will check all points (hopefully just a subset of them, using some lossless optimization) and will find the point with nearest delaunay distance. Call it Ndd point.
     d) the triangle (the edge, Ndd) is added to DT. Here we don't want conflicting writes between different threads or different blocks.
     e) Once the edge has been processed as above, the edge is removed from AFL, and the next edge in the current blocks AFL will be taken by the block.
Here we may do better in step (e). A block when finishes processing an edge, has created two new edges. And its going to remove the edge it processed from the AFL. What if it replaces it with one of the new edges, and then starts processing the other new edge immediately. I see that many blocks attempting to write one of their new edges back into a common work-list plus others trying to read it at the same time is a very bad idea, but I liked the other part: Once you, as a block, are done with one edge, start working with one of the new edges immediately, since it is equivalent to (but more efficient than) writing both new edges back to worklist and then reading one edge from the same list. Well, lets keep this part of our idea in our mental to-do list. Next thing, is that what looks more apt than a list is a queue ofcourse, and not really a list.

1) multi-front or multi-rear queues
2) multiple queues: since as it is every work-item is independent

0) how do we guarantee that we don't create a triangle that was already there. Or , in fact, we don't want to even create an edge which has already been created before.

1) is really every-work-item (every edge in the work-queue) really independent?? Say one block is processing one edge e1, and other block is processing some arbitrary e2, then unfortunately if e1 and e2 are really close, then are the block going to collide. worst case, are they going to create contradictory triangles?

Reasons why Thrust is great for me!!
  1. The "examples" include
  2. The "examples" include
  3. The API has min_element() function to get the minimum in parallel.
  4. creating random 2D points API ;-)

Which One is Better To Parallelize- DeWall or InCoDe? :-(

I am confused which one to choose as the base algorithm.

DeWall implementation here
Parallel CUDA 2D Convex Hull with CUDA Paper

Sequential Incremental Construction Algorithm
Parallel Incremental Construction Algorithm
  • ParInCoDe Paper: First one to parallelize it 
    • scalable: Divide and independently conquer, No merging.

No comments:

Post a Comment