Monday, April 4, 2011

Delaunay Triangulation using CUDA

I have planned to implement Delaunay triangulation in CUDA given a set of points.

Delaunay triangulation of a given set of points in d-dimensional Euclidean space is accurately defined in [2]. They define it as following:

The Delaunay Triangulation (DT(P)) of the Ed space defined on a pointset P is the set of
d-simplices such that:
1. a point p in Ed is vertex of a simplex in DT(P) iff p ∈ P ;
2. the intersection of two simplices in DT(P) is either an empty set or a common face;
3. the hyper-sphere circumscribed around the d + 1 vertices of each simplex contains no other point
of the set P.

Here, by Ed we mean d-dimensional Euclidean space. It should be noted that, the Delaunay triangulation of d-dimensional points gives d-simplices, meaning that, that of 3D points will give tetrahedrons, and that of 2D points would give triangles.

Initially I intended to do it for 3D point clouds, so that I can triangulate the point cloud that I get from Kinect.  (With Kinect's point clouds, we will most probably be additionally bugged by the noise in the depth information). However,  that may be too much of a task for too small time. So, I have unhappily decided to do it just for a set of 2D points. So that, probably I would need to worry less about the algorithm, and probably give more time for learning CUDA, which is basically the formal intention of the project.

The duality between Delaunay triangulations and Voronoi diagrams is well known and thus algorithms are given for the construction of the former from the latter. However, it is generally more efficient to directly construct the triangulation, and in fact the construction time for a Delaunay triangulation from a Voronoi diagram is O(n). For these reasons I am deliberately not looking into the algorithms that construct the triangulation for Voronoi diagram.

So after spending some time surfing, I have shortlisted following papers:

1) A Quasi-Parallel GPU-Based Algorithm for Delaunay Edge-Flips (Aug. 2010)
     The algorithm is a flip algorithm[0]. The input to the algorithm is some triangulation (absolutely arbitrary??) of the points, and the algorithm performs some edge flippings[1], and ends up making it the Delaunay triangulation.  (Delaunay triangulation for a given set of points is unique if the points are in so-called "general position").
    The paper specifically talks about CUDA, which makes it a nice candidate in my list. The only thing I need to think about is, how do I get (even if arbitrary) valid triangulation to start with? Triangulation by definition is 

2) An Improved Parallel Algorithm For Delaunay Triangulation on Distributed Memory Parallel Computers (1997)
- A divide-and-conquer merge-sort-type algorithm. It divides the problem into smaller subproblems and then the final delaunay triangulation is obtained by merging the sub-triangulations. Merging typically is the crux of the algorithm.

3) Parallel 3D Delaunay Triangulation (found here)
- A kind of mini-compendium for parallel Delaunay triangulation algorithms. Explains how to parallelize two classes of the algorithms doing the job, divide-and-conquer ("Delaunay Wall" algorithm) and incremental construction method.

4) Computing Two-dimensional Delaunay Triangulation Using Graphics Hardware (here)
- The paper utilizes the typical graphics pipeline for doing the job, also uses CUDA for some part of the algorithm and uses CPU for some part.

"As you adequately put, the problem is choice." - Architect to Neo (from "Matrix Reloaded")

Which algorithm to choose finally? Keeping CUDA in mind, I ideally want an algorithm that can be immediately (preferably logically) broken down into essentially hundreds of sub-problems, such that the processing is as much SIMD like as possible in small pieces, and the merging of the subsolutions should not be heavy; or at least be parallelizable. I am thinking of going for (2).

No comments:

Post a Comment