Parallel Algorithm for Connected-Component Analysis using CUDA


Parallel Algorithm for Connected-Component Analysis using CUDA

Windisch, D.; Kaever, C.; Juckeland, G.; Bieberle, A.

Connected-component analysis (CCA) is a central part of many image processing applications. To process image data at ever increasing image resolutions and frame rates, parallel CCA 2
algorithms are essential. Such algorithms targeting GPUs typically store the extracted features in arrays large enough to potentially hold the maximum possible number of objects for the given image size. Transferring these large arrays to the host requires large portions of the overall execution time. Therefore, we propose an algorithm which uses a CUDA kernel to merge trees of connected component feature structs. During the tree merging, various connected-component properties, such as total area, centroid and bounding box, are extracted and accumulated. The tree structure then enables us to only transfer features of valid objects to the host for further processing or storing. Our benchmarks show that this implementation drastically reduces memory transfer volume for processing results on the host whilst maintaining similar performance to state-of-the-art CCA algorithms.

Keywords: connected-component analysis; image stream processing; parallel computing; CUDA

Involved research facilities

  • ROFEX

Related publications

Permalink: https://www.hzdr.de/publications/Publ-35817