This paper presents a highly efficient algorithm for efficient order-independent transparency via compute unified device architecture (CUDA) in a single geometry pass. The study designs a CUDA renderer system to rasterize the scene by the scan-line algorithm, generating multiple fragments for each pixel. Meanwhile, a fixed size array is allocated per pixel in a GPU (graphics processing unit) global memory for storage. Next, this paper describes two schemes to capture and sorts the fragments per pixel via the atomic operations in CUDA. The first scheme stores the depth values of the fragments into an array of the corresponding pixel and sorts them on the fly using the atomicMin operation in CUDA. A following CUDA kernel will blend the fragments per pixel in depth order. The second scheme captures the fragments in rasterization order using the atomicInc operation in CUDA. During post-processing, the fragments per pixel array will be sorted in depth order before blending. Experimental result shows that this algorithm shows a significant improvement in classical depth peeling, producing faithful results.