I'm trying to make a real-time GPU (CUDA) ray tracer, and for now I'm tracing single rays, but I've ran into a problem: the BVH. This [PDF]paper has been my inspiration for the theoretical part, and as you can see, the BVH is composed of Axis Aligned Bounding Boxes, however, the stackless rope-based algorithm for the ray-AABB intersection does not take into account overlapping siblings, which occur quite a lot with the AABB creation algorithm I've read about in multiple places on the Internet, which is averaging the centroid of each triangle in the current triangle list and deciding in which child to place each triangle based on the projection of the average on the axis parallel to the longest edge of the parent box.
The use of AABBs in the paper indicates that there indeed exists a method to efficiently (in terms of speed) make AABB trees without overlapping siblings. Unfortunately, I can't find such a method.
Would someone please describe a fast method for creating an AABB tree without overlapping children? I'd also appreciate it if they'd post pseudocode too.
Thank you.