A discussion with Claude Sonnet 4.5 on ray-traced shadows vs cascaded shadow maps - February 6, 2026 USER: I was recently doing research on how to implement shadows for my own game engine and it seems like the most common method is still cascaded shadow maps, which basically requires the scene to be rendered from the perspective of each light in the scene. That is obviously far from optimal because just a few lights can have a huge impact on performance. Using some form of ray-tracing to determine if there is an object blocking the light between a given surface and the source light has much more potential for optimization. We can create acceleration structures for separate objects in the scene, so instead of rendering the scene for a given light, we just send out shadow rays (from each surface we want to shade) and check for intersections on the objects which are close enough to be lit up by the light. I'm pretty confident that if implemented properly, ray-traced shadows would not only look much better but also perform much better in complex scenes. The process would work best with a deferred rendering pipeline so we only need to do ray-tracing calculations on the final surfaces stored in our buffers. What are your thoughts on this problem? Do you have any insights I could use to make my game engine better? CLAUDE: You're touching on one of the most interesting transitions happening in real-time graphics right now, and your intuition about the performance characteristics is actually more nuanced than it might first appear. Let me work through the tradeoffs with you, because both approaches have their place and the "better" choice depends heavily on your specific scene characteristics. The core performance consideration The fundamental thing to understand is that cascaded shadow maps (CSM) have predictable, relatively flat performance costs. Rendering shadow maps scales primarily with the number of triangles visible to each light and the number of cascade splits you're using. It's essentially geometry processing cost. This makes CSM wonderfully predictable for performance budgeting. Ray-traced shadows, on the other hand, scale with both geometric complexity and the number of rays you need to trace. In a deferred pipeline like you're describing, you're looking at one shadow ray per pixel per light for hard shadows, or many more for soft shadows. The key insight here is that this scales with resolution and light count in a different way than shadow maps do. Where ray tracing really shines You're absolutely right that ray tracing has huge advantages in certain scenarios. Consider a scene with many small, detailed objects casting shadows—think a forest with intricate foliage, or an interior with lots of furniture and props. With shadow maps, you need enough resolution to capture all that detail, which can mean very large shadow map textures. With ray tracing and good acceleration structures (like a BVH), you can trace against the actual geometry and get pixel-perfect shadow detail essentially "for free" in terms of memory. The other massive win is with many lights. In a scene with dozens or hundreds of lights (common in modern games for atmosphere), shadow maps require rendering the scene from each light's perspective. That's where the geometry cost explodes. With ray tracing, you're just tracing more rays, and modern GPU ray tracing hardware is designed to handle exactly this case efficiently. The hidden costs of ray tracing Here's where things get interesting though. Building and maintaining acceleration structures isn't free. For dynamic scenes where objects are moving every frame, you need to refit or rebuild your BVH structures. This can be expensive, especially for deforming geometry like animated characters. Shadow maps don't care about this—they just render whatever's there. The other consideration is that achieving soft shadows with ray tracing requires multiple samples per pixel. While you get physically accurate penumbras (which look gorgeous), you might need 4, 8, or even 16 shadow rays per pixel per light to get clean results without noisy artifacts. Denoising can help, but adds its own cost and complexity. A hybrid approach worth considering Many modern engines are actually moving toward hybrid solutions. You might use: * Ray-traced shadows for your primary directional light (sun/moon), where the quality really matters and you're only dealing with one light * Shadow maps for secondary lights, especially point lights with limited radius where you can use tighter bounds * Screen-space techniques for small detail shadows and contact hardening Or you could choose dynamically based on scene characteristics—use ray tracing when you have many lights and manageable geometry, fall back to shadow maps when you have simple lighting but complex geometry. Practical implementation advice If you're committed to implementing ray-traced shadows in your deferred engine, here are some optimizations that really matter: Start with a spatial hash or light culling system so you're only tracing rays for lights that actually affect each pixel. Your G-buffer already has position and normal information, so you can quickly reject lights that are out of range or facing away. Consider implementing ray traced shadows as a compute shader pass that writes to a shadow mask texture. This gives you flexibility to denoise, temporal accumulate, or apply other filters before your lighting pass consumes the results. For your BVH structures, look into using a two-level approach: a top-level structure for instances and bottom-level structures for individual meshes. This lets you handle dynamic objects much more efficiently since you only need to update the top-level structure when things move (assuming the meshes themselves aren't deforming). Also think carefully about your ray generation strategy. You don't necessarily need to trace a shadow ray for every pixel—you could trace at half resolution or use a checkerboard pattern with reconstruction, especially for distant lights where the shadow detail matters less. The real answer So to directly answer your question: ray-traced shadows can absolutely perform better than CSM in scenes with many lights and moderate geometric complexity. But CSM still wins in some scenarios, particularly with very high triangle counts, highly dynamic geometry, or when you need guaranteed performance. The "proper implementation" you mentioned is key—a naive ray tracing approach can easily perform worse than well-optimized shadow maps. What kind of scenes are you targeting with your engine? That would help me give you more specific architectural advice. USER: I think a hybrid approach is the correct way to go, but I think a more practical way of doing it would be to use ray-tracing for static geometry so that the acceleration structures don't need to be rebuilt, and use shadow maps for the dynamic geometry since it's more efficient at that type of thing. On the topic of soft shadows, a more appealing solution would be something where we can analytically compute shadow penumbras. It turns out that this is possible because the penumbra size is based on a ratio between the distance to an occluder and the light. The formula is: penumbra size = light radius * (occluder distance / light distance). Alexandru Voica published an article in 2015 describing how that formula can be used to compute soft shadows using only one shadow ray per pixel. Using knowledge of the penumbra size we can use a blur kernel to blur shadow edges in screen space. What are your thoughts? CLAUDE: This is a really clever approach and you're thinking about the problem in exactly the right way. The hybrid static/dynamic split is much more practical than what I outlined, and the analytical penumbra technique is elegant. The hybrid geometry split Your static/dynamic distinction makes a lot of sense because it aligns the technique with the actual performance bottleneck. Static geometry gives you stable BVH structures that you build once (or infrequently when streaming in new areas), and dynamic objects tend to be a smaller subset of the scene anyway—characters, vehicles, physics objects. You're essentially getting the best of both worlds without the complexity overhead of constantly managing acceleration structure updates. The one thing to watch for is the transition zone. When a dynamic object casts a shadow onto static geometry (or vice versa), you'll have shadows coming from two different systems. This is solvable—you can combine the shadow masks multiplicatively or handle it in your lighting shader—but it's worth thinking through early so you don't get visual discontinuities. The analytical penumbra approach The Voica technique is genuinely smart and I'm glad you found it. The core insight—that you can extract penumbra size from geometric relationships and then use a variable-width blur—is much more performance-friendly than multi-sample ray tracing. A few practical considerations though: The quality of this approach depends heavily on your blur kernel implementation. A naive screen-space blur will sample across depth discontinuities and bleed light where it shouldn't. You'll want a depth-aware or bilateral blur that respects edges. The good news is you already have depth information from your G-buffer, so you can weight your blur samples based on depth similarity. The other thing is that the penumbra size formula assumes a single occluder at a known distance. In reality, you might have multiple occluding objects at different distances between the surface and the light. The shadow ray will hit the nearest occluder, but that might not be the one that determines the correct penumbra size if there are multiple overlapping shadows. In practice, using the first hit usually produces acceptable results, but it's worth being aware that the technique is an approximation. Implementation details worth considering For storing the occluder distance, you could pack it into your shadow mask texture—maybe use one channel for the shadow intensity (0-1) and another for the normalized occluder distance. This gives your blur pass the information it needs without an additional texture fetch. The blur itself could be implemented as a separable filter (horizontal then vertical passes) for better performance, though you'll need to make the kernel radius vary per-pixel based on the computed penumbra size. This is where compute shaders really shine since you can use shared memory to cache samples and avoid redundant texture fetches. One potential enhancement: you could modulate the blur radius not just by penumbra size but also by distance from camera. Closer shadows benefit more from the detail, while distant shadows can use cheaper, wider filters without anyone noticing. This is essentially a LOD system for shadow quality. A potential gotcha The screen-space blur approach has an inherent limitation: it can only blur what's visible in screen space. If the penumbra should extend beyond an object's silhouette into areas not covered by the shadow ray pass, you won't capture it. For most cases this is fine because penumbras are generally small, but with large area lights or close occluders, you might see artifacts. Just something to keep in mind during testing. The bigger picture What I really like about your approach is that it's pragmatic. You're not trying to implement a pure ray-traced solution that looks perfect in every scenario—you're building something that balances quality, performance, and implementation complexity. That's exactly the right mindset for engine development. Have you thought about how you'll handle the shadow map portion for dynamic geometry? Will you use traditional cascaded maps for directional lights, or something simpler like single shadow maps per light since you're only dealing with the dynamic subset? USER: Some great suggestions. I think using single shadow maps per light might be best since the dynamic geometry is usually close to the player. Something else I would like to have is "colored shadows", like when light travels through stained glass. Ray-tracing can achieve that effect if we keep track of the semi-transparent surfaces the ray travels through before intersecting with an opaque surface. However, it does add a bit more complexity. I wonder if a BVH-type structure really is the best approach. CLAUDE: You're right to question whether a standard BVH is the best fit once you introduce semi-transparent surfaces. The whole design of a BVH assumes you want to find the first intersection and then you're done—but with colored shadows from transparency, you potentially need to traverse through multiple surfaces and accumulate their color contributions along the way. The transparency problem with BVHs The issue is that BVH traversal is optimized for early-out behavior. Once you hit something, you typically stop. But for colored shadows, you need to: 1. Hit a semi-transparent surface 2. Attenuate your "shadow ray budget" by that surface's transmission color/alpha 3. Continue tracing through if there's still enough light getting through 4. Potentially hit multiple transparent surfaces before reaching an opaque occluder (or the light) You can do this with a BVH by just continuing the ray traversal after each hit and checking if the surface is transparent, but you're fighting against the data structure's design. You're essentially doing a full traversal without the benefit of early termination, which defeats much of the BVH's performance advantage. Alternative approaches worth considering One option that's gained traction is using a voxel-based representation specifically for transparency and light transmission. You could maintain your BVH for opaque geometry (which still gives you the early-out wins for fully occluded rays), but have a separate lower-resolution voxel grid that stores transparency and color information. When a shadow ray is cast, you: * First check the BVH for opaque occlusion (fast early-out if fully blocked) * If not fully occluded, march through the voxel grid accumulating transparency along the ray * Combine both results for the final shadow term The voxel grid is much more cache-friendly for this kind of linear traversal, and you can store pre-computed transparency/color data that updates less frequently than per-frame. Games like VXGI (Voxel Global Illumination) used similar ideas for general light transport. Hybrid layering strategy Another approach is to maintain separate acceleration structures per material type: * Primary BVH for opaque static geometry (what you already have) * Secondary structure (could be simpler, even just a sorted list by distance) for semi-transparent objects like glass * Shadow map data for dynamic opaque geometry When tracing a shadow ray, you: * Check opaque geometry first (BVH for static, shadow map for dynamic) * If not occluded, check the transparent structure and accumulate color transmission * Combine results This sounds complex, but it can actually be faster because semi-transparent objects are usually a small subset of the scene. You're not paying the cost of handling transparency for rays that are already blocked by opaque geometry. Practical simplifications For a game engine specifically, you might not need physically perfect light transmission. Here are some approximations that can work well: Fixed traversal depth: Only allow rays to pass through N transparent surfaces (say, 2-3). Most real-world cases don't need more than this. After that, just treat everything as opaque. This gives you a natural early-out condition. Distance-based cutoff: Attenuate the shadow ray's "importance" with distance traveled through transparent media. Once it drops below some threshold (say, 95% blocked), terminate and treat as fully shadowed. This handles things like thick colored glass that shouldn't transmit much light anyway. Screen-space tricks: For transparent surfaces that are visible in the frame, you could capture their color/alpha in a separate pass and use that to modulate shadows in screen space. This only works for visible transparency but is very cheap. Combine it with ray tracing for off-screen transparent occluders. A different data structure entirely If colored shadows from transparency are a core feature you want to emphasize, you might consider a uniform grid or hierarchical grid instead of a BVH for your static geometry. Grid structures are actually quite good at handling ray traversal through multiple surfaces because you're just stepping through cells in sequence. Modern techniques like DDA (Digital Differential Analyzer) grid traversal or hierarchical grids with empty space skipping can be competitive with BVHs for scenes with relatively uniform density. The tradeoff is that grids waste memory on empty space and don't adapt well to scenes with highly non-uniform object distribution (imagine a large outdoor area with all the detail clustered in one corner). But if your game has relatively contained environments—indoor areas, urban settings, dungeons—grids can actually outperform BVHs for certain workloads. My actual recommendation Given that you're targeting a hybrid approach anyway, I'd suggest: * Keep your BVH for opaque static geometry—it's the right tool for that job * Maintain a separate, simpler structure for transparent objects (even just a list that you sort by distance from the shadow ray origin) * Limit traversal through transparency to 2-3 surfaces maximum * Use the analytical penumbra blur technique you mentioned earlier—it'll help hide any artifacts from the approximations This keeps your implementation complexity reasonable while still giving you colored shadows for the most important cases (stained glass windows, colored curtains, etc.). You can always optimize further later if profiling shows the transparency traversal is a bottleneck. What kind of scenes are you building for? That would help me think about whether the voxel approach or the simpler sorted-list approach makes more sense for your use case. USER: The idea of maintaining separate acceleration structures for different materials appeals to me the most. However, I would like to know more about voxel grids, can you briefly explain how they are different and possibly more efficient for this problem compared to BVH's? "if your game has relatively contained environments" I do want to this game engine to be able to support large open worlds and not just small contained environments, so maybe using a grid structure isn't the best way to go. CLAUDE: Good question. Let me clarify how voxel grids work and why they can be efficient for transparency specifically, even though I agree they're probably not the right choice for your large open world use case. How voxel grids work A voxel grid divides 3D space into a regular grid of cubic cells (voxels). Each voxel stores some data—in this case, you'd store color and transparency/opacity information. When you trace a ray through the grid, you're essentially doing a 3D line-drawing algorithm (DDA traversal), stepping from voxel to voxel along the ray direction. The key advantage for transparency is that this traversal is linear and predictable. You know exactly which voxels you'll visit and in what order. There's no tree traversal, no branching based on bounding box tests—you just march forward accumulating transparency as you go. Each step is a simple 3D array lookup (or hierarchical lookup if you're using something like a sparse voxel octree). Why this helps with transparency Remember that with a BVH, when you hit a transparent surface, you have to: 1. Resume the ray traversal from that point 2. Re-enter the BVH tree structure 3. Test against bounding boxes again to find the next potential intersection 4. Potentially traverse down multiple branches Every time you pass through a transparent surface, you're essentially restarting a partial tree traversal. The more transparent surfaces you have, the more times you're doing this expensive operation. With a voxel grid, the traversal algorithm doesn't care whether you hit transparency or not. You're marching through voxels anyway, so each voxel you encounter just contributes its transparency/color to your running accumulation. It's the same cost whether there's one transparent surface or ten along that ray. The traversal pattern is identical. The efficiency comparison Here's a concrete example: imagine a shadow ray passing through a stained glass window. With a BVH: * Traverse tree to find first intersection → hit glass → record color/alpha * Resume traversal from exit point → traverse tree again to find next intersection * Maybe hit the window frame → fully occluded, done * Total: 2 full tree traversals with all their branch mispredictions and memory chases With a voxel grid: * Step through voxels along ray direction (predictable memory access) * Voxel containing glass → accumulate its color/opacity * Next voxel (air) → no contribution * Next voxel (window frame) → fully opaque, done * Total: Simple linear march with cache-friendly sequential access The voxel approach has much better memory coherency because you're accessing a regular grid structure. Adjacent voxels are likely in cache. With a BVH, you're jumping around in memory following tree pointers. The massive downside for your use case The problem with voxel grids for large open worlds is exactly what you'd expect: memory usage. If you have a 1km × 1km world and want 0.5 meter voxel resolution, that's 2000 × 2000 voxels horizontally. Even if you only voxelize a vertical slice (say, 100 meters tall), you're looking at 2000 × 2000 × 200 = 800 million voxels. Even at just 1 byte per voxel (highly compressed), that's 800MB for a relatively coarse representation. Sparse voxel structures (octrees, hash maps) can help, but they add back some of the complexity you were trying to avoid. And they still struggle with the fundamental issue that open worlds have lots of empty space that's not truly "empty" from a storage perspective—you still need to represent it somehow. Better alternatives for open worlds For your large open world scenario with the hybrid approach, I'd actually suggest: Chunked spatial subdivision: Divide your world into chunks (say, 100m × 100m). Each chunk has: * A BVH for opaque geometry (what you're already doing) * A simple list or loose grid for transparent objects in that chunk When tracing a shadow ray, you only need to check the chunks the ray passes through. Since transparent objects are typically sparse (a few windows, some foliage), the lists are small and can even be brute-force tested without much cost. Distance-based LOD for transparency: Only track colored shadows from transparency within a certain radius of the player (say, 50-100m). Beyond that, just treat transparent objects as either fully transparent or fully opaque for shadow purposes. Players won't notice the difference at distance, especially with your analytical penumbra blur helping to soften things. Screen-space assistance: For transparent objects that are on-screen and close to the camera (where the effect really matters), you could do a separate thin geometry pass that renders their color/alpha to a buffer. Then, when computing shadows in screen space, check this buffer and modulate the shadow color accordingly. This only works for visible transparency, but combined with ray tracing for off-screen cases, it covers most of what players will actually see. The real answer Voxel grids are elegant for contained spaces with lots of transparency—think an indoor cathedral with many stained glass windows. They're less elegant for sprawling outdoor environments. Your instinct to stick with the separate acceleration structures approach is correct for an open world engine. You get the BVH's efficiency for opaque geometry and can handle transparency as a special case without paying for a whole second spatial structure that's mostly empty space. Does that clarify the voxel approach and why it's probably not the right fit for you? USER: Yeah that helps to fill in the gaps for me, I understand why voxel grids are not a good solution for open world games. Your ideas about distance-based LOD for transparency and modulating shadows in screen-space are also great. Thanks for the suggestions. Do you have any other thoughts on this subject before I go? CLAUDE: Just one last thought that might save you some headaches down the line: Consider your lighting model early Since you're building this hybrid shadow system with both ray tracing and shadow maps, and you're planning for colored shadows from transparency, think carefully about how this integrates with your overall lighting pipeline. Specifically: If you're doing physically-based rendering (PBR), colored shadows should ideally multiply with your light color in linear space, not gamma space. This seems obvious, but it's easy to get wrong when you're combining data from different sources (ray-traced transmission colors, shadow maps, screen-space buffers). Make sure your shadow mask texture is in the right color space and that you're not accidentally doing gamma correction twice. Also, with the analytical penumbra blur technique, you'll want to blur the shadow intensity and the transmission color separately, or at least be thoughtful about how they interact. If you blur them together naively, you can get color bleeding artifacts where the penumbra picks up colors from neighboring surfaces that shouldn't contribute. One optimization to keep in mind Since you're doing deferred rendering, you have normals in your G-buffer. Before you even trace shadow rays or sample shadow maps, you can do a quick N·L test to reject pixels that are facing away from the light. This is especially valuable for point lights and spotlights where a significant portion of surfaces might be back-facing. It's a trivial check but can save a surprising amount of shadow work. You're on the right track Honestly, your approach is well thought out. The static/dynamic split for ray tracing vs shadow maps, the analytical penumbra technique, the distance-based LOD for transparency—these are all smart, pragmatic decisions that balance quality and performance. You're clearly thinking about this from an engineering perspective rather than just chasing the fanciest technique, which is exactly how good engine architecture happens. Good luck with the implementation! Feel free to come back if you hit any interesting problems along the way.