Bindless rendering — Validation

Author

Darius Bouma

Date

December 16, 2025

Reading time

7 min

Table of contents

Heading 2

Darius Bouma

Author of the Article

With a bindless rendering pipeline in place as covered in the previous blogs, we are now faced with the problem of not being able to validate resource access yet. At the moment (of writing this blog) the validation layer both on Dx12 and Vulkan are not capable of detecting or even preventing all of the bindless pitfalls yet, additionally the existing GPU validation layers are slow.

The goal is to inform the user of any invalid usage at runtime and attempt to avoid potential page faults when invalid descriptors are indexed, along with not absolutely killing performance. In this final part of the bindless rendering series, we will cover what measures can be taken to reach these validation goals, while not exploding the rendering backend with intrusive tracking code. The techniques covered here are not “bulletproof”, but should catch the majority of potential issues when using a bindless rendering pipeline.

RenderResourceHandle recap

In the previous blogposts we covered what our RenderResourceHandles look like, along with how we don’t necessarily need to use all 32 bits of such a handle to index the resource descriptor heap. Instead of using all 32 bits for just the index in the descriptorheap, we use the following structure:

23 bits for the actual resource index
2 bits to identify the resource type
1 bit for resource writability
6 bits for resource handle version

/// A handle that points to a rendering related resource (TLAS, Sampler, Buffer, Texture etc)
/// this handle can be uploaded directly to the GPU to refer to our resources in a bindless
/// fashion and can be plainly stored in buffers directly - even without the help of a [`DescriptorSet`]
/// the handle is guaranteed to live as long as the resource it's associated with so it's up to
/// the user to ensure that their data lives long enough. The handle is versioned to prevent
/// use-after-free bugs however.
#[derive(Copy, Clone, Eq, PartialEq, Hash)]
#[repr(transparent)]
pub struct RenderResourceHandle(u32);

impl RenderResourceHandle {
    pub fn new(version: u8, tag: RenderResourceTag, index: u32, access_type: AccessType) -> Self {
        let version = version as u32;
        let tag = tag as u32;
        let index = index;
        let access_type = access_type.is_read_write() as u32;

        assert!(version < 64); // version wraps around, it's just to make sure invalid resources don't get another version
        assert!((tag & !0x3) == 0);
        assert!(index < (1 << 23));

        Self(version << 26 | access_type << 25 | tag << 23 | index)
    }
}

Validating the resource

We use the validation bits from the resource handle directly in the resource abstraction, where we then explicitly validate what kind of resource we are expecting, along with if the resource should be writable.

bool validateResourceInternal(bool writable, uint expectedTag, RenderResourceHandle handle) {
    bool passedTagValidation = expectedTag == handle.resourceTag();
    bool passedWritableValidation = true;
    if (writable) {
        passedWritableValidation = handle.isWritable();
    }
 
    bool passed_validation =
        passedTagValidation && passedWritableValidation;

    // Write out a small validation packet to a readback buffer
    
    // Finally return if the validation succeeded
    return passed_validation;
}

These packets are then read back from the GPU to the CPU, since we blocked any reads/writes on the GPU, a guarantee has been given that the data will always reach the CPU as device removal has been avoided. The readback buffer can also be conveniently used for shader prints or shader asserts, as the logic to achieve this is exactly the same as writing out validation packets. Once the buffer is available on the CPU, we process all of the packets based on the atomic that is located at the start of the buffer. This atomic indicates how many readback packets have been written, we then read out the header of each packet to determine what kind of message we’re dealing with.

pub fn process_packets(
    packet_buffer: &Arc<dyn Buffer>,
    command_tracker: &CommandTracker,
    logging_info: &GpuLogInfo,
) {
    // No need to use a deferred readback buffer as the command buffer has already completed at this point.
    let packets = packet_buffer.as_slice::<ShaderLogPacket>();

    // 0th element contains our atomic counter, containing message count.
    let num_messages = if packets[0].header < MAX_ALLOWED_SHADER_MESSAGES {
        packets[0].header
    } else {
        warn!("Exceeding maximum supported GPU logging packets, please increase `MAX_ALLOWED_SHADER_MESSAGES` if needed.
        Number of messages: {} Maximum allowed: {}", packets[0].header, MAX_ALLOWED_SHADER_MESSAGES);
        MAX_ALLOWED_SHADER_MESSAGES
    };

    // The shader packets start at index 1 as index 0 is reserved for atomics.
    for message_idx in 1..=num_messages {
        let current_packet = packets[message_idx as usize];

        // Read shader message header.
        let shader_header = ShaderHeaderData::extract_data_from_header(current_packet.header);

        let name = if shader_header.pipeline_index < command_tracker.tracked_commands.len() as u32 {
            &command_tracker.tracked_commands[shader_header.pipeline_index as usize]
        } else {
            "Untracked command"
        };
        
        match shader_header.ty {
            ShaderMessageType::Validation => {
                if logging_info.gpu_resource_validation {
                    let validation_type =
                        ShaderValidationType::try_from(extract_bits(current_packet.data0, 0, 2))
                            .expect("Received invalid shader validation packet");
                    let validation_handle = unsafe {
                        RenderResourceHandle::internal_from_raw(extract_bits(
                            current_packet.data0,
                            32,
                            32,
                        ))
                    };
                    match validation_type {
                        ShaderValidationType::ResourceTag => {
                            let expected_resource_tag = RenderResourceTag::try_from(extract_bits(
                                current_packet.data0,
                                2,
                                2,
                            ))
                            .expect("Invalid resource tag");
                            error!(
                                "{} GPU resource validation failed: Resource access mismatch in `{}` handle is of type: `{:?}`, Expected handle of type: `{:?}`.",
                                shader_header.identifier, name, validation_handle.tag(), expected_resource_tag
                            );
                        }
                        ShaderValidationType::ResourceAccess => {
                            error!(
                                "{} GPU resource validation failed: Tried writing to resource that is read-only in `{}` RenderResourceHandle has AccessType of: `{:?}`.",
                                shader_header.identifier, name, validation_handle.access_type()
                            );
                        }
                        ShaderValidationType::Invalid => continue,
                    }

                    panic!();
                }
            }
            ShaderMessageType::Invalid => {
                continue;
            }
        }
    }
}

Now that we have placed measures against invalid resource indexing, we should be done right? Well not quite, what if a buffer on the GPU contains a handle that has already been dropped? This scenario would pass the resource type validation along with the writability, while not being valid for use. Tracking this scenario from the CPU quickly becomes a nightmare if multiple buffers contain resource handles (for example materials that directly contain their resource handles).

Finally there’s also the issue that the user could potentially copy resource handles on the GPU from one buffer to another. These scenarios quickly become frustratingly difficult if not impossible to deal with, so we need a different solution for these edge cases.

‍

Creating a mirror resource heap

A way to deal with these situations is by keeping track of the version of the resource handle. Every time a resource handle is recycled on the CPU, the version is incremented and looped around by 64 (6 bits). Recycling a handle happens as soon as the resource is dropped on the CPU.

Right before commandbuffer submission, we copy our entire updated pool of RenderResourceHandles to the GPU. As this pool is contained within a buffer, we can use the buffer’s RenderResourceHandle directly to set a dedicated push constant slot, which is global to all of our pipelines. We then use this buffer to read out the actual handle version, and compare this against the target resource handle’s version.

bool validateResourceInternal(bool writable, uint expectedTag, RenderResourceHandle handle) {
    // Make sure the version of the handle is the same as the reflected version from the CPU.
    // If we have a mismatch here that means that handles have been copied by the user and became
    // stale. We do not allow for render resource handle copying as this could result into a handle
    // pointing to an updated descriptor.
    ByteAddressBuffer versionBuffer =
        DESCRIPTOR_HEAP_UNIFORM(ByteBufferHandleUniform, g_bindingsOffset.versionHeap.readIndex());

    // Read & Write share same version.
    uint expectedVersion = versionBuffer.Load<uint>(handle.readIndex() * sizeof(uint));

    bool passedTagValidation = expectedTag == handle.resourceTag();
    bool passedVersionValidation = expectedVersion == handle.version();
    bool passedWritableValidation = true;
    if (writable) {
        passedWritableValidation = handle.isWritable();
    }

    bool passed_validation =
        passedTagValidation && passedVersionValidation && passedWritableValidation;
    
    // Write out a small validation packet to a readback buffer

    // Finally return if the validation passed
    return passed_validation;
}

It should be noted that this technique is definitely not perfect, as by chance the version of an invalid handle could match that of the current resource handle heap. However such a scenario would only be possible if all the following conditions are true:

The resource handle points to a valid resource of the same type
The resource handle has exactly the same writability
The resource handle has exactly the same looped version

To improve the chances of this not happening, would mean that the size of a RenderResourceHandle would need to be increased to a size that could hold a larger version number. We however did not see this being worth the cost for what it provides.

Extending the mirror heap

We could also include additional metadata in the mirror heap, for example buffer sizes and texture dimensions. After verifying the bindless handles are valid, we can reliably perform buffer and texture bounds checking as well. Upon potential out of bounds access, we discard any load/writes and return a zero T value, avoiding any potential page faults.

bool validateResourceInternal(bool writable, uint expectedTag, uint rwOffset, RenderResourceHandle handle) {
    // Make sure the version of the handle is the same as the reflected version from the CPU.
    // If we have a mismatch here that means that handles have been copied by the user and became
    // stale. We do not allow for render resource handle copying as this could result into a handle
    // pointing to an updated descriptor.
    ByteAddressBuffer versionBuffer =
        DESCRIPTOR_HEAP_UNIFORM(ByteBufferHandleUniform, g_bindingsOffset.versionHeap.readIndex());

    // Read & Write share same version.
    uint expectedVersion = versionBuffer.Load<uint>(handle.readIndex() * sizeof(uint));

    bool passedTagValidation = expectedTag == handle.resourceTag();
    bool passedVersionValidation = expectedVersion == handle.version();
    bool passedWritableValidation = true;
    if (writable) {
        passedWritableValidation = handle.isWritable();
    }

    bool passed_validation =
        passedTagValidation && passedVersionValidation && passedWritableValidation;
    
    if (!passed_validation) {
      // Write out a small validation packet to a readback buffer if validation failed
      return false;
    }
    
    ByteAddressBuffer boundsBuffer =
        DESCRIPTOR_HEAP(ByteBufferHandle, bufferBoundsHeap().readIndex());
    
    // Read buffer size
    uint maxReadWriteBoundsInBytes = boundsBuffer.Load<uint>(handle.readIndex() * sizeof(uint));
    bool isWithinBounds = rwOffset < maxReadWriteBoundsInBytes;

    if (!isWithinBounds) {
      // Write a small validation packet to a readback buffer if validation failed
      return false;
    }
    
    return true;
}

Implementing bounds checking surfaced quite a bit of oversights and bugs that persisted in our rendering pipeline for some time.

‍

Darius Bouma

Author of the Article

Continuing from templated bindless

When using the templated bindless abstraction that’s covered in part 2 of this series, we showed a way to abstract GPU resource access in order to recursively load any resource type from any piece of memory. Since we have this abstraction layer in place, we now have the ability to easily add debug logic within the abstraction for debug purposes.

Secondly we create our macros that call the internal validation logic, this way we can easily toggle on/off validation at compile-time without having a lot of duplicate code.

#if SHADER_VALIDATION
#define VALIDATE_RESOURCE_WITH_RETURN(writable, expectedTag, handle)                               \
    if (!validateResourceInternal(writable, expectedTag, handle)) {                                \
        return;                                                                                    \
    }
#define VALIDATE_RESOURCE_WITH_RETURN_VALUE(writable, expectedTag, handle, returnType)             \
    if (!validateResourceInternal(writable, expectedTag, handle)) {                                \
        return (returnType)0;                                                                      \
    }
#define VALIDATE_RESOURCE(writable, expectedTag, handle)                                           \
    validateResourceInternal(writable, expectedTag, handle)
#else
#define VALIDATE_RESOURCE_WITH_RETURN(writable, expectedTag, handle)
#define VALIDATE_RESOURCE_WITH_RETURN_VALUE(writable, expectedTag, handle, returnType)
#define VALIDATE_RESOURCE(writable, expectedTag, handle)
#endif

We then call this macro in all of our abstracted function calls, while respecting the return type in case of a read function.

struct RwRawBuffer {
    RenderResourceHandle handle;

    template < typename RWStructure > RWStructure loadWithOffset(uint offset) {
        // Macro ensure that if validation fails, target resource will not be read from.
        VALIDATE_RESOURCE_WITH_RETURN_VALUE(kWritable, kBufferResourceTag, this.handle, RWStructure);
        RWByteAddressBuffer buffer = DESCRIPTOR_HEAP(RWByteBufferHandle, this.handle.writeIndex());
        return buffer.Load< RWStructure >(offset);
    }
}

If the validation macro fails, we must make sure that target resource is not being used in order to avoid a potential pagefault — which inevitably leads to device removal. Additionaly if the condition fails, we write a small packet containing the validation failure information back to the CPU. Atomics are used here to ensure only a predetermined amount of messages can be written back to the CPU, to avoid flooding the CPU with millions of messages in case every thread in a shader fails validation.

Example shader

Of course we want to ensure the bindless validation is always working as intended, this would require quite an extensive testing set to cover all of the potential resource types and validation cases, but for the sake of this blogpost, we will cover a small example shader that covers most basic validation cases.

#include "breda-render-backend-api::templated_bindless.hlsl"
 
#define TEST_CASE_ALL_PASS 0
#define TEST_CASE_VERSION_FAILURE 1
#define TEST_CASE_WRITABLE_FAILURE 2
#define TEST_CASE_TAG_MISMATCH_FAILURE 3
 
struct Bindings {
    SimpleBuffer validBufferContainingInvalidHandle;
    Texture readOnlyTexture;
    RwTexture RwTextureOutput;
};
 
[numthreads(1, 1, 1)]
void main(int threadId: SV_DispatchThreadID)
{
    Bindings bnd = loadBindings<Bindings>();
    uint testCase = g_bindingsOffset.userData0;
 
    uint sentryValue = 0;
 
    // Test validation functionality
    if(testCase == TEST_CASE_ALL_PASS) {
        // All of the validation checks should pass.
        sentryValue += bnd.readOnlyTexture.load2D<uint>(threadId);
        sentryValue += bnd.validBufferContainingInvalidHandle.load<uint>();
        bnd.RwTextureOutput.store2D<uint>(threadId, sentryValue);
    } else if(testCase == TEST_CASE_VERSION_FAILURE) {
        // Test against invalid resource handle
        ArrayBuffer invalidBuffer = bnd.validBufferContainingInvalidHandle.load<ArrayBuffer>();
        sentryValue += invalidBuffer.load<uint>(0);
 
        // Ensure the value is actually consumed to avoid cases being compiled out.
        bnd.RwTextureOutput.store2D<uint>(threadId, sentryValue);
    } else if (testCase == TEST_CASE_WRITABLE_FAILURE) {
        // Test resource writability on a read-only texture.
        RwTexture invalidTexture = (RwTexture)bnd.readOnlyTexture;
 
        invalidTexture.store2D<float>(threadId, sentryValue);
    } else if (testCase == TEST_CASE_TAG_MISMATCH_FAILURE) {
        // Test if resource tag matches.
        RwArrayBuffer invalidRwBuffer = (RwArrayBuffer)bnd.RwTextureOutput;
        sentryValue += invalidRwBuffer.load<uint>(0);
 
        // Ensure the value is actually consumed to avoid cases being compiled out.
        bnd.RwTextureOutput.store2D<uint>(threadId, sentryValue);
    }
}

If we run the unit test with the test case set to “TEST_CASE_ALL_PASS”, the shader passes all validation checks. If we try to run the other test cases we now get an expected assert on the CPU side. These failures provide the user information on which pipeline failed validation, and some more metadata on what handle caused the failure case. Ideally we would like to print the resource names as well here, but that quickly becomes way too messy/complex to communicate back to the CPU.

Conclusion

All of this validation logic is inevitably going to cost a bit of performance, however this tends to be still relatively manageable. We took our shading kernel as a reference point which should be one of the more impacted kernels in terms of performance when having the bindless GPU validation enabled. This shading kernel jumped from ~250μs to approximately 450μs after enabling validation.

Additionally we measured the total frametime difference with validation enabled vs disabled. This ended up increasing frametimes by quite a decent amount, no validation enabled gave an average frametime of 8.2ms while running with validation enabled increased this frametime to 13ms. Both Vulkan and Dx12 appear to be equally affected by this GPU validation layer.

One of the big benefits of this validation system, is that GPU validation can be toggled on/off on a per-shader basis. Let’s say the rendering pipeline has a form of a TDR logger or something like DRED (in dx12), and a TDR occurs that points to specific pipeline, validation can be exclusively toggled within individual shaders of this pipeline. This would even allow for validating specific codeblocks within any shader.

The validation techniques covered keep the rendering pipeline interactable while debugging potential issues, while allowing global or local shader validation, resulting in a relatively painless bindless debugging experience.

‍