Bindless rendering — Setup

Author

Darius Bouma

Date

December 17, 2025

Reading time

9 min

Table of contents

Heading 2

Darius Bouma

Author of the Article

We will mainly cover the basics of why and how to set up a bindless pipeline in D3D12 and Vulkan. This includes setting up a bindless descriptor heap, root signatures/pipeline layouts, descriptor management, and communicating descriptor indices to the GPU. Rust will be used for code examples.

In part 2 of this series we will cover a modern approach of using bindless on the GPU using the recently added template support. This will also cover how to emulate ResourceDescriptorHeap functionality while running Vulkan until it’s officially supported.

Finally in part 3 we will cover a lot of necessary bindless GPU validation. This includes resource versioning, resource tagging, and checking for appropriate resource creation flags.

Why bindless?

When traditionally binding resources for each draw call using descriptor staging, setup and management code tends to get relatively complex. Besides code complexity, performance also plays an important part. With the old `bindful` model, you would first need to stage descriptors from a CPU descriptor heap to a smaller GPU descriptor heap, then set your root signature, and finally bind the correct pipeline state. All of these calls have a bit of overhead, some more than others, but more importantly, they require much more information about how a pipeline is structured, which inevitably leads to complex code. With bindless, all pipelines share the same descriptor heap, and pipeline resource layout.

Besides the reduced need for pipeline-specific information, it also becomes a lot more intuitive for the user, as the concept of indexing into a piece of memory that describes a resource makes a lot more sense, compared to having several indirections and unnecessary restrictions with the old `bindful` method.

To summarize, we only need to worry about a single piece of memory that points to our resources. Another benefit is that all pipelines share the same layout, which results in a simplified rendering pipeline.

Bindless rendering does not only consist of benefits however, there are several constraints introduced with the technique, which I will cover throughout this series.

Render resource handles

With the bindless technique that we’re going to cover in this post, a RenderResourceHandle is introduced that will be used as an API-agnostic descriptor handle. This handle maps directly to an index within the bindless heap, along with containing various additional validation information. These handles are exclusively created during resource creation.

#[derive(Copy, Clone, Eq, PartialEq, Hash)]
#[repr(transparent)]
pub struct RenderResourceHandle(u32);

With the recently added ResourceDescriptorHeap in SM 6.6, we can now easily index resources using a simple uint. In reality, it is very unlikely that we need to use all of the 32 bits to index the descriptor heap, this allows us to do more than just indexing descriptors, such as validating the RenderResourceHandle version, validating resource types, and finally validating the resource access type. Bindless GPU validation will be covered in part 3 of this series, so for now we will assume a simple uint as an index.

impl RenderResourceHandle {
    pub fn new(_version: u8, _tag: RenderResourceTag, index: u32, _access_type: AccessType) -> Self {
        Self(index)
    }
}

D3D12 setup

Descriptor heap setup

D3D12 separates descriptor heaps into four categories; RTV, DSV, CBV_SRV_UAV and SAMPLER. Out of these heaps, there are only two heap types that we can use in our shaders; CBV_SRV_UAV and SAMPLER heaps. In our case with an API-agnostic render pipeline design in mind, we decided that we want to keep as much support available across devices and use static samplers instead of creating a bindless sampler setup (VK in particular has some constraints regarding device limits).

This leaves us with having to set up a bindless system for CBV_SRV_UAV descriptor types which will be used to index resources in shaders, however, if a bindless sampler approach for a more specialized renderer is needed, the exact same technique can be used as well.

CBV, SRV, and UAV descriptors share the same descriptor heap within the D3D12 API, this makes it especially convenient to manage descriptors; all resource descriptors also share the same descriptor size -thus the descriptors can be directly indexed within a single GPU visible descriptor heap. With the introduction of ResourceDescriptorHeap in SM6.6, this heap can be directly indexed from the GPU without having to set up descriptor ranges.

-Note that actual descriptor sizes may vary for different descriptor types, but these are rounded up to a constant size for each descriptor in the CBV_SRV_UAV heap within the D3D12 API.

Now that we have a way to dynamically index our descriptor heap for resource descriptors, we can set up our actual descriptor heaps. Most modern GPU’s should support large descriptor heaps (1,000,000+). You can check the d3d12 hardware tiers to ensure the descriptor heaps are created according to the hardware tier of the device that the application runs on.

For the bindless heap, we want to make sure that we are creating a CBV_SRV_UAV heap that is shader-visible, with bindless we don’t need descriptor staging anymore.

// Ensure the descriptor_count is below the device supported number before creating the heap
let heap_desc = d3d12::D3D12_DESCRIPTOR_HEAP_DESC {
    Type: d3d12::D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV,
    NumDescriptors: descriptor_count,
    Flags: d3d12::D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE,
    NodeMask: 0,
};

Only working with a shader visible heap adds constraints for some niche d3d12 calls that require non-shader-visible CPU handles, for example, ClearUnorderedAccessViewUint.

Root signature setup

With the concept of using a single GPU visible descriptor heap to index our resources, a root signature needs to be set up in such a way that all the shaders can index the bindless descriptor heap. There are two ways to approach this:

The pre-SM6.6 method creates a descriptor table consisting of descriptor ranges for each resource type that is being used.
SM6.6 allows for a practically empty root signature, as only root signature flags are required. (Note Resource binding tier 3 is required)

Finally, a RenderResourceHandle is set using root constants to tell shaders what resources they need to use. This way of communicating resource indices is used both in the pre-SM6.6 and SM6.6 bindless approaches.

let bindless_params = unsafe {
    // Push Constants
    let mut push_constants = d3d12::D3D12_ROOT_PARAMETER1 {
        ParameterType: d3d12::D3D12_ROOT_PARAMETER_TYPE_32BIT_CONSTANTS,
        ShaderVisibility: d3d12::D3D12_SHADER_VISIBILITY_ALL,
        ..Default::default()
    };
    let param_u = push_constants.u.Constants_mut();
    param_u.RegisterSpace = BindlessTableType::PushConstants.space_index() as u32;
    param_u.ShaderRegister = 0;
    param_u.Num32BitValues = PushConstantSlots::num_push_constant_slots(is_debug) as u32;

    vec![push_constants]
};

// Note that unlike VK, static samplers do not cause a bindless offset for the set they are bound to.
let static_samplers = [
    Self::create_static_sampler(
        d3d12::D3D12_FILTER_MIN_MAG_MIP_POINT,
        d3d12::D3D12_TEXTURE_ADDRESS_MODE_WRAP,
        0,
        None,
    ),
    // Add more static samplers if needed.
];

An important subject to think about is what tradeoffs are going to be made for sending render resource handles to the GPU. On one side you can choose to pre-determine the maximum amount of resources being indexed in any shader, and set the number of root constants accordingly. Instead of limiting the number of resources being indexed, you can also choose to create buffers containing render resource handles, adding more flexibility at the cost of an extra indirection. In short, you have two options:

A maximum number of handles is determined that you will use in a shader.
Instead of pre-determining the maximum number of handles, a buffer containing render resource handles can be made instead, only requiring a single push constant at the cost of an added indirection. For our framework Breda we chose this option.

Once the root signature is created, it will be used to create every Pipeline State Object and State Object in the render pipeline, as every shader will now index the descriptor heap directly. An important thing to note for SM6.6 is that the D3D12_ROOT_SIGNATURE_FLAG_CBV_SRV_UAV_HEAP_DIRECTLY_INDEXED flag is required for all pipelines that use descriptor heap indexing.

let desc_1_1 = unsafe { desc.u.Desc_1_1_mut() };
desc_1_1.NumParameters = bindless_params.len() as u32;
desc_1_1.pParameters = bindless_params.as_ptr();
desc_1_1.NumStaticSamplers = static_samplers.len() as u32;
desc_1_1.pStaticSamplers = static_samplers.as_ptr().cast();
desc_1_1.Flags = d3d12::D3D12_ROOT_SIGNATURE_FLAG_CBV_SRV_UAV_HEAP_DIRECTLY_INDEXED;

In the case of using DXR 1.0, local root signatures should not be used anymore, instead we only need to use the global root signature as this already allows us to retrieve any resource we might need.

Shader model 6.5 and lower

This shows the difference between the modern SM6.6 and the previous pre-SM6.6 approach.

Just like the SM6.6 approach, the SM6.5 and lower root signature is also shared between all pipelines, but requires some extra information to be provided to the API in order to make it all work.

One of the big constraints of DXIL and D3D12, is that registers cannot be overlapped. To work around this problem simply assign a register space for each descriptor table, all starting at BaseShaderRegister 0. We also indicate that the descriptors and the data behind them are volatile. Besides a significant difference in shadercode, this is the only big difference on the CPU between SM6.6 and < SM6.5 bindless rendering.

// RWTexture2D
let param_u = unsafe {
    bindless_params[BindlessTableType::RwTexture2d.space_index()]
        .u
        .DescriptorTable_mut()
};
param_u.NumDescriptorRanges = 1;
param_u.pDescriptorRanges = &d3d12::D3D12_DESCRIPTOR_RANGE1 {
    RangeType: d3d12::D3D12_DESCRIPTOR_RANGE_TYPE_UAV,
    NumDescriptors: std::u32::MAX,
    BaseShaderRegister: 0,
// Assign a unique register space for each resource type.
    RegisterSpace: BindlessTableType::RwTexture2d.space_index() as u32,
    Flags: d3d12::D3D12_DESCRIPTOR_RANGE_FLAG_DESCRIPTORS_VOLATILE
        | d3d12::D3D12_DESCRIPTOR_RANGE_FLAG_DATA_VOLATILE,
    OffsetInDescriptorsFromTableStart: 0,
};

The catch of the previous technique is that we cannot use ResourceDescriptorHeap, but we can declare unsized arrays of resources at different registers to achieve the same thing. Since all of these registers start from index 0 and point to the same descriptor heap, we can essentially do the same here with resource indexing as we would do with ResourceDescriptorHeap. The main difference is that you need to be very explicit about what resource type you want to access, as indexing a wrong resource type will result in a crash or undefined behavior. Note that for every resource type, a new descriptor range needs to be declared.

Managing bindless descriptors

If we follow the idea behind bindless rendering where every resource handle has an index to the descriptor heap, we would need to solve the problem of SRV and UAV descriptors being separate descriptors. To solve this problem in D3D12 we introduce a descriptor pair.

pub struct RenderResourceHandlePair {
    pub srv: RenderResourceHandle,
    pub uav: RenderResourceHandle,
}

The idea behind a RenderResourceHandlePair is quite simple; we allocate SRV’s at descriptor index [n], and allocate the associated UAV at descriptor index [n + 1]. This way a single handle can represent both the SRV and UAV descriptors without having the need of keeping track of 2 separate handles. This handle is later interpreted in shader code for read(heap index [n]) or write(heap index [n + 1]) operations.

pub fn allocate_buffer_handle_pair(&self) -> RenderResourceHandlePair {
    self.pool[d3d12::D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV as usize]
        .allocate_descriptor_pair(RenderResourceTag::Buffer)
}

When resources are dropped we want to recycle the associated stale descriptor handles within our bindless heap — we do this by pushing stale descriptors on a VecDeque. Note that recycling descriptors may only happen when the stale descriptors are no longer in-flight on the GPU.

pub(crate) fn retire_handle(&self, handle: RenderResourceHandle) {
    self.available_recycled_descriptors
        .lock()
        .unwrap()
        .push_back(handle);
}

If a recycled descriptor is available we want to ensure that stale descriptors are reused in FIFO order; In the case of more resources being dropped compared to resource allocations being done, we could end up with unused descriptor indices that may not be reclaimed at all.

pub fn allocate_descriptor_pair(&self, tag: RenderResourceTag) -> RenderResourceHandlePair {
    self.available_recycled_descriptors
        .lock()
        .unwrap()
        .pop_front()
        .map_or_else(
            || {
                // No recycled descriptors available.
                let descriptor_idx = self.increment_descriptor_pair();
                RenderResourceHandlePair::new(0, tag, descriptor_idx)
            },
            |recycled_handle| {
                // Use old descriptor, update descriptor version and resource tag for validation.
                RenderResourceHandlePair::new_from(
                    recycled_handle.bump_version_and_update_tag(tag),
                )
            },
        )
}

Preparing the command buffer

Unlike bindful, we just need to set our descriptor heap and root signature once per command list. This is done when the command list starts recording. It is important to set the descriptor heap before setting the bindless graphics and compute root signatures, this is due to the new ordering constraint introduced with SM6.6. This ordering constraint is to ensure that correct heap pointers are available when root signatures are set. It’s also important to make sure you’re only setting the graphics root signature when recording on a D3D12_COMMAND_LIST_TYPE_DIRECT.

pub fn begin(&mut self) {
    if self.command_list_type == d3d12::D3D12_COMMAND_LIST_TYPE_DIRECT
        || self.command_list_type == d3d12::D3D12_COMMAND_LIST_TYPE_COMPUTE
        {
        unsafe {
           let cbv_srv_uav_heap_handle = self.descriptor_pool.pool
                [d3d12::D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV as usize]
                .handle
                .as_ptr();
 
            self.cmd()
                .SetDescriptorHeaps(1, 
                    [cbv_srv_uav_heap_handle]
                    .as_mut_ptr()
                    .cast());
 
            self.cmd()
                 .SetComputeRootSignature(self.descriptor_pool.global_rs);
            if self.command_list_type == d3d12::D3D12_COMMAND_LIST_TYPE_DIRECT {
                 self.cmd()
                 .SetGraphicsRootSignature(self.descriptor_pool.global_rs);
            }
        }
    }
}

Vulkan setup

Descriptorset setup

First off we need to make sure we have the correct VkPhysicalDeviceDescriptorIndexingFeatures flags set up for our device to support this bindless setup, in our case we enable every flag except for shaderInputAttachmentArrayDynamicIndexing, shaderInputAttachmentArrayNonUniformIndexing and descriptorBindingUniformBufferUpdateAfterBind which are not supported for older hardware.

Note that VK_EXT_descriptor_indexing is available since Vulkan 1.0.

It should be noted that Vulkan currently does NOT support ResourceDescriptorHeap yet, but just like the D3D12 implementation, there are a few workarounds to achieve the same behavior or even emulate the ResourceDescriptorHeap’s behavior. In part 2 of this series I will also cover how to emulate ResourceDescriptorHeap for a more unified shader implementation between D3D12 and Vulkan.

Unlike D3D12, Vulkan hides the descriptor pool within the API which forces users to follow a bindful descriptor set mindset. This makes bindless slightly more complicated to set up but not impossible.

The resources that we want to use for bindless rendering are buffers, textures, acceleration structures, and samplers -which means using descriptor set types of BUFFER, SAMPLED_IMAGE, STORAGE_IMAGE, ACCELERATION_STRUCTURE_KHR, and SAMPLER. The Vulkan 1.2 spec guarantees minimum support of 4 descriptor sets, however current desktop GPU’s that support ray tracing guarantee a minimum of 32 descriptor sets. Seeing we want to set up 5 descriptor sets, this forces us to make compromises to guarantee support on the widest variety of devices. In our case we decided to drop bindless samplers and go for immutable samplers to limit our bindless sets to the guaranteed minimum of 4.

Now that we are aware of the constraints we can proceed to setting up the VkDescriptorPool.

// Returns all descriptor pool sizes, SAMPLED_IMAGE will have extra space for immutable samplers.
let descriptor_sizes =
    BindlessTableType::descriptor_pool_sizes(immutable_samplers.len() as u32);

let descriptor_pool_info = vk::DescriptorPoolCreateInfo::builder()
    .pool_sizes(&descriptor_sizes)
    .flags(vk::DescriptorPoolCreateFlags::UPDATE_AFTER_BIND)
    .max_sets(4);

For each descriptor type we set up a VkDescriptorPoolSize with the descriptorCount set to the bindless pool size. The maximum supported count of each descriptor type can be retrieved from the VkPhysicalDeviceLimits. In our case we also associate immutable samplers with our SAMPLED_IMAGE pool size, so we need to also take these descriptors into account.

pub fn descriptor_pool_sizes(immutable_sampler_count: u32) -> Vec<vk::DescriptorPoolSize> {
    let mut type_histogram = std::collections::HashMap::new();

    // For each descriptor type, retrieve the bindless size.
    for table in Self::all_tables().iter() {
        type_histogram
            .entry(table.to_vk())
            .and_modify(|v| *v += table.table_size())
            .or_insert_with(|| table.table_size());
    }

    // Add immutable sampler descriptors to texture descriptor pool size.
    type_histogram
        .entry(Self::Texture.to_vk())
        .and_modify(|v| *v += immutable_sampler_count);

    type_histogram
        .iter()
        .map(|(ty, descriptor_count)| vk::DescriptorPoolSize {
            ty: *ty,
            descriptor_count: *descriptor_count,
        })
        .collect::<Vec<vk::DescriptorPoolSize>>()
}

For each resource type we are creating a descriptor set layout with the following flags: VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT, VK_DESCRIPTOR_BINDING_PARTIALLY_BOUND_BIT and VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT. Since we are using immutable samplers, we would need to assign these to one of the descriptor sets that is going to be used. In this case the most sensible thing is to associate them with the SAMPLED_IMAGE descriptor set -this requires the binding index to be set to the amount of immutable samplers that are going to be used. Every other descriptor set is set to binding index 0. Afterwards we simply allocate the descriptor sets using these bindless layouts.

fn create_bindless_layout(
    device: &Arc<ash::Device>,
    immutable_samplers: &[vk::Sampler],
    debug: bool,
) -> (Vec<vk::DescriptorSetLayout>, vk::PipelineLayout) {
    let descriptor_layouts = BindlessTableType::all_tables()
        .iter()
        .enumerate()
        .map(|(set_idx, &table)| unsafe {
            assert_eq!(table.set_index(), set_idx);

            let mut descriptor_binding_flags = vec![
                vk::DescriptorBindingFlags::PARTIALLY_BOUND
                    | vk::DescriptorBindingFlags::VARIABLE_DESCRIPTOR_COUNT
                    | vk::DescriptorBindingFlags::UPDATE_AFTER_BIND,
            ];

            let mut set = vec![vk::DescriptorSetLayoutBinding {
                binding: 0,
                descriptor_type: table.to_vk(),
                descriptor_count: table.table_size(),
                stage_flags: vk::ShaderStageFlags::ALL,
                p_immutable_samplers: std::ptr::null(),
            }];

            if table == BindlessTableType::Texture {
                descriptor_binding_flags.push(vk::DescriptorBindingFlags::empty());

                // Set texture binding start at the end of the immutable samplers.
                set[0].binding = immutable_samplers.len() as u32;
                set.push(vk::DescriptorSetLayoutBinding {
                    binding: 0,
                    descriptor_type: vk::DescriptorType::SAMPLER,
                    descriptor_count: immutable_samplers.len() as u32,
                    stage_flags: vk::ShaderStageFlags::ALL,
                    p_immutable_samplers: immutable_samplers.as_ptr(),
                });
            }

            let mut ext_flags = vk::DescriptorSetLayoutBindingFlagsCreateInfoEXT::builder()
                .binding_flags(&descriptor_binding_flags);

            device
                .create_descriptor_set_layout(
                    &vk::DescriptorSetLayoutCreateInfo::builder()
                        .bindings(&set)
                        .flags(vk::DescriptorSetLayoutCreateFlags::UPDATE_AFTER_BIND_POOL)
                        .push_next(&mut ext_flags),
                    None,
                )
                .unwrap()
        })
        .collect::<Vec<_>>();

    let num_push_constants = PushConstantSlots::num_push_constant_slots(debug) as u32;
    let num_push_constants_sized = std::mem::size_of::<u32>() as u32 * num_push_constants;

    let push_constant_range = ash::vk::PushConstantRange {
        stage_flags: vk::ShaderStageFlags::ALL,
        offset: 0,
        size: num_push_constants_sized,
    };

    let push_constant_ranges = [push_constant_range];

    let layout_create_info = vk::PipelineLayoutCreateInfo::builder()
        .set_layouts(&layouts)
        .push_constant_ranges(&push_constant_ranges);

    let pipeline_layout = unsafe { device.create_pipeline_layout(&layout_create_info, None) }
        .expect("Failed creating pipeline layout.");

    (descriptor_layouts, pipeline_layout)
}

For the pipeline layout we want to use the previously created descriptor set layouts and set up the root constants. In order to notify the shader what resources it’s supposed to index, we use push constants that we will use to send render resource handles to the GPU. Just like the D3D12 implementation there are two options to consider here:

Determine a maximum number of handles you can bind in a shader and increase the number of push constants by that number.
Instead of pre-determining the maximum number of handles, a buffer containing render resource handles can be made, only requiring a single push constant but also adding another indirection. We chose this option for our Breda framework.

Managing bindless descriptors

Allocating bindless descriptors is done per heap type, in our case we have 4 descriptor types that we can allocate — STORAGE_BUFFERS, SAMPLED_IMAGE, STORAGE_IMAGE and ACCELERATION_STRUCTURE_KHR. Unlike D3D12 we do not need to use descriptor pairs, instead SAMPLED and STORAGE images share the same descriptor index between heaps, reducing complexity. With an api-agnostic design and preparations for potential future API updates in mind, we decided to globally track resource indices instead of tracking per-heap. It also means that the bindless heaps will have gaps of unused descriptors. This is not as optimal compared to the D3D12 implementation which only uses a single bindless heap, and will likely be changed as soon as the official SM6.6 ResourceDescriptorHeap support has been added to Vulkan. That being said, the bindless descriptor manager will still prevent unused reclaimable descriptor indices as much as possible.

For each resource, a descriptor update is done on its corresponding descriptor set at dst_binding 0, where we simply use our fetched descriptor handle index as dst_array_element index. Afterwards the descriptor set is updated directly.

pub fn allocate_buffer_handle(
    &self,
    buffer: vk::Buffer,
) -> RenderResourceHandle {
    let handle = Self::fetch_available_descriptor(&self, RenderResourceTag::Buffer);

    let buffer_info = vk::DescriptorBufferInfo {
        buffer,
        offset: 0,
        range: vk::WHOLE_SIZE,
    };

    let write = [vk::WriteDescriptorSet {
        dst_set: self.sets[BindlessTableType::Buffer.set_index()],
        dst_binding: 0,
        descriptor_count: 1,
        dst_array_element: handle.index(),
        descriptor_type: vk::DescriptorType::STORAGE_BUFFER,
        p_buffer_info: &buffer_info,
        ..Default::default()
    }];
    unsafe {
        self.device.update_descriptor_sets(&write, &[]);
    };

    handle
}

Just like the D3D12 implementation, when dropping resources that are out-of-flight, we want to recycle the associated descriptors — we do this by pushing these stale descriptors to a VecDeque.

pub fn retire_handle(&self, handle: RenderResourceHandle) {
    self.available_recycled_descriptors
        .lock()
        .unwrap()
        .push_back(handle);
}

If a recycled descriptor is available, we want to make sure that we re-use the first dropped descriptor. This way FIFO order is respected, preventing potential fragmentation. For example in the case of more resources being dropped compared to resource allocations being done, we could end up with unused descriptor indices that may not be reclaimed at all.

fn fetch_available_descriptor(&self, tag: RenderResourceTag) -> RenderResourceHandle {
    self.available_recycled_descriptors
        .lock()
        .unwrap()
        .pop_front()
        .map_or_else(
            || RenderResourceHandle::new(0, tag, self.increment_descriptor()),
            |recycled_handle| recycled_handle.bump_version_and_update_tag(tag),
        )
}

Preparing the command buffer

Now that the descriptor sets have been set up along with the pipeline layout, we can bind these descriptor sets and the pipeline layout once for the graphics, compute and ray tracing pipeline points.

if self.queue_flags.contains(vk::QueueFlags::GRAPHICS) {
    unsafe {
        if self.queue_flags.contains(vk::QueueFlags::GRAPHICS) {
            self.device.cmd_bind_descriptor_sets(
                self.cmd,
                vk::PipelineBindPoint::GRAPHICS,
                self.allocation_handles
                    .descriptor_pool
                    .bindless_pipeline_layout,
                0,
                self.allocation_handles
                    .descriptor_pool
                    .bindless_descriptor_sets(),
                &[],
            );
        }
    }
}

if self.queue_flags.contains(vk::QueueFlags::COMPUTE) {
    unsafe {
        self.device.cmd_bind_descriptor_sets(
            self.cmd,
            vk::PipelineBindPoint::COMPUTE,
            self.allocation_handles
                .descriptor_pool
                .bindless_pipeline_layout,
            0,
            self.allocation_handles
                .descriptor_pool
                .bindless_descriptor_sets(),
            &[],
        );

        if &self.optional_handles.ray_tracing_pipeline.is_some() {
            self.device.cmd_bind_descriptor_sets(
                self.cmd,
                vk::PipelineBindPoint::RAY_TRACING_KHR,
                self.allocation_handles
                    .descriptor_pool
                    .bindless_pipeline_layout,
                0,
                self.allocation_handles
                    .descriptor_pool
                    .bindless_descriptor_sets(),
                &[],
            );
        }
    }
}

‍

Download Evolve Now!

And claim your spot in the leaderboards!

Download

Detailed Reporting

Multiple Scores

Download

Darius Bouma

Author of the Article

CPU side

To communicate resources to the GPU, we fill a buffer with RenderResourceHandles, however we can immediately track resources states in this process along with placing appropriate memory barriers if needed. We achieve this by creating a `DescriptorSetBuilder`, this indicates whether resources should be read from or written to, which allows us to preemptively determine if resource state tracking is required.

let gtao_set = DescriptorSetBuilder::ephemeral(&shader_db.get_pipeline("gtao"))
    .write(0, &self.gtao_target)
    .read(1, depth_stencil_target)
    .read(2, normal_target)
    .read(3, &ssao_constants)
    .build(device, dma_enc);

For our framework Breda we aim to do as little resource tracking as possible, therefore we heavily rely on resource promotion and decay. However if explicit resource tracking is required, this can easily be added within this system. The following example is written for D3D12 but the same principle applies to Vulkan. We gather all of the RenderResourceHandles out of the DescriptorSetBuilder and put them in an array, which is then pushed to the GPU, in the form of a buffer.

let shader_bindings =
    bindings
        .iter()
        .map(|binding| {
            binding.resource.as_ref().map_or_else(
                RenderResourceHandle::invalid,
                |resource| match (resource, &binding.access_type) {
                    (Resource::TopLevel(tlas), _) => unsafe { tlas.resource_handle() },
                    (Resource::Buffer(buffer), AccessType::ReadOnly) => unsafe {
                        buffer.resource_handle()
                    },
                    (Resource::Buffer(buffer), AccessType::ReadWrite) => {
                        // We don't do transitions on buffers, except if the buffer is used for indirect args.
                        // The transition to indirect_args is done inside the command_buffer when the buffer is passed in as a func parameter.
                        // But the transition back to a non-indirect arg state is handled by the descriptor set instead.
                        let dx12_buffer = buffer.downcast_ref::<Dx12Buffer>().unwrap();
                        if let BufferUsage::IndirectArgs = dx12_buffer.desc.usage {
                            indirect_args_buffer_state_transitions.push((
                                dx12_buffer.resource,
                                d3d12::D3D12_RESOURCE_STATE_UNORDERED_ACCESS,
                            ));
                        }
                        unsafe { buffer.resource_handle() }
                    }
                    (Resource::Texture(texture), AccessType::ReadOnly) => {
                        let dx12_texture = texture.downcast_ref::<Dx12Texture>().unwrap();
                        texture_state_transitions.push((
                            dx12_texture.resource,
                            super::command_buffer::TEXTURE_PREFERRED_READ_STATE,
                        ));
                        unsafe { texture.resource_handle() }
                    }
                    (Resource::Texture(texture), AccessType::ReadWrite) => {
                        let dx12_texture = texture.downcast_ref::<Dx12Texture>().unwrap();
                        texture_state_transitions.push((
                            dx12_texture.resource,
                            d3d12::D3D12_RESOURCE_STATE_UNORDERED_ACCESS,
                        ));

                        binding.mip_index.map_or_else(
                            || unsafe { texture.resource_handle() },
                            |mip_index| texture.mip_resource_handle(mip_index),
                        )
                    }
                },
            )
        })
        .collect::<Vec<_>>();

The buffer that contains our handles is a resource itself, which means it also has a RenderResourceHandle. We then use this buffer’s handle as our push/root constant for the draw or dispatch call.

This constant is then interpreted by our shader code as a RenderResourceHandle, which allows us to retrieve all of the necessary handles for that specific shader.

Shaders

Now that we have set our resource handle via a root/push constant, we have gained a way to communicate what resources the GPU should access. As discussed previously, we use a buffer containing an array of RenderResourceHandles to index our descriptor heap. For the sake of this introduction to bindless, we kept the RenderResourceHandle as simple as possible. In parts 2 and 3 of this series we will start modifying this RenderResourceHandle to do a lot more than just contain just the index.

In order to interpret the RenderResourceHandles, we simply unpack them in the shader code and retrieve the handle that we can use to index the arrays of resources.

struct RenderResourceHandle {
    uint index;
    uint readIndex() { return this.index; }
#ifdef VK_BINDLESS
    uint writeIndex() { return this.readIndex(); }
#else
    uint writeIndex() {
        return this.readIndex() + 1;
    }
#endif
};

Now that we have the descriptor heap index, you can simply access the ResourceDescriptorHeap to retrieve your resource. Since this functionality is not officially supported yet on Vulkan, we need to do some declarations up front to take some user hassle away. In the case of the pre-SM6.6 D3D12 bindless approach, the same declarations are necessary.

A very big constraint with using this technique is that we are currently forced to use `(RW)ByteAddressBuffers` for all of our buffers. When wrapped nicely in strongly typed bindless resources (part 2 of this series), this doesn’t matter for shader code on the user’s end, however SPIR-V has some issues with performing vectorized loads on templated byte buffers. On DXIL this is not an issue at all though, but it is definitely a very important point to keep in mind.

struct BindingsOffset {
    RenderResourceHandle bindingsOffset;
    uint userData0;
    uint userData1;
    uint userData2;
};
 
// Resource bindings, same structure applies for any other resource types and their declerations.
// For this example we will just declare a simple bindless Texture2D
#ifdef VK_BINDLESS
    [[vk::push_constant]] ConstantBuffer<BindingsOffset> g_bindingsOffset;
    [[vk::binding(0, 1)]] SamplerState g_samplerState[NUM_STATIC_SAMPLERS];
    [[vk::binding(0, 0)]] ByteAddressBuffer g_byteAddressBuffer[];
    [[vk::binding(0, 0)]] RWByteAddressBuffer g_rwByteAddressBuffer[];
 
    #define BINDLESS_TEXTURE_2D_DECL(T)                                                                \
        [[vk::binding(NUM_STATIC_SAMPLERS, 1)]] Texture2D<T> g_texture2d[]
 
    SamplerState samplerMinMagMipPointWrap() { return g_samplerState[0]; }
    #define texture2D(handle) g_texture2d[NonUniformResourceIndex(handle.readIndex())]
    #define byteBufferUniform(handle) g_byteAddressBuffer[(handle.readIndex())]
    #define rwByteBufferUniform(handle) g_rwByteAddressBuffer[handle.writeIndex()]
 
#else // D3D12 Bindless
    ConstantBuffer<BindingsOffset> g_bindingsOffset : register(b0, space10);
    SamplerState g_samplerMinMagMipPointWrap : register(s0, space0);
 
    // This declaration is ignored on D3D12.
    #define BINDLESS_TEXTURE_2D_DECL(T)
 
    SamplerState samplerMinMagMipPointWrap() { return g_samplerMinMagMipPointWrap; }
    #define texture2D(handle) ResourceDescriptorHeap[NonUniformResourceIndex(handle.readIndex())]
    #define byteBufferUniform(handle) ResourceDescriptorHeap[handle.readIndex()]
    #define rwByteBufferUniform(handle) ResourceDescriptorHeap[handle.writeIndex()]
#endif // VK_BINDLESS
 
template <typename T> T loadBindings() {
    T result = byteBufferUniform(g_bindingsOffset.bindingsOffset).Load<T>(0);
return result;
}

In the shader we would include our bindless helper and declare the resources we want to use.

#include "breda-render-backend-api::bindless.hlsl"
// Simple bindless copy shader example.
struct Bindings {
    RenderResourceHandle input;
    RenderResourceHandle output;
};
 
[numthreads(64, 1, 1)]
void main(int threadId: SV_DispatchThreadID)
{
    Bindings bnd = loadBindings<Bindings>();
    int output = byteBuffer(bnd.input).Load(threadId * sizeof(int));
    rwByteBufferUniform(bnd.output).Store(threadId * sizeof(int), output);
}

Next blog

In the next part of this series we will cover how to turn this bindless system into a modern templated bindless pipeline, along with showing some tricks to emulate ResourceDescriptorHeap behavior in Vulkan.