{"id":31,"date":"2025-01-25T22:28:04","date_gmt":"2025-01-25T21:28:04","guid":{"rendered":"https:\/\/dreamtalon.net\/?p=31"},"modified":"2025-01-25T22:28:04","modified_gmt":"2025-01-25T21:28:04","slug":"on-using-vrs-as-an-antialiasing-optimization","status":"publish","type":"post","link":"https:\/\/dreamtalon.net\/?p=31","title":{"rendered":"On using VRS as an antialiasing optimization"},"content":{"rendered":"\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"has-medium-font-size\"><strong>An idea I came up with while having few spare moments from work.<\/strong><\/p>\n\n\n\n<p>How about using VRS as a means to optimize SSAA?<br>SSAA is known to be very slow due to the need to render 4x more as render targets are 4x larger.<br>Or rather having 4x more of costly fragment shader invocations &#8211; which VRS could in principle mitigate.<\/p>\n\n\n\n<p>When VRS (Variable Rate Shading) extension is enabled, it permits you to have for example 1 fragment invocation to cover 4 pixels tile.<br>Which means you could save up some render time by reducing quantity of very costly fragment shader invocations.<br>I would like to refer to this optimization method as VRSAA.<\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Now the details with images and statistics of SSAA and VRS:<\/strong><\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-small-font-size\">rt4x<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"550\" height=\"800\" src=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-3.png\" alt=\"\" class=\"wp-image-42\" style=\"object-fit:cover\" srcset=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-3.png 550w, https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-3-206x300.png 206w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-small-font-size\">rt4x vrs<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"550\" height=\"800\" src=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-vrs-3.png\" alt=\"\" class=\"wp-image-43\" style=\"object-fit:cover\" srcset=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-vrs-3.png 550w, https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-vrs-3-206x300.png 206w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-small-font-size\">rt4x -> resize to swapchain<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"550\" height=\"800\" src=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-swapchain.png\" alt=\"\" class=\"wp-image-48\" srcset=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-swapchain.png 550w, https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-swapchain-206x300.png 206w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-small-font-size\">rt4x vrs -> resize to swapchain<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"550\" height=\"800\" src=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-vrs-swapchain.png\" alt=\"\" class=\"wp-image-49\" srcset=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-vrs-swapchain.png 550w, https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-vrs-swapchain-206x300.png 206w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<p>There is hardly any difference visible after resizing and copying render target image onto swapchain image.<br>Here are same settings but with intermediate FXAA:<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-small-font-size\">rt4x fxaa<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"550\" height=\"800\" src=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-fxaa.png\" alt=\"\" class=\"wp-image-44\" srcset=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-fxaa.png 550w, https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-fxaa-206x300.png 206w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-small-font-size\">rt4x vrs fxaa<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"550\" height=\"800\" src=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-vrs-fxaa.png\" alt=\"\" class=\"wp-image-45\" srcset=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-vrs-fxaa.png 550w, https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-vrs-fxaa-206x300.png 206w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-small-font-size\">rt4x fxaa -> resize to swapchain<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"550\" height=\"800\" src=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-fxaa-swapchain.png\" alt=\"\" class=\"wp-image-46\" srcset=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-fxaa-swapchain.png 550w, https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-fxaa-swapchain-206x300.png 206w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-small-font-size\">rt4x vrs fxaa -> resize to swapchain<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"550\" height=\"800\" src=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-vrs-fxaa-swapchain.png\" alt=\"\" class=\"wp-image-47\" srcset=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-vrs-fxaa-swapchain.png 550w, https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/rt2x-vrs-fxaa-swapchain-206x300.png 206w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<p>Full image with VRSAA + FXAA (click to view)<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"3440\" height=\"1440\" src=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/vrsaa_swapchain_full.png\" alt=\"\" class=\"wp-image-50\" srcset=\"https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/vrsaa_swapchain_full.png 3440w, https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/vrsaa_swapchain_full-300x126.png 300w, https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/vrsaa_swapchain_full-1024x429.png 1024w, https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/vrsaa_swapchain_full-768x321.png 768w, https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/vrsaa_swapchain_full-1536x643.png 1536w, https:\/\/dreamtalon.net\/wp-content\/uploads\/2025\/01\/vrsaa_swapchain_full-2048x857.png 2048w\" sizes=\"auto, (max-width: 3440px) 100vw, 3440px\" \/><\/figure>\n\n\n\n<p>The following statistics has been measured on RTX 3070 Laptop, Nvidia driver ver. 565.77 Linux, swapchain resolution 3440&#215;1440.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>Render method<\/td><td>Average frame time<\/td><\/tr><tr><td>No AA<\/td><td>1.002ms<\/td><\/tr><tr><td>FXAA<\/td><td>1.466ms<\/td><\/tr><tr><td>SSAA<\/td><td>2.392ms<\/td><\/tr><tr><td>SSAA + FXAA<\/td><td>3.846ms<\/td><\/tr><tr><td>VRSAA<\/td><td>2.347ms<\/td><\/tr><tr><td>VRSAA + FXAA<\/td><td>3.704ms<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Overall gain of 0.05ms per frame does not mean much, especially when there is literally 30 draw calls on a scene with very simple fragment shaders.<br>When extrapolating data, assuimg my frame time was 33ms (30FPS) with SSAA, the benefit gain would be 0.62ms per frame &#8211; which in turn would translate as 0.88 FPS improvement.<br>However, assuming there is more complex scene with more advanced shaders like PBR, then there is a chance the results still could be quite different.<\/p>\n\n\n\n<p><strong>As a side observation worth mentioning:<\/strong><br>If you have FXAA enabled, or possibly other antialiasing method, you could also enable VRS 2&#215;2 tiling, as final result of these two features combined could be very close &#8211; to non-observable in scenes with motion (aka regular gameplay).<br><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Extra details how to enable VRS in Vulkan:<\/strong><\/p>\n\n\n\n<p>To enable VRS extension, you have to query and enable the <code>VK_KHR_FRAGMENT_SHADING_RATE_EXTENSION_NAME<\/code> extension on your device.<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code>uint32_t count = 0;\nvkEnumerateDeviceExtensionProperties( thePhysicalDevice, nullptr, &amp;count, nullptr );\nstd::vector&lt;VkExtensionProperties&gt; existingExtensions( count );\nvkEnumerateDeviceExtensionProperties( thePhysicalDevice, nullptr, &amp;count, existingExtensions.data() )\n\nauto vrscmp = &#91;]( const auto&amp; prop )\n{\n    return std::strcmp( prop.extensionName, VK_KHR_FRAGMENT_SHADING_RATE_EXTENSION_NAME ) == 0;\n};\nbool vrsAvailable = std::ranges::find_if( existingExtensions, vrscmp ) != existingExtensions.end();\n\nstd::vector&lt;const char*&gt; enableExtensions{ \/* ... other extensions to use ... *\/ };\nif ( vrsAvailable ) enableExtensions.emplace_back( VK_KHR_FRAGMENT_SHADING_RATE_EXTENSION_NAME );<\/code><\/pre>\n\n\n\n<p>If the VRS extension is present, you have to pass <code>VkPhysicalDeviceFragmentShadingRateFeaturesKHR<\/code> structure as part of <code>VkDeviceCreateInfo::pNext<\/code> field (or chain of <code>.pNext<\/code> fields in subsequent structs).<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code>VkPhysicalDeviceFragmentShadingRateFeaturesKHR vrsInfo{\n    .sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FRAGMENT_SHADING_RATE_FEATURES_KHR,\n    .pipelineFragmentShadingRate = VK_TRUE,\n    .primitiveFragmentShadingRate = VK_TRUE,\n    .attachmentFragmentShadingRate = VK_TRUE,\n};\n\nVkDeviceCreateInfo deviceCreateInfo{\n    .sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO,\n    .pNext = vrsAvailable ? &amp;vrsInfo : nullptr, \/* ... nullptr or other create infos you wish ... *\/\n    \/* ... rest of your device create info fields ... *\/\n};\nvkCreateDevice( thePhysicalDevice, &amp;deviceCreateInfo, nullptr, &amp;theLogicalDevice );<\/code><\/pre>\n\n\n\n<p>Now the path diverges what you can do. You can do either, or both if you desire:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enable VRS as a pipeline property<\/li>\n\n\n\n<li class=\"has-medium-font-size\">Enable VRS as a pipeline dynamic state + do at least one call to <code>vkCmdSetFragmentShadingRateKHR<\/code> before draw call<\/li>\n<\/ul>\n\n\n\n<p class=\"has-medium-font-size\"><strong>VRS as pipeline property:<\/strong><br>When creating pipeline, create <code>VkPipelineFragmentShadingRateStateCreateInfoKHR<\/code> structure as part of <code>VkGraphicsPipelineCreateInfo::pNext<\/code> field (or chaing of <code>.pNext<\/code> fields in subsequent structs).<br>This will make pipeline use the specified VRS tile size for every draw call, no further input required.<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code>VkPipelineFragmentShadingRateStateCreateInfoKHR vrs{\n    .sType = VK_STRUCTURE_TYPE_PIPELINE_FRAGMENT_SHADING_RATE_STATE_CREATE_INFO_KHR,\n    .fragmentSize{ 2, 2 },\n    .combinerOps{ VK_FRAGMENT_SHADING_RATE_COMBINER_OP_KEEP_KHR, VK_FRAGMENT_SHADING_RATE_COMBINER_OP_KEEP_KHR },\n};\nVkGraphicsPipelineCreateInfo pipelineInfo{\n    .sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO,\n    .pNext = vrsAvailable ? &amp;vrs : nullptr, \/* ... nullptr or other create infos you wish ... *\/\n    \/* ... rest of your pipeline create info fields ... *\/\n};\nvkCreateGraphicsPipelines( theLogicalDevice, nullptr, 1, &amp;pipelineInfo, nullptr, &amp;thePipeline );<\/code><\/pre>\n\n\n\n<p class=\"has-medium-font-size\"><strong>VRS as dynamic state<\/strong>:<br>When creating pipeline, add <code>VK_DYNAMIC_STATE_FRAGMENT_SHADING_RATE_KHR<\/code> to the collection of enabled dynamic states.<br>This will make pipeline use the information about VRS tile set during command buffer recording &#8211; which means you have to call <code>vkCmdSetFragmentShadingRateKHR<\/code> at least once during recording, before any draw call.<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code>std::vector&lt;VkDynamicState&gt; enabledDynamicStates{ \/* ... other enabled dynamic states ... *\/ };\n\/\/ refer to the device code above to see the vrsAvailable variable\nif ( vrsAvailable ) enabledDynamicStates.emaplace_back( VK_DYNAMIC_STATE_FRAGMENT_SHADING_RATE_KHR );\n\nVkPipelineDynamicStateCreateInfo dynamicState{\n    .sType = VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO,\n    .dynamicStateCount = static_cast&lt;uint32_t&gt;( enabledDynamicStates.size() ),\n    .pDynamicStates = enabledDynamicStates.data(),\n};\nVkGraphicsPipelineCreateInfo pipelineInfo{\n    .sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO,\n    .pDynamicState = &amp;dynamicState,\n    \/* ... rest of your pipeline create info fields ... *\/\n};\n\nvkCreateGraphicsPipelines( theLogicalDevice, nullptr, 1, &amp;pipelineInfo, nullptr, &amp;thePipeline );<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code>const VkExtent2D vrs{ 2, 2 };\nconst VkFragmentShadingRateCombinerOpKHR combiner&#91;]{\n    VK_FRAGMENT_SHADING_RATE_COMBINER_OP_KEEP_KHR,\n    VK_FRAGMENT_SHADING_RATE_COMBINER_OP_KEEP_KHR,\n};\nvkCmdSetFragmentShadingRateKHR( cmd, &amp;vrs, combiner );<\/code><\/pre>\n\n\n\n<p>However please note the function <code>vkCmdSetFragmentShadingRateKHR<\/code> is extension function and may not be present in your vulkan library. If such case occurs your program may crash long before it reaches <code>main()<\/code>.<br>To mitigate this, you&#8217;d need to manually resolve it from your vulkan instance with function <code>vkGetInstanceProcAddr<\/code>.<\/p>\n\n\n\n<p><strong>A reference implementation of VRS is available at:<\/strong><br><a href=\"https:\/\/github.com\/xmaciek\/starace\/commit\/4b01ab86127f702dcf8fec0da84a0053f1ef1b8f\">https:\/\/github.com\/xmaciek\/starace\/commit\/4b01ab86127f702dcf8fec0da84a0053f1ef1b8f<\/a><br><\/p>\n","protected":false},"excerpt":{"rendered":"<p>An idea I came up with while having few spare moments from work. How about using VRS as a means to optimize SSAA?SSAA is known to be very slow due to the need to render 4x more as render targets are 4x larger.Or rather having 4x more of costly fragment shader invocations &#8211; which VRS [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[19,20,16,14,18,17,15],"class_list":["post-31","post","type-post","status-publish","format-standard","hentry","category-bez-kategorii","tag-antialias","tag-antialiasing","tag-c","tag-rendering","tag-variable-rate-shading","tag-vrs","tag-vulkan"],"_links":{"self":[{"href":"https:\/\/dreamtalon.net\/index.php?rest_route=\/wp\/v2\/posts\/31","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dreamtalon.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dreamtalon.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dreamtalon.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dreamtalon.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=31"}],"version-history":[{"count":4,"href":"https:\/\/dreamtalon.net\/index.php?rest_route=\/wp\/v2\/posts\/31\/revisions"}],"predecessor-version":[{"id":55,"href":"https:\/\/dreamtalon.net\/index.php?rest_route=\/wp\/v2\/posts\/31\/revisions\/55"}],"wp:attachment":[{"href":"https:\/\/dreamtalon.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=31"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dreamtalon.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=31"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dreamtalon.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=31"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}