Apparatus And Method For Tree Structure Data Reduction

< Back

Apparatus And Method For Tree Structure Data Reduction

Abstract: Apparatus and method for tree structure data reduction. For example, one embodiment of an apparatus comprises: a plurality of compute units; bounding volume hierarchy (BVH) processing logic to update a BVH responsive to changes associated with leaf nodes of the BVH, the BVH processing logic comprising: treelet generation logic to arrange nodes of the BVH into a plurality of treelets, the treelets including a plurality of bottom treelets and a tip treelet, each treelet having a number of nodes selected based on workgroup processing resources of the compute units; a dispatcher to dispatch workgroups to compute units to process the treelets, wherein a separate workgroup comprising a separate plurality of threads is dispatched to process each treelet.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

26 August 2022

Publication Number

13/2023

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

INTEL CORPORATION

2200 Mission College Boulevard, Santa Clara, California 95054, USA

Inventors

1. Radoslaw Drabinski

1/21, Slonimskiego, Gdansk, Poland- 80-280

2. Rafal Szczygiel

Przytulna 42/44., Gdansk, Poland - 80-176

3. Joshua Barczak

1715, Cannongate Road, Forest Hill, Maryland, USA - 21050

Specification

Description:RELATED APPLICATION
[0001] The present application claims priority to U.S. Non-Provisional Patent Application No. 17/485,395 filed on 25 September 2021 and titled “APPARATUS AND METHOD FOR TREE STRUCTURE DATA REDUCTION” the entire disclosure of which is hereby incorporated by reference.

BACKGROUND
Field of the Invention
[0002] This invention relates generally to the field of graphics processors. More particularly, the invention relates to an apparatus and method for tree structure data reduction.

Description of the Related Art
[0003] Ray tracing is a technique in which a light transport is simulated through physically-based rendering. Widely used in cinematic rendering, it was considered too resource-intensive for real-time performance until just a few years ago. One of the key operations in ray tracing is processing a visibility query for ray-scene intersections known as “ray traversal” which computes ray-scene intersections by traversing and intersecting nodes in a tree structure known as a bounding volume hierarchy (BVH).
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] A better understanding of the present invention can be obtained from the following detailed description in conjunction with the following drawings, in which:
[0005] FIG. 1 is a block diagram of an embodiment of a computer system with a processor having one or more processor cores and graphics processors;
[0006] FIGS. 2A-D illustrate computing systems and graphics processors provided by embodiments of the invention;
[0007] FIGS. 3A-C illustrate block diagrams of additional graphics processor and compute accelerator architectures;
[0008] FIG. 4 is a block diagram of an embodiment of a graphics-processing engine for a graphics processor;
[0009] FIGS. 5A-B illustrate thread execution logic including an array of processing elements;
[0010] FIG. 6 is a block diagram of thread execution logic including an array of processing elements;
[0011] FIG. 7 illustrates a graphics processor execution unit instruction format according to an embodiment;
[0012] FIG. 8 is a block diagram of another embodiment of a graphics processor which includes a graphics pipeline, a media pipeline, a display engine, thread execution logic, and a render output pipeline;
[0013] FIG. 9A is a block diagram illustrating a graphics processor command format according to an embodiment;
[0014] FIG. 9B is a block diagram illustrating a graphics processor command sequence according to an embodiment;
[0015] FIG. 10 illustrates exemplary graphics software architecture for a data processing system according to an embodiment;
[0016] FIG. 11A illustrates exemplary IP core development systems that may be used to manufacture an integrated circuit to perform operations according to an embodiment;
[0017] FIGS. 11B-D illustrate exemplary packaging arrangements including chiplets and interposer substrates;
[0018] FIG. 12 illustrates an exemplary system on a chip integrated circuit that may be fabricated using one or more IP cores, according to an embodiment;
[0019] FIG. 13 illustrates an exemplary graphics processor of a system on a chip integrated circuit that may be fabricated using one or more IP cores;
[0020] FIG. 14 illustrates an additional exemplary graphics processor of a system on a chip integrated circuit that may be fabricated using one or more IP cores;
[0021] FIG. 15 illustrates an architecture for performing initial training of a machine-learning architecture;
[0022] FIG. 16 illustrates how a machine-learning engine is continually trained and updated during runtime;
[0023] FIG. 17 illustrates how a machine-learning engine is continually trained and updated during runtime;
[0024] FIGS. 18A-B illustrate how machine learning data is shared on a network; and
[0025] FIG. 19 illustrates a method for training a machine-learning engine;
[0026] FIG. 20 illustrates how nodes exchange ghost region data to perform distributed denoising operations;
[0027] FIG. 21 illustrates an architecture in which image rendering and denoising operations are distributed across a plurality of nodes;
[0028] FIG. 22 illustrates additional details of an architecture for distributed rendering and denoising;
[0029] FIG. 23 illustrates a method for performing distributed rendering and denoising;
[0030] FIG. 24 illustrates a machine learning method;
[0031] FIG. 25 illustrates a plurality of interconnected general purpose graphics processors;
[0032] FIG. 26 illustrates a set of convolutional layers and fully connected layers for a machine learning implementation;
[0033] FIG. 27 illustrates an example of a convolutional layer;
[0034] FIG. 28 illustrates an example of a set of interconnected nodes in a machine learning implementation;
[0035] FIG. 29 illustrates a training framework within which a neural network learns using a training dataset;
[0036] FIG. 30A illustrates examples of model parallelism and data parallelism;
[0037] FIG. 30B illustrates a system on a chip (SoC);
[0038] FIG. 31 illustrates a processing architecture which includes ray tracing cores and tensor cores;
[0039] FIG. 32 illustrates an example of a beam;
[0040] FIG. 33 illustrates an apparatus for performing beam tracing;
[0041] FIG. 34 illustrates an example of a beam hierarchy;
[0042] FIG. 35 illustrates a method for performing beam tracing;
[0043] FIG. 36 illustrates an example of a distributed ray tracing engine;
[0044] FIGS. 37-38 illustrate compression performed in a ray tracing system;
[0045] FIG. 39 illustrates a method implemented on a ray tracing architecture;
[0046] FIG. 40 illustrates an exemplary hybrid ray tracing apparatus;
[0047] FIG. 41 illustrates stacks used for ray tracing operations;
[0048] FIG. 42 illustrates additional details for a hybrid ray tracing apparatus;
[0049] FIG. 43 illustrates a bounding volume hierarchy;
[0050] FIG. 44 illustrates a call stack and traversal state storage;
[0051] FIG. 45 illustrates a method for traversal and intersection;
[0052] FIGS. 46A-B illustrate how multiple dispatch cycles are required to execute certain shaders;
[0053] FIG. 47 illustrates how a single dispatch cycle executes a plurality of shaders;
[0054] FIG. 48 illustrates how a single dispatch cycle executes a plurality of shaders;
[0055] FIG. 49 illustrates an architecture for executing ray tracing instructions;
[0056] FIG. 50 illustrates a method for executing ray tracing instructions within a thread;
[0057] FIG. 51 illustrates one embodiment of an architecture for asynchronous ray tracing;
[0058] FIG. 52A illustrates one embodiment of a ray traversal circuit;
[0059] FIG. 52B illustrates processes executed in one embodiment to manage ray storage banks;
[0060] FIG. 53 illustrates one embodiment of priority selection circuitry/logic;
[0061] FIG. 54 and 55A-B illustrate different types of ray tracing data including flags, exceptions, and culling data used in one embodiment of the invention;
[0062] FIG. 56 illustrates one embodiment for determining early out of the ray tracing pipeline;
[0063] FIG. 57 illustrates one embodiment of priority selection circuitry/logic;
[0064] FIG. 58 illustrates an example bounding volume hierarchy (BVH) used for ray traversal operations;
[0065] FIGS. 59A-B illustrate additional traversal operations;
[0066] FIG. 60 illustrates one embodiment of stack management circuitry for managing a BVH stack;
[0067] FIGS. 61A-B illustrate example data structures, sub-structures, and operations performed for rays, hits, and stacks;
[0068] FIG. 62 illustrates an embodiment of a level of detail selector with an N-bit comparison operation mask;
[0069] FIG. 63 illustrates an acceleration data structure in accordance with one embodiment of the invention;
[0070] FIG. 64 illustrates one embodiment of a compression block including residual values and metadata;
[0071] FIG. 65 illustrates a method in accordance with one embodiment of the invention;
[0072] FIG. 66 illustrates one embodiment of a block offset index compression block;
[0073] FIG. 67A illustrates a Hierarchical Bit-Vector Indexing (HBI) in accordance with one embodiment of the invention;
[0074] FIG. 67B illustrates an index compression block in accordance with one embodiment of the invention; and
[0075] FIG. 68 illustrates an example architecture including BVH compression circuitry/logic and decompression circuitry/logic.
[0076] FIG. 69A illustrates a displacement function applied to a mesh;
[0077] FIG. 69B illustrates one embodiment of compression circuitry for compressing a mesh or meshlet;
[0078] FIG. 70A illustrates displacement mapping on a base subdivision surface;
[0079] FIGS. 70B-C illustrates difference vectors relative to a coarse base mesh;
[0080] FIG. 71 illustrates a method in accordance with one embodiment of the invention;
[0081] FIGS. 72-74 illustrate a mesh comprising a plurality of interconnected vertices;
[0082] FIG. 75 illustrates one embodiment of a tesselator for generating a mesh;
[0083] FIGS. 76-77 illustrates one embodiment in which bounding volumes are formed based on a mesh;
[0084] FIG. 78 illustrates one embodiment of a mesh sharing overlapping vertices;
[0085] FIG. 79 illustrates a mesh with shared edges between triangles;
[0086] FIG. 80 illustrates a ray tracing engine in accordance with one embodiment;
[0087] FIG. 81 illustrate a BVH compressor in accordance with one embodiment;
[0088] FIGS. 82A-C illustrate example data formats for a 64-bit register;
[0089] FIGS. 83A-B illustrate one embodiment of an index for a ring buffer;
[0090] FIG. 84A-B illustrate example ring buffer atomics for producers and consumers;
[0091] FIG. 85A illustrates one embodiment of a tiled resource;
[0092] FIG. 85B illustrates a method in accordance with one embodiment of the invention;
[0093] FIG. 86A illustrates one embodiment of BVH processing logic including an on-demand builder;
[0094] FIG. 86B illustrates one embodiment of an on-demand builder for an acceleration structure;
[0095] FIG. 86C illustrates one embodiment of a visible bottom level acceleration structure map;
[0096] FIG. 86D illustrates different types of instances and traversal decisions;
[0097] FIG. 87 illustrates one embodiment of a material-based cull mask;
[0098] FIG. 88 illustrates one embodiment in which a quadtree structure is formed over a geometry mesh;
[0099] FIG. 89A illustrates one embodiment of a ray tracing architecture;
[00100] FIG. 89B illustrates one embodiment which includes meshlet compression;
[00101] FIG. 90 illustrates a plurality of threads including synchronous threads, diverging spawn threads, regular spawn threads, and converging spawn threads;
[00102] FIG. 91 illustrates one embodiment of a ray tracing architecture with a bindless thread dispatcher;
[00103] FIG. 92 illustrates a ray tracing cluster in accordance with one embodiment;
[00104] FIG. 93-100 illustrate embodiments of using proxy data in a multi-node ray tracing implementation;
[00105] FIG. 101 illustrates a method in accordance with one embodiment of the invention;
[00106] FIG. 102 illustrates one example of nodes of a BVH arranged into treelets;
[00107] FIG. 103 illustrates an example of a starting points array associated with a treelet;
[00108] FIG. 104 illustrates a plurality of nodes arranged based on path length;
[00109] FIG. 105 illustrates the nodes from FIG. 104 and indicates nodes processed in the same loop iteration; and
[00110] FIG. 106 illustrates BVH processing logic in accordance with one embodiment of the invention.

DETAILED DESCRIPTION
[00111] In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention described below. It will be apparent, however, to one skilled in the art that the embodiments of the invention may be practiced without some of these specific details. In other instances, well-known structures and devices are shown in block diagram form to avoid obscuring the underlying principles of the embodiments of the invention.
EXEMPLARY GRAPHICS PROCESSOR ARCHITECTURES AND DATA TYPES
System Overview
[00112] Figure 1 is a block diagram of a processing system 100, according to an embodiment. System 100 may be used in a single processor desktop system, a multiprocessor workstation system, or a server system having a large number of processors 102 or processor cores 107. In one embodiment, the system 100 is a processing platform incorporated within a system-on-a-chip (SoC) integrated circuit for use in mobile, handheld, or embedded devices such as within Internet-of-things (IoT) devices with wired or wireless connectivity to a local or wide area network.
[00113] In one embodiment, system 100 can include, couple with, or be integrated within: a server-based gaming platform; a game console, including a game and media console; a mobile gaming console, a handheld game console, or an online game console. In some embodiments the system 100 is part of a mobile phone, smart phone, tablet computing device or mobile Internet-connected device such as a laptop with low internal storage capacity. Processing system 100 can also include, couple with, or be integrated within: a wearable device, such as a smart watch wearable device; smart eyewear or clothing enhanced with augmented reality (AR) or virtual reality (VR) features to provide visual, audio or tactile outputs to supplement real world visual, audio or tactile experiences or otherwise provide text, audio, graphics, video, holographic images or video, or tactile feedback; other augmented reality (AR) device; or other virtual reality (VR) device. In some embodiments, the processing system 100 includes or is part of a television or set top box device. In one embodiment, system 100 can include, couple with, or be integrated within a self-driving vehicle such as a bus, tractor trailer, car, motor or electric power cycle, plane or glider (or any combination thereof). The self-driving vehicle may use system 100 to process the environment sensed around the vehicle.
[00114] In some embodiments, the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system or user software. In some embodiments, at least one of the one or more processor cores 107 is configured to process a specific instruction set 109. In some embodiments, instruction set 109 may facilitate Complex Instruction Set Computing (CISC), Reduced Instruction Set Computing (RISC), or computing via a Very Long Instruction Word (VLIW). One or more processor cores 107 may process a different instruction set 109, which may include instructions to facilitate the emulation of other instruction sets. Processor core 107 may also include other processing devices, such as a Digital Signal Processor (DSP).
, C , Claims:1. An apparatus comprising:
a plurality of compute units;
bounding volume hierarchy (BVH) processing logic to update a BVH responsive to changes associated with leaf nodes of the BVH, the BVH processing logic comprising:
treelet generation logic to arrange nodes of the BVH into a plurality of treelets, the treelets including a plurality of bottom treelets and a tip treelet, each treelet having a number of nodes selected based on workgroup processing resources of the compute units;
a dispatcher to dispatch workgroups to compute units to process the treelets, wherein a separate workgroup comprising a separate plurality of threads is dispatched to process each treelet.

Documents

Application Documents

#	Name	Date
1	202244048667-US 17485395-DASCODE-8298 [26-08-2022].pdf	2022-08-26
2	202244048667-FORM 1 [26-08-2022(online)].pdf	2022-08-26
3	202244048667-DRAWINGS [26-08-2022(online)].pdf	2022-08-26
4	202244048667-DECLARATION OF INVENTORSHIP (FORM 5) [26-08-2022(online)].pdf	2022-08-26
5	202244048667-COMPLETE SPECIFICATION [26-08-2022(online)].pdf	2022-08-26
6	202244048667-FORM-26 [25-11-2022(online)].pdf	2022-11-25
7	202244048667-FORM 3 [24-02-2023(online)].pdf	2023-02-24
8	202244048667-FORM 3 [24-08-2023(online)].pdf	2023-08-24
9	202244048667-Proof of Right [10-10-2023(online)].pdf	2023-10-10
10	202244048667-FORM 3 [23-02-2024(online)].pdf	2024-02-23
11	202244048667-FORM 18 [19-09-2025(online)].pdf	2025-09-19