Sample Distribution Informed Denoising & Rendering

Abstract: A graphics processor is provided that includes circuitry configured to receive, at an input block of a neural network model, a set of data including previous frame data, current frame data, velocity data, and jitter offset data. The neural network model is configured to generate a denoised, supersampled, and anti-aliased output image based on reliability metrics computed based on sample distribution data for samples within the current frame data.

Patent Information

Application #

Filing Date

12 July 2022

Publication Number

08/2023

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Applicants

INTEL CORPORATION

2200 Mission College Boulevard, Santa Clara, California 95054, USA

Inventors

1. TOBIAS ZIRR

Adlerstr. 42 Karlsruhe GERMANY 76133

2. SUNGYE KIM

1844 Orchard Terrace Ct. Folsom CA USA 95630

Specification

Description:RELATED APPLICATION
[0001] The present application claims priority to U.S. Provisional Patent Application No. 63/235,108 filed on 19 August 2021 and titled “TEMPORALLY AMORTIZED SUPERSAMPLING USING A MIXED PRECISION CONVOLUTIONAL NEURAL NETWORK” the entire disclosure of which is hereby incorporated by reference.
[0002] The present application claims priority to U.S. Non-Provisional Patent Application No. 17/520,089 filed on 05 November 2021 and titled “SAMPLE DISTRIBUTION-INFORMED DENOISING & RENDERING” the entire disclosure of which is hereby incorporated by reference.

FIELD
[0003] This disclosure relates generally to graphics anti-aliasing via neural network operations performed via a matrix accelerator of a graphics processing unit.

BACKGROUND OF THE DISCLOSURE
[0004] Temporal Anti-aliasing (TAA) is an anti-aliasing technique in which the renderer jitters the camera every frame to sample different coordinates in screen space. The TAA stage accumulates these samples temporally to produce a supersampled image. The previously accumulated frame is warped using renderer generated velocity/motion vectors to align it with the current frame before accumulation. Although TAA is a widely used technique to generate temporally stable anti-aliased image, the warped sample history can be mismatched to the current pixel due to frame-to-frame changes in visibility and shading or errors in the motion vectors. This typically results in ghosting artifacts around moving object boundary.

BRIEF DESCRIPTION OF THE DRAWINGS
[0005] The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements, and in which:
[0006] FIG. 1 is a block diagram illustrating a computer system configured to implement one or more aspects of the embodiments described herein;
[0007] FIG. 2A-2D illustrate parallel processor components;
[0008] FIG. 3A-3C are block diagrams of graphics multiprocessors and multiprocessor-based GPUs;
[0009] FIG. 4A-4F illustrate an exemplary architecture in which a plurality of GPUs is communicatively coupled to a plurality of multi-core processors;
[0010] FIG. 5 illustrates a graphics processing pipeline;
[0011] FIG. 6 illustrates a machine learning software stack;
[0012] FIG. 7 illustrates a general-purpose graphics processing unit;
[0013] FIG. 8 illustrates a multi-GPU computing system;
[0014] FIG. 9A-9B illustrate layers of exemplary deep neural networks;
[0015] FIG. 10 illustrates an exemplary recurrent neural network;
[0016] FIG. 11 illustrates training and deployment of a deep neural network;
[0017] FIG. 12A is a block diagram illustrating distributed learning;
[0018] FIG. 12B is a block diagram illustrating a programmable network interface and data processing unit;
[0019] FIG. 13 illustrates an exemplary inferencing system on a chip (SOC) suitable for performing inferencing using a trained model;
[0020] FIG. 14 is a block diagram of a processing system;
[0021] FIG. 15A-15C illustrate computing systems and graphics processors;
[0022] FIG. 16A-16C illustrate block diagrams of additional graphics processor and compute accelerator architectures;
[0023] FIG. 17 is a block diagram of a graphics processing engine of a graphics processor;
[0024] FIG. 18A-18B illustrate thread execution logic including an array of processing elements employed in a graphics processor core;
[0025] FIG. 19 illustrates an additional execution unit;
[0026] FIG. 20 is a block diagram illustrating a graphics processor instruction formats;
[0027] FIG. 21 is a block diagram of an additional graphics processor architecture;
[0028] FIG. 22A-22B illustrate a graphics processor command format and command sequence;
[0029] FIG. 23 illustrates exemplary graphics software architecture for a data processing system;
[0030] FIG. 24A is a block diagram illustrating an IP core development system;
[0031] FIG. 24B illustrates a cross-section side view of an integrated circuit package assembly;
[0032] FIG. 24C illustrates a package assembly that includes multiple units of hardware logic chiplets connected to a substrate (e.g., base die);
[0033] FIG. 24D illustrates a package assembly including interchangeable chiplets;
[0001] FIG. 25 is a block diagram illustrating an exemplary system on a chip integrated circuit;
[0034] FIG. 26A-26B are block diagrams illustrating exemplary graphics processors for use within an SoC;
[0035] FIG. 27 is a block diagram of a data processing system, according to an embodiment;
[0036] FIG. 28A-28B illustrate a matrix operation performed by an instruction pipeline, according to an embodiment;
[0037] FIG. 29 illustrates a systolic array including multiplier and adder circuits organized in a pipelined fashion;
[0038] FIG. 30A-30B illustrates the use of a systolic array that can be configured to execute operations at an arbitrary systolic depth;
[0039] FIG. 31 illustrates a two-path matrix multiply accelerator in which each path has a depth of four stages;
[0040] FIG. 32 illustrates a four-path matrix multiply accelerator in which each path has a depth of two stages;
[0041] FIG. 33 illustrates a scalable sparse matrix multiply accelerator using systolic arrays with feedback inputs;
[0042] FIG. 34 shows a scalable sparse matrix multiply accelerator using systolic arrays with feedback inputs and outputs on each stage;
[0043] FIG. 35 illustrates a dual pipeline parallel systolic array for a matrix accelerator, according to an embodiment;
[0044] FIG. 36 illustrates a stage pair for a channel of a systolic array;
[0045] FIG. 37 illustrates a systolic array including partial sum loopback and circuitry to accelerate sparse matrix multiply;
[0046] FIG. 38A-38B illustrate matrix acceleration circuitry including codecs to enable the reading of sparse data in a compressed format;
[0047] FIG. 39 illustrates a conventional renderer with Temporal Anti-aliasing (TAA);
[0048] FIG. 40 illustrates a renderer that replaces the TAA stage with a temporally amortized supersampling stage;
[0049] FIG. 41 illustrate components of the neural network model, according to an embodiment;
[0050] FIG. 42 illustrates the input block of the neural network model, according to an embodiment;
[0051] FIG. 43A-43B illustrates output block variants for the neural network model, according to embodiments;
[0052] FIG. 44 illustrates a method to perform temporally amortized supersampling;
[0053] FIG. 45 illustrates exemplary rendering performance comparisons for multiple rendering techniques described herein;
[0054] FIG. 46 illustrates deferred lighting textures that can be used as auxiliary denoising information;
[0055] FIG. 47A-47B illustrates components of the neural network model that is configured to perform sample distribution-informed denoising & rendering with variance reduction;
[0056] FIG. 48 illustrates exemplary denoising and reference images, according to embodiments described herein;
[0057] FIG. 49 is a method of sample distribution-informed denoising & rendering, according to an embodiment; and
[0058] FIG. 50 is a block diagram of a computing device including a graphics processor, according to an embodiment.

DETAILED DESCRIPTION
[0059] A graphics processing unit (GPU) is communicatively coupled to host/processor cores to accelerate, for example, graphics operations, machine-learning operations, pattern analysis operations, and/or various general-purpose GPU (GPGPU) functions. The GPU may be communicatively coupled to the host processor/cores over a bus or another interconnect (e.g., a high-speed interconnect such as PCIe or NVLink). Alternatively, the GPU may be integrated on the same package or chip as the cores and communicatively coupled to the cores over an internal processor bus/interconnect (i.e., internal to the package or chip). Regardless of the manner in which the GPU is connected, the processor cores may allocate work to the GPU in the form of sequences of commands/instructions contained in a work descriptor. The GPU then uses dedicated circuitry/logic for efficiently processing these commands/instructions.
, C , Claims:1. A graphics processor comprising:
a set of processing resources configured to perform a supersampling anti-aliasing operation via a mixed precision convolutional neural network, the set of processing resources including circuitry configured to:
receive, at an input block of a neural network model, a set of data including previous frame data, current frame data, velocity data, and jitter offset data;
pre-process the set of data to generate pre-processed data;
provide first pre-processed data to a feature extraction network of the neural network model and second pre-processed data to an output block of the neural network model, the first pre-processed data at a first precision and the second pre-processed data at a second precision that is higher than the first precision;
process the pre-processed data at the feature extraction network via one or more encoder stages and one or more decoder stages;
output tensor data from the feature extraction network to the output block; and
generate an output image via an output block of the neural network model, wherein the output image is a denoised, supersampled, and anti-aliased output image and the output block is configured to filter the output image based on reliability metrics computed based on sample distribution data for samples within the current frame data.

Documents

Application Documents

#	Name	Date
1	202244040014-FORM 1 [12-07-2022(online)].pdf	2022-07-12
2	202244040014-DRAWINGS [12-07-2022(online)].pdf	2022-07-12
3	202244040014-DECLARATION OF INVENTORSHIP (FORM 5) [12-07-2022(online)].pdf	2022-07-12
4	202244040014-COMPLETE SPECIFICATION [12-07-2022(online)].pdf	2022-07-12
5	202244040014-FORM-26 [23-11-2022(online)].pdf	2022-11-23
6	202244040014-FORM 3 [11-01-2023(online)].pdf	2023-01-11
7	202244040014-FORM 3 [11-07-2023(online)].pdf	2023-07-11
8	202244040014-FORM 3 [10-01-2024(online)].pdf	2024-01-10
9	202244040014-FORM 18 [12-08-2025(online)].pdf	2025-08-12