Utilizing Structured Sparsity In Systolic Arrays

Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.

Patent Information

Application #

Filing Date

20 October 2021

Publication Number

22/2022

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Applicants

INTEL CORPORATION

2200 Mission College Boulevard, Santa Clara, California 95054, USA

Inventors

1. Subramaniam Maiyuran

1915 Black Slate Court Gold River, CA USA 95670

2. Jorge Parra

6200 Edgehill Dr El Dorado Hills, CA 95762 USA

3. Ashutosh Garg

1900 Prairie City Road Folsom, CA 95630 USA

4. Chandra Gurram

1352 Walden Dr Folsom, CA 95630 USA

5. Chunhui Mei

17039 New Rochelle Way San Diego, CA 92127 USA

6. Durgesh Borkar

1256 Harvest Loop Folsom, CA 95630 USA

7. Shubra Marwaha

1565 Bonanza Lane Folsom, CA 95630 USA

8. Supratim Pal

15A, SLP Nest, Owners Court West, Kasavanahalli Bangalore, KA 560035 India

9. Varghese George

460, Tobrurry Way Folsom, CA 95630 USA

10. Wei Xiong

38713 Northdale Cir. Fremont, CA 94536 USA

11. Yan Li

10277 Prairie Fawn Dr. San Diego, CA 92127 USA

12. Yongsheng Liu

16788 Santa Corina Ct. San Diego, CA 92127 USA

13. Dipankar Das

A1-705, Mont Vert Pristine, Aundh Road, Khadki, Pune MH 411003 India

14. Sasikanth Avancha

Vilia 372, The Empyrean, Anchemuskuru Village Chikkatirupati Post, Malur Taluk, Kolar District, KA 563130 India

15. Dharma Teja Vooturi

3-1-39, siva veedhi, Near tower circle, Jagtial, TG 505327 India

16. Naveen K. Mellempudi

#114, Vijayakrishna Mansion 8th Cross, Celebrity Layout, Doddathogur Bangalore, KA 560100 India

Specification

Claims:1. An apparatus comprising:
a processor comprising a systolic array to:
receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data;
identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and
output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.
, Description:FIELD
[0002] This disclosure relates generally to data processing and more particularly to utilizing structured sparsity in systolic arrays.

BACKGROUND OF THE DISCLOSURE
[0003] Neural networks and other types of machine learning models are useful tools that have demonstrated their value solving complex problems regarding pattern recognition, natural language processing, automatic speech recognition, etc. Neural networks operate using artificial neurons arranged into one or more layers that process data from an input layer to an output layer, applying weighting values to the data during the processing of the data. Such weighting values are determined during a training process and applied during an inference process.
[0004] Sparsity is a property of the data received by an execution unit. Sparsity can be capitalized upon to improve the performance of some arithmetic and logic operations. Sparsity refers to the amount of values being zeroes among the data used in a series of operations. It is recognized that multiplications when operated with zeros, give a zero as a result. If the result of these operations is known, the operations are not computed and execution time can be saved.
[0005] An instruction that computes dot matrices multiplication in a systolic array is often used in machine learning (ML) algorithms to execute neural networks. In these workloads, usually the weights and the activations of layers of the neurons are represented as matrices and multiplied. The weights have a high probability of having many sparse values when they are computed from a function (e.g., a RELU function) whose output is zero for any negative input.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] So that the manner in which the above recited features of the present embodiments can be understood in detail, a more particular description of the embodiments, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate typical embodiments and are therefore not to be considered limiting of its scope. The figures are not to scale. In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts.
[0007] FIG. 1 is a block diagram of an example computing system that may be used to utilize structured sparsity in systolic arrays, according to implementations of the disclosure.
[0008] FIGS. 2A and 2B illustrate example depictions of matrix multiplication, in accordance with implementations of the disclosure.
[0009] FIG. 3 illustrates a packing process for a row of a sub-matrix, in accordance with implementations of the disclosure.
[0010] FIG. 4 illustrates an example data packing case using half float elements, in accordance with implementations of the disclosure.
[0011] FIGS. 5A and 5B depict examples of unpacked data converted into corresponding packed data and metadata, in accordance with implementations of the disclosure.
[0012] FIG. 6 illustrates an example computing environment implementing a systolic array that utilizes structured sparsity, in accordance with implementations of the disclosure.
[0013] FIG. 7 illustrates a schematic of operations of the systolic array when structured sparsity is provided in the index, according to implementations of the disclosure.
[0014] FIG. 8 is a flow diagram illustrating a method for utilizing structured sparsity in systolic arrays, in accordance with implementations of the disclosure.
[0015] FIG. 9 is a flow diagram illustrating a method for performing matrix multiplication in systolic arrays utilizing structured sparsity, in accordance with implementations of the disclosure.
[0016] FIG. 10 is a schematic diagram of an illustrative electronic computing device to enable utilization of structured sparsity in systolic arrays, according to some embodiments.

DETAILED DESCRIPTION
[0017] Implementations of the disclosure describe utilizing structured sparsity in systolic arrays. Today’s computing systems are expected to deliver near zero-wait responsiveness and superb performance while taking on large workloads for execution. Therefore, computing architectures have continually changed (e.g., improved) to accommodate large (and often demanding) workloads and increased performance expectations.

Documents

Application Documents

#	Name	Date
1	202144047611-FORM 1 [20-10-2021(online)].pdf	2021-10-20
2	202144047611-DRAWINGS [20-10-2021(online)].pdf	2021-10-20
3	202144047611-DECLARATION OF INVENTORSHIP (FORM 5) [20-10-2021(online)].pdf	2021-10-20
4	202144047611-COMPLETE SPECIFICATION [20-10-2021(online)].pdf	2021-10-20
5	202144047611-FORM-26 [18-01-2022(online)].pdf	2022-01-18
6	202144047611-FORM 3 [18-04-2022(online)].pdf	2022-04-18
7	202144047611-FORM 3 [17-10-2022(online)].pdf	2022-10-17
8	202144047611-FORM 18 [28-10-2024(online)].pdf	2024-10-28