Floating Point Dot Product Hardware With Wide Multiply Adder Tree For

Floating Point Dot Product Hardware With Wide Multiply Adder Tree For Machine Learning Accelerators

Abstract: Systems, apparatuses and methods may provide for technology that conduct a first alignment between a plurality of floating-point numbers based on a first subset of exponent bits. The technology may also conduct, at least partially in parallel with the first alignment, a second alignment between the plurality of floating-point numbers based on a second subset of exponent bits, where the first subset of exponent bits are LSBs and the second subset of exponent bits are MSBs. In one example, technology adds the aligned plurality of floating-point numbers to one another. With regard to the second alignment, the technology may also identify individual exponents of a plurality of floating-point numbers, identify a maximum exponent across the individual exponents, and conduct a subtraction of the individual exponents from the maximum exponent, where the subtraction is conducted from MSB to LSB.

Patent Information

Application #

Filing Date

09 March 2020

Publication Number

50/2020

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

ipo@iphorizons.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-03-14

Renewal Date

Applicants

INTEL CORPORATION

2200 Mission College Boulevard, Santa Clara, California, 95054, USA

Inventors

1. Himanshu Kaul

9718 SW West Haven Dr. Portland, OR 97225 (US)

2. Mark Anders

2559 NE Katie Dr. Hillsboro, OR 97124 (US)

Specification

Claims:1. A computing system comprising:
a network controller; and
a processor coupled to the network controller, the processor including logic coupled to one or more substrates to:
conduct a first alignment between a plurality of floating-point numbers based on a first subset of exponent bits;
conduct, at least partially in parallel with the first alignment, a second alignment between the plurality of floating-point numbers based on a second subset of exponent bits, wherein the first subset of exponent bits are least significant bits (LSBs) and the second subset of exponent bits are most significant bits (MSBs); and
add the aligned plurality of floating-point numbers to one another.
, Description:TECHNICAL FIELD
[0001] Embodiments generally relate to machine learning. More particularly, embodiments relate to floating-point dot-product hardware with a wide multiply-adder tree for machine learning accelerators.

BACKGROUND
[0002] Deep neural networks (DNNs) are typically used in machine learning (ML) workloads to perform matrix multiplication and convolution operations, which tend to be the most power and performance limiting operations of the ML workloads. While hardware accelerators with dot-product compute units have been proposed to improve area and energy efficiency of these operations (e.g., using a variety of dataflow architectures and data types), there remains considerable room for improvement. For example, conventional floating-point (FP) dot-product hardware solutions may first find the maximum exponent across floating-point products, with each individual product mantissa (e.g., significand, coefficient) being aligned for accumulation/summation using the maximum exponent and the corresponding individual exponent. Globally searching for the maximum exponent may introduce latency (e.g., decreasing performance). Moreover, the alignment may involve a relatively large amount of hardware (e.g., alignment shifter stages) that adds to latency, cost and/or power consumption. Indeed, as ML applications transition from standard number formats (e.g., floating-point sixteen bit/FP16, with 5-bit exponents) to more optimized number formats (e.g., Brain floating-point sixteen bit/Bfloat16, with 8-bit exponents), the power and performance limitations may increase.

BRIEF DESCRIPTION OF THE DRAWINGS
[0003] The various advantages of the embodiments will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:
[0004] FIG. 1 is a comparative block diagram of an example of multiplier-adder tree hardware topologies according to an embodiment;
[0005] FIG. 2 is a flowchart of an example of a method of operating a multiplier-adder tree according to an embodiment;
[0006] FIG. 3 is a comparative block diagram of an example of maximum exponent computation hardware according to an embodiment;
[0007] FIG. 4 is a flowchart of an example of a method of determining maximum exponent bits according to an embodiment;
[0008] FIG. 5 is a block diagram of an example of global alignment subtraction hardware according to an embodiment;
[0009] FIG. 6 is a block diagram of an example of a multiplier-adder tree hardware topology according to an embodiment;
[0010] FIG. 7 is a flowchart of an example of a method of conducting a global alignment according to an embodiment;
[0011] FIG. 8 is a block diagram of an example of a performance-enhanced computing system according to an embodiment; and
[0012] FIG. 9 is an illustration of an example of a semiconductor apparatus according to an embodiment.

DESCRIPTION OF EMBODIMENTS
[0013] Turning now to FIG. 1, a plurality of multiplier-adder trees are shown in which the product between a first array of floating-point numbers (e.g., a0, a1, … an) and a second array of floating-point numbers (e.g., b0, b1, … bn) is computed, followed by the summation/accumulation of the computed product. The computations may generally be useful in DNN-based machine learning applications that involve matrix multiplication and convolution operations. In the illustrated example, a first conventional topology 20 is a wide multiply adder tree optimized for input stationary matrix-multiply operations (e.g., matrix-matrix or matrix-vector multiplication operations in which either of the inputs is either stationary or changes less frequently than the other input). Area, energy efficiency, and throughput may generally be relevant to accelerators, with multiplier-adder tree topologies increasing area/energy efficiency as they enable the summation operation to be optimized across multiple inputs. Multiplier-adder tree topologies may also be easily pipelined for higher throughputs. Improving total latency for these designs improves both area and energy, since the latency determines the number and location of required pipeline flip-flops for the same throughput with a multi-cycle design.

Documents

Application Documents

#	Name	Date
1	202044010145-FORM 1 [09-03-2020(online)].pdf	2020-03-09
1	202044010145-IntimationOfGrant14-03-2024.pdf	2024-03-14
2	202044010145-DRAWINGS [09-03-2020(online)].pdf	2020-03-09
2	202044010145-PatentCertificate14-03-2024.pdf	2024-03-14
3	202044010145-DECLARATION OF INVENTORSHIP (FORM 5) [09-03-2020(online)].pdf	2020-03-09
3	202044010145-ABSTRACT [21-02-2022(online)].pdf	2022-02-21
4	202044010145-COMPLETE SPECIFICATION [09-03-2020(online)].pdf	2020-03-09
4	202044010145-CLAIMS [21-02-2022(online)].pdf	2022-02-21
5	202044010145-FORM 18 [07-05-2020(online)].pdf	2020-05-07
5	202044010145-Correspondence-Letter [21-02-2022(online)].pdf	2022-02-21
6	202044010145-FORM-26 [05-06-2020(online)].pdf	2020-06-05
6	202044010145-FER_SER_REPLY [21-02-2022(online)].pdf	2022-02-21
7	202044010145-OTHERS [21-02-2022(online)].pdf	2022-02-21
7	202044010145-FORM 3 [07-09-2020(online)].pdf	2020-09-07
8	202044010145-PETITION UNDER RULE 137 [21-02-2022(online)]-1.pdf	2022-02-21
8	202044010145-FORM 3 [09-03-2021(online)].pdf	2021-03-09
9	202044010145-FER.pdf	2021-10-18
9	202044010145-PETITION UNDER RULE 137 [21-02-2022(online)].pdf	2022-02-21
10	202044010145-FORM 3 [18-02-2022(online)].pdf	2022-02-18
10	202044010145-Proof of Right [18-02-2022(online)].pdf	2022-02-18
11	202044010145-Information under section 8(2) [18-02-2022(online)].pdf	2022-02-18
12	202044010145-FORM 3 [18-02-2022(online)].pdf	2022-02-18
12	202044010145-Proof of Right [18-02-2022(online)].pdf	2022-02-18
13	202044010145-FER.pdf	2021-10-18
13	202044010145-PETITION UNDER RULE 137 [21-02-2022(online)].pdf	2022-02-21
14	202044010145-FORM 3 [09-03-2021(online)].pdf	2021-03-09
14	202044010145-PETITION UNDER RULE 137 [21-02-2022(online)]-1.pdf	2022-02-21
15	202044010145-FORM 3 [07-09-2020(online)].pdf	2020-09-07
15	202044010145-OTHERS [21-02-2022(online)].pdf	2022-02-21
16	202044010145-FER_SER_REPLY [21-02-2022(online)].pdf	2022-02-21
16	202044010145-FORM-26 [05-06-2020(online)].pdf	2020-06-05
17	202044010145-Correspondence-Letter [21-02-2022(online)].pdf	2022-02-21
17	202044010145-FORM 18 [07-05-2020(online)].pdf	2020-05-07
18	202044010145-CLAIMS [21-02-2022(online)].pdf	2022-02-21
18	202044010145-COMPLETE SPECIFICATION [09-03-2020(online)].pdf	2020-03-09
19	202044010145-DECLARATION OF INVENTORSHIP (FORM 5) [09-03-2020(online)].pdf	2020-03-09
19	202044010145-ABSTRACT [21-02-2022(online)].pdf	2022-02-21
20	202044010145-PatentCertificate14-03-2024.pdf	2024-03-14
20	202044010145-DRAWINGS [09-03-2020(online)].pdf	2020-03-09
21	202044010145-IntimationOfGrant14-03-2024.pdf	2024-03-14
21	202044010145-FORM 1 [09-03-2020(online)].pdf	2020-03-09

Search Strategy

1	SearchStrategyMatrix202044010145E_13-08-2021.pdf