Sign In to Follow Application
View All Documents & Correspondence

Floating Point Dot Product Hardware With Wide Multiply Adder Tree For Machine Learning Accelerators

Abstract: Systems, apparatuses and methods may provide for technology that conduct a first alignment between a plurality of floating-point numbers based on a first subset of exponent bits. The technology may also conduct, at least partially in parallel with the first alignment, a second alignment between the plurality of floating-point numbers based on a second subset of exponent bits, where the first subset of exponent bits are LSBs and the second subset of exponent bits are MSBs. In one example, technology adds the aligned plurality of floating-point numbers to one another. With regard to the second alignment, the technology may also identify individual exponents of a plurality of floating-point numbers, identify a maximum exponent across the individual exponents, and conduct a subtraction of the individual exponents from the maximum exponent, where the subtraction is conducted from MSB to LSB.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
09 March 2020
Publication Number
50/2020
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
ipo@iphorizons.com
Parent Application
Patent Number
Legal Status
Grant Date
2024-03-14
Renewal Date

Applicants

INTEL CORPORATION
2200 Mission College Boulevard, Santa Clara, California, 95054, USA

Inventors

1. Himanshu Kaul
9718 SW West Haven Dr. Portland, OR 97225 (US)
2. Mark Anders
2559 NE Katie Dr. Hillsboro, OR 97124 (US)

Specification

Claims:1. A computing system comprising:
a network controller; and
a processor coupled to the network controller, the processor including logic coupled to one or more substrates to:
conduct a first alignment between a plurality of floating-point numbers based on a first subset of exponent bits;
conduct, at least partially in parallel with the first alignment, a second alignment between the plurality of floating-point numbers based on a second subset of exponent bits, wherein the first subset of exponent bits are least significant bits (LSBs) and the second subset of exponent bits are most significant bits (MSBs); and
add the aligned plurality of floating-point numbers to one another.
, Description:TECHNICAL FIELD
[0001] Embodiments generally relate to machine learning. More particularly, embodiments relate to floating-point dot-product hardware with a wide multiply-adder tree for machine learning accelerators.

BACKGROUND
[0002] Deep neural networks (DNNs) are typically used in machine learning (ML) workloads to perform matrix multiplication and convolution operations, which tend to be the most power and performance limiting operations of the ML workloads. While hardware accelerators with dot-product compute units have been proposed to improve area and energy efficiency of these operations (e.g., using a variety of dataflow architectures and data types), there remains considerable room for improvement. For example, conventional floating-point (FP) dot-product hardware solutions may first find the maximum exponent across floating-point products, with each individual product mantissa (e.g., significand, coefficient) being aligned for accumulation/summation using the maximum exponent and the corresponding individual exponent. Globally searching for the maximum exponent may introduce latency (e.g., decreasing performance). Moreover, the alignment may involve a relatively large amount of hardware (e.g., alignment shifter stages) that adds to latency, cost and/or power consumption. Indeed, as ML applications transition from standard number formats (e.g., floating-point sixteen bit/FP16, with 5-bit exponents) to more optimized number formats (e.g., Brain floating-point sixteen bit/Bfloat16, with 8-bit exponents), the power and performance limitations may increase.

BRIEF DESCRIPTION OF THE DRAWINGS
[0003] The various advantages of the embodiments will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:
[0004] FIG. 1 is a comparative block diagram of an example of multiplier-adder tree hardware topologies according to an embodiment;
[0005] FIG. 2 is a flowchart of an example of a method of operating a multiplier-adder tree according to an embodiment;
[0006] FIG. 3 is a comparative block diagram of an example of maximum exponent computation hardware according to an embodiment;
[0007] FIG. 4 is a flowchart of an example of a method of determining maximum exponent bits according to an embodiment;
[0008] FIG. 5 is a block diagram of an example of global alignment subtraction hardware according to an embodiment;
[0009] FIG. 6 is a block diagram of an example of a multiplier-adder tree hardware topology according to an embodiment;
[0010] FIG. 7 is a flowchart of an example of a method of conducting a global alignment according to an embodiment;
[0011] FIG. 8 is a block diagram of an example of a performance-enhanced computing system according to an embodiment; and
[0012] FIG. 9 is an illustration of an example of a semiconductor apparatus according to an embodiment.

DESCRIPTION OF EMBODIMENTS
[0013] Turning now to FIG. 1, a plurality of multiplier-adder trees are shown in which the product between a first array of floating-point numbers (e.g., a0, a1, … an) and a second array of floating-point numbers (e.g., b0, b1, … bn) is computed, followed by the summation/accumulation of the computed product. The computations may generally be useful in DNN-based machine learning applications that involve matrix multiplication and convolution operations. In the illustrated example, a first conventional topology 20 is a wide multiply adder tree optimized for input stationary matrix-multiply operations (e.g., matrix-matrix or matrix-vector multiplication operations in which either of the inputs is either stationary or changes less frequently than the other input). Area, energy efficiency, and throughput may generally be relevant to accelerators, with multiplier-adder tree topologies increasing area/energy efficiency as they enable the summation operation to be optimized across multiple inputs. Multiplier-adder tree topologies may also be easily pipelined for higher throughputs. Improving total latency for these designs improves both area and energy, since the latency determines the number and location of required pipeline flip-flops for the same throughput with a multi-cycle design.

Documents

Application Documents

# Name Date
1 202044010145-FORM 1 [09-03-2020(online)].pdf 2020-03-09
1 202044010145-IntimationOfGrant14-03-2024.pdf 2024-03-14
2 202044010145-DRAWINGS [09-03-2020(online)].pdf 2020-03-09
2 202044010145-PatentCertificate14-03-2024.pdf 2024-03-14
3 202044010145-DECLARATION OF INVENTORSHIP (FORM 5) [09-03-2020(online)].pdf 2020-03-09
3 202044010145-ABSTRACT [21-02-2022(online)].pdf 2022-02-21
4 202044010145-COMPLETE SPECIFICATION [09-03-2020(online)].pdf 2020-03-09
4 202044010145-CLAIMS [21-02-2022(online)].pdf 2022-02-21
5 202044010145-FORM 18 [07-05-2020(online)].pdf 2020-05-07
5 202044010145-Correspondence-Letter [21-02-2022(online)].pdf 2022-02-21
6 202044010145-FORM-26 [05-06-2020(online)].pdf 2020-06-05
6 202044010145-FER_SER_REPLY [21-02-2022(online)].pdf 2022-02-21
7 202044010145-OTHERS [21-02-2022(online)].pdf 2022-02-21
7 202044010145-FORM 3 [07-09-2020(online)].pdf 2020-09-07
8 202044010145-PETITION UNDER RULE 137 [21-02-2022(online)]-1.pdf 2022-02-21
8 202044010145-FORM 3 [09-03-2021(online)].pdf 2021-03-09
9 202044010145-FER.pdf 2021-10-18
9 202044010145-PETITION UNDER RULE 137 [21-02-2022(online)].pdf 2022-02-21
10 202044010145-FORM 3 [18-02-2022(online)].pdf 2022-02-18
10 202044010145-Proof of Right [18-02-2022(online)].pdf 2022-02-18
11 202044010145-Information under section 8(2) [18-02-2022(online)].pdf 2022-02-18
12 202044010145-FORM 3 [18-02-2022(online)].pdf 2022-02-18
12 202044010145-Proof of Right [18-02-2022(online)].pdf 2022-02-18
13 202044010145-FER.pdf 2021-10-18
13 202044010145-PETITION UNDER RULE 137 [21-02-2022(online)].pdf 2022-02-21
14 202044010145-FORM 3 [09-03-2021(online)].pdf 2021-03-09
14 202044010145-PETITION UNDER RULE 137 [21-02-2022(online)]-1.pdf 2022-02-21
15 202044010145-FORM 3 [07-09-2020(online)].pdf 2020-09-07
15 202044010145-OTHERS [21-02-2022(online)].pdf 2022-02-21
16 202044010145-FER_SER_REPLY [21-02-2022(online)].pdf 2022-02-21
16 202044010145-FORM-26 [05-06-2020(online)].pdf 2020-06-05
17 202044010145-Correspondence-Letter [21-02-2022(online)].pdf 2022-02-21
17 202044010145-FORM 18 [07-05-2020(online)].pdf 2020-05-07
18 202044010145-CLAIMS [21-02-2022(online)].pdf 2022-02-21
18 202044010145-COMPLETE SPECIFICATION [09-03-2020(online)].pdf 2020-03-09
19 202044010145-DECLARATION OF INVENTORSHIP (FORM 5) [09-03-2020(online)].pdf 2020-03-09
19 202044010145-ABSTRACT [21-02-2022(online)].pdf 2022-02-21
20 202044010145-PatentCertificate14-03-2024.pdf 2024-03-14
20 202044010145-DRAWINGS [09-03-2020(online)].pdf 2020-03-09
21 202044010145-IntimationOfGrant14-03-2024.pdf 2024-03-14
21 202044010145-FORM 1 [09-03-2020(online)].pdf 2020-03-09

Search Strategy

1 SearchStrategyMatrix202044010145E_13-08-2021.pdf

ERegister / Renewals

3rd: 31 May 2024

From 09/03/2022 - To 09/03/2023

4th: 31 May 2024

From 09/03/2023 - To 09/03/2024

5th: 31 May 2024

From 09/03/2024 - To 09/03/2025