Specification
I
FAST BRANCH-FREE VECTOR DIVISION COMPUTATION
FIELD
[0001] The present disclosure generally relates to the field of computing.
More particularly, an embodiment of the invention generally relates to techniques for fast branch-free vector division computation.
BACKGROUND
[0002] Compared to other simple arithmetic operations, hardware
implementations for division operations have been quite slow, for example, due to
larger latency. Some speedup could be achieved in vector cases due to various kind
of parallelism existence on modem architectures, such as via SIMD (Single-
Instruction, Multiple-Data) parallelism, superscalar, and out-of-order execution. For
example, reciprocal approximation with further Newton-Raphson refinement
iterations method (such as discussed at
http://en.wikipedia.org/wiki/Newton%E2%80%93Raphson_method) generally
works well for single precision (SP) case providing up to a two-fold speedup over hardware division operation in some implementations. However, this approach loses most of its benefits on double precision (DP) side because of absence of double
2
precision reciprocal operation in current SSE architectures. Consequently, additional DP to SP and SP to DP conversions may need to be performed, along with exponent field manipulations. Further, the SP and DP above-described approximations generally require special processing of denominator with infinite (INF) or zero values, reducing parallelism and reducing potential performance gains.
BRIEF DESCRIPTION OF THE DRAWINGS
(0003) The detailed description is provided with reference to the
accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in diflerent figures indicates similar or identical items.
(0004| Fig. I illustrates a flow diagram of a method according to an
embodiment of the invention.
(0005| Figs. 2A-2C illustrate pseudo code segments, which may be used in
some embodiments.
[0006] Fig. 3 illustrates a block diagram of a fast vector division, according to
an embodiment.
(0007) Figs. 4 and 5 illustrate block diagrams of embodiments of computing
systems, which may be utilized to implement some embodiments discussed herein.
3
DETAILED DESCRIPTION
[0008] In the following description, numerous specific details are set forth in
order to provide a thorough understanding of various embodiments. However, various embodiments of the invention may be practiced without the specific details. In other instances, well-icnown methods, procedures, components, and circuits have not been described in detail so as not to obscure the particular embodiments of the invention. Further, various aspects of embodiments of the invention may be performed using various means, such as integrated semiconductor circuits ("hardware"), computer-readable instructions organized into one or more programs ("software"), or some combination of hardware and software. For the purposes of this disclosure reference to "logic" shall mean either hardware, software (including for example micro-code that controls the operations of a processor), or some combination thereof.
[0009] Reference in the specification to "one embodiment" or "an
embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least an implementation. The appearances of the phrase "in one embodiment" in various places in the specification may or may not be all referring to the same embodiment.
4
[OOIOJ Also, in the description and claims, the terms "coupled" and
"connected," along with their derivatives, may be used. In some embodiments of the invention, "connected" may be used to indicate that two or more elements are in direct physical or electrical contact with each other. "Coupled" may mean that two or more elements are in direct physical or electrical contact. However, "coupled" may also mean that two or more elements may not be in direct contact with each other, but may still cooperate or interact with each other.
[001IJ Some of the embodiments discussed herein may provide improved
performance for double precision division/inversion vector computations, e.g., without requiring branches or special actions which were previously necessary. The vector division computations may be performed on SIMD computing platforms. Generally, SIMD is a technique employed to achieve data level parallelism. In particular, multiple data may be processed in multiple corresponding lanes of an SIMD vector processor (such as processors 402 and 502/504 of Figs. 4 and 5, respectively) in accordance with a single instruction.
[0012] In some implementations, only one division operation is performed for
several inversions. Taking the following as an example (proposed by I. I. Zavarzin, V. F. Kuryakin, V. V. Lunev, D.M. Obuvalin, V. G Ryzhih in "Optimizatsiya Vychislenij Vektomyh Funktsyj." VANT. ser. Matematicheskoe modelirovanie fizicheskih protsessov. 1997. Vol. 4 (Russian language magazine)):
5
• p
= X, -X) -X^ • K,
— = jc, • Xj x^R,
— = X, • jCj • X4 • R,
— = X| • Xj • Xj • R,
where R =
I * 2 ' 3 * 4
[0013] The inversion of every argument may be computed using three
additional multiplications by R and three other arguments which is generally faster than four hardware divisions that involve large latency and large throughput value. More specifically, for the general case of N values, the next maximum possible performance gain estimation for this technique (D, M -throughput values for Division and Multiplier, respectively) may be:
. ^ D
^'""~ D {N-\)M ,,, ,, ,^
_ + ^^ i + (N-l)M
N N
|0014] A weakness of this approach lies in high possibility to encounter
overflow or underflow for the product x, Xj Xj x^ because of argument exponents
variety which may result in incorrect outputs for the whole four-arguments bundle.
6 For example, if x^ = 0.0, but .t, * 0.0, x^ * 0.0, x^ * 0.0 , then R = INF and three of four results are incorrect: — = — = — = INF
(0015| To address this problem, there needs to be a guarantee that the sum of
all input exponents is not going to cause underflow or overflow. This makes the work interval quite narrow and requires argument comparisons with possible branches for special cases that cannot be processed properly in the main path. In an embodiment, the above-mentioned problems may be addressed by argument scaling and reconstruction.
(0016| More specifically. Fig. I illustrates a flow diagram of a method 100 in
accordance with one embodiment. At an operation 102, one or more arguments (e.g., xi, X2, etc.) are received. The arguments may be represented as x = {-lyb" ■ f, where the terms are defined as follows:
5 = {0,1} - sign for "s" in f-//,
b - base (e.g., for binary case b = 2),
n - exponent (E^^
Documents
Application Documents
| # |
Name |
Date |
| 1 |
3665-delnp-2012-Form-18-(23-05-2012).pdf |
2012-05-23 |
| 1 |
3665-DELNP-2012-US(14)-HearingNotice-(HearingDate-03-03-2021).pdf |
2021-10-17 |
| 2 |
3665-DELNP-2012-Correspondence to notify the Controller [03-03-2021(online)].pdf |
2021-03-03 |
| 2 |
3665-delnp-2012-Correspondence-others-(23-05-2012).pdf |
2012-05-23 |
| 3 |
3665-delnp-2012-GPA-(17-07-2012).pdf |
2012-07-17 |
| 3 |
3665-DELNP-2012-ABSTRACT [23-11-2018(online)].pdf |
2018-11-23 |
| 4 |
3665-delnp-2012-Form-3-(17-07-2012).pdf |
2012-07-17 |
| 4 |
3665-DELNP-2012-AMENDED DOCUMENTS [23-11-2018(online)].pdf |
2018-11-23 |
| 5 |
3665-delnp-2012-Correspondence-Others-(17-07-2012).pdf |
2012-07-17 |
| 5 |
3665-DELNP-2012-CLAIMS [23-11-2018(online)].pdf |
2018-11-23 |
| 6 |
3665-DELNP-2012-COMPLETE SPECIFICATION [23-11-2018(online)].pdf |
2018-11-23 |
| 6 |
3665-delnp-2012-Assignmnet-(17-07-2012).pdf |
2012-07-17 |
| 7 |
3665-DELNP-2012-DRAWING [23-11-2018(online)].pdf |
2018-11-23 |
| 7 |
3665-delnp-2012-Correspondence-Others-(12-10-2012).pdf |
2012-10-12 |
| 8 |
3665-delnp-2012-Form-3-(18-10-2012).pdf |
2012-10-18 |
| 8 |
3665-DELNP-2012-FER_SER_REPLY [23-11-2018(online)].pdf |
2018-11-23 |
| 9 |
3665-delnp-2012-Correspondence-Others-(18-10-2012).pdf |
2012-10-18 |
| 9 |
3665-DELNP-2012-FORM 13 [23-11-2018(online)].pdf |
2018-11-23 |
| 10 |
3665-DELNP-2012-FORM 3 [23-11-2018(online)].pdf |
2018-11-23 |
| 10 |
3665-delnp-2012-Form-5.pdf |
2013-04-04 |
| 11 |
3665-delnp-2012-Form-3.pdf |
2013-04-04 |
| 11 |
3665-DELNP-2012-Information under section 8(2) (MANDATORY) [23-11-2018(online)].pdf |
2018-11-23 |
| 12 |
3665-delnp-2012-Form-2.pdf |
2013-04-04 |
| 12 |
3665-DELNP-2012-MARKED COPIES OF AMENDEMENTS [23-11-2018(online)].pdf |
2018-11-23 |
| 13 |
3665-delnp-2012-Form-1.pdf |
2013-04-04 |
| 13 |
3665-DELNP-2012-OTHERS [23-11-2018(online)].pdf |
2018-11-23 |
| 14 |
3665-delnp-2012-Drawings.pdf |
2013-04-04 |
| 14 |
3665-DELNP-2012-PETITION UNDER RULE 137 [23-11-2018(online)].pdf |
2018-11-23 |
| 15 |
3665-delnp-2012-Description (Complete).pdf |
2013-04-04 |
| 15 |
3665-DELNP-2012-RELEVANT DOCUMENTS [23-11-2018(online)]-1.pdf |
2018-11-23 |
| 16 |
3665-delnp-2012-Correspondence-others.pdf |
2013-04-04 |
| 16 |
3665-DELNP-2012-RELEVANT DOCUMENTS [23-11-2018(online)].pdf |
2018-11-23 |
| 17 |
3665-DELNP-2012-FER.pdf |
2018-05-29 |
| 17 |
3665-delnp-2012-Claims.pdf |
2013-04-04 |
| 18 |
3665-delnp-2012-Abstract.pdf |
2013-04-04 |
| 19 |
3665-delnp-2012-Claims.pdf |
2013-04-04 |
| 19 |
3665-DELNP-2012-FER.pdf |
2018-05-29 |
| 20 |
3665-delnp-2012-Correspondence-others.pdf |
2013-04-04 |
| 20 |
3665-DELNP-2012-RELEVANT DOCUMENTS [23-11-2018(online)].pdf |
2018-11-23 |
| 21 |
3665-delnp-2012-Description (Complete).pdf |
2013-04-04 |
| 21 |
3665-DELNP-2012-RELEVANT DOCUMENTS [23-11-2018(online)]-1.pdf |
2018-11-23 |
| 22 |
3665-delnp-2012-Drawings.pdf |
2013-04-04 |
| 22 |
3665-DELNP-2012-PETITION UNDER RULE 137 [23-11-2018(online)].pdf |
2018-11-23 |
| 23 |
3665-delnp-2012-Form-1.pdf |
2013-04-04 |
| 23 |
3665-DELNP-2012-OTHERS [23-11-2018(online)].pdf |
2018-11-23 |
| 24 |
3665-DELNP-2012-MARKED COPIES OF AMENDEMENTS [23-11-2018(online)].pdf |
2018-11-23 |
| 24 |
3665-delnp-2012-Form-2.pdf |
2013-04-04 |
| 25 |
3665-delnp-2012-Form-3.pdf |
2013-04-04 |
| 25 |
3665-DELNP-2012-Information under section 8(2) (MANDATORY) [23-11-2018(online)].pdf |
2018-11-23 |
| 26 |
3665-DELNP-2012-FORM 3 [23-11-2018(online)].pdf |
2018-11-23 |
| 26 |
3665-delnp-2012-Form-5.pdf |
2013-04-04 |
| 27 |
3665-delnp-2012-Correspondence-Others-(18-10-2012).pdf |
2012-10-18 |
| 27 |
3665-DELNP-2012-FORM 13 [23-11-2018(online)].pdf |
2018-11-23 |
| 28 |
3665-DELNP-2012-FER_SER_REPLY [23-11-2018(online)].pdf |
2018-11-23 |
| 28 |
3665-delnp-2012-Form-3-(18-10-2012).pdf |
2012-10-18 |
| 29 |
3665-delnp-2012-Correspondence-Others-(12-10-2012).pdf |
2012-10-12 |
| 29 |
3665-DELNP-2012-DRAWING [23-11-2018(online)].pdf |
2018-11-23 |
| 30 |
3665-delnp-2012-Assignmnet-(17-07-2012).pdf |
2012-07-17 |
| 30 |
3665-DELNP-2012-COMPLETE SPECIFICATION [23-11-2018(online)].pdf |
2018-11-23 |
| 31 |
3665-delnp-2012-Correspondence-Others-(17-07-2012).pdf |
2012-07-17 |
| 31 |
3665-DELNP-2012-CLAIMS [23-11-2018(online)].pdf |
2018-11-23 |
| 32 |
3665-delnp-2012-Form-3-(17-07-2012).pdf |
2012-07-17 |
| 32 |
3665-DELNP-2012-AMENDED DOCUMENTS [23-11-2018(online)].pdf |
2018-11-23 |
| 33 |
3665-delnp-2012-GPA-(17-07-2012).pdf |
2012-07-17 |
| 33 |
3665-DELNP-2012-ABSTRACT [23-11-2018(online)].pdf |
2018-11-23 |
| 34 |
3665-delnp-2012-Correspondence-others-(23-05-2012).pdf |
2012-05-23 |
| 34 |
3665-DELNP-2012-Correspondence to notify the Controller [03-03-2021(online)].pdf |
2021-03-03 |
| 35 |
3665-DELNP-2012-US(14)-HearingNotice-(HearingDate-03-03-2021).pdf |
2021-10-17 |
| 35 |
3665-delnp-2012-Form-18-(23-05-2012).pdf |
2012-05-23 |
Search Strategy