Sign In to Follow Application
View All Documents & Correspondence

"Fast Branch Free Vector Division Computation"

Abstract: Methods and apparatus for double precision division/inversion vector computations on Single Instruction Multiple Data (SIMD) computing platforms are described. In one embodiment, an input argument is represented by an exponent portion and a fraction portion. The fraction portions are scaled, inverted, and multiplied to generate an inverse version of the input argument. In an embodiment, the inversion of the exponent portion may be done by changing the sign of the exponent. Other embodiments are also described.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
26 April 2012
Publication Number
45/2015
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
 
Parent Application

Applicants

INTEL CORPORATION
2200 MISSION COLLEGE BLVD., M/S: RNB4-150, SANTA CLAIFORNIA 95052, UNITED STATES OF AMERICA

Inventors

1. KOLESOV, ANDREY IVANOVICH
RODIONOVA ST., 188-61, NIZHNIY NOVGOROD, 603126, RUSSIAN FEDERATION
2. KURIAKIN, VALERY FEDOROVICH
B. KORNILOV STR., 2-59, NIZHNY NOVGOROD, 603106, RUSSIAN FEDERATION
3. GUSEVA, MARIA VALERIEVNA
2ND MICR., 20, 0, KSTOVO, NIZHEGORODSKAYA OBL., 607650, RUSSIAN FEDERATION

Specification

I FAST BRANCH-FREE VECTOR DIVISION COMPUTATION FIELD [0001] The present disclosure generally relates to the field of computing. More particularly, an embodiment of the invention generally relates to techniques for fast branch-free vector division computation. BACKGROUND [0002] Compared to other simple arithmetic operations, hardware implementations for division operations have been quite slow, for example, due to larger latency. Some speedup could be achieved in vector cases due to various kind of parallelism existence on modem architectures, such as via SIMD (Single- Instruction, Multiple-Data) parallelism, superscalar, and out-of-order execution. For example, reciprocal approximation with further Newton-Raphson refinement iterations method (such as discussed at http://en.wikipedia.org/wiki/Newton%E2%80%93Raphson_method) generally works well for single precision (SP) case providing up to a two-fold speedup over hardware division operation in some implementations. However, this approach loses most of its benefits on double precision (DP) side because of absence of double 2 precision reciprocal operation in current SSE architectures. Consequently, additional DP to SP and SP to DP conversions may need to be performed, along with exponent field manipulations. Further, the SP and DP above-described approximations generally require special processing of denominator with infinite (INF) or zero values, reducing parallelism and reducing potential performance gains. BRIEF DESCRIPTION OF THE DRAWINGS (0003) The detailed description is provided with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in diflerent figures indicates similar or identical items. (0004| Fig. I illustrates a flow diagram of a method according to an embodiment of the invention. (0005| Figs. 2A-2C illustrate pseudo code segments, which may be used in some embodiments. [0006] Fig. 3 illustrates a block diagram of a fast vector division, according to an embodiment. (0007) Figs. 4 and 5 illustrate block diagrams of embodiments of computing systems, which may be utilized to implement some embodiments discussed herein. 3 DETAILED DESCRIPTION [0008] In the following description, numerous specific details are set forth in order to provide a thorough understanding of various embodiments. However, various embodiments of the invention may be practiced without the specific details. In other instances, well-icnown methods, procedures, components, and circuits have not been described in detail so as not to obscure the particular embodiments of the invention. Further, various aspects of embodiments of the invention may be performed using various means, such as integrated semiconductor circuits ("hardware"), computer-readable instructions organized into one or more programs ("software"), or some combination of hardware and software. For the purposes of this disclosure reference to "logic" shall mean either hardware, software (including for example micro-code that controls the operations of a processor), or some combination thereof. [0009] Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least an implementation. The appearances of the phrase "in one embodiment" in various places in the specification may or may not be all referring to the same embodiment. 4 [OOIOJ Also, in the description and claims, the terms "coupled" and "connected," along with their derivatives, may be used. In some embodiments of the invention, "connected" may be used to indicate that two or more elements are in direct physical or electrical contact with each other. "Coupled" may mean that two or more elements are in direct physical or electrical contact. However, "coupled" may also mean that two or more elements may not be in direct contact with each other, but may still cooperate or interact with each other. [001IJ Some of the embodiments discussed herein may provide improved performance for double precision division/inversion vector computations, e.g., without requiring branches or special actions which were previously necessary. The vector division computations may be performed on SIMD computing platforms. Generally, SIMD is a technique employed to achieve data level parallelism. In particular, multiple data may be processed in multiple corresponding lanes of an SIMD vector processor (such as processors 402 and 502/504 of Figs. 4 and 5, respectively) in accordance with a single instruction. [0012] In some implementations, only one division operation is performed for several inversions. Taking the following as an example (proposed by I. I. Zavarzin, V. F. Kuryakin, V. V. Lunev, D.M. Obuvalin, V. G Ryzhih in "Optimizatsiya Vychislenij Vektomyh Funktsyj." VANT. ser. Matematicheskoe modelirovanie fizicheskih protsessov. 1997. Vol. 4 (Russian language magazine)): 5 • p = X, -X) -X^ • K, — = jc, • Xj x^R, — = X, • jCj • X4 • R, — = X| • Xj • Xj • R, where R = I * 2 ' 3 * 4 [0013] The inversion of every argument may be computed using three additional multiplications by R and three other arguments which is generally faster than four hardware divisions that involve large latency and large throughput value. More specifically, for the general case of N values, the next maximum possible performance gain estimation for this technique (D, M -throughput values for Division and Multiplier, respectively) may be: . ^ D ^'""~ D {N-\)M ,,, ,, ,^ _ + ^^ i + (N-l)M N N |0014] A weakness of this approach lies in high possibility to encounter overflow or underflow for the product x, Xj Xj x^ because of argument exponents variety which may result in incorrect outputs for the whole four-arguments bundle. 6 For example, if x^ = 0.0, but .t, * 0.0, x^ * 0.0, x^ * 0.0 , then R = INF and three of four results are incorrect: — = — = — = INF (0015| To address this problem, there needs to be a guarantee that the sum of all input exponents is not going to cause underflow or overflow. This makes the work interval quite narrow and requires argument comparisons with possible branches for special cases that cannot be processed properly in the main path. In an embodiment, the above-mentioned problems may be addressed by argument scaling and reconstruction. (0016| More specifically. Fig. I illustrates a flow diagram of a method 100 in accordance with one embodiment. At an operation 102, one or more arguments (e.g., xi, X2, etc.) are received. The arguments may be represented as x = {-lyb" ■ f, where the terms are defined as follows: 5 = {0,1} - sign for "s" in f-//, b - base (e.g., for binary case b = 2), n - exponent (E^^

Documents

Application Documents

# Name Date
1 3665-delnp-2012-Form-18-(23-05-2012).pdf 2012-05-23
1 3665-DELNP-2012-US(14)-HearingNotice-(HearingDate-03-03-2021).pdf 2021-10-17
2 3665-DELNP-2012-Correspondence to notify the Controller [03-03-2021(online)].pdf 2021-03-03
2 3665-delnp-2012-Correspondence-others-(23-05-2012).pdf 2012-05-23
3 3665-delnp-2012-GPA-(17-07-2012).pdf 2012-07-17
3 3665-DELNP-2012-ABSTRACT [23-11-2018(online)].pdf 2018-11-23
4 3665-delnp-2012-Form-3-(17-07-2012).pdf 2012-07-17
4 3665-DELNP-2012-AMENDED DOCUMENTS [23-11-2018(online)].pdf 2018-11-23
5 3665-delnp-2012-Correspondence-Others-(17-07-2012).pdf 2012-07-17
5 3665-DELNP-2012-CLAIMS [23-11-2018(online)].pdf 2018-11-23
6 3665-DELNP-2012-COMPLETE SPECIFICATION [23-11-2018(online)].pdf 2018-11-23
6 3665-delnp-2012-Assignmnet-(17-07-2012).pdf 2012-07-17
7 3665-DELNP-2012-DRAWING [23-11-2018(online)].pdf 2018-11-23
7 3665-delnp-2012-Correspondence-Others-(12-10-2012).pdf 2012-10-12
8 3665-delnp-2012-Form-3-(18-10-2012).pdf 2012-10-18
8 3665-DELNP-2012-FER_SER_REPLY [23-11-2018(online)].pdf 2018-11-23
9 3665-delnp-2012-Correspondence-Others-(18-10-2012).pdf 2012-10-18
9 3665-DELNP-2012-FORM 13 [23-11-2018(online)].pdf 2018-11-23
10 3665-DELNP-2012-FORM 3 [23-11-2018(online)].pdf 2018-11-23
10 3665-delnp-2012-Form-5.pdf 2013-04-04
11 3665-delnp-2012-Form-3.pdf 2013-04-04
11 3665-DELNP-2012-Information under section 8(2) (MANDATORY) [23-11-2018(online)].pdf 2018-11-23
12 3665-delnp-2012-Form-2.pdf 2013-04-04
12 3665-DELNP-2012-MARKED COPIES OF AMENDEMENTS [23-11-2018(online)].pdf 2018-11-23
13 3665-delnp-2012-Form-1.pdf 2013-04-04
13 3665-DELNP-2012-OTHERS [23-11-2018(online)].pdf 2018-11-23
14 3665-delnp-2012-Drawings.pdf 2013-04-04
14 3665-DELNP-2012-PETITION UNDER RULE 137 [23-11-2018(online)].pdf 2018-11-23
15 3665-delnp-2012-Description (Complete).pdf 2013-04-04
15 3665-DELNP-2012-RELEVANT DOCUMENTS [23-11-2018(online)]-1.pdf 2018-11-23
16 3665-delnp-2012-Correspondence-others.pdf 2013-04-04
16 3665-DELNP-2012-RELEVANT DOCUMENTS [23-11-2018(online)].pdf 2018-11-23
17 3665-DELNP-2012-FER.pdf 2018-05-29
17 3665-delnp-2012-Claims.pdf 2013-04-04
18 3665-delnp-2012-Abstract.pdf 2013-04-04
19 3665-delnp-2012-Claims.pdf 2013-04-04
19 3665-DELNP-2012-FER.pdf 2018-05-29
20 3665-delnp-2012-Correspondence-others.pdf 2013-04-04
20 3665-DELNP-2012-RELEVANT DOCUMENTS [23-11-2018(online)].pdf 2018-11-23
21 3665-delnp-2012-Description (Complete).pdf 2013-04-04
21 3665-DELNP-2012-RELEVANT DOCUMENTS [23-11-2018(online)]-1.pdf 2018-11-23
22 3665-delnp-2012-Drawings.pdf 2013-04-04
22 3665-DELNP-2012-PETITION UNDER RULE 137 [23-11-2018(online)].pdf 2018-11-23
23 3665-delnp-2012-Form-1.pdf 2013-04-04
23 3665-DELNP-2012-OTHERS [23-11-2018(online)].pdf 2018-11-23
24 3665-DELNP-2012-MARKED COPIES OF AMENDEMENTS [23-11-2018(online)].pdf 2018-11-23
24 3665-delnp-2012-Form-2.pdf 2013-04-04
25 3665-delnp-2012-Form-3.pdf 2013-04-04
25 3665-DELNP-2012-Information under section 8(2) (MANDATORY) [23-11-2018(online)].pdf 2018-11-23
26 3665-DELNP-2012-FORM 3 [23-11-2018(online)].pdf 2018-11-23
26 3665-delnp-2012-Form-5.pdf 2013-04-04
27 3665-delnp-2012-Correspondence-Others-(18-10-2012).pdf 2012-10-18
27 3665-DELNP-2012-FORM 13 [23-11-2018(online)].pdf 2018-11-23
28 3665-DELNP-2012-FER_SER_REPLY [23-11-2018(online)].pdf 2018-11-23
28 3665-delnp-2012-Form-3-(18-10-2012).pdf 2012-10-18
29 3665-delnp-2012-Correspondence-Others-(12-10-2012).pdf 2012-10-12
29 3665-DELNP-2012-DRAWING [23-11-2018(online)].pdf 2018-11-23
30 3665-delnp-2012-Assignmnet-(17-07-2012).pdf 2012-07-17
30 3665-DELNP-2012-COMPLETE SPECIFICATION [23-11-2018(online)].pdf 2018-11-23
31 3665-delnp-2012-Correspondence-Others-(17-07-2012).pdf 2012-07-17
31 3665-DELNP-2012-CLAIMS [23-11-2018(online)].pdf 2018-11-23
32 3665-delnp-2012-Form-3-(17-07-2012).pdf 2012-07-17
32 3665-DELNP-2012-AMENDED DOCUMENTS [23-11-2018(online)].pdf 2018-11-23
33 3665-delnp-2012-GPA-(17-07-2012).pdf 2012-07-17
33 3665-DELNP-2012-ABSTRACT [23-11-2018(online)].pdf 2018-11-23
34 3665-delnp-2012-Correspondence-others-(23-05-2012).pdf 2012-05-23
34 3665-DELNP-2012-Correspondence to notify the Controller [03-03-2021(online)].pdf 2021-03-03
35 3665-DELNP-2012-US(14)-HearingNotice-(HearingDate-03-03-2021).pdf 2021-10-17
35 3665-delnp-2012-Form-18-(23-05-2012).pdf 2012-05-23

Search Strategy

1 search_16-04-2018.pdf