A Hardware/Software Paralalised Implementation Of Elliptic Curve

< Back

A Hardware/Software Paralalised Implementation Of Elliptic Curve Digital Signature Algorithm

Abstract: An authentication system and a method for signing data are disclosed. The system uses a hardware software partitioned approach. In its implementation the system of the invention compares favourably with performance and other parameters with a complete hardware or full software implementation. Particularly, advantageously there is a reduced gate count. Also as disclosed in the invention the system makes it difficult for hackers to attack the system using simple power analysis.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

03 June 2005

Publication Number

23/2007

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Patent Number

Legal Status

Grant Date

2019-09-20

Renewal Date

Applicants

Tata Consultancy Services Ltd

Air India Building, 11th floor Nariman point Mumbai-400 021,

Inventors

1. Sarangarajan A.

Tata Consultancy Services Ltd, Deccan Park, Plot No.1, Software Units Layout, Madhapur, Hydrabad-500 081,

2. Viswanath Rao T.

Tata Consultancy Services Ltd, Abhilash Building, Plot No.96, EPIP Industrial Area, Whitefield Road, Bangalore-560 066, Karnataka, India

3. Murugesh R.

Tata Consultancy Services Ltd, Abhilash Building, Plot No.96, EPIP Industrial Area, Whitefield Road, Bangalore-560 066, Karnataka, India

4. Srinidhi N.

Tata Consultancy Services Ltd, Abhilash Building, Plot No.96, EPIP Industrial Area, Whitefield Road, Bangalore-560 066, Karnataka, India

5. Sreenaiah G.

Tata Consultancy Services Ltd, Abhilash Building, Plot No.96, EPIP Industrial Area, Whitefield Road, Bangalore-560 066, Karnataka, India

Specification

FORM 2
THE PATENTS ACT, 1970
(39 OF 1970)
&
THE PATENTS RULES, 2003
Provisional
COMPLETE SPECIFICATION
(Section 10 and Rule 13)

<
z
5
E o

1. Title of the invention
A HARDWARE/SOFTWARE PARALLELIZED IMPLEMENTATION OF THE ELLIPTIC CURVE DIGITAL SIGNATURE ALGORITHM

2. Applicants)
(a) Name: Tata Consultancy Services Ltd.
(b) Nationality: An Indian Company.
(c) Address : Air India Building, 11th floor, Nariman
Point, Mumbai 400 021.
3. Preamble to the Description
The following specification particularly describes the invention and the
manner in which it is to be performed.
Complete specification

treated as provisional
specification 4/5 9(3)
of the patent Act, 1970

(Dr. Ujawala Haldankar)
examiner of Patent & Design

2
4. DESCRIPTION
FIELD OF INVENTION;
The present invention is in the field of cryptography. The invention pertains to secure and efficient generation of digital signatures through a hardware-software parallelized system consisting of a hardware accelerator and a microprocessor, based on the Elliptic Curve Digital Signature Algorithm (ECDSA).
BACKGROUND OF INVENTION
The digital signature technology plays a key role in the performance of secure electronic transactions. Electronic transactions such as exchange of email and financial transactions have become widespread. To be able to accept these as valid transactions, it has become necessary to digitally "sign" them. Digital signatures authenticate the sender.
Digital signatures are enabled through "Public Key Cryptosystems," a concept that took roots from the seminal result by Diffie and Hellman who showed that it was possible for two users to share a secret over an insecure channel. This gave rise to the study of one-way functions that are the building blocks of public-key cryptosystems.
In a public-key cryptosystem, each user generates a pair of keys, a public and a private key. The private key is used for signing a document whereas the public key is used for verifying the signature. The public key, along with some domain parameters, is broadcast to the other users through, for instance, a public directory; while the private key is secured with the user. The private key is a large number, about 50 to 200 digits long, while the public key may be an

3
integer (for RSA, ElGamal) or a point on a two-dimensional curve (for Elliptic Curve Cryptosystems).
Public-key cryptosystems are built on the premise of the existence of one-way functions, i.e. functions that are easy to compute one-way but hard to invert. Recovering a user's private key from her public key and domain parameters is equivalent to inverting this function. The different public key cryptosystems currently in use are classified according to the one-way function that they are based on.
In public key cryptography, the digital signature binds a message to a signer and assures the verifier of the origin of the message. The message can be any electronic document (a text or an image file). While anyone can verify a signature, only the signer could have generated the signature, as only she has access to the private key. The signature usually takes the form of a pair of large numbers that are dependent on the message and the signer's private key.
ELLIPTIC CURVE CRYPTOSYSTEMS
Elliptic Curve Cryptography, hereafter also referred to as ECC, was first proposed independently by Neal Koblitz and Victor Miller in 1985. Since then ECC has received a lot of attention due to its relatively short key size that makes it an attractive alternative to RSA particularly on constrained devices such as smart cards.
In ECC, one starts with a finite field. Informally, a field is a set of elements with an addition and multiplication operation defined on it that satisfies some basic properties. An example is Fp = {0,1.. .p-1}, the set of integers mod p where p is a prime number. Addition and multiplication are defined modulo p. Another example is F2m, the set of all polynomials of degree less than m with

4
coefficients 0 or 1. Addition of polynomials is performed coefficient-wise where the coefficients are added modulo 2. Multiplication is performed modulo a reduction polynomial of degree m (i.e. to determine the product of two polynomials, first multiply them and divide by the reduction polynomial and take the remainder).
Next, an elliptic curve over the field is defined. An elliptic curve over F2111 is the set of all points (x,y) with x, y e F2m satisfying
y2 + xy = x3 + ax2 + b where a, b e F2m. A (slightly) different equation characterizes an elliptic curve over a prime field.
The points on the curve together with a special "point at infinity'' (denoted by O) form a group (think of a group as a field except that it has only one operation, namely addition, defined on it) that we denote as E(F2m). For the group addition and other properties of the elliptic curve, refer Johnson, D., Menezes, A., The Elliptic Curve Digital Signature Algorithm (ECDSA), Technical Report CORR 99-34, August 1999.
To generate a key pair, the public and private keys of a public key cryptosystem, we need a base point G ε E(F2m) that generates a subgroup of a large order n (the subgroup generated by G is the largest group within E(F2m) containing G). Next we generate a random number d and compute the point Q = d.G (which is the addition operation applied to G, (d-1) times). Then Q is our public key while d is the private key.
The set (m, a, b, G, n) is referred to as the domain parameters of the elliptic curve. It is required that n be a large prime number in order to ensure a cryptographically strong curve. As a result, the underlying field defined by m should be large (it is recommended that m > 160). There are further restrictions

5
on these parameters to avoid some known attacks that apply to special classes of curves; refer Johnson D. et al cited above for more details on choosing domain parameters.
The Elliptic Curve Digital Signature Algorithm or ECDSA is a standard developed by IEEE for signing and verifying messages using ECC. In ECDSA, the message is first hashed using a standard hashing algorithm such as SHA-1. The hashed message is then transformed into a signature, a pair of bit strings, using the sender's private key. Both the message and the signature are sent to the receiver, who upon receiving the message, computes its hash and uses the sender's public key to verify the signature on the hash.
The standard signature generation algorithm based on ECC is described below,
where the symbols G, n, t, d denote the following:
G is the base point of the curve.
n is the order of the base point G.
t is the message.
d is the signer's private key and d < n.
I. Compute the message digest e = SHA-1 (t).
II. Generate a random number k, 0 < k < n.
III. Compute the value k.G = (x1, y1). Let r = x1 mod n. If r = 0, go to II.
IV. Compute z1 = k-1 mod n.
V. Compute s = zl(e + dr) mod n. If s = 0, go to II.
VI. Output the signature (r, s).
Performance of the ECDSA is measured by the time taken to generate and verify a signature. Where performance does not meet the requirements of the application, hardware accelerators are used. While hardware accelerators

6
improve performance, they would add to the gate count of the silicon. This gate count needs to be minimized.
PRIOR ART:
The traditional approaches to implementation of ECDSA have relied on a purely hardware or a purely software implementation. In the first approach, all the ECC algorithms and the big integer arithmetic required for generating a signature are implemented in hardware. This can typically lead to a gate count in the order of 100,000 or more. The timing results for a software approach depend on the processor used. For a Mitsubishi 16 bit 10MHz processor (M16C), results have been published whereby a signature has been generated in 150 ms (refer Hasegawa, T., Nakajima, J., and Matsui, M. A Small and Fast Software Implementation of Elliptic Curve Cryptosystems over GF(p) on a 16-Bit Microcomputer, IEEE Trans. Fundamentals, E82-A, 1 (Jan. 1999), 98-106.). The best-known implementation of the elliptic curve point multiplication, the computationally expensive step in signature generation, on an 8051 processor takes nearly 2 seconds (see [2]). Also this uses a 134-bit key, which is not considered cryptographically secure.
In the present invention, an efficient implementation of the ECDSA is disclosed, using a hardware-software partitioned approach, which compares well in performance with a complete hardware implementation, but with a reduced gate count. The approach is much better in performance compared to a full software implementation on the same microcontroller.
Patents in the field deal with optimizing the time taken for the generating the signature. The point multiplication operation (step III), in particular, is the focus of many works.

7
United States Patent No. 6279110 by Johnson et al discloses a modified ECDSA algorithm where the inversion operation on step IV is replaced by generating another random number 't' and submitting triple (s’ r, c) as the signature. Here r is as defined earlier; c = k•t mod n and s' = t•(e+d•r) mod n. The original (r, s) pair is recovered by observing that s = c -1s' mod n. This is an optimized way of generating signature on constrained devices such as Smart Cards where inversion is also considered expensive. The 's' value is recovered either by the terminal connected to the smart card reader or the verifier. However, if the value is recovered by the verifier, then a longer signature is transmitted thus consuming more bandwidth.
United States Patent No. 6782100 by Vanstone et al discloses work related to hardware implementation of Step HI, point multiplication, on elliptic curves. This has impact on the performance of signature generation algorithm. However, even better performance could be obtained by enhancing the scope of optimization of the algorithm.
United States Patent No. 6772184 by Chang discloses a method for efficient field inversion operation. The 160-bit curve is defined over a prime field. This work is more on optimizing the field inversion operation in point multiplication.
United States Patent No. 6751318 by Crandall discloses a signature algorithm which is different from ECDSA. The scheme generates two points on an elliptic curve over a prime field that are compared to verify the signature. The scheme provides a method for deducing the value of 'x' coordinate of a sum of two points using only the x-coordinates of the original two points. Resorting to projective coordinates optimizes elliptic addition and doubling operations and eliminates field inversion. The entire algorithm is implemented in software. The work is a scheme to present an efficient way of verifying rather than generating a signature and has no hardware component in its implementation.

8
United States Patent No. 6714648 by Miyazaki et al discloses hardware and software implementation of ECDSA with a 160-bit curve over a prime field using a residual multiplier implemented in hardware. The implementation in hardware has a memory-mapped input/output and the rest of the computing is done in software. It also uses pre-computations which help in speeding up the point multiplication algorithm. Further, the field inversion operation is implemented using residual multiplication.
United States Patent No. 6704870 by Vanstone et al discloses the use of pre-stored values of 'k' and 'kG' to circumvent the expensive point multiplication step. New signing pairs (k, kG) are created by combining pairs of pre-stored elements. The combination is performed partly on the smart card and partly on the transaction device in such a way that the identity of the signing elements is not revealed. However, this procedure assumes significant computing power as well as availability of the necessary software on the transaction device. Another aspect is that the signature generation time includes the cost of communication between the card and device thereby increasing use of resources.
United States Patent No. 6263081 by Miyaji et al discloses a method of implementing point multiplication, step HI, in software using certain pre-computations. Although this achieves optimization on the timing of point multiplication, this results in increasing the memory requirement. The scope of this patent is also limited to point multiplication.
United States Patent No. 6252959 by Paar et al discloses a method of point multiplier implementation that reduces the number of point doubling operations. The method achieves timing optimization. However, it requires more memory. The scope of this patent is also restricted to optimizing the point multiplier operation.

9
United States Patent No. 6671709 by Glaser et al discloses a hardware multiplier architecture for use in public key cryptosystems. The multiplier cells are used for modular polynomial basis multiplication. The scope of this patent is restricted only to optimization of the multiplier module, i.e., step HI.
United States Application No. 2003/01522188 by Coron et al discloses optimization of the point multiplication step by storing some pre-selected set of pairs (ki, Pi) where Pi = kiG. These pre-selected pairs are then used to generate a new random pair (k, kG) through the use of the Frobenious operator on a Koblitz curve. The optimization is a reduction in the number of elliptic operations for performing point multiplication.
United States Patent Application No. 2003/0194086 by Lambert discloses a method for strengthening ECDSA against power analysis. The method introduces noise by using random values during the generation of the signature. Specifically, the private key d is split into two parts that are updated during each iteration in such a way that they sum to d. These parts are used instead of d in the computation of the signature. The point multiplication (step HI) is also randomized to prevent leakage of the per-message secret k.
United States Patent Application No. 2001/004872 by Handschuh discloses an attack based on differential power analysis to recover the private key 'd' during operations of the form dP for varying point P on the elliptic curve. A counter-measure to this attack is also proposed wherein the value dP is computed indirectly through the use of a random number of same magnitude as 'd'.
United States Patent Application No. 2004/0091105 by Ho Won Kim et al discloses an apparatus for hyperelliptic curve crypto processing. The entire crypto algorithm is implemented in hardware which may imply higher

10
area/hardware requirement. Parallelism is used in this implementation with respect to inversion operation with step IV and the point multiplication, i.e., step in, in hardware.
United States Patent Application No. 2004/0001590 by Eisentraeger et al discloses an implementation of the double and add operation in point multiplier computation. It points out several optimizations in field multiplication, squaring and point additions which are part of the point multiplication unit. The scope of this patent is restricted to optimizing the point multiplication operation.
United States Patent No. 6,738,478 by Vanstone et al discloses implementation of the point multiplication using Montgomery method in signature generation. It stresses the fact that functional 'double and add' method for point multiplication is susceptible to power signature attacks. As 'double and add' involves multiply operations which generate unique power signatures, observation of these signatures will lead to sequences of l's and 0's thus deriving the scalar value k.
SUMMARY OF INVENTION
In the present invention, the algorithm is partitioned between the hardware and software. This approach optimizes on both timing and gate count. It provides significant benefits over a purely hardware or software implementation. For the invention, the steps in the ECDSA, described in the previous section, are modified to a certain extent.
We follow an approach that uses a combination of hardware and software components. The point multiplication step is implemented in hardware while all the big-integer arithmetic (multiplication, inversion and reduction) are implemented in software. By performing most of the software operations in parallel with the hardware, we achieve near equal performance of a full

11
hardware implementation. The parallelism allows us to effectively reduce the number of big-integer multiplications and reductions, counted towards the time to compute a signature, from two to one. Thus, the software execution time beyond the execution in hardware is reduced. This enables the total performance to be closer to the performance of a full hardware implementation, but reduces the hardware requirement itself. The implementation of some of the components in software allows us to reduce the gate count. Parallelism also lets us reduce the code size. This is enabled by the fact that the time budget for software components is more than adequate, facilitating code optimization rather than time optimization.
It is also an object of this invention to provide security from Power Analysis attacks during the point multiplication step. The power consumed by the hardware is effectively masked by the power used by the software executing in parallel.
The invention disclosed herein is a crypto sub-system that could be applied in systems such as smartcards, desktops, servers etc. The hardware-software partitioned sub-system of the invention as depicted in Fig 1 (described below) would be common to all systems. The manner in which this sub-system would be integrated with the total system would depend on the system. For instance, in a smartcard the sub-system would be integrated within a System on Chip or , SoC, which would then be embedded in a card; in a server the sub-system would typically be an add-on card which would sit on the system bus.
BRIEF DESCRIPTION OF DRAWINGS
The invention is now be described with reference to the accompanying figures, which are

12
Fig. 1 is a schematic diagram of the proposed ECDSA architecture showing the
hardware-software partitioning between the microcontroller and the hardware
accelerator and the I/O interface between them.
Fig. 2 is a flowchart describing the flow of execution of the algorithm
highlighting the parallel paths and the steps that are executed in hardware and
software.
Fig. 3 is the top-level block diagram of the point multiplication in hardware.
Fig. 4 shows the full unrolling of the F(U,V) loop with the necessary logic levels required for final output.
Fig. 5 shows the F (U,V) loop unrolling meeting the technology limitations and logic reuse requirements.
DESCRIPTION OF THE PREFERRED EMBODIMENT
We describe the invention on a platform, which consists of the 8051 processor, a hardware accelerator and the I/O interface between them. Figure 1 shows the architecture along with the interface between the hardware and software components. 8051 microcontroller has been used for the sake of specific example and it is not to be construed as a limitation of the invention.. The same principles apply for any other microcontroller.
The hardware accelerator consists of a set of control and status registers and the hardware logic for performing the elliptic curve point multiplication algorithm. It is memory-mapped into the 8051 address space. 8051 communicates with the accelerator by selecting the assigned memory address and performing a read or write of the control / status registers. Depending on the commands written into the command registers the hardware logic is invoked for performing a hardware operation. Status of execution of the hardware is ascertained by reading the status registers.

13
Referring to the standard signature generation algorithm described earlier, the main embodiments of implementation of the invention are the following:
• Since steps (III) and (TV) are independent, compute the two in parallel. Step (DT) being computationally more intensive, implement the same in hardware.
• Sub-divide step (V) into the following

a) Compute z2 = zl*d mod n
b) Compute z3 = zl*e mod n
c) Compute s = (z3 + z2*r) mod n
Step (a) and (b) are, once again, independent of step (ID). These could be performed in parallel with step (HI), but in sequence with step (IV). Only step (c) needs to be performed subsequent to completion of step (TO).
While the traditional approach to ECDSA uses two multiplications and two reductions, this approach requires three multiplications and three reductions. However, two of these are executed in parallel with the hardware and do not count towards the total time to generate a signature. This enhances the performance of the algorithm.
Thus, the steps involved in ECDSA as modified for this invention are:
A. Compute the message digest e = SHA-1 (t).
B. Generate a random number k, 0 < k < n.
C. Compute the value kG = (x1, y1). Let r = x1 mod n. If r = 0, go to B.
D. Compute zl = k -1 mod n.

14
E. Compute z2 = zl*d mod n
F. Compute z3 = zl*e mod n
G. Compute s = (z3 + z2*r) mod n. If s = 0, go to B.
H. Output the signature (r, s).
These steps are computed in a parallelized form as shown in Figure 2. The hardware-software partition is also depicted in the figure.
In the description hereafter, wherever the above modified algorithm is referred, the steps are referred as A to H as described above.
The message digest e in step (A) is assumed to be computed externally and is given as input to the microcontroller.
The random number in step (B) is generated using linear feedback shift registers. The feedback taps are chosen to make the resulting sequence a maximum-length sequence. This step is performed in software on the 8051.
In step (C), we implement the double-and-add method for elliptic curve point multiplication algorithm to compute kG using k generated in step (B). This is an expensive step and is thus done in hardware. The top-level block diagram of this operation is shown in Figure 3.
Field multiplication is the most frequently used operation in step (C). It contributes the highest to the timing of this step. This operation is computed as follows
Let A(a0, a1,..., am-1) and B(b0, b1,..., bm-1) ε F2m
Then A * B = C(co, c1,... cm-1) where each ci is computed as follows.
Step 1: U(u0, u1,..., Um-1) = A(a0, a1,..., am-1)

15
Step 2: V(v0, v1 ..., vm-1) = B(b0, b1,... ,bm-1)
Step 3: for k = 0 to m-1 do
Step 3.1: ck = F(U,V)
Step 3.2: U = LCS( U) and V = LCS( V)
where LCS is the 'left cyclic shift' operation. Step 4: return C = (c0, c1,..., Cm-1)
Where
p-2
F(U,V) = Σ UJ (K+1) &V J(P-K)
k=l
UJ(k+1) is J(K+l)th bit of U and
VJ(p-k)is J(P-K)th bit of V
p is computed by the formula p=mT+l, where T is the minimal type of the Gaussian bases for F2m and m is the order of the binary field. For m=163, T is 4. Therefore, p=653. J is an array of size (p-1) that is static and could be pre-computed.
The computation of F(U,V) is the innermost loop of point multiplication operation. Thus, this contributes the highest to the timing of the operation. The performance could be improved by fully unrolling the F(U,V) loop. A series of 653 AND gates could be designed with hardwired U and V bits as inputs, as per the lookup table J. The results could be XORed over multiple stages, assuming only a 2-input XOR gate availability. An observation of the F(U,V) computation reveals that each bit of U is ANDed with 4 specific bits of V. Fig. 4 shows the unrolling with the common signals being shared.
However, the number of gates to implement a full unrolling will be high. Further, the gate delays will limit the number of logic levels that could be completed within a clock cycle. This would lead to timing violations. Hence,

16
instead of a full unrolling, a lesser amount of unrolling meeting the limitations of the technology needs to be implemented. For illustration, for Xilinx Virtex-II FPGAs, a 163-fold unrolling has been tested by the inventors. The logic of Fig. 4 is used on the reduced extent (163-fold) of unrolling. This logic is reused for the 4 iterations, with different U inputs. Thus, the extent of unrolling also needs to be determined by the amount of logic that can be expended (or the reuse required). Fig. 5 depicts the final logic that is used to compute F(U,V). Additional blocks are added, at the ends, to complete the field multiplication operation.
Step (D) uses the Extended Euclidean Algorithm (see Algorithm 14.61 in Menezes, A. J., van Oorschot, P. C, and Vanstone, S. A. Handbook of Applied Cryptography. CRC Press, 1997 for a description) for computing the inverse of k mod n. This is implemented in software. Steps (E) and (F) use multiplication and modular reduction of big integers to compute (zl*d mod n) and (zl*e mod n), respectively. These steps are also implemented in software. We use Barrett's technique for performing modular reduction (see Algorithm 14.42 in Menezes et al cited above for a description). This avoids the expensive division procedure by using a pre-computed value to translate a modular reduction to two multiplications.
The main characteristic of the implementation under this invention is to execute the most expensive step of the algorithm, step (C), in hardware. Simultaneously, we identify many subsequent steps that are independent of this (step C) and execute them in parallel, in software, with this step. In fact, execution time of step (C) provides a time budget for the steps that could be performed in software (steps D, E and F). Since the time budget is on the higher side, the dimension for optimization for these steps is to reduce the code size, rather than the execution speed.

17
Step (G) consists of big-integer addition, multiplication and reduction modulo n. These are performed in software.
IMPLEMENTATION RESULTS
The implementation was tested on a 163-bit elliptic curve over a binary field. This is a curve recommended by National Institute of Standards and Technology (NIST) and meets the criteria that define a cryptographically strong curve. (For the curve parameters, see the description of the curve B-163 in Johnson, D., Menezes, A., The Elliptic Curve Digital Signature Algorithm (ECDSA), Technical Report CORR 99-34, August 1999).
The following table presents the time taken, code size and the gate count (where applicable) for the various components of the algorithm.

Algorithm Time Code Size Gate Count(ms) (KB)
Random number generation (step B) 10 0.1 Not Applicable
Big integer Multiplication (steps E, F and G) 8.5 0.2 Not Applicable
Modular Reduction (steps E, F and G) 14.5 0.3 Not Applicable
Big integer Inversion (step D) 120 0.3 Not Applicable
Elliptic Curve Point Multiplication(step C) 275 N/A 50,000
Table 1: Implementation Statistics for ECDSA.
With the above results, the total time for generation of a signature is 308 ms.
The hardware accelerator was run at a clock frequency of 10 MHz. This could be increased to obtain better performance. According to the design of the

18
invention the clock could be increased till the execution time for elliptic curve point multiplication falls to 166 ms. Then the total time for generation of a signature would be 199 ms. Beyond this limit, big integer operations will limit the performance, unless they are optimized.
POWER ANALYSIS ATTACKS
The security of our implementation against Power Analysis attacks is achieved by the masking of power utilized in the point multiplication step. Power Analysis attacks on cryptographic systems study the power consumed during the cryptographic operation of interest to deduce some secret information such as the user's private key ( for an overview refer Kocher P., Jaffe Joshua and Jun Benjamin, Differential Power Analysis; in Advances in Cryptology - CRYPTO '99, LNCS 1666, Springer-Verlag, Berlin 1999, 38&—399.)
This is especially the case with products such as smartcards. In simple power analysis (SPA), the attacker directly correlates the power consumed to the instruction performed by the processor. In Differential Power Analysis (DPA), the attacker uses statistical analysis to extract the hidden information from a large sample of power traces. While no DPA attacks have been published for ECDSA, the point multiplication step can be subject to an SPA attack (refer Coron J. S., Resistance Against Differential Power Analysis for Elliptic Curve Cryptosystems, Cryptographic Hardware and Embedded Systems, vol. 1717 of Lecture Notes in Computer Science, 292—302, Springer-Verlag, 1999) as described below.

19
The Double-and-add algorithm for point multiplication is
Input: G, k with binary representation (kj-i,..., ki, ko).
Output: The point k.G
Q←G
For i from j-2 down to 0 do
If ki=l then Q←Q + G. Output: Q.
From an analysis of the power consumed during this algorithm, it may be possible to distinguish between point addition and point doubling thus revealing the bits of k. Using k, the attacker can recover the private key d from the equation s = k-1(e + dr) mod n.
The article (referred above) describes a modification of the Double-and-add algorithm that is resistant to SPA attacks and studies possible DPA attacks on this modified algorithm.
In the implementation of this invention, the point multiplication (step C) is performed in parallel with the inverse operation (step D) and modular operations (steps E and F). As a result, the power consumed during step C is masked by the power used by steps D, E and F which cannot be quantified as it depends on the random number k. This renders the implementation of this invention secure against such attacks.
The solution proposed is an inexpensive option when compared to the prevailing implementations of ECDSA without compromising performance. This makes our approach an attractive and competitive alternative to existing implementations.

20
While particular embodiments of the invention are described here, the present invention however is not limited to any particular application or environment. Modifications and variations may occur to those skilled in the art. The description of the exemplary embodiment is for the purpose of illustration and not a limitation.
INDUSTRIAL APPLICATIONS
The technology outlined above finds a number of applications related to smart cards. Some specific areas where our technology can be applied are:
1. Digital Signatures through Smart Cards: A smart card employing our implementation of ECDSA can be used for secure signing of electronic documents such as tax forms, airline reservations etc.
2. Authentication of connection to a remote host: Certain web transactions such as banking and e-commerce need to be authenticated at the server-end. Traditionally, this has been achieved by establishing an SSL connection between the client and server using RSA. Using ECC reduces key sizes without compromising security and increases the volume of traffic that can be handled by the server. With our technology, a user can authenticate herself, more efficiently, over the web using her ECC key pair and set up a secure session.
3. Key Generation: Our technology can also be used for the secure generation of a public/private ECC key pair. The private key is stored inside the card and never leaves the card thus providing the most secure storage of private keys. The public key is output to the terminal that the card is attached to and is used for generating a certificate.

21
4. Symmetric Key Generation: Using public key cryptography to encrypt messages is usually inefficient compared to symmetric key techniques. For this reason, when two parties want to set up a secure communication channel, they use their public/private key pairs to generate a symmetric key through some session key generation protocol such as Elliptic Curve Diffie-Hellman key exchange. Our ECDSA technology can be adapted to facilitate this session key generation.

22
We claim;
1) A method and a system for implementing the Elliptic Curve Digital Signature Algorithm comprising a microcontroller and the hardware accelerator executing the algorithm in parallel, with each unit executing predetermined steps of the algorithm.
2) A method and a system as claimed in claim (1) which generates digital signatures based on Elliptic Curve Digital Signature Algorithm, by using the full algorithm for the scope for optimization.
3) A method and a system for secure processing of digital signatures by implementing Elliptic Curve Digital Signature Algorithm as claimed in claims (1) to (2) comprising
a) The Elliptic Curve Digital Signature Algorithm modified for
parallel implementation as below
A. Compute the message digest e = SHA-1 (t).
B. Generate a random number k, 0 < k < n.
C. Compute the value kG = (x1, y1). Let r = x1 mod n. If r = 0, go to B.
D. Compute zl = k -1 mod n.
E. Compute z2 = zl*d mod n
F. Compute z3 = zl*e mod n
G. Compute s = (z3 + z2*r) mod n. If s = 0, go to B.
H. Output the signature (r, s).
b) A hardware accelerator for point multiplication step C of the above
algorithm.

23
c) A microcontroller processing in parallel the remaining steps of the algorithm such as multiplication, inversion and reduction in software (steps B, D, E, F).
4) A method and a system as claimed in claims (1) to (3) wherein, the algorithm is implemented in parallel, reducing one multiplication and one modular reduction counted towards the generation of a signature with respect to a full sequential execution of steps D, E, F and G, after step C.
5) A method and a system as claimed in claims (1) to (4) wherein, the gate count is reduced with respect to an implementation of the whole algorithm in hardware.
6) A method and a system as claimed in claims (1) to (5) wherein, the code of the software component is optimized for size instead of execution time.
7) A method and a system as claimed in claims (1) to (6) wherein, the steps of the algorithm implemented in software, steps D, E and F, have processing time equivalent to the execution of point multiplication in the hardware accelerator.
8) A method and a system as claimed in claims (1) to (7) which employs limited unrolling of the loops in the computation of the F(U,V) function and reuse of the unrolled logic, to reduce the gate count and the logic levels that need to be completed within a single clock cycle.
9) A method and a system as claimed in claims (1) to (8) which is secured against power analysis attacks wherein, the power consumed during the

24
point multiplication operation in hardware controller is masked by the software instructions executing in parallel.
10) A method and a system as claimed in claims (1 to 9) which can be applied to smartcards by integrating it within a System on Chip and embedding the said system in a card.
11) A method and a system as claimed in claims (1 to 10) which can be applied to servers by incorporating it through an add-on card which would sit on the system bus.
12) A method and a system as claimed in claims (1) to (11) which could be used to generate the public-private key pair for Elliptic Curve Digital Signature Algorithm.
13) A method and a system as claimed in claims (1) to (12) which could be used to generate symmetric keys, per session, for bulk encryption, using Diffie-Hellman algorithm.
Mumbai, Dated this day of March ,2005

( Dr. Avinash Shivade ) P.A. for the Applicant

25
ABSTRACT
The Elliptic Curve Digital Signature Algorithm (ECDSA) is implemented on a platform, which consists of a microcontroller and a hardware accelerator. The algorithm is implemented partly in software and partly in a hardware accelerator by partitioning the steps of the algorithm. The partitioning is designed to extract independent steps in the algorithm, which are executed in parallel, in hardware and software. Hardware-software partitioning is used to reduce the gate count, as compared to a full implementation of the algorithm in hardware. It also achieves better performance as compared to a full software implementation. Parallelism reduces the number of big-integer multiplications and modular reductions from two to one, in the overall time to generate a signature, thus bringing the performance closer to that of a full hardware implementation. The power consumed by the hardware component is masked by the software, thus securing the point multiplication step against power analysis attacks.
F3 JUN2005

Documents

Application Documents

#	Name	Date
1	664-mum-2005-correspondence 2(24-12-2007).pdf	2007-12-24
2	664-MUM-2005-FORM 18(15-04-2008).pdf	2008-04-15
3	664-MUM-2005-FORM 18-(08-05-2008).pdf	2008-05-08
4	664-MUM-2005-FORM 4(ii) [16-08-2017(online)].pdf	2017-08-16
5	664-MUM-2005-OTHERS [16-11-2017(online)].pdf	2017-11-16
6	664-MUM-2005-FER_SER_REPLY [16-11-2017(online)].pdf	2017-11-16
7	664-MUM-2005-DRAWING [16-11-2017(online)].pdf	2017-11-16
8	664-MUM-2005-CORRESPONDENCE [16-11-2017(online)].pdf	2017-11-16
9	664-MUM-2005-COMPLETE SPECIFICATION [16-11-2017(online)].pdf	2017-11-16
10	664-MUM-2005-CLAIMS [16-11-2017(online)].pdf	2017-11-16
11	abstract1.jpg	2018-08-09
12	664-mum-2005-form-5.pdf	2018-08-09
13	664-mum-2005-form-3.pdf	2018-08-09
14	664-mum-2005-form-26.pdf	2018-08-09
15	664-mum-2005-form-2.pdf	2018-08-09
17	664-mum-2005-form-13.pdf	2018-08-09
18	664-mum-2005-form-1.pdf	2018-08-09
19	664-mum-2005-form 3(11-8-2006).pdf	2018-08-09
20	664-MUM-2005-FORM 26(6-8-2009).pdf	2018-08-09
21	664-mum-2005-form 2(title page)-(provisional)-(3-6-2005).pdf	2018-08-09
22	664-mum-2005-form 2(title page)-(complete)-(25-5-2006).pdf	2018-08-09
23	664-mum-2005-form 2(complete)-(25-5-2006).pdf	2018-08-09
24	664-mum-2005-form 18(8-5-2008).pdf	2018-08-09
25	664-mum-2005-form 18(15-4-2008).pdf	2018-08-09
26	664-MUM-2005-FORM 13-(8-05-2008).pdf	2018-08-09
27	664-MUM-2005-FORM 13(8-5-2008).pdf	2018-08-09
28	664-mum-2005-form 13(25-5-2006).pdf	2018-08-09
29	664-MUM-2005-FER.pdf	2018-08-09
30	664-mum-2005-drawings.pdf	2018-08-09
31	664-mum-2005-drawing(25-5-2006).pdf	2018-08-09
32	664-mum-2005-description(complete)-(25-5-2006).pdf	2018-08-09
33	664-mum-2005-description (provisional).pdf	2018-08-09
34	664-mum-2005-correspondence-send.pdf	2018-08-09
35	664-mum-2005-correspondence-received-ver-250505.pdf	2018-08-09
36	664-mum-2005-correspondence-received-ver-030605.pdf	2018-08-09
37	664-MUM-2005-CORRESPONDENCE(6-8-2009).pdf	2018-08-09
38	664-MUM-2005-CORRESPONDENCE(27-4-2009).pdf	2018-08-09
39	664-MUM-2005-CORRESPONDENCE(16-7-2015).pdf	2018-08-09
40	664-mum-2005-correspondence 3(25-3-2008).pdf	2018-08-09
41	664-mum-2005-correspondence 1(15-4-2008).pdf	2018-08-09
42	664-mum-2005-claims.pdf	2018-08-09
43	664-mum-2005-form-1.pdf	2018-08-09
44	664-mum-2005-claims(25-5-2006).pdf	2018-08-09
44	664-mum-2005-form-13.pdf	2018-08-09
45	664-MUM-2005-ANNEXURE TO FORM 3(16-7-2015).pdf	2018-08-09
46	664-mum-2005-abstract.pdf	2018-08-09
46	664-mum-2005-form-2.pdf	2018-08-09
47	664-mum-2005-form-26.pdf	2018-08-09
48	664-mum-2005-abstract(25-5-2006).pdf	2018-08-09
48	664-mum-2005-form-3.pdf	2018-08-09
49	664-MUM-2005-HearingNoticeLetter.pdf	2019-04-26
49	664-mum-2005-form-5.pdf	2018-08-09
50	664-MUM-2005-Correspondence to notify the Controller (Mandatory) [22-05-2019(online)].pdf	2019-05-22
50	abstract1.jpg	2018-08-09
51	664-MUM-2005-CLAIMS [16-11-2017(online)].pdf	2017-11-16
51	664-MUM-2005-Written submissions and relevant documents (MANDATORY) [18-06-2019(online)].pdf	2019-06-18
52	664-MUM-2005-COMPLETE SPECIFICATION [16-11-2017(online)].pdf	2017-11-16
52	664-MUM-2005-MARKED COPIES OF AMENDEMENTS [18-06-2019(online)].pdf	2019-06-18
53	664-MUM-2005-FORM 13 [18-06-2019(online)].pdf	2019-06-18
53	664-MUM-2005-CORRESPONDENCE [16-11-2017(online)].pdf	2017-11-16
54	664-MUM-2005-DRAWING [16-11-2017(online)].pdf	2017-11-16
54	664-MUM-2005-AMENDED DOCUMENTS [18-06-2019(online)].pdf	2019-06-18
55	664-MUM-2005-FER_SER_REPLY [16-11-2017(online)].pdf	2017-11-16
55	664-MUM-2005-PatentCertificate20-09-2019.pdf	2019-09-20
56	664-MUM-2005-IntimationOfGrant20-09-2019.pdf	2019-09-20
56	664-MUM-2005-OTHERS [16-11-2017(online)].pdf	2017-11-16
57	664-MUM-2005-FORM 4(ii) [16-08-2017(online)].pdf	2017-08-16
57	664-MUM-2005-RELEVANT DOCUMENTS [30-03-2020(online)].pdf	2020-03-30
58	664-MUM-2005-FORM 18-(08-05-2008).pdf	2008-05-08
58	664-MUM-2005-RELEVANT DOCUMENTS [25-09-2021(online)].pdf	2021-09-25
59	664-MUM-2005-FORM 18(15-04-2008).pdf	2008-04-15
59	664-MUM-2005-RELEVANT DOCUMENTS [30-09-2022(online)].pdf	2022-09-30
60	664-mum-2005-correspondence 2(24-12-2007).pdf	2007-12-24
60	664-MUM-2005-RELEVANT DOCUMENTS [28-09-2023(online)].pdf	2023-09-28

Search Strategy

1	TACD_10-01-2017.pdf