Specification
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
8
THE PATENT RULES. 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
Method and System for Error Control Coding Using Expander-like Codes
Applicants
TATA Consultancy Services Limited
A company lncorporated in lndia under The Companies Act, 1956
Having address:
Nirmal Building, 9th Floor
Nariman Point. Mumbai 400021,
Maharashtra, lndia
AND
Indian Institute of Technology Bombay
An institute lncorporated in lndia under The institutes of Technology Act, 1961
Having address:
IIT Bombay, Powai, Mumbai 400 076,
Maharashtra, lndia
The following specification particularly describes the invention and the manner in which it is to be
performed.
FIELD OF THE INVENTION
The present invention relates to error control coding. Particularly the invention provides error
correction codes derived from projective geometry based graphs. More particularly the invention
provides method and system for generating error correction codes derived from higher
dimensional projective geometry based graphs.
BACKGROUND OF THE INVENTION
Retention of information in form of data is one of the core functions of the modern computer, and
is provided by storage. Hence storage is a fundamental component of all modern computing
systems. Now-a-days data storage commonly refers to mass storage, more particularly optical
storages such as CD-ROM and DVD-ROM or magnetic storage like hard disk drives etc. In
architecture parlance, such components are generally called secondary storage.
Disc storage is a general category of secondary storage mechanisms, in which data are digitally
recorded by various electronic, magnetic, optical, or mechanical methods on a surface layer
deposited of one or more planar, round and rotating platters. A disk drive is a device
implementing such a storage mechanism with fixed or removable media. Internal hard disc drives
are examples of fixed media, while CD-ROMs are example of removable media. In modern times.
it is the disc storage that is the most popular way of implementing data storage unlike the tape
storages in past.
The two popular methods of recording digital data on discs are magnetic and optical recording.
Magnetic recording refers to the storage of data on a magnetized medium. Magnetic storage uses
different patterns of magnetization in a magnetizable material to store data, and is a form of nonvolatile
memory. Hard. disk drives, commonly found in a computer's CPU, is one example of
magnetic recording. On the other hand. Optical Recording is encoding of binary digital data in the
form of pits (point which lacks reflection when read) and lands (point that reflects when read) on a
special material surface. The encoding material (e.g.. Aluminum) sits atop a flat surface of thicker
substrate (~suallyp olycarbonate), which makes up the bulk of the disc. CD-ROM and DVD-ROM
are the most prominent examples of optical disc recordings, while Laser Disc, Magneto-optical
disc, Universal Media Disc, Blue-ray Disc, HD Disc, Holographic and protein-coated discs are
other lesser known examples of optical recordings. The encoding pattern for most magnetic or
optical recordings follows a continuous, spiral path covering the entire disc surface and extending
from the innermost track to the outermost track.
Although discs are more dureble than earlier storage mechanisms such as tapes, they are
susceptible to environmental and daily-use damage. Unlike the now-obsolete 3.5-inch floppy disk,
most removable media such as optical discs do not have an integrated protective casing and are
therefore susceptible to data transfer problems due to scratches, fingerprints, and other
environmental problems such as dust speckles. These data transfer problems, while the data is
being read, manifests itself in form of bit errors in the digital data stream. Even mechanical issues
such as vibration due to occasional high rotational speeds of disc motors also produce
undesirable noise, and hence bit errors occur in fixed as well as removable media. A long
sequence of bit read errors while a track is being read (e.g. a scratch on a track) can be
characterized as burst error, while bit read error arising out of tiny dust speckle masking limited
number of pits and lands on a track leads to random error. The occurrence of such events
obviously not being rare, recovery of data to maximum extent in presence of such errors is an
essential subsystem within moit computing systems, such as CPU and disc players.
To achieve the data recovery caused by above said errors there is a need for efficient error
correction coding. However, the existing methods and systems are capable of correcting bit
errors to some extent but they are not efficient enough to correct burst error. Some of the error
correcting systems and methodologies which form the prior art are given below:
US6842872 to Yedida, et al. provides a method for evaluating and optimizing an error-correcting
code to be transmitted through a noisy channel and to be decoded by an iterative messagepassing
decoder. Yedida, et al. teaches the representation of error 6.151
Expander codes being a subclass of LDPC codes, for whose iterative decoding using variables
and constraints a bipartite graph is required, we are interested mainly in bipartite expander graph.
More specifically, a general (unbalanced) bipartite expander graph is a (c, d, E, 6) expander if it is
a (c,d)-regular bipartite graph in which every subset of at most an E fraction of the c-regular
vertices expand by a factor of at least 6.
The degree of "goodness" of expansion, especially for regular graphs, can also be measured
using its eigenvalues. The lalgest eigenvalue of a k-regular graph is 'k'. If the second largest
eigenvalue is much smaller than 'k', then the graph is known to be a good expander.
Construction of Ex~andeCr odes
It is well known that a randomly chosen (c,d)-regular graph will be a good expander with high
probability. A deterministic construction of good expander graph, that further leads to construction
of good expander codes is by considering the edge-vertex incidence graph B of a d-regular graph
G. The edge-vertex incidence graph of G = (V,E), a (2,d)-regular bipartite graph, has vertex set E
U V and edge set
((e.v) E E x V : v is an endpoint of e)
Referring to Figure 3, vertices of B corresponding to edges E of G are then associated to
variables (302), while vertices of B corresponding to vertices of G are associated to constraints
(304) on these variables. Each constraint corresponds to a set of linear restrictions on the d
variables that are its neighbors.
In particular, a constraint will r~,?quirteh at the variables it restricts form a codeword in some linear
code of length d. Further, all the constraints are required to impose isomorphic codes on different
variables. The default construction of expander codes requires d to remain constant as the order
of G increases. In invented construction, d will also increase with increase in order of G (hence
the term, 'expander-like', instead of expander). However, all important properties and advantages
of using specific expander codes for various applications still remain intact.
Formally, let B be a (2,d)-regular graph between set of n nodes called variables, and 2/d' n nodes
called constraints. Let b(i,j) be a function such that for each constraint Ci, the variables
neighboring Ci are v~(~,,,., . - , v~(~,L~et, .S be an errorcorrecting code of block length d. The
expander code C(B, S) is the code of block length n whose codewords are the words (xl. . . . ,
xn) such that, for 1 5 i < 2td' n, &,i,r,, . . , xbli., is a codeword of S.
Good Ex~andeCr odes
As pointed out earlier, the decoding algorithm for such codes is iterative. Hence good expander
codes imply at least the following properties:
. Better minimum distance (hence larger error-correction capability) than other codes of
same length.
Faster convergence, and
Better code rate than other codes in the same class
Construction of good codes having the above said properties is described below, Construction
makes use of three theorems. We only state these theorems before giving the construction
details. For woofs of the theorems refer literature.
We assume that an expander code C(B, S) has been constructed having S as a linear code of
rate r, block length d, and mir?imum relative distance E, while B as the edge-vertex incidence
graph of a d-regular graph G with second-largest eigenvalue A.
Theorem 1 The code C(B, S) constructed as above has rate at least 2r - I , and minimum relative
distance at least
Theorem 2 If a parallel decoding round for C(B. S) is given as input, a word of relative distance a
from a codeword, then it will output a word of relative distance at most equal to
from that codeword.
Theorem 3 For all E such that 1- 2H(E) > 0, where H(.) is the binary entropy function, there exists
a polynomial-time constructible family of expander codes of rate 1 - 2H(f) and minimum relative
distance arbitrarily close to E' in which any a < E2 I48 fraction of error can be corrected by a
circuit of size O(n log n) and depth O(log n).
From theorem 1, it is observed that to have high minimum relative distance for the expander
code, S should have high E and B should have low, Nd.. Since B has been constructed out of dregular
graph G, low Nd signifies high distance between first and second eigenvalues, i.e. the
graph G has to be a "good" expander graph. Further, to have high rate for the expander code, S
has to have a high rate r as well, besides having the requirement of having high minimum
relative distance €. :
From theorem 2, it is observed that to shrink the relative distance of input word from the
codeword after one iteration maximally, we need to again have E as high as possible and Md as
low as possible. Such maximal shrinking of distance, per iteration, leads to the fastest
convergence possible, and is also brought out in the proof of theorem 3 which has been given in
previous literature.
From theorem 3, it is observed that to be able to correct as high fraction of errors as possible, it is
required to have E as high as possible..
RS Codes as Good Component Codes
By choosing a "good expander" graph, and fixing a code with high minimum relative distance E,
one can design code, having the first two properties described earlier. Simultaneously, to have
high code rate for C(B, S), the component code S also needs to have high rate r. Reed-Solomon
codes are a class of non-binary, linear codes, which for a given rate, have the best minimum
relative distance (so-called paximum distance separable codes), and vi ce-versa. The code
parameters for the RS codes are given by (n, k, n - k + 1).
It had been observed that earlier definition of expander code requires 'd' to remain constant as 'n'
increases. In the present construction. 'd' increases as 'n' increases. However, it is clear from
statement of theorems 1 and 2 that higher value of 'd' leads to better properties of the code C(B,
S). However, such codes may not be called expander codes in wake of definition of these, but
just graph-based, or expander-like codes.
PG Graphs. Ramanuian Graphs and Good Expander Graphs
The construction of expander codes makes use of an unbalanced bipartite graph B made out of a
d-regular graph G. Zemor pointed out that fi G is a regular bipartite graph, then the % of errors
that can be corrected using a parallel iterative decoding algorithm can be increased twelve-fold.
Further, he reasoned out throbgh theorem 1 that to achieve good minimum relative distance for
the expander code graph G should be a Ramanujan graph with the property: A (second-largest
eigenvalue) s 2- .
However, it should be noted that (a) Approximately half the constructions of the bigger class of
Ramanujan graphs lead to bipartite regular graphs, and (b) Using bipartite regular graphs as G
leads to twelve-fold improvement in error correction capability. Hence it is imperative that one
focuses on using Ramanujan graphs for construction of good expander codes instead.
Construction of PG-araph based Expander-like Codes
In one embodiment of the invention to construct an expander-like code, Zemor's Construction has
been followed. The present invention uses Zemor's construction. which is based on a d-regular
balanced bipartite graph. G=(V,E). The set V is divided into two sets A and B, with IAl = IBI = n
such that every edge has one'endpoint in A and another in B. For any vertex t, the set of edges
incident on t is denoted by E,. As the graph is bipartite, the sets E, vr E ~ f l tt; 't E A induce a
partition on E. A similar partition can be created using the edge sets of the vertices belonging to
B. The expander code, C(G, S) is constructed by treating the edges of G as variables and the
vertices as constraints for a binary component code S. The block length of code C is N = n d. As
before, the second largest eigenvalue of G is denoted by A. The steps in one decoding round of
the algorithm suggested by Zemor are as follows:
Each constraint t in set A completely decodes the sub-vector associated with the set of d
variables, E,, and replaces it with the closest codeword in S. This step can be carried out
in parallel by all constraints in A as no symbol is shared between two constraints.
The constraints in set B replace the sub-vector associated with its edge sets, E,, t E Bt .
with the closest codeword in S. This again can be carried out in parallel by all constraints.
In another embodiment of the invention we use the following two properties of projective space of
dimension d over GF(s), namely P(d.GF(s)) where s = pk, k being a positive integer for improving
the error correction properties of the the said code beyond Zernor bound.
1. The number of subspaces of dimension m is equal to the number of subspaces of
dimension d - m - 1
2. The number of m-dimensional subspaces incident on each d -m- 1-dimensional
subspace is equal to the number of d - m - ldimensional subspaces incident on each
m-dimensional subspace.
In another embodiment of the invention the above said two properties of projective subspaces
have been used to create balanced regular bipartite graphs under the condition q = 2, m=O,and
d>2 (point-hyperplane inciderce graphs). Point-hyperplane incidence graphs also satisfy the
eigenvalue properties that make it a Ramanujan graph.
In another embodiment of the invention the following properties of Projective space over finite
fields are used to count the number of hyperplanes and number of points on each hyperplane in a
general projective space over finite fields. The description is also extended to count the number
of points in an m-dimensional subspace and also to count the number of I-dimensional subspaces
in an m-dimensional subspace where I 0.5 is not possible, while simultaneously assuming subcode distance E> 3 * A
(Zemor's decoding constraint). For n = 5 onwards, while the rates for 2-d PG based
expander-like codes are better than their n-d counterparts by a factor of 2.5 or less, the
(%) error correction capability(and hence minimum relative distance) is consistently better
for the n-d counterparts (touching almost 90 times better). As a side note, this
observation also brings out the classical tradeoff between rate and distance (relative
distance) of any class I codes.
5. Since the degree of graph used are of nature (2" -I), RS codes can be used which are
not shortened. Whereas, in case of projective plane based graph codes, (2" + 1) almost
always leads to usage of shortened RS codes. Using shortened RS codes reduces the
code rate of the subcode, and hence the expander construction as well.
6. Since each decoder is RS decoder, it can also be used in sense of optimum erasure
code decoder, since RS codes are optimum erasure correction codes. Then, the burst
error correction capability gets doubled for each decoder.
Lcngbh of RS Subcodcs
Onlcr of Bipartite Graph
Longth of Ovcrnll Codc
Sccond Eigcnvulucs
Zcrnor's Constraint h m d Subcode min distnncc
Rntc of Ovcrull Codc
hlasi>num Err~rsC ormc~ed
Pcrcontnge Errors Co~rcctcd
Table 9 Comparison of n-dimensional vs 2-dimensional PG-based Expander-like Codes
41
Choice of Reed-Solomon Code (RS Code)
The parameters of the RS code have been chosen to enable efficient schemes that could be
applicable to data storage systems. In case of application to CD-ROM, since the smallest symbol
size is a byte. RS code of size 255 has been used.
The degree of each vertex in the bipartite graph used there being 31, the RS code size has to be
shorten to 31. This shortening is done by dropping the message symbols from 32 onwards. More
specifically, if the minimum distance is E, symbols 32 to 255- E +1 are assumed to be zero.
In turn, the minimum distances are chosen to get the overall rate of code to match the rate used
by applications such as the CD-ROM encoding scheme. Additional error correction capability is
achieved by adding a small amount of interleaving.
WE CLAIM
1. A method for error control coding for a digital storage device or packet data transmission;
the said method is characterized by use of a symbol error correcting code based on at
least one higher dimensional projective geometry graph followed by mapping the symbols
of the error correcting code to the edges of the said one or more graph, wherein the said
higher dimensional projective geometry based graph is generated by the computer
implemented steps of:
a. defining the projective space wherein the said projective space of dimension d
consists of one dimensional subspaces of a (d+l)-dimensional vector space and an
m-dimensional subspace of the projective space consists of all one dimensional
subspaces of a (m+l)-dimensional subspace of the vector space;
b, deriving the higher dimensional projective geometry based graph from the incidence
relations of said projective space, wherein the higher dimensional projective
geometry based graph is having at least two partitions and each partition is having at
least one vertex;
c. associating one vertex of the graph with each m-dimensional subspace and one with
each (d-m-1)-dimensional subspace of the said projective space wherein each vertex
of the graph has a fixed degree and the number of vertices in each partition of the
graph are equal;
d, connecting all vertices from one partition with other vertices from the other partition
by an edge if the corresponding subspaces are incident on each other; and
e. decoding all the symbols by at least one vertex in one partition of the higher
dimensional projective geometry based graph, wherein the symbols correspond to
the edges incident on the vertex; and coding for error control in a communication and
digital storage device by use of a symbol error correcting code based on at least one
derived higher dim?nsional projective geometry graph.
2. A method as claimed in claim 1, wherein the higher dimensional projective geometry
based graph is a bipartite graph.
3. A method as claimed in claim 1, wherein the error correcting code is an expander-like
code having reed soloinon (RS) code as component code.
4. A method as claimed in claim 1, the symbol error control coding is based on higher
dimensional projective geometry based graph, wherein the higher dimensional projective
geometry is at least two in numbers.
5. A method as claimed in claim 1, the symbol error correcting codes are provided for large
data blocks in digital storage device or packet data transmission wherein the length of
data blocks is at least one thousand symbols.
6. A method as claimed in claim 1, wherein the digital storage device is selected from the
set of disc based secondary digital storage devices comprising RAID systems, HDD, CDROM
and DVD-ROMs.
7. A method as claimed in claim 1, wherein the symbol error correcting code is employed to
detect and correct erroneous symbols of at least one said digital storage device.
8. A method as claimed in claim 1, wherein the symbolerror is selected from burst error or
random error.
9. A system for error control coding in a digital storage device or packet data transmission;
the said system comprising at least one decoding device, at least one digital storage
device and at least one memory'element communicatively coupled with each other.
wherein the said system is characterized in using a symbol error correcting code based
on at least one higher dimensional projective geometry graph followed by mapping the
symbols of the error correcting code to the edges of the said one or more graph, wherein
the said higher dimensional projective geometry based graph is generated by:
a. decoding device defining the projective space wherein the said projective space of
dimension d consists of one dimensional subspaces of a (d+l)-dimensional vector
space and an mdimensional subspace of the projective space consists of all one
dimensional subspaces of a (m+l)-dimensional subspace of the vector space;
b. decoding device deriving the higher dimensional projective geometry based graph
from the incidence relations of said projective space, wherein the higher dimensional
projective geometry based graph is having at least two partitions and each partition is
having at least one vertex:
c. decoding device associating one vertex of the graph with each m-dimensional
subspace and one with each (d-m-1)-dimensional subspace of the said projective
space wherein each vertex of the graph has a fixed degree and the number of
vertices in each partition of the graph are equal;
d, decoding device connecting all vertices from one partition with other vertices from
the other partition by an edge if the corresponding subspaces are incident on each
other; and
e. decoding device decoding all the symbols by at least one vertex in one partition of
the higher dimensional projective geometry based graph, wherein the symbols
correspond to the edges incident on the vertex; and coding for error control in a
communication a d digital storage device by use of a symbol error correcting code
based on at least one derived higher dimensional projective geometry graph.
10. A system as claimed in claim 9, wherein the said decoding device is a reed solomon (RS)
decoder.
11. A system as claimed in claim 10, wherein the said reed solomon (RS) decoder is having
ability to skip decoding if more than correctable errors are detected.
12. A system as claimed in claim 9, wherein the error correcting code is an expander-like
code having reed solomon (RS) code as component code.
13. A system as claimed in claim 10, wherein the reed solomon (RS) decoder are capable of
handling shortened reed solomon (RS) codes.
14. A system as claimed in claim 9, wherein the higher dimensional projective geometry
based graph is a bipartite graph;
15. A system as claimed in claim 9, the symbol error control coding is based on higher
dimensional projective geometly based graph, wherein the higher dimensional projective
geometry is at least two in number.
16. A system as claimed in claim 9, the symbol error correcting codes are provided for large
. data blocks in digital storage device or packet data transmission wherein the length of
data blocks is at least one thousand symbols.
17. A system as claimed in claim 9, wherein the digital storage device is selected from the
set of disc based secondary digital storage devices comprising RAID systems. HDD, CDROM
and DVD-ROMs.
18. A system as claimed in claim 9, wherein the symbol error correcting code is employed to
correct at least one error of at least one said digital storage device.
19. A system as claimed in claim 9, wherein the symbol error is selected from burst error or
random error.
20. A system as claimed in claim 9, wherein the system for symbol error control coding for
digital storage device or packet data transmission has a parallel and symmetric design.
21. A system as claimed in claim 9, wherein the memory element is utilized to store the
address space in the required order of computation.
22. A system and method substantially as herein described with reference to and as
illustrated by the accompanying drawings.