A System For Parallel Decoding Of Low Density Parity Check (Ldpc)

< Back

A System For Parallel Decoding Of Low Density Parity Check (Ldpc) Encoded Data

Abstract: A system for parallel decoding of low density parity check (LDPC) encoded data having length L in a bit stream, ‘L’ being given by the function (ps+1), where p is a prime number and s is any positive integer, said system comprising, means to select a block of the bit stream of said length L, said block containing a plurality of bits; a parity check matrix having a plurality of elements; a plurality of parity check nodes for implementing each row of said parity check matrix and to update to receive said updated information from said parity check nodes and having means to calculate the bit probability and further adapted to transmit the calculated bit probability data to said parity check nodes; a logic control means adapted to information from said parity check nodes to bit nodes until decoding is complete.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

05 February 2007

Publication Number

39/2008

Publication Type

INA

Invention Field

ELECTRONICS

Status

Parent Application

Patent Number

Legal Status

Grant Date

2016-11-22

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

BOMBAY HOUSE, 24, HOMI MODY STREET, MUMBAI 400 001,

Inventors

1. SHARMA HRISHIKESH

TATA CONSULTANCY SERVICES LIMITED ABHILASH BUILDING PLOT NO 96 EPIP INDUSTRIAL ARE WHITEFIELD ROAD BANGLORE 560 066

Specification

FORM - 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
PROVISIONAL
Specification
(See section 10 and rule 13)
HARDWARE SYSTEM OF DECODING OF LDPC-CODED
BITSTREAM IN BOTH WIRELESS AND WIRELINE
COMMUNICATION SYSTEM
TATA CONSULTANCY SERVICES LIMITED
an Indian Company
of Bombay House, 24, Homi Mody Street, Mumbai 400 001,
Maharashtra, India
THE FOLLOWING SPEC IFICATION DESCRIBES THE INVENTION.

Field of invention:
This invention relates to computer systems.
In particular, this invention relates to computer systems in both wireless and wire line communication systems.
Still particularly, this invention relates to a hardware system of decoding of LDPC-coded bitstream in both wireless and wireline communication systems.
In general, the system in accordance with this invention can be used in any decoder for communication systems where regular LDPC codes have been used. The decoder system exploits the projective geometry regularity in certain subset of Tanner graphs. However, following the industry practice of joint code-decoder design, the system can be used specifically with LDPC codes arising out of projective planes in mind. The decoder is fully generic in the sense that it can be used to decode codes of any length, provided the length is of the type (ps + 1), where p is a prime number and s any positive integer.
Background of invention:
Introduction:
It is known that digital data transmission suffers from channel influence such
as:
1. Severe (multipath) fading in terrestrial mobile radio communications.
2

2. Very low signal-to-noise ratio for satellite communications due to high path loss and limited transmit power in the downlink.
3. Compressed data (e.g. audio and video signals) is very sensitive to transmission errors.
Hence, a transmitted message especially over a long distance may not be received as it is sent. In order to make out meaning of what was sent, we need to, therefore, detect, and if possible, correct errors.
Wireless communication systems are especially prone to errors, and hence robust error-correction system at the receiving end almost becomes a necessity. In recent times, such wireless communication systems are widely deployed to provide different services. Popular technologies such as both ad-hoc networking (Wireless LAN) and infrastructure-based (3G) all require error-correction.
An approach to deal with error correction first encodes each unit of information (such as byte) into a unit (called word) derived from p-ary alphabet (binary alphabet with 0 and 1 is most used). Then the error detection or correction work at the receiver end becomes mapping the current unit of information received to the "closest" p-ary code, thus knowing the actual unit of information sent. This process is known as the encoding/decoding process in digital communication.
It was the invention of turbo codes and turbo product codes that moved the limit of achievable coding gain much closer to the Shannon limit for low and high code rates, respectively. However, the iterative nature of the decoding
3

algorithms for turbo codes and turbo product codes present significant implementation challenges. Search for better iterative decoding algorithms led to the (re-)discovery of low-density parity-check (LDPC) codes. These codes are simpler to decode when compared to other codes. For example, unlike turbo and turbo product codes, it is possible to decode LDPC codes using a block-parallel algorithm rather than a block-serial algorithm. Further, using the message-passing algorithm, LDPC decoders require an order of magnitude less arithmetic computations than equivalent Turbo decoders. Finally, it was also established that LDPC codes are asymptotically superior to turbo codes with respect to coding gain.
Simplicity of implementation of decoder coupled with better performance has made these codes candidates for usage in the recent generation of digital transmission systems. For example, in digital satellite broadcasting standard (DVB-S2), a long LDPC code has been chosen for error-correction in downlink data. A similar case is being made in wireless LAN technology, as has been mentioned earlier. Because of this thrust, optimal systems related to LDPC decoding have a very high potential of being deployed.
Summary of the Invention:
This invention envisages using parallel hardware architecture for a LDPC decoding system at the receiving end of a communication system.
4

Particularly, this invention envisages a method and apparatus for decoding LDPC-encoded bitstream and includes a system for improving the decoding time-performance and memory access efficiency for such bitstream.
The invention will now be described with reference to the accompanying drawings, in which
Figure 1 shows Tanner graph representation of the coding process;
Figure 2 shows the apparatus in accordance with this invention;
Figure 3 shows numbering and grouping for perfect pattern generation in the method of this invention;
Figure 4 shows pictorial representation of interconnect for p = 5;
Figure 5 shows group formations;
Figure 6 shows an adder in accordance with this invention
Figures 7 and 8 depict the check node architectures;
The apparatus envisaged in accordance with this invention works on a highly generic template. It is suitable for designing many parallel systems. The underlying structure of regular LDPC codes is isomorphic with the structure of the architectural template, and hence the envisaged novel decoder system substantially improves the performance.
In the prior art, most of the work has been done on designing semi-parallel architecture for such decoders. This includes decoders for both regular and irregular codes, though designs for regular codes are more to be found. Most of the work suggests a strong correlation between choice of the codes and
5

the decoder design. Hence the importance of specialized designs is quite evident. It was Andrew Blanksby and Chris Howland who pioneered a 1024-bit, rate 1/2 fully parallel generic decoder implementation. Their parallel architecture enables rapid convergence in the decoding algorithm to be translated into low decoder switching activity resulting in low power dissipation. For the synchronous execution, they-use registers in the datapath on each edge of the corresponding Tanner graph. Our design is better in the sense that we use (lesser number of) dual-port memories with scheduling such that there are no memory access conflicts. Thus not only the cost of manufacturing comes down, but also the resulting design is still fast enough compared to other semi-parallel designs.
The belief propagation means for decoding LDPC-coded bitstream passes messages between so-called bit (variable) and check nodes. These messages are stored before the other nodes access and process it, in order to have synchronous execution. However, it is the lack of any structural regularity in these essentially random codes that creates a major challenge for building a practical LDPC decoder. Hence the envisaged system focuses on decoding regular codes only. In regular codes, each node in the Tanner graph passes the same number of messages in a phase of decoding. Further, by choosing projective geometry regular codes to lend additional regularity, it becomes possible to design a schedule in such a way that there are no memory or processor conflicts, as our system shows. In fact, various other "nice properties" that one generally seeks during parallel system design: load-balancing, efficient data-routing are also achieved. This regularity can also
6

be used to reduce the wiring and layout complexity in the final stages of hardware design.
This decoding algorithm can be understood from the Tanner graph representation of the code, as shown in Figure 1 of the accompanying drawings. Each bit of the codeword is connected (hence related) to the constraints (in form of parity-checks) that the bits must satisfy in order to form a valid codeword. During the iterative decoding algorithm process, messages are exchanged between the bits and the constraints (parity checks) on the edges of the graph. The steps in this scheme can be briefly put down as follows:
1) All bit-nodes and their outgoing bit-messages are initialised to the intrinsic information represented as a log-likelihood ratio of the received symbol

The messages are sent to the parity-check nodes

along the edges of the Tanner graph.
2) At the parity-check nodes, computation is performed separately on the sign and the magnitude part of each message. The XOR on the sign bits of the incoming bit messages at the parity-check nodes form the parity check result for the row. The sign of each outgoing check message for each edge of the graph is formed by further XORing the sign of the incoming bit message of that edge, and the row parity check result.
Further, an intermediate row parity reliability:
7

and subsequent outgoing check message reliabilities are computed in the logarithmic domain as:

where In (λi) is the log of the parity reliability for row and λij is the reliability of the message sent to parity-check node i by bit node j.
The outgoing check message reliabilities (magnitudes) are calculated using In(λi) as:

where X& is the reliability of the check message from parity-check node i to variable node k.
3) These messages are now sent to the bit nodes along the edges of the Tanner graph.
4) At the bit nodes, estimates of the decoded bit is updated by summing up the intrinsic information, and the extrinsic information received along each edge (the check message log-likelihoods). The decoded bit is taken to be the sign of the summation, for this round of iteration. Outgoing bit messages for the next decoding iteration are then formed using the same summation. However, each edge is updated without

8

using the incoming message on the corresponding graph edge. This is easily implemented by subtracting the contribution of each edge from the group sum.
5) Repeat steps 2}-5) until a termination condition is met.
From the above description of algorithm, it is clear that over various iterations, updation of extrinsic information is where most of the time is spent. Hence this patent work focuses on optimizing that.
The design uses distributed memory configuration by pushing all required inputs of a node into its local memory efficiently. That way, communication issues such as waiting and data corruption are avoided. Direct wiring has been used because of locality of communication. As mentioned before, designing efficient access to memory is crux of this system design.
In parallel computing, there is a seminal work done by Karmarkar relating to design of systems, which arise out of projective geometry considerations. In that work, Karmarkar defined two units of system behaviour: perfect access pattern and perfect access sequences. The main focus was to define a unit amount of processing requiring stimulation of the whole parallel system, such that it automatically results in a coherent conflict-free operation and uniform balanced use of all the resources such as processors, memories and wires connecting them. Such units were called perfect access patterns. If such units are combined, then the resulting computation is still balanced with respect to resource usage. Such aggregation was called a perfect access sequence. It has been proved by authors that the behaviour of regular LDPC decoder while doing extrinsic information updating is made up of such
9

perfect access patterns and sequences. To accommodate separate nature of bit-node and check-node processing within a single iteration, the authors have extended Karmarkar's design template. This was made possible because the bit nodes and check nodes work in non-temporal-overlap fashion. The design is in such a way that the memory blocks that these nodes access for operand input and output are non-common, thus temporally partitioned the computation in two sub-computations fully.
Since 2-dimensional geometry has been used for the design, the bipartite Tanner graph is symmetric. With respect to memory configuration, the approach is to use the small local memory available on a processing node itself. Thus the problem of scheduling for perfect access pattern/sequence in LDPC decoding gets decomposed into two isomorphic scheduling problems.
The apparatus in accordance with this invention is shown in Figure 2 of the accompanying drawings. This figure clearly shows the symmetric decomposition of the computation.
For the scheduling algorithm for this setup, recall that in the bit-node processing, the major processing(same on all nodes) involves adding the extrinsic information collected over all the connections of the particular bit-node to check nodes. By taking, for example, two inputs at a time for addition, we can schedule a binary operation on each bit-node processing unit, in (every) one cycle. Similarly, in the check-node processing, the major processing involves adding functionally transformed input messages collected over all the connections of the particular check-node to bit nodes.
10

Hence one can again find perfect access pattern in check-node processing as well. An illustration of such numbering and grouping for perfect pattern generation is depicted in Figure 3. One can observe that since the multiplicative group is cyclic, a "horizontal right shift" by one unit length, of each group of line and two points yields the next group of line and two points.
Before designing the decoder system, one also needs to fix the length of the code, since our design is quite generic. A regular LDPC code's Tanner graph can be carved out from any projective plane. A lot many parameters(numerical properties) of codes can then be derived for each instance of code. In iterative decoding, main parameters that are used for shortlisting one for architectural design are the block size(ps+l), and the number of iterations required. It is up to the user of this patent work to fix the codes before using the envisaged system.
Further, before the schedule i s generated, one needs to design the interconnect network for a particular value of prime number p. The question that naturally arises here is: what are (indices of) the points that are linked to line number/index, for example, 2 One needs to go through an elaborate process of vector space construction, which is an indirect method to answering this question. A pictorial representation of interconnect for p = 5 is given in Figure 4 of the accompanying drawings.
11

In the decoding process based on belief propagation, we observe that for bit node computation, addition and subtraction are the two required operators. Similarly, for check node computation, addition, subtraction and XOR are the operators. Of course the check node computation is preceded/succeeded by transforming the inputs in the log domain.
In the envisaged architecture, each operator has to take (ps + 1) inputs before it can produce an output. The inputs to these operators are grouped two at a time. Hence a binary operation of one type is feasible every cycle on all processing units of one type for decoding purposes. We can now clearly marry it off to the balanced schedule of an iteration found in the perfect sequence. The most important advantage of using binary operation is that we will be able to use off-the-shelf dual-port memories to fetch two operands at a time. By keeping the number of ports limited to two, one can keep the cost, area and power consumption in control. The final advantage is of being able to use 3-input hardware units such as adders, as discussed later. A hardware unit with higher degree of fan-in is significantly more complex, requires custom logic at times and blows up the cost of producing a decoder.
Except for prime number p = 2, for all other prime numbers, perfect sequences can be formed in the following way. We use here the "horizontal shift" property discussed earlier. Since a line is an unordered set of points, we reorder each such set in such a way that we get multiple units of two points showing "horizontal shift" property. So from each set of points, we start collecting such groups of two points. The first group forms the first perfect access pattern. The second group forms the next pattern, and so on.
12

This formation is illustrated in Figure 5. The collection of all such pattern forms a perfect sequence. The execution of perfect sequence, thus, takes multiple cycles. For prime number 2, (ps+l) is divisible by 3 for all values of s, and hence we have been able to design scheduling algorithm for this number as well.
Further, in context of LDPC decoder dynamics, one needs to decide the order in which processing nodes are activated (fired). One also needs to decide in which order inputs are taken in, and outputs are computed? This order is referred to as the schedule of the decoder. The proposed decoding system uses the flooding type of schedule. In this scheme, the nodes of the same type are all updated (e.g. the bit nodes) and then the nodes of the other type are also all updated to finish the iteration. Each of these two periods is called a phase. In the envisaged system, one needs to further make sure that every cycle the scheduled processing units are performing the same operation.
The implementation of each type of node involves more than one hardware unit (e.g., gates). The behaviour of units is mainly arithmetic and logical operations, though the check nodes require complex transformation units for implementing f(x) and f'(x) defined previously. We assume here that all these hardware units perform their operation within one machine cycle, which is a practical assumption. If a particular operation runs over multiple cycles, then that operation must be pipelined.
13

For the bit-node architecture, we use one 3-input adder with a feedback element on bit-node side as the first hardware unit. It has a memory(register) in the feedback path to store the intermediate value of addition. Apart from that, a register is provided to store the accumulated sum of extrinsic information. The adder is depicted as part of the Figure 6.
We also use two 2-input subtractors on the bit-node side. One of the inputs of these subtractors is connected to a register that stores the sum of all its inputs.
Depending on the quantization levels we choose, the adder and subtractors have to be multi-bit precision.
For the check-node processing, the sign and magnitude processing happens separately. The corresponding coefficients have to be derived from the same message value, by taking leading bit(s) as the sign, and remaining as magnitude. Also, a log-domain transformation is required before magnitude processing, and reverse transformation after processing. Thus, for sign processing, we use one 3-input XOR gate with a feedback element on bit-node side as the first hardware unit. It has a memory(register) in the feedback path to store the intermediate value of XORing.
XORing a bit twice with another bit nullifies the effect of the duplicated bit. We use this property to remove the sign bit of a particular edge associated to a node, from the group of sign bits of all edges related to that node. The 2-input XOR gates are used for that purpose.
14

Both sign and magnitude processing parts have a memory (register) each in the feedback path to store the intermediate value of the main operator. Apart from that, a register is provided to store the accumulated sign bit (or the addition of log-likelihood ratios). The two 2-input subtracters for the magnitude processing have one each of their inputs connected to a register that stores the sum of all its inputs. The check node architecture is presented in Figure 7 and Figure 8 of the accompanying drawings.
Depending on the quantization levels chosen, the adder, XOR gates, f(x) and f-1(x), and subtracters have to be multi-bit precision.
The implementation of functions f(x) and f-1'(x) is one of the trickier aspects of check node implementation. Both these functions are non-linear functions. When the quantization levels are smaller, then these mathematical functions can be approximated using first few terms of their infinite expansion. In that case, an ASIC unit can be put in using few logic gates. Otherwise, a SRAM-based LUT block can also be used for implementation.
To mitigate the effect of extra bits generated during addition, a saturation unit can be added wherever required. Within a node, one can use higher-precision logic gates and internal busses to minimize the round-off error. So we may use higher precision(bit-width) adder and subtractors in bit node architecture. In that case, the (two) saturation blocks are placed at the (two) outputs of the two subtractors. For magnitude processing on the check node side, we place the saturation blocks just before the blocks implementing f -
15

The layout of memories, where the reliability messages sent out by various processing units get stored, is as follows. In the design, both the row- and column-regularity is (ps + 1), while the number of processing units is (p2s + ps + 1). Hence the number of memory blocks is also (p s + ps + 1), each having a width of (ps + 1) data objects. The size of each data object depends on the quantization level of messages. As discussed before, we use vector space construction to identify which memory blocks(point in projective plane) are linked to which processing unit(lines in projective plane) of both types.
Individually, the memory blocks are all dual-port memories. Each memory block stores (ps + 1) messages in the following order. Suppose point aj is represented as a intersection of lines {lJ0, lj1, ..., ljPs}, where lj1 is a number between 0 and (p2s + ps). Without loss of generality, we assume that these numbers are linearly ordered, i.e. lJ0 < lj1 < ... < ljPs. If they are not, we can always reorder them. Then a message corresponding to processing unit number lj0 is stored in 0th location of the block, lj1 in 1st location, and so on. Such an ordering eases out address generation, if it were not to be a look-up table implementation.
It is known that the access to memory blocks is structured. One needs to further answer the question: if the lj0th processing unit working on some binary operation accesses the jth memory block, then in which cycle does it access it, and at which location(local address)? It follows that since we read/write two messages at a time, the mth message associated with the processing unit lj1 is accessed from jth memory block's ith location in cycle
16

number . Because of this ring structure, the complexity of hardware is

significantly reduced. By doing a simple modulo addition and then doing a search in LUT, one can find which out of 0 to (k-1) points actually moved into jth memory location. That becomes the value of m.
There are two ways by which processing units can access operands stored in memories in a particular cycle. In the first way, the processing units themselves calculate/generate and place the address/location using an extra bus, for a memory access. One can alternatively build and embed an address generator within the memory block, which places two data objects on the two ports(or alternatively, allows two data objects to be stored at two locations), given the cycle number. Hence for each memory block, one will require one address generation block. Yet again, there are two ways of implementing this address generation block(wherever it is). Given the Tanner graph, one can pre-calculate and implement it using a LUT. Alternatively, one can use custom logic to design a counter-type logic based on 'structure' in access, which can act as the address generation block. The choice of implementation method is left open to the user of this patent work.
Both processing units and memories have (ps + 1) ports. In each cycle, they operate using only two ports, and thus cover the whole set of (ps + 1) ports. Hence the switch is distributed across the architecture; each processing unit and each memory block has its own switch. For a processing unit, the function of switch is to 'select' and turn 'on' two out of (ps + 1) ports. Hence it needs a custom 2-to-(ps +1) line decoding logic. The remaining ports in each cycle can be turned off by the decoder. On similar lines, the switch
17

element in each memory block 'selects', turns 'on', and maps the two out of (ps + 1) ports to it's two internal ports(we are using dual-port memories). We need similar line decoding logic for this switch element.
With this much architectural information, the schedule of the envisaged system is now presented. Within each nodes' computation, one can find two perfect patterns (sub-computations) which do not temporally overlap. These patterns are named as 'accumulation scan' and 'output scan' respectively.
In the accumulation scan, one accumulates the sum of all input messages, in a suitably transformed domain. The architecture of bit nodes and check nodes shows a corresponding adder element at the beginning (left) for this scan. For the sign processing on check node, the accumulation is achieved by using a XOR gate. Further, each memory block is read twice within the same cycle from various processing units. Hence both of its ports remain busy. However, within a 'phase', the accumulation scan needs only one type of memory, and hence the set of other type of memories are idle during this scan.
In the output scan, messages are read corresponding to edges, two at a time, subtract/remove them from the accumulation, inverse transform them wherever needed, and output/store them in memories corresponding to the other type of nodes. The architecture of bit nodes and check nodes shows two corresponding subtracter elements at the end (right) for this scan. For the sign processing on check node, the effect of subtraction is achieved by using a XOR gate. While the reading of two coefficients happens from one type of memory, the storage happens in other type of memory. For example, the bit nodes read from the bit node memory blocks during output scan, and
18

write into the check node memory blocks. Each memory block of each type is accessed twice. Thus, bit node memory blocks will be read twice within the same cycle by bit node processing units. Further, the check node memory blocks will be written into twice within the same cycle by the check node processing units. That way, both the memories and their two ports each are busy every cycle within the output scan. This happens irrespective of the 'phase' to which the output scan belongs.
Having provided the hardware structure and the schedule to stimulate various elements, one can now understand the decoding process in detail.
Each decoding cycle takes many decoding iterations. Here we discuss details of a single iteration. First, the memory register in the feedback path of the adder(Mintr) is initialized with the intrinsic information of the current frame. All bit nodes then first accumulate the sum of their input messages, by adding them two messages at a time. The messages are read in from the appropriate bit-memory blocks using suitable addressing. The intermediate value of addition is accumulated in the memory Mintr. Since there are (ps + 1) input messages per node, the accumulation scan needs (ps + l)/2 cycles to finish. At the end of accumulation scan, the other memory register Macc is selected, and the sum dumped into it.
Next, each of the bit node processing unit starts reading two messages, again, at a time, and subtracting it from the sum using two subtractors in parallel. The output of these subtractors is stored into appropriate check-
19

memory blocks. If a saturation unit is inserted, then the saturation is done just before storing the bit node outputs.
Once this phase is over, the check nodes take over the processing for the next phase of the same data frame. Both the intermediate memory registers Sintr and Mintr are initialized with 0. The messages are read in from the appropriate check-memory blocks using suitable addressing. The sign and magnitude are segregated, and handed over to different units for processing. The sign unit accumulates the XOR of signs of all input messages in it's corresponding accumulation scan, two at a time. The magnitude unit first transforms all coefficients into In domain, before it similarly starts
accumulation two at a time. Both of them use the corresponding intermediate accumulation memories, Sintr and Mintr, to store the intermediate results. Like bit-node processing, the accumulation scan of sign and magnitude processing take (ps + l)/2 cycles, working in parallel, to finish. At the end of accumulation scan, the other memory registers, Sacc and Macc respectively, are selected, and the sum dumped into it.
Next, both the sign and magnitude units of each node start reading two messages, again, at a time. The sign bits and magnitude are yet again segregated and routed. The magnitude part is (forward) transformed before being routed. The sign unit does a XOR of the two signs using two XOR gates in parallel, to nullify the effect of the signs in the accumulation. The magnitude unit subtracts the two magnitudes from the accumulation using two subtractors in parallel. The magnitude parts are then inverse transformed
20

and combined with the appropriate sign output for a particular edge. They are then stored back into appropriate bit-memory blocks. If a saturation unit is inserted, then the saturation is done just before storing the check node outputs.
Since the sign processing in the check node is lightweight when compared to the magnitude processing, it is possible that the accumulation and output scans of both units are not synchronized. The real need for synchronizing them arises when the sign and magnitude part of a coefficient are being combined, but one part has not arrived. In such case, one needs to implement a rendezvous and insert appropriate buffers, most probably for storing the sign unit outputs..
Thus each phase requires (ps + l)/2 + (ps + l)/2 cycles to finish both the scans. Each iteration, which consists of two phases, therefore requires 2(ps + 1) cycles to finish. This number, however, is a coarse approximation.
The iterations stop when the parity checks are satisfied. There are two ways of checking them as per the literature. Alternatively, one stops after a fixed number of iterations.
The memories encountered in the overall system are certain registers within each node, and the bit-memory and check-memory blocks. All these memory units are not active in all machine cycles. Hence wherever possible, chip select signals should be used. The design of chip select signal turns out to be quite straightforward.
21

The control logic design for the envisaged system will be such that whenever a hardware unit is scheduled, the corresponding data operands are ready at it's inputs. Hence the fetch-decode-execute paradigm gets reduced to just 'execute'. One can say that the internal control logic of the processing units(nodes) is deterministic as well as simple. Two parts of control logic, namely the address generation unit and the switch, have been covered already. Another feature of control logic is appropriate control signal generation(enabling) and distribution(sending) at right cycle to right unit(s). The system clock may be switched off for some of the elements that are idle during certain periods. Implementing this power-saving mode requires signals. Given that the whole schedule is determinable at design time itself(static scheduling), one can even design for various types of chip select signals that are required for memory synchronization. There will also be other signals such as frame start signal, which signals the start of a new iteration. Finally, a group of signals will need to be designed to handle exceptions, overflows etc. Whether the required control logic is microprogrammed using a control store, or it is implemented as hardwired control circuitry, is a prerogative of the implementer of this patent work.
Since the system design is based on proven properties of projective geometries, simulations are not exactly required. However, detailed complexity and cost analysis was done, and the results are discussed now.
The parameters based on which comparison is done are:
22

1. Processing Unit parameters. Code length n, gate count and their fan-in were chosen for analysis.
2. Interconnect parameters. The number of edges/connections in Tanner graph was chosen for analysis.
3. Memory parameters. Number of memory blocks and their width, number of memory registers per processing unit are primary parameters. Apart from this, number of memory accesses per cycle per block, and aggregated over an iteration, number of read and write accesses per iteration, and bandwidth utilization are further parameters. The corresponding utilization factor implies what percentage of this access capacity is being used in a particular cycle.
4. Timing parameters. The number of cycles taken by few aggregate units of time, namely the fold, phase, and iteration are primary parameters here. Another important parameter is the system throughput.
The envisaged decoder system in accordance with this invention requires J = (p2s + ps + 1) processing units of either type. Hence the overall number of processing units is 2•(p2s + ps + 1).
The bit-processing nodes have three gates: one adder and two subtractors. The checks processing nodes contain 6 gates: three XOR gates, one adder and two subtractors. We are assuming here that the functions f(x) and f'(x) are implemented using look-up table. Hence their numbers do not add to the gate count, but do add to area-related considerations.
Each bit node in the architecture is connected to both bit- and check-memory blocks. Hence the number of wires is 2•(p2s + ps + l)*(ps + 1).
23

Turning to memory layout, there are (p s + ps + 1) bit-memory and (p s + ps + 1) check-memory blocks. The width of each block is (ps + 1) data objects and they are all dual-port memories. There are also two parallel registers in each bit-processing unit, and four registers in each check-processing unit. They count upto 6•(p2s + ps + 1) memory registers. The memory bandwidth utilization is dependent on the type of scan that is going on. In accumulation scan, one set of memory blocks(the opposite ones) are idle, and so are the connections to those blocks. In output scan, the utilization is full, for both type of memory blocks are busy. In every iteration there are four scans, two of each type. Each memory type remains idle in one out of these four scans.
For each processing unit, either two or four out of 2•(ps + 1) connections get
excited in a cycle(see the assumption next). Since this figure is same over

each cycle, the communication bandwidth utilization factor is either

Assuming that the both types of processing units take one cycle to execute
each type of scan, the number of cycles that execution of a phase(consisting
of two scans) of either type takes is then (ps + l)/2 + (ps + l)/2 = (ps + 1).
The iteration time is thus 2•(ps + 1). If N is the number of iterations to

decode each frame of data, then the throughput is

is the system frequency.
Some of the lower bounds on some of the parameters related to area and speed are introduced as follows. One of the parameter is the gate count, which can crudely represent one part of area. The other part is the area taken
24

25
While considerable emphasis has been placed herein on the various components of the preferred embodiment and the interrelationships between the component parts of the preferred embodiment, it will be appreciated that many alterations can be made and that many modifications can be made in the preferred embodiment without departing from the principles of this invention. These and other changes in the preferred embodiment as well as other embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.

Documents

Orders

Section	Controller	Decision Date

Application Documents

#	Name	Date
1	205-MUM-2007-CORRESPONDENCE(IPO)-(FER)-(28-01-2014).pdf	2014-01-28
1	205-MUM-2007-RELEVANT DOCUMENTS [28-09-2023(online)].pdf	2023-09-28
2	205-MUM-2007-RELEVANT DOCUMENTS [26-09-2022(online)].pdf	2022-09-26
2	Other Patent Document [10-06-2016(online)].pdf	2016-06-10
3	205-MUM-2007-RELEVANT DOCUMENTS [29-09-2021(online)].pdf	2021-09-29
3	205-MUM-2007-POWER OF ATTORNEY-(17-06-2016).pdf	2016-06-17
4	205-MUM-2007-RELEVANT DOCUMENTS [29-03-2020(online)].pdf	2020-03-29
4	205-MUM-2007-CORRESPONDENCE-(17-06-2016).pdf	2016-06-17
5	Other Patent Document [13-07-2016(online)].pdf	2016-07-13
5	205-MUM-2007-RELEVANT DOCUMENTS [23-03-2019(online)].pdf	2019-03-23
6	205-MUM-2007-CORRESPONDENCE(IPO)--(22-11-2016).pdf	2016-11-22
6	205-mum-2007-abstract(5-2-2008).pdf	2018-08-09
7	205-MUM-2007-CORRESPONDENCE(IPO)-(DECISION)-(22-11-2016).pdf	2016-11-22
7	205-MUM-2007-Abstract-210115.pdf	2018-08-09
8	205-MUM-2007-CLAIMS(GRANTED)--(22-11-2016).pdf	2016-11-22
8	205-mum-2007-claims(5-2-2008).pdf	2018-08-09
9	205-MUM-2007-Claims-210115.pdf	2018-08-09
9	Form 27 [20-03-2017(online)].pdf	2017-03-20
10	205-MUM-2007-CORRESPONDENCE(16-4-2009).pdf	2018-08-09
10	205-MUM-2007-RELEVANT DOCUMENTS [28-03-2018(online)].pdf	2018-03-28
11	205-mum-2007-correspondence(5-2-2008).pdf	2018-08-09
11	205-MUM-2007_EXAMREPORT.pdf	2018-08-09
12	205-MUM-2007-CORRESPONDENCE(IPO)-(HEARING NOTICE)-(4-5-2016).pdf	2018-08-09
12	205-MUM-2007-Power of Attorney-210115.pdf	2018-08-09
13	205-mum-2007-correspondence-received.pdf	2018-08-09
13	205-MUM-2007-OTHERS-210115.pdf	2018-08-09
14	205-mum-2007-description (provisional).pdf	2018-08-09
14	205-mum-2007-form-3.pdf	2018-08-09
15	205-mum-2007-description(complete)-(5-2-2008).pdf	2018-08-09
15	205-mum-2007-form-26.pdf	2018-08-09
16	205-mum-2007-drawing(5-2-2008).pdf	2018-08-09
16	205-mum-2007-form-2.pdf	2018-08-09
17	205-mum-2007-drawings.pdf	2018-08-09
18	205-mum-2007-form-1.pdf	2018-08-09
18	205-MUM-2007-Examination Report Reply Recieved-210115.pdf	2018-08-09
19	205-mum-2007-form 1(18-5-2007).pdf	2018-08-09
19	205-MUM-2007-Form 5-210115.pdf	2018-08-09
20	205-mum-2007-form 13(16-4-2009).pdf	2018-08-09
20	205-mum-2007-form 5(5-2-2008).pdf	2018-08-09
21	205-MUM-2007-FORM 18(16-4-2009).pdf	2018-08-09
21	205-MUM-2007-Form 3-210115.pdf	2018-08-09
22	205-mum-2007-form 2(5-2-2008).pdf	2018-08-09
22	205-mum-2007-form 3(5-2-2007).pdf	2018-08-09
23	205-mum-2007-form 2(title page)-(complete)-(5-2-2008).pdf	2018-08-09
23	205-MUM-2007-Form 2(Title Page)-210115.pdf	2018-08-09
24	205-mum-2007-form 2(title page)-(provisional)-(5-2-2007).pdf	2018-08-09
25	205-MUM-2007-Form 2(Title Page)-210115.pdf	2018-08-09
25	205-mum-2007-form 2(title page)-(complete)-(5-2-2008).pdf	2018-08-09
26	205-mum-2007-form 2(5-2-2008).pdf	2018-08-09
26	205-mum-2007-form 3(5-2-2007).pdf	2018-08-09
27	205-MUM-2007-FORM 18(16-4-2009).pdf	2018-08-09
27	205-MUM-2007-Form 3-210115.pdf	2018-08-09
28	205-mum-2007-form 13(16-4-2009).pdf	2018-08-09
28	205-mum-2007-form 5(5-2-2008).pdf	2018-08-09
29	205-mum-2007-form 1(18-5-2007).pdf	2018-08-09
29	205-MUM-2007-Form 5-210115.pdf	2018-08-09
30	205-MUM-2007-Examination Report Reply Recieved-210115.pdf	2018-08-09
30	205-mum-2007-form-1.pdf	2018-08-09
31	205-mum-2007-drawings.pdf	2018-08-09
32	205-mum-2007-drawing(5-2-2008).pdf	2018-08-09
32	205-mum-2007-form-2.pdf	2018-08-09
33	205-mum-2007-description(complete)-(5-2-2008).pdf	2018-08-09
33	205-mum-2007-form-26.pdf	2018-08-09
34	205-mum-2007-description (provisional).pdf	2018-08-09
34	205-mum-2007-form-3.pdf	2018-08-09
35	205-mum-2007-correspondence-received.pdf	2018-08-09
35	205-MUM-2007-OTHERS-210115.pdf	2018-08-09
36	205-MUM-2007-Power of Attorney-210115.pdf	2018-08-09
36	205-MUM-2007-CORRESPONDENCE(IPO)-(HEARING NOTICE)-(4-5-2016).pdf	2018-08-09
37	205-MUM-2007_EXAMREPORT.pdf	2018-08-09
37	205-mum-2007-correspondence(5-2-2008).pdf	2018-08-09
38	205-MUM-2007-CORRESPONDENCE(16-4-2009).pdf	2018-08-09
38	205-MUM-2007-RELEVANT DOCUMENTS [28-03-2018(online)].pdf	2018-03-28
39	205-MUM-2007-Claims-210115.pdf	2018-08-09
39	Form 27 [20-03-2017(online)].pdf	2017-03-20
40	205-mum-2007-claims(5-2-2008).pdf	2018-08-09
40	205-MUM-2007-CLAIMS(GRANTED)--(22-11-2016).pdf	2016-11-22
41	205-MUM-2007-Abstract-210115.pdf	2018-08-09
41	205-MUM-2007-CORRESPONDENCE(IPO)-(DECISION)-(22-11-2016).pdf	2016-11-22
42	205-MUM-2007-CORRESPONDENCE(IPO)--(22-11-2016).pdf	2016-11-22
42	205-mum-2007-abstract(5-2-2008).pdf	2018-08-09
43	Other Patent Document [13-07-2016(online)].pdf	2016-07-13
43	205-MUM-2007-RELEVANT DOCUMENTS [23-03-2019(online)].pdf	2019-03-23
44	205-MUM-2007-RELEVANT DOCUMENTS [29-03-2020(online)].pdf	2020-03-29
44	205-MUM-2007-CORRESPONDENCE-(17-06-2016).pdf	2016-06-17
45	205-MUM-2007-RELEVANT DOCUMENTS [29-09-2021(online)].pdf	2021-09-29
45	205-MUM-2007-POWER OF ATTORNEY-(17-06-2016).pdf	2016-06-17
46	Other Patent Document [10-06-2016(online)].pdf	2016-06-10
46	205-MUM-2007-RELEVANT DOCUMENTS [26-09-2022(online)].pdf	2022-09-26
47	205-MUM-2007-CORRESPONDENCE(IPO)-(FER)-(28-01-2014).pdf	2014-01-28
47	205-MUM-2007-RELEVANT DOCUMENTS [28-09-2023(online)].pdf	2023-09-28