"A Method To Produce Efficient And High Quality Simulated Annealing

< Back

"A Method To Produce Efficient And High Quality Simulated Annealing Placement"

Abstract: The present invention provides an efficient method for mapping a logic design on Field Programmable Gate Arrays comprising the steps of determining the minimum required square grid of FPGA logic blocks for mapping said design, providing a compensation factor on said minimum square grids, and selecting the maximum value amongst the said compensated square grids for reducing the mapping time, and implementing a legalization adjustment to ensure mapping of said compensated design.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

29 December 2004

Publication Number

06/2007

Publication Type

INA

Invention Field

ELECTRONICS

Status

Parent Application

Applicants

STMICROELECTRONICS PVT. LTD.

PLOT NO. 2,3 & 18 SECTOR 16A, INSTITUTIONAL AREA, NOIDA 201301, UTTAR PRADESH, INDIA

Inventors

1. DHABALENDU SAMANTA

B2/3, GLAXO APARTMENTS, MAYUR VIHAR, PHASE-1, EXTN. DELHI-91, INDIA

2. VIREN AGARWAL

COLOUR HOUSE, 10TH MILE, RISHI ROAD, KALIMPONG-734301, WEST BENGAL, INDIA

Specification

Field of the Invention
The present invention relates to an efficient method for mapping a logic design on Field Programmable Gate Arrays.
Background of the Invention
Field Programmable Gate Arrays (FPGAs) are configurable logic devices, which are used to implement varied logic functions. The basic FPGA consists of Look Up Tables (LUTs) and routing paths but a complex one includes dedicated RAM, logic/I/O blocks and other macros like adders, multipliers etc. These FPGAs can be configured to produce a desired output by interconnecting said blocks through routing paths. This process of implementing logic of a design on a FPGA by interconnecting configurable logic devices is known as FPGA mapping. Since being first introduced by XILINX in 1985, the FPGAs have become increasingly popular devices for use in electronics systems. The use of FPGAs continues to grow exponentially because they offer relatively short design cycles, with huge reduction in costs through logic consolidation,' and flexibility in terms of configurability
Several FPGA placement systems have been developed to optimally implement the desired designs on a FPGA. The systems try to map the desired designs onto a grid of configurable logic blocks on a FPGA device of particular size. The net-list for the design, and the Input/Output pad (1/0 pad) information of the chip are given as the inputs to the placement system and the system outputs the coordinates for each mapped logic element on a FPGA and interconnection information for logic elements. Over the last few decades, the FPGA placement problem has been solved by various techniques like analytic, partition based and Simulated Annealing. Two parameters of design cycle i.e., delay time and required Silicon area are optimized to get the best possible results. These systems also consider the mapping time and routability of the placed design while mapping the design on a FPGA.
Out of all techniques mentioned here, most FPGA placement systems are based on a variant of Simulated Annealing (SA) as SA usually results in most optimal solution in
terms of silicon area and delay time. SA is a Monte Carlo approach for minimizing multivariate fiinctions. The parameter involved in the case of placement is the cost of the placement in terms of wire length or some other factor. The temperature T is a parameter that controls the likelihood of accepting moves that makes the placement worse.
Initially T is very high so almost all moves are accepted. It is then gradually decreased, as the placement is refined so that eventually the probability of accepting moves that makes the placement worse is very low. To apply SA, the system is initialised with a particular random placement. A new placement is constructed by random displacement of the configurable logic devices. If the cost of this new state is lower than that of the previous one, the change is accepted unconditionally and the placement is updated. If the energy is greater, the new placement is accepted probabilistically This is the Metropolis step, the fundamental procedure of SA. This procedure allows the system to move consistently towards lower cost, yet still jump' out of local minima due to the probabilistic acceptance of some upward moves.
While various FPGA configurations are currently used, the FPGA devices mentioned herein represent a square grid of configurable elements on a FPGA. For example, a FPGA with device size 17 represents a FPGA having a square grid of configurable logic blocks with 17 rows and 17 columns and routing paths in between. The square arrangement helps in ensuring minimization of wire length in a design and thus reducing the placement cost.
There are two inherent problems with these SA based placement systems. Firstly, when the design size is almost equivalent to the FPGA device size, the present day systems are unable to produce an optimal FPGA placement for large designs within a reasonable amount of time. This is because there is a limited free space available for movement of logic blocks and as a result, the system performs significant amount of switching of logic blocks to find an optimal placement. The other problem is the availability of excess free space for movement of logic blocks when the design size is much smaller than the device size The larger the device size the more number of moves SA will make and as a result.
the system would take more time to find optimal placement. This is observed in Figure 1
and Figure 2.
The graph of figure 1 gives a very clear idea about the behavior of SA. XorMux 8 requires a minimum of device size 17 [15 X 15 for Logic blocks and I/O's on periphery] The placement cost of SA is low in the region between 25 and 38. For very high device size, when the logic blocks have excess space to move, the SA produces very bad results. Moreover the deviation in the results, or in other words the randomness of SA is also increased for higher device size. The graph has been computed taking an average of mapping cost of five runs for each value of device size. If a design were to run only once foi each device size, the deviation would even be larger.
The extra silicon area given to the logic blocks is to allow the SA to have more "moves" in which only a single cell is involved (i.e. no swapping) during the initial phase. This will facilitate SA to have a better intermediate solution compare to one with inappropriate silicon area
The larger the device size the more number of moves SA will make. For more number of moves SA would take more time. But it is found that SA takes more time for less number of moves and produce a sub-optimal solution when it has very little free silicon area to move i e when the design size is almost equal to the device size. This is illustrated in figure 2.
As seen in the graph of figure 2, the placement time decreases till the device size is around 24 and then increases again. When the device size is near 17 [minimum size for XorMuxS] the logic blocks are very tightly packed and hence the each move of SA actually involves two logic blocks (i.e. swapping). As shown in Figure 3 and Figure 4 below, almost all moves in figure 3 will involve two or more logic blocks where as moves in figure 4 will mostly involve one logic block.
Ideally the system should produce an optimal FPGA mapping, but the mapping time for a design adversely affects the system output As the time passes, the parameter T decreases for SA based systems and as result, the probability of accepting a worse placement also decreases. For big designs the system may accept a sub-optimal placement, as eventually the system may not be able to accept a worse output in the proximity. Hence, eventually the system accepts a sub-optimal placement as an output if better placement is not found in the proximity. The system placement results are illustrated in terms of placement cost and placement time for various device sizes in Figures 1 & 2 respectively. As observed, both parameters initially decrease as the FPGA device size increases. However, the paramenters start to increase, as the device size gets significantly bigger than the design size Moreover, the de\iation in the results, or in other words the randomness of the output is also increased for higher device size.
The miniaturization of semiconductor technology and increase in the number of logic elements in the present day designs has made the present FPGA mapping systems very slow and unreliable for producing feasible FPGA implementation. All existing techniques produce far from the optimal solution. The present day systems are very slow in providing optimal mapping and hence, are very undesirable for FPGA implementation of designs with large number of logic elements. To truly exploit FPGAs for rapid turnaround development and prototyping, placing of processing elements in proper locations of the device plays an important role in determining the delay and the silicon requirement for FPGA implementation of a design.
Hence, there is a need for systems that minimize the chip area and delay time for a design, and at the same time reduce the mapping time. The present invention relates to efficient FPGA placement system for designs with large number of logic elements using SA based approach. The present invention provides a placement system for efficient FPGA implementation of big designs The Required Silicon area is dominated either by the l/O's, Logic Blocks or Macro Height
Object and Summary of the Invention
To obviate the drawbacks of the prior art the object of the instant invention is to provide a system that results in reduced silicon area for designs with large number of logic elements.
Further object of the invention is to reduce the timing delay for the mapped design on FPGA.
Another object of the invention is to reduce the time taken for mapping the above mentioned designs on a FPGA
To achie\e the above objects the instant invention provides an efficient method for mapping a logic design on Field Programmable Gate Arrays comprising the steps of
determining the minimum required square grid of FPGA logic blocks for
mapping said design;
providing a compensation factor on said minimum square grids, and
selecting the maximum value amongst the said compensated square grids
for reducing the placement time; and
implementing a legalization adjustment to ensure mapping of said
compensated design.
Said determining comprises the steps of
determining the minimum required number of FPGA logic blocks in a square grid for mapping corresponding design logic blocks;, determining the minimum required number of FPGA logic blocks in a square grid for mapping corresponding design Input / Output blocks, and determining the minimum required number of FPGA logic blocks in a square grid for mapping corresponding design macros.
Said compensation factor for said square grid for said design logic blocks is calculated by multiplying said square grid for said design logic blocks by a factor 'X'.
Said compensation factor of said square grid for said design Input / Output blocks is calculated by multiplying the number of Input/Output blocks with a factor 'Y'.
Said compensation factor of said square grid for said design macros is calculated by mulliplying the number of macro blocks with a factor 'Z'.
Said factor "X' ranges between 1.5 to 3.
Said factor Y" is I when more than one 1/0 logic block is held in Input/Output coordiualc
Said factor "y'" is ! 2 when Input/Output coordinate holds only one I/O logic block.
Said factor "Z" is 1 2.
Brief Descriptions of the Drawings
The invention will be described with reference to the accompanying drawings.
Figure 1 shows the graph of placement cost versus device size.
Figure 2 shows the graph of variation of placement time with increasing device.
Figures 3 and 4 show the moves involving the logic blocks.
Figure 5 shows the graph of logic block dominated design.
Figure 6 shows the graph of I/O or macro dominated design.
Figures 7 and 8 show the device sizes.
Figures 9 to 13 show different simulation results.
Detail Description of the Invention
To truly exploit FPGAs for rapid turn-around development and prototyping, placing of processing elements in proper locations of the device plays an important role in determining the delay and the silicon requirement for FPGA implementation of a design. The present invention provides a mapping system for efficient FPGA implementation of big designs. Based on the analysis in background, the mapping system determines the appropriate FPGA device size to place a design.
The Required Silicon area for each case is determined the following way.
a) L = Required Silicon area for Logic Blocks = Square Root (Number of Logic Blocks).
b) I = Required Sihcon area for I/O's = I = (Number of I/O Cells required) / 2.
c) M = Required Silicon area for Macros = max (Maximum Height of Macro, Maximum Width of Macro).
To get the optimal required silicon area we characterize the behavior of the above three categories of designs. For Logic Block dominated designs with small diflFerence between L and 1, the behavior is as shown in figure 5. For I/O dominated designs having huge difference between L and I, the behavior is as shown in figure 6. For Macro dominated designs also the behavior is as shown in figure 6 for designs where the difference between L and I is huge and similar to figure 5 where the difference between L and I is small.
In cases where the difference between L and I is small for I/O dominated designs or L and M is small for Macro dominated designs, the behavior is similar to Figure 5 because in such cases the values of 1 or M lie in region A of figure 5. In other words the Logic Blocks are still tightly packed due the small difference between L and I for I/O dominated designs or L and M for Macro dominated designs. Hence the logic blocks do not get enough free space to move.
Therefore for all designs the target should be to achieve Region B. However due to the minimum space required for 1/0 dominated designs or Macro dominated designs, the device size in region B may not be sufficient to place all I/O's or satisfy the macro heights. In such a scenario a value in region C is chosen (the behavior in region C is similar to Figure 6). Another parameter that plays an important role in choosing the optimal device size is the placement time. It is evident from figure 1 and figure 2 that the time increases in the second half of the region B. To get the optimal value of required silicon we compute the value of L, I and M according to the following procedure:
A) Computing Value of L,,,
L.„, = L '^ w, where the factor w can be in the range of 1.5 to 3.
B) Computing Value M„,
Twenty percent extra silicon area is given to Macros. HenceMM, = M * 1.20
C) Computing Value of Im
The computation of 1 depends on the architecture of the device.
i) If a single I/O co-ordinate can hold more than one 1/0, then I is not to be modified
ii) If a single 1/0 co-ordinate holds only one I/O then twenty percent extra silicon area
is given Hence 1m=1 * 12.
The maximum value amongst the computed values of Lm !„, and Mm is taken and set as the required silicon area for the design. Hence, Required Silicon for design = Rm = Max
(Lm, Im Mm)
The extra silicon area given to the logic blocks is to allow the SA to have more "moves" in which only a single cell is involved (i.e. no swapping) during the initial phase. This
will facilitate SA to have a better intermediate solution compare to one with inappropriate silicon area.
Re-adjustment of Rm
The required silicon area Rm calculated may not be sufficient to place all macros. Also the In, value takes only two sides I/O mapping into account. The I/O mapping on the other two sides is not calculated. Hence some adjustments are done to require silicon value (Rm). The following two steps are further carried out to find the
i) A legalization algorithm is executed to find the minimum amount of device required to place all the macros without legalization. Hence ML = Legalization Value for macros.
2) The I,„ value is readjusted as lFinal = Min (Im, Device size)- This is done to allow I/O's to use all-the four sides of the device if the Im value exceeds the device size.
Hence, Final Required Silicon for design = RS = Maximum (Rm, ML, I final )
The required Silicon Area can exceed the device size. The SA apparatus would usually converge itself in such cases to an area supported by the device size. A post placement method may be required to set the logic block co-ordinates to valid co-ordinates with respect to the device size as shown in figure 7 and figure 8 below. As shown in figure 7 the placed design has co-ordinates from 0 to RS. Hence the entire placement needs to be shifted as shown in figure 8.
An example when a design may consume a larger area than the amount required by it is shown below. For example an area of 8 X 8 may be sufficient for a design fed to an SA based placer But the SA initially maps it to the entire available chip area and then searches for the global optima in the entire device. This is demonstrated in the figure 9.
To overcome the problem outlined in figure 9 SA is prevented from using an area more than the calculated required silicon area as shown in figure 10. A large portion of silicon is not explored due to the restrictions set by the required silicon. This helps in reducing the placement time and produces a near optimal solution as only the required silicon area is explored Moreover, the maximum deviation in the output of the SA apparatus is less than the output generated using the require silicon area concept. Hence randomness of SA is also decreased.
As illustrated in the figure 11 the I/O spread them in the design and then the SA gets trapped in local minima yielding a result very far from the optimal. Using RS in the figure 12 the solution space is restricted and a solution closer to the global optima is
obtained
The idea of providing more silicon area virtually is very fruitful for these designs similar to the ones shown in Figure 13. In these designs during the annealing there are a number of moves in which two cells are swapped since there is very less vacant area. To handle such designs the designs are spread over the required silicon area (virtual since it will be more than the device size) and then annealing is started. The free silicon area will increase the number of moves in which only one cell is involved during the initial passes thus yielding a high quality intermediate solution.
Thus the aforesaid invention offers lot of advantages over the prior art. Due to bad placement of very large designs the router may not be able to route the design on a given device. Our reduced silicon area approach can map a big design in a much better way. Hence it may map even those designs that are not mapped by SA without reduced silicon on the same device The placement apparatus searches only the appropriate solution space applicable for the design Hence unnecessary solution space is not explored. The present invention provides following benefits over the existing method:
Reduction in randomness of SA.
> Reduction in mapping time.
> Preventing SA from getting trapped in local minima resulting in wastage
of Silicon area.
Further, since only the required area of silicon is used for a design, it assists in;
> Reduction in the number of routing resources required. '> Reduction in the delay in the design.
> Reduction in the routing execution time.
Since the silicon area is restricted, the maximum deviation of the results is also decreased Hence the consistency of this invention is increased. This is further illustrated bv the tables listed below
Calculation of RS.

(Table Removed)
In prior art, the FPGA device size used to map a design is 34 which uses a 34 X 34 size FPG.A array as an initial square arrangement for the mapping systems, while Table 1 shows the required FPGA device size determined using the instant invention for initial mapping. If the value of 1 exceeds 34, then Im is taken as Minimum (1, device Size (34)). The value of compensation factor w is taken as 2. The placement costs for various device
sizes used in the mapping system are given in Table 2. -As shown in Table 2, the highlighted values illustrate the mapping cost for the instant invention. This clearly shows the advantage of the instant invention over the prior art in terms of placement cost.
MST COST for Various Device Sizes

(Table Removed)
Note I Required Silicon Values are highlighted in blue colour.

We claim:
1 An efficient method for mapping a logic design on Field Programmable Gate
Arrays comprising the steps of:
determining the minimum required square grid of FPGA logic blocks for
mapping said design;
providing a compensation factor on said minimum square grids, and
selecting the maximum value amongst the said compensated square grids
for reducing the placement time; and
implementing a legalization adjustment to ensure mapping of said
compensated design.
2 The efficient method for mapping a logic design on Field Programmable Gate
Arrays as claimed in claim 1 wherein said determining comprises the steps of:
- determining the minimum required number of FPGA logic blocks in a square grid for mapping corresponding design logic blocks; determining the minimum required number of FPGA logic blocks in a square grid for mapping corresponding design Input / Output blocks; and determining the minimum required number of FPGA logic blocks in a square grid for mapping corresponding design macros.
The efficient method for mapping a logic design on Field Programmable Gate
Arrays as claimed in claim 1 wherein said compensation factor for said square grid for said design logic blocks is calculated by multiplying said square grid for said design logic blocks by a factor 'X'.
4 The efficient method for mapping a logic design on Field Programmable Gate
Arrays as claimed in claim I wherein said compensation factor of said square grid for said design Input / Output blocks is calculated by multiplying the number of Input/Output blocks with a factor 'Y'.
5 The efficient method for mapping a logic design on Field Programmable Gate
Arrays as claimed in claim 1 wherein said compensation factor of said square grid for said design macros is calculated by multiplying the number of macro blocks with a factor 'Z"
6. The efficient method for mapping a logic design on Field Programmable Gate Arrays as claimed in claim 3 wherein said factor 'X' ranges between 1.5 to 3.
7 The efficient method for mapping a logic design on Field Programmable Gate Arravs as claimed in claim 4 wherein said factor 'Y' is 1 when more than one I/O logic block is held in Input/Output coordinate.
8 The efficient method for mapping a logic design on Field Programmable Gate Arrays as claimed in claim 4 wherein said factor 'Y' is 1.2 when Input/Output coordinate holds only one I/O logic block.
9. The efficient method for mapping a logic design on Field Programmable Gate Arrays as claimed in claim 5 wherein said factor 'Z' is 1.2.
10 An efficient method for mapping a logic design on Field Programmable Gate Arrays substantially as herein described with reference to and as illustrated in the accompanying drawings

Documents

Application Documents

#	Name	Date
1	2595-del-2004-petition-138.pdf	2011-08-21
2	2595-del-2004-pa.pdf	2011-08-21
3	2595-del-2004-form-5.pdf	2011-08-21
4	2595-del-2004-form-3.pdf	2011-08-21
5	2595-del-2004-form-2.pdf	2011-08-21
6	2595-del-2004-form-1.pdf	2011-08-21
7	2595-del-2004-drawings.pdf	2011-08-21
8	2595-del-2004-description (provisional).pdf	2011-08-21
9	2595-del-2004-description (complete).pdf	2011-08-21
10	2595-del-2004-correspondence-others.pdf	2011-08-21
11	2595-del-2004-claims.pdf	2011-08-21
12	2595-del-2004-abstract.pdf	2011-08-21