Abstract: The present invention provides a system and method for adaptive rrequerteyscaimg lor predicting the load on the processing unit and dynamically changing its clock frequency while keeping the synchronization with other processing units. The amount of data in the input memory waiting to be processed is a good indicator of the current load. The present invention utilizes the same concept for predicting the load on the processing unit. The frequency of operation is thus changed on the basis of the percentage of memory being occupied by its input data. The algorithm disclosed by the present invention allows the processing unit to use the maximum possible clock frequency only when it is required and run at some lower frequencies in low processing power requirements. Operating the circuit at low frequency helps in reducing power consumption. A method of implementing the proposed algorithm in the form of a simple digital circuit is also provided by the present invention.
The present invention relates to dynamic frequency scaling in electronic devices and more particularly to adaptive frequency scaling based on workload prediction for reducing power consumption in an electronic device.
Background of the Invention
The current trends in the chip industry point to the development of heterogeneous systems that will be able to support several complementary standards on a single chip in order to satisfy the user's demands in diverse application scenarios. These systems will lead to the integration of existing technologies and standards and will be based on reconfigurable architecture consisting of hardware units shared between different technologies and standards.
Supporting different types of task or service requests in a heterogeneous system based on reconfigurable architecture and side-by-side satisfying need for low power consumption is a challenge.
An interactive mobile terminal, for example can spend 90% of system energy and time waiting for a user response. Such idle periods provide opportunities for dynamic power management and voltage scaling techniques to reduce the system power usage. Dynamic voltage frequency scaling (DVFS) is a technique to reduce active power consumption by scaling processor frequency and voltage to meet the required performance. This technique enables a chip to operate at different voltages and clock frequencies. In a system based on dynamic frequency clocking, the current operating frequency of a processing unit is set on the basis of different factors. These factors may be application, environment, or circuit specific. The conventional systems based on dynamic frequency clocking try to reduce the power consumption by changing the frequency on the basis of technology or standard or interface being used at that time.
The most effective way to reduce dynamic power consumption on an implementation level is scaling of the supply voltage due to the quadratic dependence. The limiting parameter is the propagation delay through a digital circuit that increases with low supply voltages.
The propagation delay through a CMOS circuit increases drastically as the supply voltage approaches the threshold voltage of the circuit. On the other hand, there is only little impact on performance with high supply voltages. Therefore, any voltage reduction must be balanced against performance reduction. To compensate and maintain the same data throughput extra hardware may be added.
The working principle of dynamic frequency clocking has been explained with the help of a block diagram shown in FIGURE 1. A Clock Divider Block (also known as Frequency Divider Block) (101) is used to generate multiple clock frequencies based on a master clock frequency which is the maximum frequency at which the synthesized system can work. The multiple clocks have been shown as 102A, 102B, 102C and 102D. A frequency selector (103) selects one of these frequencies based on some control signals generated by a control block (104). The delay that is needed by the dynamic frequency scaling circuit to change the clock frequency and getting stabilized is taken into account at the time of implementation.
A system and method for dynamic clock generation has been described in U.S Patent Application Number 6,564,329 Bl by Cheung et al. It describes a clock controller for an Application Specific Integrated Circuit (ASIC) for a portable electronic device that dynamically and automatically varies the frequency of on-chip clocks in response to bandwidth requirements of the driven logic. The ASIC includes one or more oscillators used by phase locked loops (PLLs) to generate one or more master clocks. These master clocks are received by a system clock controller which derives various clocks of different frequencies from the master clocks. These derived clocks are used to drive the various controllers and peripherals connected to the ASIC. For example, the system clock controller preferably generates a memory clock for clocking the memory controller and
the external memory devices, a bus clock for clocking the system bus, a CPU clock for clocking the CPU, and one or more peripheral clocks for clocking the various peripheral controllers and peripherals coupled to the ASIC. The various devices in the ASIC that can be accessed by other devices in the ASIC are known as "resources". The speed at which a resource is clocked affects the rate at which the resource can process data (i.e. the bandwidth of the resource). Every device in the ASIC that can access a resource, also known as a controller, has a request line coupled to the system clock controller to indicate when the controller is accessing a resource. In addition, the system clock controller has a programmable bandwidth register associated with each controller for holding a value representing the bandwidth utilized by the controller. The system clock controller also preferably includes an adder, a frequency table, and a multiplexer (MUX) for each clocked resource. When a controller accesses a resource, the controller signals the system clock controller via the request line. The system clock controller in turn, uses the adder to sum the values held in the bandwidth registers of all of the controllers that are currently accessing the resource. The resulting sum is then used as an index to an entry in the frequency table. The contents of the entry are applied to the selection lines of the MUX and dynamically select the appropriate clock frequency for the resource. Thus, the clock frequency for the resource is automatically determined by the total bandwidth utilization of the controllers requesting access to the resource. Accordingly, the clock frequency is preferably chosen so that the bandwidth of the resource closely matches the needed bandwidth. As a result, little power is wasted due to operating the resource at a higher clock frequency than is necessary.
The above mentioned technique and other such techniques based on conventional dynamic frequency clocking reduce power consumption in an ASIC by changing the frequency of operation on the basis of the interface or standard being used at that time. These conventional techniques cannot work in the scenarios when there is only one technology or standard or interface controller is used. In case of single standard, the processing requirement may change depending on the real time incoming data scenarios. For example, the real time data scenario may vary from time to time in terms of the number of packets arriving in a burst, size of each packet, inter-packet delay. Data some
times can come as a burst of large number of packets whereas other times, the burst can just comprise of two or three packets. These variations have not been taken into account by any of the existing techniques of frequency scaling.
Therefore, there arises a need for a system and method for dynamic frequency scaling which dynamically modifies the frequency of operation not on the basis of interface and standard being used but on the basis of load on the processing unit at a particular time, which is the right criterion for the processing requirement.
Summary of the Invention
It is an object of the present invention to provide an improved technique of dynamic frequency scaling which modifies the frequency of operation not on the basis of interface and standard being used but on the basis of load on the processing unit at a particular instant of time.
To achieve the aforementioned objective the present invention provides an improved algorithm for predicting the load on the processing unit and dynamically changing its clock frequency while keeping the synchronization with other processing units. The amount of data in the input memory waiting to be processed is a good indicator of the current load. The present invention utilizes the same concept for predicting the load on the processing unit. The frequency of operation is thus changed on the basis of the percentage of memory being occupied by its input data. The algorithm disclosed by the present invention allows the processing unit to use the maximum possible clock frequency only when it is required and run at some lower frequencies in low processing power requirements. Operating the circuit at low frequency helps in reducing power consumption. The present invention also provides a method of implementing the proposed algorithm in the form of a simple digital circuit.
To overcome the drawbacks of the prior art and to achieve the aforementioned objectives, the present invention provides a system for adaptive frequency scaling in an electronic device comprising:
an input interface block for receiving real time data;
at least one processing unit for processing real time data received by said
input interface block;
at least one memory unit for storing the real time data before said data is
processed by the processing unit;
a frequency divider block for generating multiple clock frequencies from
received clock frequency;
a control unit for selecting the appropriate frequency of operation from
said multiple clock frequencies wherein said selection is based on the level
of utilization of said memory unit
Further the present provides a method for adaptive frequency scaling in an electronic device comprising the steps of:
initializing the processing unit of said electronic device at a first
frequency.
keeping track of data present in memory for processing;
signaling the change in occupancy level of memory;
changing the frequency of operation of said processing unit in response to
change in occupancy level of memory
Brief Description of the Accompanying Drawings
The invention will now be described with reference to the accompanying drawings.
FIGURE 1 shows the basic principle of working of dynamic frequency clocking in the form of a block diagram.
FIGURE 2 shows a block diagram of basic structure of a standard chip.
FIGURE 3 shows a block diagram of an asynchronous FIFO used in the present invention.
FIGURE 4 shows a block diagram of the system disclosed by the present invention. FIGURE 5 shows the logic used for functioning of a Moore machine.
FIGURE 6 shows the implementation of the frequency divider as used in the present invention.
Detailed Description of the Invention
FIGURE 2 shows a block diagram of a standard chip. An Input Interface Block (201) receives real-time data which needs to be processed inside the chip by some Processing Units. The Input Interface Block (201) may either support different standards or a single standard depending on the implementation. The frequency of the Input Interface block is fixed depending on the standard it is supporting e.g. XGMII, GMII etc. on the other hand, the Processing Units (202 A and 202 B) use dynamic frequency scaling scheme. The data is passed from one clock domain to another clock domain through FIFOs (203A and 203B). The data values are written to a FIFO buffer from one clock domain and the data values are read from the same FIFO buffer from another clock domain, where the two clock domains can be asynchronous to each other.
FIGURE 3 shows a block diagram of an asynchronous FIFO used in the present invention. Two independent interfaces to the queue with all the signals needed for the implementation of the algorithm of present invention are also shown. Signals wr_clk & rd_clk denote the clocks used to write to and read from the FIFO buffer respectively. Full & Empty are the signals used to check whether the FIFO is full or empty respectively. The FIFO implementation uses separate pointers for write & read, WR PTR & RD PTR whose width depends on the depth of RAM used to implement the asynchronous FIFO. Before incrementing the FIFO pointers, "if not Full" or "if not Empty" tests are
performed to ensure that overflow or underflow would not happen anytime. These tests are implemented by comparing the status of WR_PTR and RD PTR.
Apart from the above mentioned signals, a multi-bit signal named "almost_full" (herein after refer to as status signal) of width N is used to signal the different percentage occupancy levels of the FIFO buffer. The width of the signal, N, depends on the implementation & different parameters like the maximum or average length of the packets that can come on the input interface and the depth of the buffer used to implement the FIFO. For example, if the average packet length of the incoming packet is 512 bytes, maximum packet length is 1024 bytes and the FIFO RAM depth is 8 Kbytes, then the width N can be taken as 3 such that "almost full[2] = 1" implies that the buffer is at least 75% filled i.e. buffer has at least 12 average size packets, "almost_full[l] = 1" implies that the buffer is at least 50% filled but less than 75% i.e. buffer has between 8-12 average size packets, and "almost_full[0] = 1" implies that the buffer is at least 25% filled but less than 50% i.e. buffer has between 4-8 average size packets. So in this case we have,
almost_full[2] = 1, if buffer occupancy is between 75 % and 100 %; almost_full[l] - 1, if buffer occupancy is between 50 % and 75 %; and almost_full[0] = 1, if buffer occupancy is between 25 % and 50 %. Where, "between a % and b %" means greater than or equal to a % but less than b %. Where a and b can take any value like 25, 50, 75 ,100 as mentioned above.
Different bits of "almost_fuH" are asserted by comparing the status of WR_PTR & RD_PTR as is done to assert "Full" and "Empty" signals. The above definition clearly states that only one single bit of "Almost Full" can be asserted at a time. So in the above example, whenever the memory occupancy level increases from 25 % to 50%, the value of the signal almost_full[2:0] changes from "001" to "010". This type of implementation is analogous with the Hot Code notation in which only one bit can be asserted at a time. Depending on the design implementation for multiple clock domains, it may be converted to Gray Code notation, which is the preferred notation for multiple clock domains. The value of this signal is updated in the "wr_clk" domain i.e. the clock of the block which is
writing to this asynchronous FIFO, whereas it is captured and used in the "rd_clk" domain to change the frequency of the processing unit which has to read and process the data stored in this FIFO buffer. The "almost_full" signal from asynchronous FIFO, 203A, is used to generate clock signal for processing unit 202A as shown in FIGURE 4. Similarly, the "almost_full" signal from asynchronous FIFO, 203B, is used to generate clock signal for processing unit 202B. Hence every processing unit has its clock generation circuit, as shown in FIGURE 4, to generate its clock using "almost_full" signal from its input asynchronous FIFO.
The frequency of the processing unit in the present invention is changed on the basis of the change in workload. The workload is estimated on the basis of the amount of data in the memory waiting to be processed by the processing unit. This is predicted by checking the status of "almost_full" signal. Whenever the value of the signal "almost_fuH" changes, the algorithm changes the frequency of the processing unit. This means that the performance of the algorithm is highly dependent on the definition and the structure of the signal "almost_full" since it represents different memory occupancy levels. Whenever any change is detected in "almostjEull", it triggers the algorithm to change the frequency of the processing unit. For the structure and definition of "almost full" signal described above, the frequency of the processing unit changes only when the memory occupancy level reaches 25%, 50% or 75%. The frequency would not get changed for any other changes in the memory occupancy level because the algorithm is dependent on the structure of "almost_full" signal and according to the above defined definition, the value of "almost_full" signal changes only when the memory occupancy reaches 25%, 50% or 75%. The value of "almost_full" signal will remain same for other changes in the memory occupancy. The total number of frequencies at which the processing block can run also depends on the implementation of "almost_full". If the width of the signal "almost_fuH" is N, then the total number of frequencies available to the processing block is N+l.
Again considering the above example in which the signal width is 3, the total number of available frequencies is 4. Now, referring to FIGURE 4 which shows a block diagram of
the system disclosed by the present invention, "fmax" is the master clock frequency used to generate other scaled clock frequencies using a series of "divide-by-2" frequency dividers.The frequency dividers may be implemented using T flip-flop. 102A, 102B, 102C and 102D are the four frequencies synthesized by the Frequency Divider Block (101). The frequency 102A represents the master clock frequency, fmax, and other three frequencies 102B, 102C and 102D are the frequencies that can be synthesized using the k'divide-by-2" strategy based Frequency Divider Block (101) such that 102A > 102B > 102C > 102D. The actual scaling factor for generating these frequencies depends on the implementation. For the above example, let us choose scaling factor of 2, 4 and 8 for the frequencies 102B, 102C and 102D respectively such that 102A = 2 * 102B = 4 * 102C = 8 * 102D. A frequency selector (103) selects one of these frequencies based on some control signals generated by a control block (104). The selected frequency is shown as f in FIGURE 4.
In one embodiment, the clock frequency for the next task is changed to any of these four frequencies whenever any change in "almost_full" is detected. The processing unit starts working at the lowest frequency, which is 102D in this example. If the value of almost_full[2:0j increases, say from 000 to 001, it means that 25% of memory is now filled with data that needs to be processed by the processing unit and processing unit should try to process the data faster, so increase its frequency to the next level frequency, 102C. Again depending on how the "almost_full" is changed next time, the frequency is changed from 102C to 102B or from 102C to 102D depending on whether "almost_full" has decreased or increased respectively. A clock for triggering the control block (104) is generated by a trigger generator (405) as shown in FIGURE 4. The trigger generator (405) receives the selected clock of frequency (f) and the signal "almost_full" as its input. Every clock cycle of clock having frequency (f), the trigger generator (405) compares the current value of "almost_full" signal with the value available in the last clock cycle. In case the value has got changed, it asserts the signal, CLK, which is used as the clock of control block (104).
In the present invention, a Moore machine acts as a Control Block (104). The Moore machine is implemented by means of a synchronous sequential circuit and its clock, CLK, is generated by the trigger generator (405). FIGURE 5 shows the state machine used for the implementation of the Moore machine. The Moore machine, on each rising edge of CLK, generates control signals, Oi and C»2, which are used by the frequency selector (103) and frequency divider block (101). The state numbers, shown in FIGURE 5, represents the output associated with each state. It is shown that the output associated with each state is fed to the frequency selector which determines the next frequency of the processing unit. The starting state of Moore machine is 00. At this starting point, the value of Almost_Full [2:0] is 000 and the frequency is minimum i.e. 102D. Now if from this point, the value of Almost Full [2:0] gets changed to 001, then the next state would be 01. Depending on the further changes in the value of the "almost_full" signal, the state can change from 01 to 00 or 10 and so on.
The implementation of the frequency divider block (101) is shown in FIGURE 6. 601 A,
601B, and 601C are divide-by-two frequency dividers. They are implemented by means of T flip-flops. 602 A, 602B and 602C are pass logic blocks. These blocks allow or block a signal to pass through them depending on whether the blocks are enabled or disabled respectively. An Encoder (603) takes the outputs, O\ and Oa, of Control Block (104) as its inputs and sends signals c[2],c[l] and c[0] respectively to 601A, 601B, and 601C. These signals enable Pass Logic blocks such that only the needed scaled frequency is generated as the output of frequency divider. Assuming that the "almost_full [2:0]" signal changes its value from 000 to 001, the output of Control Block [01,0)2], would be 01 and the frequency needed would be 102C i.e. only two Pass Logic blocks need to be activated. So encoder (603) generates c [2:0] as 110. As a result, pass logic blocks 602A and 602B are enabled and the required frequency (which is 102C in this case) is generated at the output of 601B. Similarly, if the output of the Control Block is 10, frequency needed would be 102B, so encoder (603) generates c[2:0] as 100. Also, for output values of 11 and 00, the value of c[2:0] would be 000 and 111 respectively. So depending on the values of output of Control Block, Oj and 62, the encoder (603) generates c[2:0] so that only the required logic is enabled.
It is believed that the present invention and many of its attendant advantages will be understood by the foregoing description. It is also believed that it will be apparent that various changes may be made in the form, construction and arrangement of the components thereof without departing from the scope and spirit of the invention or without sacrificing all of its material advantages. The form herein before described being merely an exemplary embodiment thereof, it is the intention of the following claims to encompass and include such changes.
We claim:
1. A system for adaptive frequency scaling in an electronic device comprising:
an input interface block for receiving real time data;
at least one processing unit for processing real time data received by said
input interface block;
at least one memory unit for storing the real time data before said data is
processed by the processing unit;
a frequency divider block for generating multiple clock frequencies from
received clock frequency;
a control unit for selecting the appropriate frequency of operation from
said multiple clock frequencies wherein said selection is based on the level
of utilization of said memory unit.
2. A system for adaptive frequency scaling in an electronic device as claimed in
claim 1, wherein said memory unit is a FIFO buffer.
3. A system for adaptive frequency scaling in an electronic device as claimed in
claim 1, wherein a set of signals associated with each said FIFO comprising:
a clock for writing into said FIFO buffer;
a clock for reading from said FIFO buffer;
a signal for checking if said FIFO is full or empty;
a pointer for reading from said FIFO buffer;
a pointer for writing into said FIFO buffer; and
a status signal for signaling the percentage occupancy level of said FIFO
buffer.
4. A system for adaptive frequency scaling in an electronic device as claimed in
claim 3, wherein definition and structure of said status signal depends on length of
packets received by said input interface and depth of buffer used to implement
said FIFO buffer.
5. A system for adaptive frequency scaling in an electronic device as claimed in
claim 3, wherein only one single bit of said status signal is asserted at a time.
6. A system for adaptive frequency scaling in an electronic device as claimed in
claim 3, wherein width of said pointers depends on width of RAM used to
implement said FIFO.
7. The system as claimed in claim 1, wherein said frequency divider block
comprises of:
at least one frequency divider;
at least one pass logic block coupled before each said frequency divider;
and
an encoder for sending signals to enable said pass logic blocks.
8. A system for adaptive frequency scaling in an electronic device as claimed in
claim 7, wherein said frequency divider is implemented by a T Flip-flop.
9. A system for adaptive frequency scaling in an electronic device as claimed in
claim 3, wherein said frequency divider is a divide-by-two frequency divider.
10. A system for adaptive frequency scaling in an electronic device as claimed in
claim 3, wherein said pass logic allows a signal to pass through when enabled.
11. A system for adaptive frequency scaling in an electronic device as claimed in
claim 1, wherein each said FIFO is asynchronous.
12. A system for adaptive frequency scaling in an electronic device as claimed in
claim 1, wherein said control block is a Moore machine implemented by
synchronous sequential circuit.
13. A system for adaptive frequency scaling in an electronic device as claimed in
claim 1, wherein the number of states in the Moore machine of said control block
is equal to the number of frequencies available at the input of frequency selector.
14. A system for adaptive frequency scaling in an electronic device as claimed in
claim 1, wherein the number of frequencies available at the input of frequency
selector is one greater than the width of said status signal.
15. A method for adaptive frequency scaling in an electronic device comprising the
steps of:
initializing the processing unit of said electronic device at a first
frequency.
keeping track of data present in memory for processing;
signaling the change in occupancy level of memory;
changing the frequency of operation of said processing unit in response to
change in occupancy level of memory.
16. A method for adaptive frequency scaling in an electronic device as claimed in
claim 15, wherein value of said signal increases or decreases when memory
occupancy increases or decreases respectively.
17. A method for adaptive frequency scaling in an electronic device as claimed in
claim 15, wherein frequency of operation increases or decreases when value of
said signal increases or decreases respectively.
18. A system for adaptive frequency scaling in an electronic device substantially as
herein described with reference to and as illustrated in the accompanying
drawings.
19. A method for adaptive frequency scaling in an electronic device substantially as herein described with reference to and as illustrated in the accompanying drawings.
| # | Name | Date |
|---|---|---|
| 1 | 931-del-2006-abstract.pdf | 2011-08-20 |
| 1 | 931-del-2006-form-3.pdf | 2011-08-20 |
| 2 | 931-del-2006-claims.pdf | 2011-08-20 |
| 2 | 931-del-2006-form-2.pdf | 2011-08-20 |
| 3 | 931-del-2006-correspondence-others.pdf | 2011-08-20 |
| 3 | 931-del-2006-form-1.pdf | 2011-08-20 |
| 4 | 931-del-2006-description (complete).pdf | 2011-08-20 |
| 4 | 931-del-2006-drawings.pdf | 2011-08-20 |
| 5 | 931-del-2006-description (complete).pdf | 2011-08-20 |
| 5 | 931-del-2006-drawings.pdf | 2011-08-20 |
| 6 | 931-del-2006-correspondence-others.pdf | 2011-08-20 |
| 6 | 931-del-2006-form-1.pdf | 2011-08-20 |
| 7 | 931-del-2006-claims.pdf | 2011-08-20 |
| 7 | 931-del-2006-form-2.pdf | 2011-08-20 |
| 8 | 931-del-2006-abstract.pdf | 2011-08-20 |
| 8 | 931-del-2006-form-3.pdf | 2011-08-20 |