FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
SYSTEM AND METHOD FOR MONITORING AND CONTROLLING THERMAL CONDITION OF A DATA CENTER IN REAL-TIME
Applicant:
Tata Consultancy Services Limited A company Incorporated in India under The Companies Act, 1956
Having address:
Nirmal Building, 9th Floor.
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
[001] The present subject matter described herein, in general, relates to a method and
a system for monitoring and controlling thermal state of a data center.
BACKGROUND
[002] Data centers are centralized repository for housing and managing number of
electronic equipments. These electronic equipments are generally placed in number of housing units such as racks arranged in the data center. The electronic equipments may comprise servers. computers: communication devices etc. There are frequent temperature variations inside the data center on account of variety of causes which include variation in heat load patterns with changing utilization of servers, change in ambient temperatures due to seasons and temperature cycling in cooling units among others. These result in creation of hot and cold conditions in different regions of the data center, which may cause serious damages to the electronic equipments placed therein. A data center manager has to confront a dual challenge of ensuring the thermal safety of data center equipment and simultaneously keep the cooling costs at a minimum. Individual control of cooling units does not cater to global need of energy efficient operation. Consequently, a data center manager tends to overcool the data center rather than bearing the risk of unsafe thermal operation. This leads to unnecessary cooling costs. Therefore a data center needs a centralized and continuous monitoring and controlling system in order to maintain the data center in a thermally safe yet an energy efficient state.
[003] Computer Room Air Condition (CRAC) or Computer Room Air Handler
(CRAH) or other types of air cooling units are used to take away the heat generated by the electronic equipment. The CRACs may be further classified into supply-air controlled CRAC and return-air controlled CRAC. Compressors of a typical return air controlled CRAC may frequently switch over a time and thus results in a huge temperature variation in supply temperatures. Thus, for controlling the thermal state of the data center, it is a challenge to adapt to such frequent temperature variations in the CRAC supply temperatures. This calls for a robust monitoring and control system, which would provide corrective actions based on the analysis of current temperature scenario in the data center and yet maintain energy efficiency.
[004] A data center encompasses a complex interplay of fluid flow and heat transfer,
where a number of CRACs and racks interact with each other directly or indirectly. Hence, in order to determine a corrective action based on the analysis of temperatures, it is imperative to know the influence of these entities on each other. Some known techniques may be used for identifying the affected regions in the data center. In order to locate the affected regions, large number of temperature sensors may be used which may add cost to the overall solution. Minimizing the use of the temperature sensors is yet another concern in the known techniques. The affected regions may be an indicative for the data center being in any one of the hot condition and the cold condition.
[005] The existing monitoring and control systems provide recommendations in
terms of CRAC supply temperatures. This limits the application domain of such control systems to a particular kind of CRACs like supply-air controlled CRAC. Large numbers of data centers are found to employ return air controlled CRACs, where CRAC supply temperature may not be directly controlled. Hence, it becomes imperative to have a control system which would provide corrective actions in terms of a directly controllable operational parameter like set points of CRACs.
[006] Thus, in the view of the above discussed concerns/challenges, there is a
considerable need of a system and a method to overcome one or more concerns/challenges for providing energy efficient data center and yet have a thermally safe data center.
SUMMARY
[007] This summary is provided to introduce aspects related to systems and methods
for real-time monitoring and control to optimize operation of a data center and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
[008] In one implementation, a system for real-time monitoring and control to
optimize operation of a data center by controlling operational parameter of one or more cooling units impacting one or more heat generating devices in the data center is disclosed. The system comprises a processor and a memory coupled to the processor for executing a set of modules stored in the memory. The set of modules comprises an obtaining module, a
temperature analyzing module, a computing module, an identification module, and a generation module. The obtaining module is configured to obtain continuously a first-set of temperatures for the one or more heat generating devices in the real-time during every pre-defined time interval (t). The each temperature of the first-set of temperatures is estimated by a thermal predictor in the real-time and further, a temperature of each heat generating device is obtained at each instance of a pre-determined number of instances of the pre-defined time interval (t). The first-set of temperatures obtained is analyzed by the temperature analyzing module in order to identify the one or more heat generating devices under one of a hot condition and a cold condition, and further to categorize state of the data center into one of a hot detection mode and a cold detection mode. Thereafter, the computing module is configured to compute one of a hot-reference temperature and a cold-reference temperature for the one or more heat generating devices classified under one of the hot condition and the cold condition. Further, the identification module is configured to identify a target cooling unit amongst the one or more cooling units based on a collective influence of the one or more cooling units on the one or more heat generating devices classified under one of the hot condition and the cold condition, and based on a historical control signal log. After identification of the target cooling unit, the generation module is configured to iteratively generate a control signal comprising one or more gradual changes in an operational parameter of the target cooling unit to optimize operation of the data center in a stepwise manner.
[009] In another implementation, method for real-time monitoring and control to
optimize operation of a data center by controlling operational parameter of one or more cooling units impacting one or more heat generating devices in the data center is disclosed. The method comprises a step of obtaining continuously a first-set of temperatures for the one or more heat generating devices in the real-time during a pre-defined time interval (t). The each temperature of the first-set of temperatures is estimated by a thermal predictor in the real-time and further, a temperature of each heat generating device is obtained at each instance of a pre-determined number of instances of the pre-defined time interval (t). Further, the method is provided for analyzing the first-set of temperatures obtained for identifying the one or more heat generating devices under one of a hot condition and a cold condition, and further categorizing state of the data center into one of a hot detection mode and a cold detection mode. Also, the method is provided for computing one of a hot-reference temperature and a
cold-reference temperature for the one or more heat generating devices classified under one of the hot condition and the cold condition. The method is further enabled for identifying a target cooling unit amongst the one or more cooling units based on a collective influence of the one or more cooling units on the one or more heat generating devices classified under one of the hot condition and the cold condition, and further based on a historical control signal log. Upon identification of the target cooling unit the method is provided for iteratively generating a control signal comprising one or more gradual changes in an operational parameter of the target cooling unit to optimize operation of the data center in a stepwise manner. The method for the obtaining, the analyzing, the identifying, and the iteratively generating are performed by the processor.
[0010] Yet in another implementation, computer program product having embodied
thereon a computer program for real-time monitoring and control to optimize operation of a data center by controlling operational parameter of one or more cooling units impacting one or more heat generating devices in the data center is disclosed. The computer program product comprises a step of obtaining continuously a first-set of temperatures for the one or more heat generating devices in the real-time during a pre-defined time interval (t). The each temperature of the first-set of temperatures is estimated by a thermal predictor in the real-time and further, temperature of each heat generating device is obtained at each instance of a predetermined number of instances of the pre-defined time interval (t). Further, an instruction is provided for analyzing the first-set of temperatures in order to identify the one or more heat generating devices under one of a hot condition and a cold condition, and further to categorize state of the data center into one of a hot detection mode and a cold detection mode. Also, an instruction is further provided for analyzing the first-set of temperatures in order to identify the one or more heat generating devices under one of a hot condition and a cold condition and further to categorize state of the data center into one of a hot detection mode and a cold detection mode. Further, the instruction is enabled for computing one of a hot-reference temperature and a cold-reference temperature for the one or more heat generating devices classified under one of the hot condition and the cold condition. Thereafter, an instruction is further provided for identifying a target cooling unit amongst the one or more cooling units based on a collective influence of the one or more cooling units on the one or more heat generating devices classified under one of the hot condition and the cold condition, and
further based on historical control signal log. After identification of the target cooling unit, an instruction is provided for iteratively generating a control signal comprising one or more gradual changes in an operational parameter of the target cooling unit to optimize operation of the data center in a stepwise manner.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The detailed description is described with reference to the accompanying
figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
[0012] Figure 1 illustrates a network implementation of a system for real-time
monitoring and control to optimize operation of a data center is shown, in accordance with an embodiment of the present subject matter.
[0013] Figure 2 illustrates the system, in accordance with an embodiment of the
present subject matter.
[0014] Figure 3, illustrates a detailed working of the system, in accordance with an
embodiment of the present subject matter.
[0015] Figure 4 illustrate detail explanation of the obtaining module for obtaining the
temperatures of the heat generating devices, in accordance with one embodiment of the present subject matter.
[0016] Figure 5 illustrates detail description of the temperature analyzing module for
analyzing the first-set of temperatures, in accordance with an embodiment of the present subject matter.
[0017] Figure 6 illustrates detail explanation of identification module and the
generation module, in accordance with an embodiment of the present subject matter.
[0018] Figure 7 illustrates detail working of the system 102 for achieving a stable
thermal state, in accordance with one embodiment of the present subject matter.
[0019] Figure 8 illustrates a method for real-time monitoring and control to optimize
operation of a data center, in accordance with an embodiment of the present subject matter.
DETAILED DESCRIPTION
[0020] Systems and methods for monitoring and control to optimize operation of a
data center in a real-time have been disclosed in present subject matter. The data center comprises one or more cooling units and one or more heat generating devices. The "one or more cooling units" hereinafter referred as "cooling units" may be a computer room air condition (CRAC) or computer room air handler (CRAH) or other types of air cooling units which may be used for cooling the data center. Also, the CRACs may be of different types like supply-air controlled CRAC and return-air controlled CRAC. Further, the "one or more heat generating devices" hereinafter referred as "heat generating devices" may comprise a number of electronic equipments capable of generating heat such as various types of servers, computers, communication devices etc. The heat generating devices may be mounted upon different sized rack-cabinets placed in a prearranged order based on dynamics of the data center. The data center operation is very dynamic in nature due to server utilization changes, seasonal temperature variations and CRAC cycling among others. Therefore, there are twin challenges of maintaining the thermal safety of the data center as well as ensuring an energy efficient operation. The present subject matter provides the means to address these challenges.
[0021] In the process of providing the solution, temperature prediction from the heat
generating devices may be required in a real-time. It may be noted that, the present subject matter is enabled for obtaining the temperatures of the heat generating devices in a real-time fashion at one or more instances during a pre-defined time interval (t). Further, each temperature of the heat generating devices may be predicted by a thermal predictor explained in detail in subsequent paragraphs of the detail description. Based on the temperatures obtained/predicted, thermal condition of the heat generating devices may be identified i.e., a heat generating device may either identified in a hot condition or in a cold condition for a particular time interval. Similarly, on basis of temperatures obtained/predicted, state of the data center may be categorized into any one of a hot detection mode and cold detection mode.
[0022] Thereafter, for maintaining thermally safe and energy efficient operation of the
data center, recommendations may be generated for optimizing the operations of the data center by controlling operational parameter of the cooling units impacting the heat generating devices in the data center. The operational parameter may be a set-point to be determined in
terms of a temperature for the cooling units. Before determining the set-point, the cooling units may be prioritized based on their impact on the heat generating devices in the data center. Thus, the determination of the set-point may be done for the cooling units prioritized. By following such approach i.e., prioritizing the cooling units on the basis of their impact, the thermal state of the data center may be controlled in a stepwise manner. Based on the priority, at a single instance, only one cooling unit amongst all the cooling units in the data center is designated as a most-influential cooling unit or a target cooling unit for which the set-point is to be determined.
[0023] The set-point determined for the most-influential cooling unit may be applied
and its impact on the thermal state of the data center may be monitored. Based on the impact on the thermal state of the data center in response of the set-point applied, the present subject matter is enabled for designating another cooling unit, based on the priority, as a next most-influential cooling unit from the cooling units. All over again, for said next most-influential cooling unit designated based on the priority, a set-point is determined and applied for controlling the thermal state of the data center. Thus, according to present approach described in the present subject matter, only one cooling unit i.e., the most-influential cooling unit may be taken into consideration for determination of the set-point at a time rather than determining the set-point for each of the cooling units impacting the heat generating devices in the data center and further applying the set-points of all the cooling units at a same time. Also, by providing the recommendations in terms of the set-points, the present subject matter is enabled to control return-air controlled type CRACs where direct control of supply temperature may not be possible. This makes the present subject matter to be used in a more generalized manner and also gives a wider application domain.
[0024] While aspects of described system and method for monitoring and control to
optimize operation of the data center in the real-time may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.
[0025] Referring now to Figure 1, a network implementation 100 of system 102 for
real-time monitoring and control to, optimize operation of a data center is illustrated, in accordance with an embodiment of the present subject matter. In one embodiment, the system
102 facilitates optimization of the operation of the data center by controlling operational parameter of cooling units impacting heat generating devices in the data center. The system 102 may obtain continuously a first-set of temperatures for the one or more heat generating devices during a pre-defined time interval (t). Each temperature of the first-set of temperatures may be estimated by a thermal predictor, where a temperature of each heat generating device is obtained at each instance of pre-determined number of instances of the pre-defined time interval (t). Further, the system 102 may analyze the first-set of temperatures to identify one or more heat generating devices under one of a hot condition and a cold condition, and to categorize state of the data center into one of a hot detection mode and a cold detection mode. Thereafter, the system 102 may compute one of a hot-reference temperature and a cold-reference temperature for the one or more heat generating devices classified under one of the hot condition and the cold condition. Upon computation, the system 102 may further identify a target cooling unit amongst the one or more cooling units based on a collective influence of the one or more cooling units on the one or more heat generating devices classified under one of the hot condition and the cold condition, and historical control signal log. Further, the system 102 may to iteratively generate a control signal comprising one or more gradual changes in an operational parameter of the target cooling unit to optimize operation of the data center in a stepwise manner.
[0026] Although the present subject matter is explained considering that the system
102 is implemented for optimizing operation of the data center on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like. It will be understood that the system 102 may be accessed by multiple users through one or more user devices 104-1, 104-2... 104-N, collectively referred to as user 104 hereinafter, or applications residing on the user devices 104. Examples of the user devices 104 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 104 are communicatively coupled to the system 102 through a network 106.
[0027] In one implementation, the network 106 may be a wireless network, a wired"
network or a combination thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network
(WAN), the internet and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[0028] Referring now to Figure 2, the system 102 is illustrated in accordance with an
embodiment of the present subject matter. In one embodiment, the system 102 may include at least one processor 202, an input/output (I/O) interface 204. and a memory 206. The at least one processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 202 is configured to fetch and execute computer-readable instructions or modules stored in the memory 206.
[0029] The I/O interface 204 may include a variety of software and hardware
interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 204 may allow the system 102 to interact with a user directly or through the client devices 104. Further, the I/O interface 204 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 204 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 204 may include one or more ports for connecting a number of devices to one another or to another server.
[0030] The memory 206 may include any computer-readable medium or computer
program product known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or nonvolatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, a compact disks (CDs), digital versatile disc or digital
video disc (DVDs) and magnetic tapes. The memory 206 may include modules 208 and data
224.
[0031] The modules 208 include routines, programs, objects, components, data
structures, etc.. which perform particular tasks or implement particular abstract data types. In one implementation, the modules 208 may include an obtaining module 210, a temperature analyzing module 212, a computing module 214, an identification module 216, a generation module 218, a user-interface module 220 and other modules 222. The other modules 222 may include programs or coded instructions that supplement applications and functions of the system 102. According to embodiments of present subject matter, the other modules 222 may comprise a thermal predictor (302).
[0032] The data 224, amongst other things, serves as a repository for storing data
processed, received, and generated by one or more of the modules 208. The data 224 may also include an influence index metrics 226, a historical control signal log 228, and other data 230. Further, it may be noted that the influence index metrics 226 may be obtained based on a methodology/technique disclosed in an Indian Patent application 652/MUM/2011, hereinafter incorporated as a reference.
[0033] In one implementation, at first, a user may use the client device 104 to access
the system 102 via the I/O interface 204. The users may register them using the I/O interface 204 in order to use the system 102. The working of the system 102 may be explained in detail in Figure 3 explained below. The system 102 may be used for real-time monitoring and control to optimize operation of a data center. The optimization may be done by controlling operational parameter of cooling units impacting heat generating devices in the data center.
[0034] Referring to Figure 3, a detailed working of the system 102 is illustrated, in
accordance with an embodiment of the present subject matter. The purpose of the present subject matter is to provide a thermally safe as well as an energy efficient data center. Due to its dynamic nature, the temperature pattern of the data center may undergoes numerous changes frequently due to variety of reasons like changing in load utilizations of the heat generating devices, seasonal variations, and cycling of cooling units. Thus, it may be the purpose of the subject matter to enable the system 102 to overcome such dynamic nature of
the data center by quickly adapting to such changing conditions in a real-time and providing recommendations accordingly.
[0035] To achieve the above discussed one or more purposes, the system 102 is
enabled for providing real-time monitoring and control to optimize operation of the data center. Optimizing the operation of the data center may be performed by controlling operational parameter of cooling units impacting heat generating devices in the data center. There may be n number of cooling units and n number of heat generating devices residing in the data center. Further, the heat generating devices may comprise a number of electronic equipments capable of generating heat such as various types of servers, computers, communication devices etc. As discussed above, the dynamic parameters like seasonal variations, variations in utilization of the heat generating devices and cycling of the cooling units, may take the data center into either an over-heated or an over-cooled condition. The over-heated and the over-cooled condition may be referred as a hot detection mode and cold detection mode respectively throughout the detail description of the present subject matter.
[0036] To overcome such conditions in the data center, the system 102 comprises
various modules 208 stored in memory 206 of the system 102. One of such module is an obtaining module 210 which may be configured to continuously obtain temperatures for the heat generating devices in the real-time at one or more instances during a pre-defined time interval (t). The detail explanation of the obtaining module 210 for obtaining the temperatures of the heat generating devices may be understood by referring a flow diagram (400) of figure 4.
[0037] Now referring the flow diagram (400) of the figure 4, at block 402, the
temperature obtaining process may be initiated by the obtaining module 210, where the temperatures may be obtained continuously for pre-defined time interval (t) as shown in block 404. The temperatures obtained by the obtaining module 210 (for the heat generating devices in the data center) may be referred as a first-set of temperatures. Each temperature of the first-set of temperatures may be estimated by a thermal predictor 302. Further, a temperature of each heat generating device may be obtained at each instance of a pre-determined number of instances of the pre-defined time interval (t). According to embodiments of the present subject matter, the thermal predictor 302 may be a module/set of instructions stored in the memory
206 of the system 102. For estimating the each temperature of the first-set of temperatures, the thermal predictor 302 may use an influence index metrics 226, and set of supply temperatures of the cooling units, power dissipation of the heat generating devices, or combinations thereof. Further, the set of supply temperatures of the cooling units may be obtained from one or more sensors (not shown in figure). The one or more sensors may be a temperature sensing device capable for sensing the supply temperatures from the cooling units.
[0038] In general, the influence index metrics 226 may be an indicative of an
influence of different sources in the data center (i.e. the cooling units and hot air recirculation from outlets of the heat generating devices) onto the data center's targets (i.e. inlets of the heat generating devices) in terms of air flow distribution. For each heat generating device in the data center, a metric is obtained through the influence index metrics 226, each of which indicates influence of the above discussed sources on the heat generating devices. Further, the influence index metrics 226 may be computed by using one of mathematical and experimental methods (as disclosed in the Indian Patent Application 652/MUM/2011). It may be noted to a personal skilled in art that, the present subject matter is capable of using different set of influence index metrics preconfigured in database of the system 102 for several varieties of data center i.e., depending upon the dynamics (design and air-flow configurations) of the data center.
[0039] Further, according to embodiments of the present subject matter, the one or
more sensors may be a temperature sensing device. The temperatures sensing device may be further capable for recording and storing the supply temperature of the cooling units sensed and further transmitting the temperature to a desired location. Further, at block 406, the each temperature of the first-set of temperatures estimated by the thermal predictor may be obtained. Further, a temperature for each heat generating device may be obtained at each instance of predetermined number of instances in the pre-defined time interval (t). Thus, at the block 406, the first-set of temperatures for the heat generating devices may be obtained for further analysis. According to embodiments of the present subject matter, there may be n number of instances in the pre-defined time interval (t) at which temperature of the heat generating devices may be estimated by the thermal predictor 302 and further obtained as the first-set of temperatures by the obtaining module 210. The time interval (t) consisting of n instances of obtaining temperatures may be measured backwards in time from the current
real-time instance of obtaining the temperatures. Every new instance of obtaining the temperatures may update the first-set of temperatures by adding the temperatures obtained at current real-time instance to the set and deleting the temperatures obtained at nth instance measured backwards in time from the current instance.
[0040] While the first-set of temperatures are continuously obtained and stored in the
database of the system 102, it may be further processed by a temperature analyzing module 212 as shown in figure 3 in a real-time. The detail description of the temperature analyzing module 212 for analyzing the first-set of temperatures may be understood by referring a flow diagram 500 of figure 5 as below.
[0041] Now referring the flow diagram 500 of the figure 5, at block 502, the first-set
of temperatures (FST) 412 may be received for being analyzed by a temperature analyzing module 212. One of a purpose of analyzing the FST may be to identifying the heat generating devices (HGD) under one of hot condition and cold condition. Another purpose for analyzing the first-set of temperatures (FST) may be to categorize the data center into one of the hot detection mode and the cold detection mode.
[0042] For achieving the above discussed one or more purposes, it may be required to
get actual condition of the heat generating devices in the data center. Due to cycling of the cooling units and other factors, the data center may experience huge fluctuation in the temperatures of the heat generating devices. Also, the cycling period may vary depending upon heat load condition in the data center. A possibility of getting false or irrelevant detection of actual condition of the heat generating devices may also arise and may further results in performing wrong analysis. Thus, to prevent such false detections and the wrong analysis, it is essential to analyze the obtained data to be performed by the temperature analyzing module 212.
[0043] To overcome such one or more concerns, the temperature analyzing module
212 may be configured to continuously monitor and analyze the first-set of temperatures (received at the block 502) for pre-defined time interval (t). The next step of the analysis performed by the temperature analyzing module 212 at block 504 (conditional block), where the FST are analyzed with a threshold temperature (TThreshold) to verify the heat generating devices falling under hot condition or cold condition. Based on the analysis at the block 504,
those heat generating devices satisfying the above condition (i.e., corresponding temperatures of the heat generating devices crossing the threshold temperature at one or more instances in a sub-time interval (thd) of the pre-defined time interval (t) may be classified under the hot condition. Thus, at block 506, those heat generating devices falling under the hot condition may be listed along with their temperatures (hot-reference temperatures). Further, the corresponding temperatures for heat generating devices categorized/falling under the hot condition may be referred as a second-set of temperatures, thus the second-set of temperatures are subset of the first-set of temperatures.
[0044] By introducing the sub-time interval (thd) while performing the analysis at the
block 504. it may be used as a check-point by the temperature analyzing module 212 to avoid any hot detection of the heat generating devices due to sporadic and insignificant surges in temperatures caused by variety of above discussed reasons in the data center. Since, the heat generating devices are classified under the hot detection, if their corresponding temperatures crossing the threshold temperature (TThreshold) at one or more instances of the sub-time interval (thd), it may facilitate more accurate categorization of the heat generating devices falling under the hot condition. According to one embodiment of the present subject matter, the heat generating devices may be classified under the hot condition, if the temperatures of the heat generating devices are crossing the threshold temperature (TThreshold) for two instances of the one or more instances in the sub-time interval (thd). The determination of the sub-time interval (thd) may be done on basis of a detail study of the dynamics of the data center. It may be noted to a person skilled in the art that the heat generating devices may be classified under the hot condition, if the temperatures of the heat generating devices are crossing the threshold temperature (TThreshold) for "x" number of instances of the one or more instances in the sub-time interval (thd)- According to embodiments of the present subject matter, for each heat generating device identified under the hot condition, a hot-reference temperature may be determined. According to embodiments of the present subject matter, the hot-reference temperature may be referred to a temperature obtained for the heat generating devices at any instance of the one or more instance in the sub-time interval (thd). Furthermore, the hot-reference temperature may also be determined by performing either one of mathematical or statistical operation on the first set of temperatures. Thus, the heat generating devices falling
under the hot condition along with their respective hot-reference temperatures may be shortlisted at block 506, for applying corrective actions which is explained later in detail.
[0045] Similarly, by referring the same conditional block 504. where the condition is
not satisfied i.e., if no occurrence of such hot condition is detected for another predefined time interval (tcd) (according to block 508), then the heat generating devices may be identified/classified under the cold condition. The identified heat generating devices under the cold condition along with their cold-reference temperature may be shortlisted at block 510. Similar to the thd which may be used as a check-point for identifying the heat generating devices under the hot condition, the tcdmay also be used at the block 508 as a check-point for identifying the heat generating devices under the cold condition. Further, for determining tcd detail knowledge of the data center dynamics may be required. Thus, by analyzing the first-set of temperatures for another predefined time interval (tCd) may facilitate more accurateness for identifying the heat generating devices falling under the cold condition.
[0046] According to embodiments of the present subject matter, temperatures of the
heat generating devices falling under the cold condition may be referred as a third-set of temperatures, where the third-set of temperatures may be a sub-set of the first-set of temperatures. Further, for each heat generating device identified under the cold condition, a cold-reference temperature may be determined. In one embodiment, the cold-reference temperature may refer to a temperature obtained for the heat generating device having maximum number of instances along with a maximum temperature value i.e., a most common maximum temperature. It may be further noted to a person skilled in the art, that the cold-reference temperature may be referred to a temperature having different combinations of instances with different combinations of temperature value.
[0047] Specially, for the return-air controlled type CRACs, the determination of the
most common maximum temperature (cold-reference temperature) is more necessary as it has been generally observed that such type of CRAC have fluctuating supply temperatures due to switching of their compressors. Due to such fluctuation, the temperature of the heat generating devices may also get fluctuated. Thus, it necessitates the determination of the cold-reference temperature for the heat generating devices and then considers the heat generating devices for applying the corrective actions. Thus, at the block 510, the heat generating devices
under the cold condition along with their cold reference temperature may be listed for further processing. Further, the first-set of temperatures obtained, the second-set of temperature corresponding to the heat generating devices identified under the hot condition, the third-set of temperatures corresponding to the heat generating devices identified under the cold condition, and the threshold temperature (TThreshold) may be stored in a database of the memory 206 of the system 102.
[0048] Thus, by categorizing the heat generating devices under any one of the hot
condition and the cold condition, a relevant data is filtered out for analysis. If the data center is categorized under the hot detection mode, then only those heat generating devices falling under the hot condition may be considered for further analysis i.e., for taking corrective actions. According to embodiments of the present subject matter, it may noted to a person skilled in the art, that the data center may be categorized under the hot detection mode even if only one heat generating device is identified in the hot condition. In such case, only the hot reference temperature may be determined and thus, no cold reference temperature is determined.
[0049] Similarly, when the data center is categorized under the cold detection mode,
then only those heat generating devices falling under the cold condition may be considered for the analysis i.e.. for taking corrective actions. According to embodiments of the present subject matter, it may be noted to the person skilled in the art. that if none of the heat generating devices are detected under the hot condition during the predefined time interval (TCd), then the data center may be categorized under the cold detection mode. Thus, in such a case, only the cold reference temperature may be determined.
[0050] Thus, the list of the heat generating devices identified under the hot condition
or the cold condition along with their hot-reference temperatures and the cold-reference temperatures respectively is passed over to the identification module 216 for further processing. According to embodiments of the present subject matter, a computing module 214 (figure 3) is configured to compute the hot-reference temperatures and the cold-reference temperatures in order to take the corrective actions or provide recommendations for controlling the unbalanced thermal state of the data center.
[0051] For taking the corrective actions, the identification module 216 and the
generation module 218 (figure 3) may be required. The detailed explanation of both the modules i.e., 216 and 218 for generating recommendation may be understood by referring a flow diagram 600 of figure 6. According to embodiments of present subject matter, for controlling the thermal state of the data center, operational parameters of the cooling units needs to controlled.
[0052] Now referring the flow diagram 600 of the figure 6, at block 602, the list of the
heat generating devices classified under the hot condition and the cold condition are received by the identification module 216 for analysis. From the list received, the identification module 216 may perform a check at block 604, to check whether the data center is in a hot detection or a cold detection mode. . If data center is under the hot detection mode, only hot reference temperatures may be determined for the heat generating devices under the hot condition. Further, if the data center is in the cold detection mode, only cold reference temperatures may be calculated for the heat generating devices under the cold condition.
[0053] The next step may be performed by the identification module 216 is to identify
a most-impacting or most-influential cooling unit amongst the cooling units impacting the heat generating devices. According to embodiments of the present subject matter, the most-impacting or the most-influential cooling unit may be referred as a "target cooling unit".
[0054] The cooling units impacting the heat generating devices (under the hot
condition) may be analyzed by the identification module 216. Each cooling unit of the cooling units may have an influence on the heat generating devices (under the hot condition) in the data center. The influence may be in terms of a "metric" which may be derived from an influence index metrics (as disclosed in the Indian Patent application 652/MUM/2011). The metric indicates an impact of the cooling unit on the heat generating devices in the data center. Thus, at block 606, the identification module 216 may be configured to determine a collective influence of the cooling units based on the metric. According to embodiments of present subject matter, the collective influence may be determined by a statistical technique. Further, the statistical technique may determine the collective influence based on temperature predicted for each of the heat generating devices and the total number of the heat generating devices affected by each cooling unit of the cooling units. Further, it may be noted to a person
skilled in art, that there may other statistical and/or mathematical techniques may be used for determining the collective influence for the cooling units. Upon determining the collective influence, at block 608, the cooling units may be prioritized on basis of their impact i.e., collective influence. Hence, a cooling unit amongst the cooling units having the maximum collective influence on the racks identified under hot detection may be considered as a target cooling unit. Thus, a priority list comprising one or more target cooling units may be obtained on basis of ascending order of their impact at block 608, where said priority list may be referred as a "hot priority list".
[0055] Similarly, by following the same approach for determining the target cooling
unit (as discussed above), at block 616, the identification module 216 may be further configured to determine a priority list of one or more target cooling units impacting the heat generating devices in the cold condition. But, the priority list may be determined in an ascending order of their impact on the heat generating devices, where said priority list may be referred as a "cold priority list". It may be noted to a person skilled in art, that practically only one priority list may be generated, because the data center may be detected either in the hot detection mode or in cold detection mode at a time. Thus, depending upon the mode (the hot detection mode or the cold detection mode) of the data center, the priority list may be processed.
[0056] Therefore, by finding out the most influential cooling unit, it is ensured that
maximum impact is achieved through every recommendation in gradual manner. After obtaining the priority list for the one or more target cooling units, the next step is to finalize the list before implementing it for recommendation. Since, the recommendation may be in combination of a target cooling unit and its set-point i.e., operational parameter of the target cooling unit, it may be another step to verify that whether implementing such recommendation may transit state of the data into any one of hot detection mode and cold detection mode. Thus, to finalize the priority list, the identification module 216 may be further configured to refer a historical control signal log 228 (figure 3). The historical control signal log 228 may comprise a historical recommendation data for the cooling units in the data center. The historical recommendation data may indicate outcomes associated with previous recommendations. The outcomes may be a transition of the data center into the hot detection mode or into the cold detection mode.
[0057] To prevent such outcome, at block 610. the identification module 216 may
check whether the current hot detection mode of the data center is a direct consequence of a previous recommendation i.e., previously recommended operational parameter for a target cooling unit. As, it may be possible that by implementing a cold recommendation i.e. a recommendation provided against a cold detection mode of the data center, typically consisting of increasing the set point of one or more target cooling units, the state of the data center may transit from the cold detection mode to the hot detection mode again. More particularly, if the data center is classified to be in a hot detection mode within the predefined sub-time interval thd from previously implemented cold recommendation (recommended increase in set points of one or more target cooling units), then the hot detection mode may be assumed to be a direct consequence of that previous cold recommendation implemented.
[0058] However, if the data center is classified to be in a hot detection mode after the
predefined time interval thd from previously implemented cold recommendation, the generated hot recommendation may be finalized for being implemented at block 612. Moreover, after every implementation of the hot recommendation i.e., combinations of set-points (set points before the recommendation) responsible for the hot condition may be stored in a database as a hot flag. Thus, the system 102 may ensure that this hot flagged combination of set-points may not be recommended again for another fixed time interval Tp. Therefore all such combinations of set points that may be present in a generated priority list may be deleted from the finalized priority list. According to embodiments of present subject matter, all combinations of set-points may get reset after said another fixed time interval Tp.
[0059] Now referring the cold recommendation cycle, cold priority list (having one or
more target cooling units) for the heat generating devices under the cold condition may have to be finalized. For finalizing the cold priority list, the identification module 216 may be-configured to check recommendations (set-points) from the database for the hot flag. If any combination of set-points found to be matched with the hot flag stored in the database, then the identification module 216 may further be configured to delete set-point combinations from the cold priority list. After performing this operation, the cold priority list may be finalized and implemented. Once implemented, the recommendation (the set point combination) is again deleted from the cold priority list, so that the second recommendation in the cold priority list takes it place at the top.
[0060] The set-point referred in the above discussions may be considered as
operational parameter of the cooling units, more specifically, target cooling units. Upon finalizing the lists of the one or more target cooling units, a generation module 218 is configured to iteratively generate a control signal to optimize operation of the data center in a stepwise manner. The control signal generated may comprise one or more gradual changes in an operational parameter of the target cooling unit. According to embodiments of the present subject matter, the operational parameter i.e., the set-point in terms of a temperature for the target cooling unit identified. According to embodiments of the present subject matter, the set points may be increased if data center is in cold detection mode. This recommendation is referred to as cold recommendation. Conversely, set points may be decreased if the data center is in hot detection mode. This recommendation is referred to as hot recommendation. According to embodiments of present subject matter, after generating the recommendations, the system 102 passes the control over the obtaining module 210. Thus, the obtaining module 210 may be further configured for obtaining the first-set of temperatures (temperatures of the heat generating devices) for a next pre-defined time interval (t).
[0061) According to one embodiment of the present subject matter, detail working of
the system 102 for achieving a stable thermal state is explained in figure 7. From the figure 7, it may be seen that for two CRACs i.e., CRAC 1 and CRAC 2, the set-points are being controlled for achieving the stable thermal state in a data center. A stability curve as shown in the figure 7 is an imaginary curve representing the combination of set points for the two CRACs (CRAC 1 and CRAC 2) which would maintain the data center in a thermally stable and energy efficient state. The imaginary curve will be a constant provided the heat dissipation pattern of the heat generating devices and other data center environment remains constant. Any combination of set points below the stability curve represents cold detection mode for the data center. Conversely, any combination of set points above the curve represents a hot detection mode. Therefore the curve represents an umbrella for the set points of the two CRACs (CRAC 1 and CRAC 2) which will ensure no hot detection of the data center. A set point combination located close to the curve would represent one of the near optimum states for the data center both from thermal safety as well as energy efficiency point of view. According to aspects of subject matter, the curve may be replaced by a multidimensional entity if more CRACs are present. The present set-point for CRAC 1 and
CRAC 2 is x and y respectively at state 1, where the state 1 is in a cold region i.e., under cold detection mode. Thus, a cold recommendation may be provided for the state 1. The cold recommendation may comprise modified set-points as "x+1 and y" to reach at another state 2. Here, the set-point of the CRAC 1 is incremented by 1 i.e.. a gradual increase and the set-point of CRAC 2 is remained same (y). As the state 2 is still in the cold detection mode, a further recommendation may be provided by the system 102. The further recommendation may be provided by modifying the set-point combinations as x+1 and y+1 which takes the data center from the state 2 to state 3. In this scenario, the set-point of CRAC 1 remains same and the set-point of CRAC 2 is modified from y to y+1 i.e., a further gradual increase in the set-point of CRAC 2.
[0062] The state 3 seems to be closer to the stability curve, but it is still falling under
the cold detection mode for which cold recommendation is further provided by the system 102. The recommendation for the state 3 may be provided by modifying the current set-points of state 2 i.e.. x+1 and y+1 into a new combination of set-points i.e., x+1 and y+2. This new combination of set-points takes the data center into a hot region i.e., under hot detection mode at state 4. The new combination of set-points recommended for state 3 may be a direct consequence of previously generated recommendations for the cold detection mode. Since, the transition of the data center into the state 4 (hot detection mode) is direct consequence of cold detection recommendation, another combination of set-point may be directly implemented as the recommendation. Another combinations of set-point is x+2 and y+1, wherein the set-point of CRAC 1 is gradually increased by 1 and the set-point of CRAC 2 is gradually decreased by 1. This combination transits the data center into a state 5 which is still falling under the hot detection mode. The system 102 further recommends a return to previous set-points i.e., x+1 and y+1. Further, since the system has found the state 4 and state 5 to be under hot detection, the set point combinations of the state 4 and the state 5 may be hot flagged and hence will not be recommended again for a given interval of time Tp. Therefore, the state 3 will be a stable and optimum state of the data center for given heat dissipation in given data center environment. Thus, by gradually increasing or decreasing the set-points of the cooling units (CRAC 1 and CRAC 2). the system 102 is enabled to achieve the stable and optimum thermal state in the data center in step-wise manner.
[0063] The data center in the hot detection mode or in the cold detection mode may be
notified by a notification message to a user. Further, the system 102 also comprises an user-interface module 220 configured to display layout of the data center, the first-set of temperatures, notification messages indicative of the datacenter being into one of the hot detection mode and the cold detection mode, and the gradual changes generated to be applied in the operational parameter i.e.. the set-point. The layout of the data center comprises arrangements of the heat generating devices and cooling units in the data center. According to embodiments of present subject matter, the recommendations generated in terms of the set-points of the cooling units may be automatically implemented by the system 102. Specifically, the system 102 may automatically control the set-points of the target cooling units, based on the recommendations provided, via an interface between the cooling units and a control center of the data center.
[0064] Referring now to Figure 8, the method for monitoring and control to optimize
operation of a data center in a real-time is shown, in accordance with an embodiment of the present subject matter. According to the embodiments, the operation of the data center may be optimized by controlling operational parameters of cooling units impacting heat generating devices in the data center. The method 800 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 800 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
[0065] The order in which the method 800 is described is not intended to be construed
as a limitation, and any number of the described method blocks can be combined in any order to implement the method 800 or alternate methods. Additionally, individual blocks may be deleted from the method 800 without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combinations thereof. However, for ease of explanation, in the
embodiments described below, the method 800 may be considered to be implemented in the above described system 102.
[0066] At block 802, a first-set of temperatures for the heat generating devices in the
data center may be obtained during a pre-defined time interval (t). Each temperature of the first-set of temperatures may be estimated by a thermal predictor 302. According to embodiments, the thermal predictor 302 may be a module/set of instruction stored in memory 206 of the system 102. Further, the thermal predictor 302 may estimate the each temperature of the first-set of temperatures by using influence index metrics 226, supply temperature of the cooling units, power dissipation of the heat generating devices, or combination thereof. Further, a temperature of the first-set of temperatures associated with each heat generating device may be obtained at each instance of a pre-determined number of instances of the predefined time interval (t).
[0067] At block 804, the first-set of temperatures obtained may be analyzed for
identification of the heat generating devices in one of a hot condition and a cold condition. The first-set of temperatures may also be analyzed for categorizing state of the data center into one of a hot detection mode and a cold detection mode.
[0068] From the first-set of temperatures obtained for the heat generating devices, a
second-set of temperatures and a third-set of temperatures may be selected. The second-set of temperature refers to a set of temperatures crossing a threshold temperature (TThreshold) at one or more instances in a sub-time interval (thd) of the predefined time interval (t). Thus, from the second-set of temperatures i.e., the subset of the first-set of temperature, the heat generating devices under the hot condition may be identified. According to embodiments of the present subject matter, for each heat generating device identified under the hot condition, a hot-reference temperature may be determined. Similarly, the third-set of temperatures refers to a set of temperatures for the heat generating devices not falling under the hot condition for a predefined time interval (tcd). Thus, from the third-set of temperatures i.e.. the subset of the first-set of temperatures, the heat generating devices under the cold condition may be identified. According to embodiments of the present subject matter, for each heat generating device identified under the cold condition a cold-reference temperature may be determined.
[0069] At block 806, the hot reference temperature and the cold-reference temperature
determined for the heat generating devices detected under the hot detection mode and the cold detection mode may be computed for taking corrective actions or provide recommendations for controlling thermal state of the data center.
[0070] At block 808, one or more target cooling units may be identified amongst the
cooling units impacting the heat generating devices in the data center. The one or more target cooling units identified may be based on collective influence of the cooling units on the heat generating devices classified under one of the hot condition and the cold condition. Further, the one or more target cooling units identified may also be based on a historical control signal log 228. The target cooling unit identified is most-impacting or most-influential cooling unit amongst the cooling units in the data center. Upon identification, the one or more target cooling units may be prioritized based on their influence/impact on the heat generating devices.
[0071] At block 810, after the identification of the one or more target cooling units, a
control signal comprising gradual changes in operational parameter of the one or more target cooling units may be iteratively generated to optimize operation of the data center. According to embodiments of the present subject matter, the operational parameter may be a set-point in terms of a temperature for the one or more target cooling units identified. Thus, to optimize the operation of the data center, the gradual changes may be applied to the set-point of the one or more target cooling units depending upon the condition of the data center i.e., the hot detection mode and cold detection mode.
[0072] In a scenario, when the data center is detected in the hot detection mode, the
set-point is decremented gradually by predefined value using gradual changes in order to achieve a stable thermal state in the data center. Similarly, when the data center is detected in the cold detection mode, the set-point is incremented gradually by a predefined value using gradual changes for achieving the stable thermal state in the data center. Thus, in both the scenarios, the system 102 may be enabled for achieving the stable thermal state gradually in a stepwise manner
ADVANTAGES OF THE SYSTEM
[0073] The system 102 provides an energy efficient method for optimizing operation
of the data center in a real-time, thus saving energy which is consumed by cooling units for controlling thermal state of the data center.
[0074] The system 102 is enabled for gradually achieving and maintaining a stable
and close to optimum thermal and energy state in the data center thereby eliminating the possible overcooling costs.
[0075] The system 102 is enabled for providing corrective recommendations in terms
of set points of the cooling units which facilitates the control of data center cooled by a return air-controlled CRAC. where the return air-controlled CRACs may not provide user with an explicit control over the supply temperature.
[0076] The system 102 by using influence index metrics for temperature prediction
eliminates the need of extensive temperature sensor network across the data center. Thus, the sensors are only used for sensing supply temperature of CRACs and no other sensors are required in the data center.
[0077] Although implementations for methods and systems for providing real-time
monitoring and control to optimize operation of a data center have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples to optimize operation of the data center by controlling operational parameter of the one or more cooling units impacting the one or more heat generating devices in the data center.
CLAIMS:
1. A method for real-time monitoring and control to optimize operation of a data center by controlling operational parameter of one or more cooling units impacting one or more heat generating devices in the data center, the method comprising;
obtaining continuously a first-set of temperatures for the one or more heat generating devices during a pre-defined time interval (t). wherein each temperature of the first-set of temperatures is estimated by a thermal predictor, and wherein a temperature of each heat generating device is obtained at each instance of a predetermined number of instances of the pre-defined time interval (t);
analyzing the first-set of temperatures for:
identifying the one or more heat generating devices under one of a hot
condition and a cold condition; and
categorizing state of the data center into one of a hot detection mode
and a cold detection mode;
computing one of a hot-reference temperature and a cold-reference temperature for the one or more heat generating devices classified under one of the hot condition and the cold condition;
identifying a target cooling unit amongst the one or more cooling units based on a collective influence of the one or more cooling units on the one or more heat generating devices classified under one of the hot condition and the cold condition, and historical control signal log; and
iteratively generating a control signal comprising one or more gradual changes in an operational parameter of the target cooling unit to optimize operation of the data center in a stepwise manner, wherein the obtaining, the analyzing, the computing, the identifying, and the iteratively generating are performed by a processor.
2. The method of claim 1, wherein the thermal predictor estimates the each temperature of the first-set of temperatures using at least one of:
influence index metrics indicative of an influence of the one or more cooling units and the one or more heat generating devices on the one or more heat generating devices, wherein the influence index metrics are computed from one of mathematical and experimental methods; and
at least one of a set of supply temperature of the one or more cooling units and a power dissipation of the one or more heat generating devices, and wherein the set of supply temperature is obtained through one or more sensors deployed in the data
center.
3. The method of claim 1, wherein the hot condition of the one or more heat generating devices is identified from respective second-set of temperatures, and wherein the second-set of temperatures are sub-set of the first-set of temperatures crossing a threshold temperature (TThreshold) at one or more instances in a sub-time interval (thd) of the predefined time interval (t).
4. The method of claim 1, wherein the cold-reference temperature of the one or more heat generating devices is obtained from a third-set of temperatures, and wherein the third-set of temperatures are sub-set of the first-set of temperatures for the one or more heat generating devices not falling under the hot condition for a predefined time
interval (tCd).
5. The method of claim 1, wherein the operational parameter comprises a set-point for the target cooling unit, and wherein the operational parameter is controlled by performing one of incrementing and decrementing the set-point by a predefined value based on the categorization of the data center into one of the cold detection mode and the hot detection mode respectively.
6. The method of claim 1, wherein the cooling unit is selected from a group comprising a supply air controlled CRAC return air controlled CRAC or combination thereof.
7. A system 102 for real-time monitoring and control to optimize operation of a data center by controlling operational parameter of one or more cooling units impacting one or more heat generating devices in the data center, the system 102 comprising: a processor 202;
a memory 206 coupled to the processor 202, the memory 206 comprising a plurality of modules 208 capable of being executed by the processor 202, wherein the plurality of modules 208 comprises:
obtaining module 210 configured to obtain continuously a first-set of temperatures for the one or more heat generating devices during a pre-defined time interval (t), wherein each temperature of the first-set of temperatures is estimated by a thermal predictor 302, and wherein a temperature of each heat generating device is obtained at each instance of a pre-determined number of instances of the pre-defined time interval (t);
temperature analyzing module 212 configured to analyze the first-set of temperatures for:
identifying the one or more heat generating devices under one of a hot condition and a cold condition; and
categorizing state of the data center into one of a hot detection mode and a cold detection mode;
computing module 214 configured to compute one of a hot-reference temperature and a cold-reference temperature for the one or more heat generating devices classified under one of the hot condition and the cold condition;
identification module 216 configured to identify a target cooling unit amongst the one or more cooling units based on a collective influence of the one or more cooling units on the one or more heat generating devices classified under one of the hot condition and the cold condition, and historical control signal log; and
generation module 218 configured to iteratively generate a control signal comprising one or more gradual changes in an operational parameter of
the target cooling unit to optimize operation of the data center in a stepwise manner.
8. The system of claim 7 further comprises a user interface module 220 configured for displaying layout of the data center, the first-set of temperatures, notification messages indicative of the datacenter being into one of the hot detection mode and the cold detection mode, and the gradual changes in the operational parameter, and wherein the layout of the data center comprises arrangements of the one or more heat generating devices and the one or more cooling units in the data center.
9. The system of claim 7, wherein the operational parameter comprises a set-point for the target cooling unit, and wherein the operational parameter is controlled by performing one of incrementing and decrementing the set-point by a predefined value based on the categorization of the data center into one of the cold mode and the hot mode respectively.
10. A computer program product having embodied thereon a computer program for real-time monitoring and control to optimize operation of a data center by controlling operational parameter of one or more cooling units impacting one or more heat generating devices in the data center, the computer program product comprising a set of instructions, the instructions comprising instructions for:
obtaining continuously a first-set of temperatures for the one or more heat generating devices during a pre-defined time interval (t), wherein each temperature of the first-set of temperatures is estimated by a thermal predictor, and wherein a temperature of each heat generating device is obtained at each instance of a predetermined number of instances of the pre-defined time interval (t);
analyzing the first-set of temperatures for:
identifying the one or more heat generating devices under one of a hot
condition and a cold condition: and
categorizing state of the data center into one of a hot detection mode
and a cold detection mode;
computing one of a hot-reference temperature and a cold-reference temperature for the one or more heat generating devices classified under one of the hot condition and the cold condition;
identifying a target cooling unit amongst the one or more cooling units based on a collective influence of the one or more cooling units on the one or more heat generating devices classified under one of the hot condition and the cold condition, and historical control signal log: and
iteratively generating a control signal comprising one or more gradual changes in an operational parameter of the target cooling unit to optimize operation of the data center in a stepwise manner.