Abstract:
Industrial data mining is performed on data collected/gathered from industrial processes/equipment for monitoring performance of the processes/equipment, and in turn to make necessary changes so as to obtain an intended result. However, the existing data mining systems fail to consider relation between variables and certain Key Performance Indicators (KPI), and strength of the relation. Disclosed herein is a method and system for data mining in industrial processes or equipment in which relation between the variables and the KPIs are determined, and also the strength of relation is determined. Based on the determined relation, an order of importance of the variables with respect to each KPI is determined. This information can be used to alter/change appropriate parameters to yield intended results.
Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence
Nirmal Building, 9th Floor,
Nariman Point, Mumbai - 400021, Maharashtra, India
Inventors
1. SELVANATHAN, Balaji
Tata Consultancy Services Limited, Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune - 411013, Maharashtra, India
2. NISTALA, Sri Harsha
Tata Consultancy Services Limited, Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune - 411013, Maharashtra, India
3. RUNKANA, Venkataramana
Tata Consultancy Services Limited, Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune - 411013, Maharashtra, India
Specification
Claims:1. A processor implemented method for mining of data from industrial systems , comprising:
collecting data corresponding to a plurality of variables and a plurality of Key Performance Indicators (KPI) corresponding to at least one process or equipment being monitored, as input, via one or more hardware processors;
determining relation of each of the plurality of variables with each of the plurality of KPIs, via the one or more hardware processors;
generating a subset of variables from the plurality of variables, based on the determined relation, via the one or more hardware processors;
determining strength of the relation between each variable in the subset of variables and each of the plurality of KPIs, via the one or more hardware processors; and
determining an order of importance of the variables in the subset of variables based on the determined relation, via the one or more hardware processors.
2. The method as claimed in claim 1, wherein the collected data corresponding to the plurality of variables and the plurality of Key Performance Indicators (KPI) is pre-processed before determining the relation, further wherein the pre-processing comprises detection and removal of outliers, and imputation of missing data.
3. The method as claimed in claim 1, wherein a binning analysis is performed on the collected data, said binning analysis comprising:
sorting data points of each of the plurality of variables in an ascending order;
arranging data points of each of the plurality of KPIs to match the arrangement of the data points of each of the plurality of variables;
discretizing the sorted data points of each of the plurality of variables and each of the plurality of KPIs into a plurality of bins; and
computing at least one measure of central tendency and one measure of dispersion of the data points of each of the plurality of variables and KPIs in each of the plurality of bins, wherein the measure of central tendency and the measure of dispersion of data points form a bin information.
4. The method as claimed in claim 3, wherein the measure of central tendency comprises one of mean, median and mode, and the measure of dispersion comprises one of variance, standard deviation, interquartile range, mean absolute difference, median absolute deviation, average absolute deviation and distance standard deviation.
5. The method as claimed in claim 3, wherein determining the relation of each of the plurality of variables with each of the plurality of KPIs, comprises:
computing difference between the measures of central tendency of consecutive bins corresponding to the plurality of variables and the plurality of KPIs;
identifying at least one of a consecutive increasing, consecutive decreasing or constant sequence, by applying a moving window over the computed differences of the measure of central tendency corresponding to the plurality of KPIs;
determining a longest sequence of each of the at least one consecutive increasing sequence, the at least one consecutive decreasing sequence, and the at least one constant sequence, and the indices corresponding to the start of the sequences;
applying at least one condition on the determined longest sequences to determine the relation, wherein the at least one condition defines at least one relation; and
shortlisting all variables having a relation with each of the plurality of KPIs, based on the at least one condition, to generate the subset of variables.
6. The method as claimed in claim 3, wherein determining the strength of relation comprises:
processing the measure of central tendency and the measure of dispersion corresponding to the plurality of variables in the subset of variables and the plurality of KPIs, further comprising:
computing area under a maximum dispersion curve and a minimum dispersion curve;
calculating difference of area under the maximum dispersion curve and a minimum dispersion curve;
computing a total possible area spanned by the maximum dispersion curve and a minimum dispersion curve;
computing ratio of difference between the calculated difference of area under the maximum dispersion curve and a minimum dispersion curve and the computed total possible area; and
comparing the computed ratio with a threshold of ratio.
7. The method as claimed in claim 1, wherein determining the order of importance comprises:
extracting data points (XL) corresponding to a lower range of KPIs and data points (XH) corresponding to an upper range of KPIs, of each of the variables in the subset of variables;
obtaining density distribution of XL and XH;
computing percentage overlap of XL and XH; and
arranging the variables in the subset of variables in ascending or descending order of percentage of overlap.
8. A system for industrial data mining, comprising:
one or more communication interfaces 103;
a memory module 101 storing a plurality of instructions; and
one or more hardware processors 102 coupled to the memory module 101 via the one or more communication interfaces 103, wherein the one or more hardware processors 102 are configured by the instructions to:
collect data corresponding to a plurality of variables and a plurality of Key Performance Indicators (KPI) corresponding to at least one process or equipment being monitored, as input, via one or more hardware processors;
determine relation of each of the plurality of variables with each of the plurality of KPIs, via the one or more hardware processors;
generate a subset of variables from the plurality of variables, based on the determined relation, via the one or more hardware processors;
determine strength of the relation between each variable in the subset of variables and each of the plurality of KPIs, via the one or more hardware processors; and
determine an order of importance of the variables in the subset of variables based on the determined relation, via the one or more hardware processors.
9. The system as claimed in claim 8, wherein the system pre-processes the collected data corresponding to the plurality of variables and the plurality of Key Performance Indicators (KPI) before determining the relation, further wherein the pre-processing comprises detection and removal of outliers, and imputation of missing data.
10. The system as claimed in claim 8, wherein the system performs a binning analysis on the collected data, comprising:
sorting data points of each of the plurality of variables in an ascending order;
arranging data points of each of the plurality of KPIs to match the arrangement of the data points of each of the plurality of variables;
discretizing the sorted data points of each of the plurality of variables and each of the plurality of KPIs into a plurality of bins; and
computing at least one measure of central tendency and one measure of dispersion of the data points of each of the plurality of variables and KPIs in each of the plurality of bins, wherein the measure of central tendency and the measure of dispersion of data points form a bin information.
11. The system as claimed in claim 10, wherein the measure of central tendency comprises one of mean, median and mode, and the measure of dispersion comprises one of variance, standard deviation, interquartile range, mean absolute difference, median absolute deviation, average absolute deviation and distance standard deviation.
12. The system as claimed in claim 10, wherein the system determines the relation of each of the plurality of variables with each of the plurality of KPIs, by:
computing difference between the measures of central tendency of consecutive bins corresponding to the plurality of variables and the plurality of KPIs;
identifying at least one of a consecutive increasing, consecutive decreasing, or constant sequence, by applying a moving window over the computed differences of the measure of central tendency corresponding to the plurality of KPIs;
determining a longest sequence of each of the at least one consecutive increasing sequence, the at least one consecutive decreasing sequence, and the at least one constant sequence, and the indices corresponding to the start of the sequences;
applying at least one condition on the determined longest sequences to determine the relation, wherein the at least one condition defines at least one relation; and
shortlisting all variables having a relation with each of the plurality of KPIs, based on the at least one condition, to generate the subset of variables.
13. The system as claimed in claim 10, wherein the system determines the strength of relation by:
processing the measure of central tendency and the measure of dispersion corresponding to the plurality of variables in the subset of variables and the plurality of KPIs, further comprising:
computing area under a maximum dispersion curve and a minimum dispersion curve;
calculating difference of area under the maximum dispersion curve and a minimum dispersion curve;
computing a total possible area spanned by the maximum dispersion curve and a minimum dispersion curve;
computing ratio of difference between the calculated difference of area under the maximum dispersion curve and a minimum dispersion curve and the computed total possible area; and
comparing the computed ratio with a threshold of ratio.
14. The system as claimed in claim 8, wherein the system determines the order of importance by:
extracting data points (XL) corresponding to a lower range of KPIs and data points (XH) corresponding to an upper range of KPIs, of each of the variables in the subset of variables;
obtaining density distribution of XL and XH;
computing percentage overlap of XL and XH; and
arranging the variables in the subset of variables in ascending or descending order of percentage of overlap.
, Description:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
METHOD AND SYSTEM FOR INDUSTRIAL DATA MINING
Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The disclosure herein generally relates to mining of data from industrial processes and equipment and, more particularly, to identification and shortlisting variables that are related to or affect key performance indicators (KPIs) being considered in the industrial system.
BACKGROUND
As part of industrial monitoring, various processes and equipment present in an industry are monitored, values of different parameters/variables are collected, and the collected values are processed to assess performance characteristics of the processes and/or the equipment being monitored. For each process/equipment, there are certain key performance indicators (KPIs) that give an indication of performance of the process/equipment. In order to achieve a desired result (in terms of performance of the process/equipment), certain parameters related to the process/equipment may have to be varied/re-calibrated.
Multiple parameters/variables may be associated with each of the industrial processes/equipment. However, each of the parameters/variables have different impact on the corresponding process/equipment. Which means, when a deviation from desired performance is observed, adjusting/changing certain parameters may prove to be effective whereas adjusting certain other parameters may not yield the desired result.
Identification of the variables that are effective in improving the performance of the process/equipment is not straightforward due to the complex interactions among the variables and large number of variables (typically, hundreds) associated with industries processes/equipment. While there are some approaches that identify important variables, this information alone may not help a user identify the parameters to be changed/adjusted in order to get a desired result (may be in terms of plant performance).
SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional approaches. For example, in one embodiment, a processor implemented method for mining of industrial data is provided. In this method, a plurality of variables and a plurality of key performance indicators (KPIs) corresponding to at least one process/equipment being monitored are collected as input, via one or more hardware processors. Further, relation of each of the plurality of variables with each of the plurality of KPIs is determined, via the one or more hardware processors. Further, a subset of variables from the plurality of variables is generated based on the determined relation, via the one or more hardware processors. Further, strength of the relation between each variable in the subset of variables and each of the plurality of KPIs is determined, via the one or more hardware processors. Based on the determined strength of relation, an order of importance of the shortlisted variables is determined, via the one or more hardware processors.
In another aspect, a system for mining of industrial data is provided. The system includes one or more communication interfaces, a memory module storing a plurality of instructions, and one or more hardware processors coupled to the memory module via the one or more communication interfaces. The one or more hardware processors are configured by the instructions to collect a plurality of variables and a plurality of key performance indicators (KPIs) corresponding to at least one process/equipment being monitored, as input, via one or more hardware processors. The system then determines relation of each of the plurality of variables with each of the plurality of KPIs, via the one or more hardware processors. The system then generates a subset of variables from the plurality of variables, based on the determined relation, via the one or more hardware processors. Further, a strength of the relation between each variable in the subset of variables and each of the plurality of KPIs is determined via the one or more hardware processors. Based on the strength of relation, an order of importance of the shortlisted variables is determined, via the one or more hardware processors.
In yet another aspect, a non-transitory computer readable medium for mining of industrial data is provided. The non-transitory computer readable medium collects a plurality of variables and a plurality of key performance indicators (KPIs) corresponding to at least one process/equipment being monitored as input, via one or more hardware processors. Further, relation of each of the plurality of variables with each of the plurality of KPIs is determined, via the one or more hardware processors. Further, a subset of variables from the plurality of variables is generated based on the determined relation, via the one or more hardware processors. Further, strength of the relation between each variable in the subset of variables and each of the plurality of KPIs is determined, via the one or more hardware processors. Based on the determined strength of relation, an order of importance of the shortlisted variables is determined, via the one or more hardware processors.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG. 1 illustrates an exemplary system for industrial data mining, according to some embodiments of the present disclosure.
FIGS. 2A and 2B (collectively referred to as FIG. 2) is a flow diagram depicting steps involved in the process of industrial data mining, according to some embodiments of the present disclosure.
FIG. 3 is a flow diagram depicting steps involved in the process of performing binning analysis using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG. 4 is a flow diagram depicting steps involved in the process of shortlisting variables having relation with selected KPIs, using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG. 5 is a flow diagram depicting steps involved in the process of determining strength of relation between variables and corresponding KPIs, using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG. 6 is a flow diagram depicting steps involved in the process of determining an order of importance of variables, using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG. 7 depicts different types of relations that may exist between variables and KPIs, in accordance with some embodiments of the present disclosure.
FIG. 8 depicts graphical representation of estimation of strength of relation, in accordance with some embodiments of the present disclosure.
FIG. 9 illustrates a percent of overlap considered in determining an order of importance of the variables, using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIGS. 10A, 10B, and 10C (collectively referred to as FIG. 10) illustrate example binning plots, in accordance with some embodiments of the present disclosure.
FIGS. 11A, and 11B (collectively referred to as FIG. 11) illustrate example of determined relations, in accordance with some embodiments of the present disclosure.
FIG. 12 illustrates example of determined strength of relation based on percentage overlap, in accordance with some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope and spirit being indicated by the following claims.
Referring now to the drawings, and more particularly to FIG. 1 through FIG. 12, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 illustrates an exemplary system for industrial data mining, according to some embodiments of the present disclosure. The system 100 includes at least one memory 101, one or more hardware processors 102, and at least one communication interface 103.
The one or more hardware processors 102 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, graphics controllers, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the hardware processor(s) 102 are configured to fetch and execute computer-readable instructions stored in the memory 101, which causes the hardware processor(s) 102 to perform actions depicted in FIG. 2 for determining the relation between variables and the KPIs and for determining the strength of relation. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
The communication interface(s) 103 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the communication interface(s) 103 can include one or more ports for connecting a number of devices to one another or to another server, and/or to connect the system 100 to one or more sensors in order to collect data pertaining to one or more variables/parameters pertaining to equipment/processes being monitored.
The memory 101 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more modules (not shown) of the system 100 can be stored in the memory 101. The memory 101 stores a plurality of instructions which when executed, cause the one or more hardware processors 102 to perform the actions depicted in FIGS. 2 to 6, to perform the industrial data mining being handled by the system 100.
The system 100 can be used to perform the data mining in one or more unit operations or processes from manufacturing or process industries such as iron and steel making, power generation, pharma manufacturing, refineries, cement making, oil and gas production, fine chemical production, paper making, automotive production and so on, and the equipment being monitored for data mining could be any equipment used in the unit operations or processes in manufacturing and process industries, such as but not limited to valves, compressors, blowers, pumps, steam turbines, gas turbines, heat exchangers, chemical reactors, bio-reactors, condensers, and boilers.
The system 100 collects data pertaining to a plurality of variables/parameters and a plurality of key performance indicators (KPIs) associated with one or more process/equipment being monitored, as input. The system 100 may use any suitable sensors (hardware sensors and/or soft sensors) to collect the values pertaining to the variables. Soft-sensors may include data-driven as well as physics-based soft-sensors. The KPIs as well as the variables may be process/equipment specific. Information pertaining to the KPIs may be statically or dynamically fed to the system 100 using appropriate interface provided by the communication interface(s) 103. The system 100 then pre-processes the collected data. The pre-processing of the data may involve processes such as but not limited to detection and removal of outliers, and imputation of missing data, so as to fine-tune the data for further processing.
The system 100 initially performs binning analysis of the collected plurality of variables and the KPIs so as to generate binning information corresponding to the KPIs and variables. During the binning analysis, the system 100 sorts data points (instances) of each of the variables (Xi) in an ascending order, and simultaneously the corresponding data points of the KPIs are also arranged in the same order as that of Xi. The sorted data of Xi and rearranged data of Y (KPI of interest) are discretized into N discrete bins. Some examples of methods that may be used for discretization are equal width discretization (N equally spaced bins), equal frequency discretization (equal number of data points in each of the N bins), k-means discretization, and custom range discretization. For each bin, the system 100 computes a measure of central tendency and a measure of dispersion. The measure of central tendency could be one of mean, median and mode, and the measure of dispersion could be one of variance, standard deviation, interquartile range, mean absolute difference, median absolute deviation, average absolute deviation and distance standard deviation. The measures of central tendency and dispersion for each of the N bins constitute the ‘bin info’.
The generated bin info is used by the system 100 to identify relations between each of the variables and the KPIs. The system 100 computes the difference of consecutive elements in the measure of central tendency from bin info to obtain the ‘consecutive difference’ array. A window of length W (Wpercent*(N-(W-1)) --- (1)
‘Negative’ : L_dec>percent*(N-(W-1)) --- (2)
‘Constant’ : L_const>percent*(N-(W-1)) --- (3)
‘V-shape’ : (L_inc>percent_V*(N-(W-1))) &(L_inc>percent_V*(N-(W-1))) & (I_inc> I_dec) --- (4)
‘Inverted-V’ : (L_inc>percent_V*(N-(W-1))) & (L_inc>percent_V*(N-(W-1))) & (I_inc< I_dec) --- (5)
where
percent – a user-input value in a suitable range (for example, from 0.5 to 1.0)
percent_V – a user-input value in a suitable range (for example, from 0.2 to 0.5)
N-(W-1) – Length of the array of ‘moving sum of differences’
Here, Linc and Iinc are the length and starting index of the longest CIS sequence, Ldec and Idec are the length and starting index of the longest CDS sequence, and Lconst is the length of the longest COS sequence.
Variables that satisfy one of the aforementioned conditions are shortlisted by the system 100 to form a subset of ‘p’ variables of the plurality of variables.
Further, a ‘strength’ of the relation between each of the ‘p’ shortlisted variables and the KPI of interest is determined using the measure of dispersion from bin info. For example (as in FIG. 8), if the relation between a variable and KPI of interest is ‘positive’ and the dispersion is large in several of the N bins, then the strength of the relation is termed as ‘weak’ as there is large overlap of values between consecutive bins. On the other hand, if the dispersion is smaller, then the relation is termed as ‘strong’ as there is lesser overlap of values between consecutive bins. If C is the measure of central tendency array and D is the measure of dispersion array, then C + (D/2) represents the ‘maximum dispersion curve’ ( CmaxD) and C - (D/2) represents the ‘minimum dispersion curve’ (CminD). The system 100 uses the following conditions to determine the strength of the relation between each of the shortlisted variables and the KPI of interest:
‘Weak’ : (A_maxD- A_minD)/T_A >Threshold --- (6)
Strong’: (A_maxD- A_minD)/T_A
Documents
Application Documents
#
Name
Date
1
201921036369-STATEMENT OF UNDERTAKING (FORM 3) [10-09-2019(online)].pdf
2019-09-10
2
201921036369-REQUEST FOR EXAMINATION (FORM-18) [10-09-2019(online)].pdf
2019-09-10
3
201921036369-FORM 18 [10-09-2019(online)].pdf
2019-09-10
4
201921036369-FORM 1 [10-09-2019(online)].pdf
2019-09-10
5
201921036369-FIGURE OF ABSTRACT [10-09-2019(online)].jpg