Data Collection Method For Process Latitude Monitoring System Of

< Back

Data Collection Method For Process Latitude Monitoring System Of Industrial Plant, And Recording Medium Thereof

Abstract: Disclosed is a data collection method for a process margin monitoring system of industrial equipment that is capable of collecting learning data from a database of a computer in a power plant and converting the data into a form in which the data can be easily learned in realizing a monitoring system for analyzing process margin of industrial equipment based on a statistical learning method. The data collection method includes preparing a learning data set based on data determined to be normal in an operation history of the industrial equipment so that the learning data set is sorted for each operation mode, in a case in which the industrial equipment includes a plurality of equipment units performing the same functions, receiving data for each of the equipment units and processing the received data as data for the equipment units, sorting and grouping associated ones of the data in the learning data set, and sampling the collected data to reduce the number of data. [Fig. 1]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

26 September 2012

Publication Number

03/2016

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

BNF TECHNOLOGY INC.

556 Yongsan dong Yuseong gu Daejeon 305 500

Inventors

1. KIM Su Young

106 903 Taepyeong Apt. Taepyeong dong Jung gu Daejeon 301 771

Specification

DATA COLLECTING METHOD FOR DETECTION AND ON-TIME WARNING SYSTEM OF INDUSTRIAL PROCESS j
BACKGROUND OF THE INVENTION
Field of the Invention
The present invention relates to a data collection
method for a process margin monitoring system of . . industrial equipment and a storage medium for storing the w* same, and more particularly to a data collection method
for a process margin monitoring system of industrial
equipment that is capable of collecting learning data from
a database of a computer in a power plant and converting ;
the data into a form in which the data can be easily learned in realizing a monitoring system for analyzing <
process margin of industrial equipment based on a I
statistical learning method and a storage medium for f
storing the same. I
Description of the Related Art
Industrial equipment includes a plurality of systems
and instruments for achieving a specific purpose.
- Generally, one or more measuring instruments for confirming
an operation and safety state of the industrial equipment
are installed such that the operation and safety state of
the industrial equipment can be measured offline or online.
• 2
I
I
i
!
Efficiency and safety of the industrial equipment are <
changed depending upon external conditions (temperature, j
pressure, or humidity of the atmosphere; temperature of j
seawater or rainfall in a case in which a coolant is needed), characteristics of fuel supplied to the j
industrial equipment, a degradation degree of the
industrial equipment, and an operation range. In terms of
cost, a change range in which the efficiency and safety of
^p the industrial equipment are maintained is called process
margin. Most industrial equipment has a stoppage/protection function for stopping/protecting a specific system or
instrument in order to prevent the operation of the industrial equipment exceeding such process margin. In
j order to realize such a stoppage/protection function, a
! control device is provided for forcibly stopping the
I | industrial equipment if a value of a specific operation
variable exceeds a set value for stoppage/protection.
j ^* The process margin and the set value for
I stoppage/protection are interdependent variables. If the
I :
j set value for stoppage/protection is set to too large a
[ •
I value, the process margin is relatively increased, and
i :
I therefore, cost benefit obtained by operating the industrial
j - equipment is increased; however, serious accidents may occur
I with the result that the industrial equipment may be stopped
| for a long period of time. On the other hand, if the set
3 j
value for stoppage/protection is set to a too small value,
probability of accident occurrence is lowered; however, the i
process margin is decreased with the result that the
industrial equipment • may frequently be stopped, and j
therefore, cost benefit obtained by operating the industrial equipment is decreased. I
Therefore, both of these facets should be considered
when deciding overall process margin. As high degree of
^P safety is required, a conservative value, including external
conditions, supplied fuel, a degradation degree of the industrial equipment, and an operation range, is generally
used to decide process margin. However, it is very difficult to decide overall
process margin in various situations, such as external
conditions, supplied fuel, a degradation degree of the
industrial equipment, and an operation range.
On the other hand, a set value for preliminary
"* stoppage/protection is generally provided so that an
operator can prepare for the stoppage of the industrial
equipment or can take proper measures to normalize the
industrial equipment before the value of the specific
operation variable reaches the set value for
stoppage/protection.
However, such a set value for preliminary
stoppage/protection is generally a static value. The value
4
i
I
j
is not changed once the value is set. Although the value is j
changed, the set value is set as a function with respect to 1
one or two conditions indicating characteristics of the j
industrial equipment. If process is within the above set value for stoppage/protection, therefore, it cannot be determined whether the process is really normal or abnormal. Also, it is difficult to expect time during which a process problem j
£\ is transmitted to the set value. For this reason, it is
impossible to take a proper measure until a very tense situation is caused. i !
I Technology has been well known that is capable of I performing dynamic monitoring and issuing a timely alarm l
I ' ' ' I
j with respect to a stoppage/protection signal of the
I ';
I ;
| industrial equipment based on a series of statistical
1
j learning and prediction models in order to solve the abovej
I
j mentioned conventional problems.
\ SUMMARY OF THE INVENTION
i Therefore, the present invention has been made in
I view of the above problems, and it is an object of the
present invention to provide a data collection method for
a process margin monitoring system of industrial equipment
that is capable of collecting learning data from a
database of a computer in a power plant and converting the i
5
. data into a form in which the data can be easily learned j
in realizing a monitoring system for analyzing process ;
margin of industrial equipment based on a statistical
learning method and a storage medium for storing the same. I
In accordance with an aspect of the present [
invention, the above and other objects can be accomplished '
by the provision of a data collection method for a process j
margin monitoring system of industrial equipment, £% including preparing a learning data set based on data
determined to be normal in an operation history of the ;
industrial equipment so that the learning data set is sorted for each operation mode, in a case in which the
industrial equipment includes a plurality of equipment j
units performing the same functions, receiving data for
each of the equipment units and processing the received
data as data for the equipment units, sorting and grouping
associated ones of the data in the learning data set, and
^r sampling the collected data to reduce the number of data.
The learning data set may include a first data set to
an N-th data set (N being a natural number equal to or
greater than 2) depending upon a scale of data to be
collected or time when data are collected.
The first data set may include signals related to a
specific equipment unit of the industrial equipment for
monitoring process margin of the specific equipment unit, •
6
j
the second data set may include signals related to the .entirety of the industrial equipment for monitoring
process margin of the entirety of the industrial equipment, and the third data set may include signals j
regarding the entirety or a portion of the industrial equipment immediately after a specific event is generated {
in the entirety or the portion of the industrial j
equipment. i
|^ The data collection method may further includes, in a
case in which the learning data set comprises data
displayed as digital signals, collecting analog signal |
i i
that can substitute for the digital signal and converting i the digital signal into the analog signal. !
| The grouping step may include regarding variables, a
| correlation coefficient between which is equal to or
| greater to a set value, as belonging to the same group,
j calculating a smoothness parameter with respect to the
| ^r variables regarded as belonging to the same group using a
J 4-fold validation method, putting combinations of all
{ variables in the group besides the variables regarded as
j :-
! '•
\ belonging to the same group to calculate a square sum of
j residuals while calculating the smoothness parameter using
• the 4-fold validation method, and, in a case in which a
decrease ratio of a square sum of residuals immediately
after a square sum of specific residuals to the square sum
7 ;
i
of specific residuals is equal to or less than a set
value, terminating grouping at a time when the square sum j
-of specific residuals is calculated. j
The step of calculating the square sum of residuals
may include sorting and using only variables related to characteristics of the equipment among the variables
besides the variables regarded as belonging to the same
group in consideration- of characteristics. of the
tf^ equipment.
The correlation coefficient may be analyzed by the 1
following mathematical expression. *
Where, pXy indicates a correlation coefficient
I between variables X and Y, Xi indicates an i-th value on
the basis of a sampling section of learning data, Yi
^^ indicates an i-th value on the basis of a sampling section
of learning data (Y is a variable different than X) , ux
-indicates the average of a variable X, uY indicates the
average of a variable Y, ax indicates standard deviation of
a variable X, oY indicates standard deviation of a variable
Y, and N indicates the number of data collection intervals
in a sampling section of learning data.
The data sampling step may include performing
8
1
j
dispersion of a value of a'specific variable on the basis of j
a grid size to reduce the number of data related to the
variable in a corresponding grid. I
The data sampling step may include calculating j
standard deviation (ox) of a value of a specific variable j
and reducing the number of data related to the variable in a ;
corresponding grid on the basis of a grid size (GridSizex) I
calculated by the following mathematical expression ^h according to set resolution.
\0csx j
GridSizex i
Resolution The number of data left in the grid may be decided by !
the product of the number of data related to the variable
in the corresponding grid and a set rate, and at least one
of the data is left in each grid.
jh BRIEF DESCRIPTION OF THE DRAWINGS
: The above and other objects, features and other
advantages of the present invention will be more clearly
-understood from the following detailed description taken
in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic view showing a general power
generation system as industrial equipment;
FIG. 2 is a view showing a construction example of
9
I
multiple learning data sets in a data collection method i
for a process margin monitoring system of industrial j
equipment according to an embodiment of the present j
invention; !
FIG. 3 is a view showing a user interface for j
selecting a learning data set in the data collection j
method for the process margin monitoring system of industrial equipment according to the embodiment of the I
£*S present invention;
FIG. 4 is a view showing a collection example of t
analog data or digital data in the data collection method for the process margin monitoring system of industrial ;'
.equipment according to the embodiment of the present
i invention;
j FIG. 5 is a view showing imaginary tag creation in
the data collection method for the process margin
' monitoring system of industrial equipment according to the
^f embodiment of the present invention;
| FIGS. 6 and 7 are views showing stepwise variable
| selection in the data collection method for the process
margin monitoring system of industrial equipment according
to the embodiment of the present invention;
FIG. 8 is a view showing stepwise variable selection
results and cross variable grouping results in the data
collection method for the process margin monitoring system
10
of industrial equipment according to the embodiment of the I
i
present invention; and |
FIGS. 9 and 10 are views illustrating a data j
compression principle in the data collection method for j
the process margin monitoring system of industrial I
equipment according to the embodiment of the present j
invention. {p DETAILED DESCRIPTION OF THE INVENTION |
i
Now, preferred embodiments of the present' invention j
will be described in detail with reference to the ;
accompanying drawings so as to explain the present invention in detail to such an extent that a person having ordinary
skill in the art to which the present invention pertains can
easily make the present invention. The object, operation, and effects, and, in addition, other objects, features, i
^^ and operational advantages of the present invention will
I. be more clearly understood from the following detailed description. •
For reference, embodiments disclosed in this
specification are selected from several possible
embodiments and presented as the most preferred
embodiments to assist those skilled in the art to
understand the present invention. Therefore, the technical concept of the present invention is not restricted or j
11
j
limited to the disclosed embodiments, and it should be
understood that various modifications, . additions and j
substitutions are possible, and, in addition, equivalents j
thereof are also possible, without departing • from the
technical concept of the present invention.
A process margin monitoring system for issuing a
timely alarm about process margin based on a statistical
learning and prediction model has been developed. The |
process margin monitoring system is characterized by i
^p> distinguishing between errors of a measuring instrument
and abnormality of equipment using statistical data
(hereinafter, referred to as "learning data") obtained
from an operation history of the equipment. Accuracy of the process margin monitoring system
depends upon how reliably learning data are collected from
j the operation history of the equipment and how the
j collected learning data are grouped so that the learning
] data can be used to construct a prediction model.
£\ Conditions required to improve accuracy of the
! process margin monitoring system may be divided into the
I following detailed items. ;
| (1) How to collect, data
| This is a method of selecting time when collection of ;
learning data from a database installed in a computer of a
power plant is started and time when collection of learning
data from the database is ended.
12
(2) How to collect data in a case in which power
generation equipment is normally operated and in a case in I
i
which the power generation equipment is not normally ]
operated j
A normal state means- that the equipment is.maintained j
in a stable state without change of operation conditions. |
Generally, data collected at that time are useful to J
construct a statistical model. On the other hand, data j
i
obtained when the state of the power generation equipment is I
^^ changed due to start, stoppage, or various control logics are not useful to construct a statistical model. For this I
reason, it is necessary to provide a method of collecting j
I I
I data from the database installed in the computer of the j
| power plant while distinguishing between a normal state and i
\ an abnormal state and inputting the collected data to the
i
] process margin monitoring system.
} •
I (3) How to collect analog data and digital data
I ' ' ' . • ^
i Unlike an analog signal indicating a general process t
I £*. signal, a digital signal for mainly indicating operation }
I states of equipment, such as an open or closed state of a
i valve and an operation/stoppage state of a pump, plays an
| important role, but a problem occur when the digital signal I
is reflected in a statistical learning model developed based on analog data. For this reason, it is necessary to provide I
a method of receiving digital data from the database
installed in the computer of the power plant and inputting
13 ' }
the received digital data to the process margin monitoring I
system. I
(4) How to process data having the same j
characteristics provided by a plurality of equipment units j
In many cases, an industrial equipment unit for j
performing an important function has one or more backup j
equipment units that are capable of performing the same function. For example, in a case in which several pumps are
operated while another pump remains stopped, and one of the j
%j pumps under operation is stopped for a certain reason, the pump remaining stopped is operated to substitute for the j
failed pump. In this case, the total number of equipment j
units that are operated is not changed, and therefore, the operation condition is not changed. In providing a user I
with monitoring results, however, a portion to be changed is generated since there is a change in the equipment units :
under operation. That is, it is necessary to provide a
method of receiving data having the same characteristics
£*± provided by a plurality of equipment units from the
j database installed in the computer of the power plant,
processing the received data, and inputting the processed
data to the process margin monitoring system.
(5) How to select an optimal combination in grouping
data
A signal list for monitoring the power generation (
equipment is generally enormous. Such a signal list
includes not only signals important to confirm process
14
I
I
i
I
margin of the equipment but also unnecessary signals. The j
- • • - I
simplest grouping method is confirming a correlation j
s
j
coefficient between signals and grouping signals having high j
1 i
correlation. However, grouping results may not be j
consistent depending upon a collection policy of learning j
I
data. Therefore, it is necessary to provide a method of j
j
grouping data based on ,a s-tatistical method and engineering !
knowledge of equipment and inputting the grouped data to I
I
the process margin monitoring system. I &^K? (6) How to reduce collected data to such an extent
that learning is really possible |
Generally, if a sampling interval is very short I
j although data are collected for a short period of time, the
amount of the collected data is enormous. Also, for largei
sized power generation equipment, a signal list to be I
j monitored is very large. For this reason, it is not easy to
i
I process a huge amount of calculation necessary to construct
a statistical learning model although a high-performance \ ^^ computer is used. Therefore, it is necessary to provide a
! method of reducing collected data with the minimum loss so f
that the data can be really learned and inputting the I
reduced data to the process margin monitoring system. !
Hereinafter, methods of satisfying conditions required to improve accuracy of the process margin monitoring system will be described in detail according to j
the respective detailed items thereof. [
15 |
I
(1) Collection of data (Construction of multiple ]
i
learning data sets) I
FIG. 1 is a schematic view showing a general power I
i
generation system as industrial equipment. As shown in I
FIG. 1, the general power generation system includes a !
steam generation equipment unit 1, such as a boiler of a I
steam power plant or a steam generator, of a nuclear power I
plant, a steam turbine 2 connected to the steam .generation j
i
equipment unit 1, a condenser 3 connected to the steam j
%^ turbine 2, and a pump 4 connected between the condenser 3 j
and the steam generation equipment unit 1. In FIG. 1, j
reference symbols A to G denote signals that can be j
obtained by sensors installed at the respective equipment j
units. Reference symbol A denotes an outlet pressure j
signal of the steam generation equipment unit 1, reference symbol B denotes a pressure signal of the condenser 3, I
reference symbol C denotes a temperature signal of the I
condenser 3, reference symbol D denotes an outlet pressure (fa signal of the pump 4, reference symbol E denotes a
supplied water flow rate signal, reference symbol F
denotes an internal pressure signal of the steam
generation equipment unit 1, and reference symbol G ;
denotes an internal temperature signal of the steam
generation equipment unit 1. I
Ideal learning data must be obtained from operation f
conditions of normal equipment having no deterioration I
with time and no lowering of efficiency. Also, such ideal
16
i
j
learning data must include operation data based on the !
combination of all external conditions (temperature, ]
j
pressure, or humidity of the atmosphere; temperature of I
j
seawater or rainfall in a case in which a coolant is j
needed) and all internal conditions (characteristics of j
supplied fuel or an operation range). Since it is j
i
impossible to perfectly collect such data in actuality, S
i
however, learning data are prepared using the following i
method. . i
%gf First, two or more learning data sets are constructed. Since learning data function as a reference j
target which is compared with a present equipment state, j
multiple learning data sets may be constructed J
correspondingly. Consequently, the learning data sets may include a first data set, a second data set, a third data (
set ..., and an N-th data set (N being a natural number) j
depending upon the scale of data to be collected or the i i
! f
time when data are collected. I
^ N On the assumption that three learning data sets are
j' constructed as shown in FIG. 2, a first data set has a f
I i
i learning database including signals C, D, and E for |
I - • - | monitoring process margin of a specific equipment unit
(for example, the pump 4 of the power generation system) . ;
j, Three-month data collected immediately after replacement or maintenance of the equipment unit are periodically collected and stored in the database (see FIG. 2 (a) ) . A second data set has a learning database including all 17
I I
i
i
signals A, B, C, D, E, F, and G for monitoring process j
margin of all of the equipment units. One-year operation j
history data after initial installation of the equipment I
units are stored in the database. The second data set is j
used to confirm how much different a present state of the •
power generation equipment is than a design value (see j
FIG. 2(b)). A third data set includes signals A, B, C, D, j
I
E, F, and G regarding all of the equipment units. In the j
third data set, the signals are periodically updated on a
%? per specific event basis. For example, signals are f
periodically updated three months after every planned preventive stop, in the summer season or the winter season !
[ every year, or three months after a specific equipment unit is replaced. The third data set may be used to
j observe a certain state of the equipment on the basis of f
\ an equipment condition immediately after a specific event I , i
I is generated (see FIG. 2(c)). f
I j f
\ A statistical learning method is divided into a
I
t ;
S £t learning mode and an execution mode. Each of the multiple
I learning data sets is used to generate a model in the
learning mode, and provides a proper interface, by which a
user can select one of the multiple learning data sets
when the execution mode is commenced. FIG. 3 shows a user
interface in a case in which one of the learning data sets
constructed in FIG. 2 is selected.
(2) Collection data in a case in which the equipment
is normally operated and in a case in which the equipment
18 [
I
i
I
i
|
is not normally operated (Collection of learning data for 1
each operation mode) j
For most equipment, the equipment is started from a j
i state in which the equipment is stopped, an operation state ;
of the equipment is maintained in a predetermined state, and j
then the equipment is stopped after a predetermined time. j
j
Consequently, the mode of the equipment may be divided into 1
i
a start mode, a normal operation mode, and a stop mode. j
i
j
0*s According to circumstances, the operation mode may be i
subdivided. When collecting learning data, data sets are j
I
I
sorted on a per operation mode basis. In a case in which j
i
data are sorted for each of the operation modes, grouping i
reliability is increased, and a model is simplified, whereby j
• accuracy of the overall monitoring system is improved. !
! Consequently, learning data are sorted and collected for f
j each operation mode using the multiple learning data I
| selection method described in paragraph (1). f
\ ^p That is, a model suitable for a corresponding
I operation mode is used .in the execution mode. In a case in
which monitoring is performed only in a specific operation
mode, such monitoring is performed only when data obtained
in an operation condition not exceeding a data range
prepared in the learning mode are input. In a case in which
the state of the system is not different than the above
condition, an alarm indicating that reliability of the j
19 |
I
i
output result is low is issued to a user, or a calculation j
is automatically bypassed. j
(3) Collection of analog data and digital data |
j
If modeling is difficult in using the statistical j
learning method in a case in which learning data include a I
i
digital signal, learning data may be collected using an j
l
analog signal that can substitute for the digital signal. j
j
For example, if modeling of a digital signal indicating an 1 £S open or closed state of a valve is difficult, flow rate, t
pressure, or temperature at a pipe located downstream of 1
t
the valve is included in the learning data so that the open or closed state of the valve can be indirectly known. )
FIG. 4 is a view showing a collection example of analog | data or digital data. In FIG. 4(a), reference symbol Al i denotes an analog signal regarding pressure of an outlet I
S i
j part of the pump 4, reference symbol A2 denotes an analog l
j signal regarding temperature of the outlet part of the %j pump 4, and reference symbol Dl denotes a digital signal
I regarding an ON/OFF state of the pump 4. FIG. 4(b)
I
i
j illustrates a data set in a case in which the use of
digital data is impossible, and FIG. 4(c) illustrates a
data set in a case in which the use of digital data is
possible. :
If kernel regression analysis is used as a model of
the learning data, analog data and digital data may be
20
I
mixed. Also, important digital data must be designated as j
the same group as the learning data. In a grouping method j
j
based only on a linear correlation coefficient used in the j
j
existing statistical learning method, important digital j
data may be lost during grouping. For this reason, a j
i
method of finding an optimal grouping combination, which ;
will be described below, must be utilized. j
In the execution mode, however, the result of a
i
j
£*^ digital signal may be an intermediate value or a value j
I
deviating from 0 or 1 as well as 0 or 1. In this case, it j
I
is determined that indication of opening/closing or j
I
stop/operation that the digital signal means may be j
incorrect. f
(4) Processing of data having the same I
characteristics provided by a plurality of equipment units t
(Creation of imaginary analog/digital tags)
Learning data are not collected on an equipment basis
%^ but on a function basis. In a case in which data having the
i
I same characteristics are provided by a plurality of
equipment units, therefore, imaginary tags are given. In
order to give such imaginary tags, it is assumed that three
of the four pumps 4a, 4b, 4c, and 4d are operated, and the
remaining one is stopped so that it can be operated in case (
of emergency, as shown in FIG. 5. That is, it is assumed ;
that each of the pumps has a capacity of 33.3 %, and three l
21 I
I'
i
of the four pumps must be operated. The four pumps 4a, 4b, 1
4c, and 4d are different equipment units but perform the j
same function. Consequently, learning data must not use j
flow meters or thermometers located at the outlets of the j
i
four pumps 4a, 4b, 4c, and.4d as denoted by HI to ,H4 but use j
a flow meter or thermometer installed at a position at which the four pumps 4a, 4b, 4c, and 4d are joined together as j
denoted by H. If a desired measuring instrument is not
^^ provided at this position, an imaginary tag is created to substitute for a real flow meter or thermometer. An i
imaginary tag is used which is configured as a method of j
summing flow rate of the three operated pumps or averaging j
temperature of the three operated pumps based on operation I
states of the respective pumps. !
A concept of such an imaginary tag may be used to I
indicate a position at which a measuring instrument is not j
really installed although a signal is required, a position j ^^ at which such a measuring instrument cannot be installed, or I
I a physical amount that can be measured. For example, if it
is wished to utilize enthalpy as a signal at the points HI
to H4 of FIG. 5 in addition to the thermometers and
i manometers at the outlet "side positions Hi to H4 of the
pumps 4a, 4b, 4c, and 4d, an imaginary tag of enthalpy, a
function of temperature and pressure, may be made and used
.at the positions HI to H4.
22 i
• ' !
i
(5) Selection of an optimal combination in grouping
data (stepwise variable selection and cross grouping)
j
In order to improve accuracy of grouping, various j
t
kinds of singularity included in learning data must be {
I
s
basically removed. Representative examples of singularity j
I
may include a case in which data are not input, such as I
^ad input' and a case in which data are input but are j
large or small to such an extent that the data temporarily I
J^ deviate excessively from a normal range, such as AOut of i
range.' In a case in which data having such singularity J
are generated, data of all variables acquired at that time J
t
are simultaneously removed to improve reliability of }
learning data. All variables having no change during I
sampling of the learning data are processed as ^Bad input' J
so that the variables cannot function as noise in
modeling.
Learning data include information useful to inform a
i ^P user of the state of a specific equipment unit and |
! information useless to inform the state of a specific
equipment unit. Also, all signals do not indicate states [
of all of the equipment units in the system although the signals include useful information. For this reason, it is f
necessary to group signals' including information useful to I
inspect a state of each of the equipment units to be inspected. When the grouping is performed as. described f
f
• • i
I
23 I
I |
I
\ 1
1
j
i
I
above, it is possible to remove signals including useless j
.information from the learning data, thereby reducing the j
number of signals necessary to monitor a specific j
j
equipment unit to an appropriate level. j
Generally, a correlation coefficient used as a basis j
of grouping in the statistical learning method is analyzed j
i i
with respect to all variable pairs constituting learning data, and is calculated as represented by the following |
(^ mathematical expression 1. If the calculated value of the correlation coefficient is equal to or greater than a set value, the variables are regarded as the learning data. On f
the other hand, if the calculated value of the correlation I
coefficient is less than the predetermined value, the |
variables are excluded from the learning data. The set §
value is input by a user. f
[Mathematical expression 1]
Where, pXy indicates a correlation coefficient between variables X and Y, Xi indicates an i-th value on
the basis of a sampling section of learning data, Yi
indicates an i-th value on the basis of a sampling section >
of learning data (Y is a variable different than X ) , ux
indicates the average of a variable X, uY indicates the
" 24 (
i
I
1 I
i.
I
1
I
average of a variable Y, o~x indicates standard deviation of j
I
i
a variable X, aY indicates standard deviation of a variable j
i
£
-Y, and N indicates the number of data collection intervals I
i
i
in a sampling section of learning data. [
j
However, grouping depending on the correlation I
?,
coefficient as described above has the following two j
problems. f
First, a correlation coefficient between . variables {
£**. which should have a physical relationship is very low with I
the result that the variables may not belong to the same I
group. A correlation coefficient indicates a linear f
relationship between two variables. However, linearity of I
I
two certain variables may be differently analyzed I
depending upon a sampling period of learning data. For |
example, variables, such as an outside air condition, a I
seawater or rainfall condition, and a fuel condition, I
I
affect overall performance of the power generation %r equipment but are not sufficiently reflected in the correlation coefficient since such variables change much more slowly than a process change of the equipment. Such j
variables may be regarded as independent variables of the j
overall system. That is, change of the system does not I
affect such variables, but such variables affect change of f
the system. I
Second, if such variables belong to a specific group, I
25 |
i
I
!
j
i
i
?
the variables cannot- belong to other groups. Since j
I
i
independent variables of the system affect all groups, it $
is necessary for a plurality of groups to have the S
!
i
..independent variables jointly. j
Consequently, a stepwise variable selection method is f
suggested as follows in order to more precisely construct f
R
J |
grouping. j
CD First, variables having a predetermined set value f
M^ or an arbitrary value designated by a user, for f
example a value of 0.8 or more using a correlation f
coefficient are regarded as belonging to the same ;
i
group. |
(2) A 4-fold validation method is used with respect to «••
the variables of the group constructed in ® to f
calculate a smoothness parameter. In the 4-fold I
validation method, learning data are divided into
four equal parts, data corresponding to three
WJ equal parts are used to form an autocorrelation
regression analysis model, and the remaining data
are used to verify the model, which are repeated
in other combinations. In this way, verification f
is performed four times. Among them, the data [
corresponding to three equal parts used to make the autocorrelation regression analysis model are J
referred to as learning data, the data
26 I
|
i
corresponding to the remaining one equal part used j
to verify the • regression analysis model are j
referred to as testing data. Each verification j
j
step is referred to as a run. In the 4-fold }
i
i
- . - validation method, therefore, four runs are }
performed. A square sum of residuals (SSR) between §
an input signal and an output signal is used as an j
I
index indicating excellence of the regression l
^ft analysis model. At this time, the calculated
square sum of residuals (SSR) is defined as SSRi.
(2) Combinations of all different variables, not
variables belonging to the same group, are I
included in the group constructed in ®, and a f
square sum of residuals (SSR) is calculated using the 4-fold validation method while a smoothness |
parameter is calculated. A square sum of residuals of an i-th combination according to the sequence
^ f c * of the combinations is defined as SSRi.
® As shown in a table of FIG. 6 and in a graph of I
FIG. 7, a square sum of residuals (SSR) is decreased as the number of variables belonging to f
- • - "I
a group is increased. However, including too many j
variables in the same group may cause other j
i
problems. For this reason, grouping is terminated in case 4, at which SSRi is slightly reduced. This j
I
27 I
1
F
I
f
!
j
<
i
!
I
i
i
may be normalized as follows. If a decrease ratio I
of a square sum of residuals immediately after a
square sum of specific residuals to the square sum
of specific residuals is equal to or less than a
set value, grouping may be terminated at the time
. . when the square sum of specific residuals is
calculated. Here, the set value may be decided as I
a ratio of a decrease ratio of a square sum of I
ifc residuals in case 5 to the square sum of specific I
residuals in case 4 to a decrease ratio of a I
square sum of residuals in case 4 to the square f
sum of specific residuals in case 3 shown in FIG. j
7. That is, such a set value may be understood as !
a value to sort a state in which a square sum of t
residuals is suddenly slowed or is not decreased f
any more. Consequently, in case of FIGS. 6 and 7, J
variables A, B, C, and F are decided as belonging |
^r to the same group. I
© In actuality, combinations of a great number of f
variables must be considered, and therefore, there {
is a possibility that much time is necessary to f
perform the case of © . In this case, variables I
related to characteristics of the equipment are f
decided as independent variables in consideration of characteristics of the equipment, and the case !
28 !
|
of (3) is performed only with respect to the j
independent variables. I
The second problem is automatically solved using the {
stepwise variable selection method as described above. I
Stepwise variable selection results and cross variable |
grouping results are shown in FIG. 8. Three variables I
A0001, A0002, and A0003 shown in FIG. 8 belong to groups |
1, 2, and 3, respectively. In particular, FIG. 8 shows 1
%^ that a variable A0002 belongs to group 1. j
f
(6) A method of reducing collected data to such an I
extent that learning is actually possible I
i
Learning data that can be actually collected are too f
i
much to be analyzed by the latest computer. In this case, f
|
a huge amount of time is necessary for stepwise variable f
_ selection and cross grouping of (5).
In order to solve this problem, dispersion of a signal j
is performed on the basis of a grid size, and a method of f
^^ reducing the number of data in corresponding data is I
suggested as follows. First, dispersion of a value of a j
specific variable is calculated, and the calculated J
dispersion is set as a reference grid size. A user may set j
i
the reference grid size to be large or small. Next, a grid I
is set for each variable, and real data are dotted in each I
• - (
grid. [
FIGS. 9 and 10 illustrate a case having two variables. I
29 I
I
I
I
First, FIG. 9 shows original data. Grids drawn on the
horizontal axis and the vertical axis are decided by
dispersion sizes of a variable corresponding to the I
horizontal axis and a variable corresponding to the vertical I
axis. I
For a system having two variables, in a case in which variables are divided into grids having a predetermined J
resolution for each variable, and duplicated data are I
^ n removed from one grid, if the duplicated data are present in the grid, the result data can be reduced as shown in FIG. 10. Using this method, it is possible to adjust the size of I"
the grid, thereby calculating an appropriate scale of I
learning data. If the size of the grid is set td be large, |
the number of data is greatly reduced with the result that f
I
i
learning time is decreased; however, accuracy of regression j
|
. .analysis is relatively lowered. On the other hand, if the J
i
{•
size of the grid is set to be small, the number of data is f
^& increased with the result that learning time is increased; I
however, it is possible to acquire relatively accurate I
regression analysis result. Although the resolution of the I
grid may be differently set for each variable, several
thousands of variables or tens of thousands of variables are
generally used for learning in a power plant. For this
reason, setting the resolution of the grid for each variable
is troublesome and not efficient. Consequently, a method of I
30 {
'I
i
I
?
I
I
I
deciding what resolution the grid has through a setting I
interface before learning is proposed. The resolution means |
what equal parts a variable is divided into in the entire I
distribution. That is, as the resolution is set to be f
larger, the variable is divided into smaller grids with the f
result that an amount "of learning data is increased. The I
size of grids GridSizex according to resolution may be j
calculated by the following mathematical expression 2 based _ i
^h - on standard deviation ax of a corresponding variable. j:
[Mathematical expression 2] i
s
lOa^ I
GridSizex=—— |
Resolution I
When the resolution is decided by the learning setting f
interface, a corresponding variable is divided into equal S
t
parts corresponding to the resolution from an average to -5a
t
to +5a thereof. At this time, the reason that the minimum |
- - t
^h value to the maximum value of the variable is not divided j
by the resolution but -5a to +5a of the variable is j
divided by the resolution is that abnormally large or j
small values may be occasionally included in learning [•
S
i
data, and therefore, if the minimum value and the maximum i
'i
value of the variable are used, the grids may be }
I
abnormally distributed. Variables are naturally f
"i
distributed, and therefore, most data are distributed f
i
i
31 j
I
|-
I t
j
I
between -5a to + 5a of the variable. For example, when the resolution is set to 4, the variable may be divided into f
four grids, i.e. a grid of -5a to -2.5 a, a grid of -2.5 a to an average, a grid of the average to +2.5 a, and a grid |
of +2.5 a to +5a. On the other hand, when the resolution j
is set to 2, the variable may be divided into two grids,
i.e. a grid of -5a to an average and a grid of the average }
to +5a. I
^ \ Next, a predetermined rate or a certain rate input by j
- a user is used to reduce the number of data included in each grid. The number of data in each of the grids is reduced according to such a rate. Although data are }
is
|
reduced according to this rate, at least one of the data I
must be left. FIG. 10 shows the remaining data after
removal of data according to the above principle. When a
signal is predicted in kernel regression analysis, the j
distance from all data is converted and reflected. Most j
^ P process variables are normally distributed. Consequently, {
learning data are concentrated upon the central point of I
the entire section. This affects signal prediction with f
the result that prediction values are generally I
concentrated on the center. However, it is difficult to (
completely exclude importance of data occasionally located '
outside. Using this method, the number of data is reduced
in consideration of data distribution, and therefore, it
32 I
is possible to effectively reduce the number of data
without losing important data.
The data compression method may be variously used in ;
the statistical learning method. In order to achieve the
greatest effect, variables must be grouped first, and then
data compression must be performed in the same group. This
is because if the data compression method is applied to a
signal upon which signal processing is not performed,
K^ compression effects may be reduced.
As is apparent from the above description, the
present invention has the effect of collecting learning
data from a database of a computer in a power plant and converting the data into a form in which the data can be
easily learned in realizing a monitoring system for t
analyzing process margin of industrial equipment based on
a statistical learning'method.
Although the preferred embodiments of the present !
invention have been disclosed for illustrative purposes,
• those skilled in the art will appreciate that various
modifications, additions and substitutions are possible,
without departing from the scope and spirit of the
invention as disclosed in the accompanying claims.

We Claim:
1. A data collection method for a process margin
monitoring system of industrial equipment, comprising:
preparing a learning data set based on data
determined to be normal in an operation history of the
industrial equipment so that the learning data set is
sorted for each operation mode;
^^ in a case in which the industrial equipment comprises
a plurality of equipment units performing the same
functions, receiving data for each of the equipment units
and processing the received data as data for the equipment
units;
sorting and grouping associated ones of the data in
the learning data set; and
sampling the collected data to reduce the number of
data.
2. The data collection method according to claim 1,
. . wherein the learning data set comprises a first data set
to an N-th data set (N being a natural number equal to or
greater than 2) depending upon a scale of data to be
collected or time when data are collected.
3. The data collection method according to claim 2,
34
wherein
the first data set comprises signals related to a
specific equipment unit of the industrial equipment for
monitoring process margin of the specific equipment unit,
the second data set comprises signals related to the
entirety of the industrial equipment for monitoring
process margin of the entirety of the industrial
equipment, and
the third data set comprises signals regarding the
entirety or a portion of the industrial . equipment
immediately after a specific event is generated in the
entirety or the portion of the industrial equipment.
4. The data collection method according to claim 1,
further comprising, in a case in which the learning data
set comprises data displayed as digital signals,
collecting analog signal that can substitute for the
^n^ digital signal and converting the digital signal into the
analog signal.
5. The data collection method according to claim 1,
wherein the grouping step comprises:
regarding variables, a correlation coefficient
between which is equal to or greater to a set value, as
belonging to the same group;
35
calculating a smoothness parameter with respect to
the variables regarded as belonging to the same group
using a 4-fold validation method;
putting combinations of all variables in the group
besides the variables regarded as belonging to the same
group to calculate a square sum of residuals while
calculating the smoothness parameter using the 4-fold
validation method; and
^K in a case in which a decrease ratio of a square sum
of residuals immediately after a square sum of specific
residuals to the square sum of specific residuals is equal
to or less than a set value, terminating grouping at a
time when the square sum of specific residuals is
"calculated.
6. The data collection method according to claim 5,
wherein the step of calculating the square sum of
^F residuals comprises sorting and using only variables
related to characteristics of the equipment • among the
variables besides the variables regarded as belonging to
the same group in consideration of characteristics of the
^ . -equipment.
7. The data collection method according to claim 5,
wherein the correlation coefficient is analyzed by the
, 3 6
following mathematical expression.
Where, pxy indicates a correlation coefficient
between variables X and Y, Xi indicates an i-th value on
"the basis of a sampling section of learning data, Yi
indicates an i-th value on the basis of a sampling section
of learning data (Y is a variable different than X) , JJX
indicates the average of a variable X, \1Y indicates the
average of a variable Y, ox indicates standard deviation of
a variable X, Oy indicates standard deviation of a variable
Y, and N indicates the number of data collection intervals
in a sampling section of learning data.
8. The data collection method according to claim 1,
wherein the data sampling step comprises performing
^^ dispersion of a value of a specific variable on the basis of
a grid size to reduce the number of data related to the
variable in a corresponding grid.
9. The data collection method according to claim 1,
wherein the data sampling step comprises calculating
standard deviation (ox) of a value of a specific variable
and reducing the number of data related to the variable in a
37
corresponding grid on the basis of a grid size (GridSizex)
calculated by the following mathematical expression
according to set resolution.
GridSizex"^ ;
Resolution
10. The data collection method according to claim 8 or 9, wherein the number of data left in the grid is
decided by the product of the number of data related to
the variable in the corresponding grid and a set rate, and
at least one of the data is left in each grid.
11. A storage medium for storing a data collection
method according to any one of claims 1 to 9, wherein the data collection method is computer programmed.

Documents

Application Documents

#	Name	Date
1	8428-DELNP-2012-AbandonedLetter.pdf	2019-09-26
1	8428-delnp-2012-GPA-(09-10-2012).pdf	2012-10-09
2	8428-DELNP-2012-FER.pdf	2018-09-06
2	8428-DELNP-2012-Correspondence-Others-(09-10-2012).pdf	2012-10-09
3	8428-DELNP-2012.pdf	2016-11-15
3	8428-delnp-2012-1-Form-18-(09-10-2012).pdf	2012-10-09
4	8428-delnp-2012-Abstract.pdf	2013-08-20
4	8428-delnp-2012-1-Correspondence-Others-(09-10-2012).pdf	2012-10-09
5	8428-delnp-2012-Form-5.pdf	2013-08-20
5	8428-delnp-2012-Claims.pdf	2013-08-20
6	8428-delnp-2012-Form-3.pdf	2013-08-20
6	8428-delnp-2012-Correspondence-others.pdf	2013-08-20
7	8428-delnp-2012-Form-2.pdf	2013-08-20
7	8428-delnp-2012-Description(Complete).pdf	2013-08-20
8	8428-delnp-2012-Form-1.pdf	2013-08-20
8	8428-delnp-2012-Drawings.pdf	2013-08-20
9	8428-delnp-2012-Form-1.pdf	2013-08-20
9	8428-delnp-2012-Drawings.pdf	2013-08-20
10	8428-delnp-2012-Description(Complete).pdf	2013-08-20
10	8428-delnp-2012-Form-2.pdf	2013-08-20
11	8428-delnp-2012-Form-3.pdf	2013-08-20
11	8428-delnp-2012-Correspondence-others.pdf	2013-08-20
12	8428-delnp-2012-Form-5.pdf	2013-08-20
12	8428-delnp-2012-Claims.pdf	2013-08-20
13	8428-delnp-2012-Abstract.pdf	2013-08-20
13	8428-delnp-2012-1-Correspondence-Others-(09-10-2012).pdf	2012-10-09
14	8428-DELNP-2012.pdf	2016-11-15
14	8428-delnp-2012-1-Form-18-(09-10-2012).pdf	2012-10-09
15	8428-DELNP-2012-FER.pdf	2018-09-06
15	8428-DELNP-2012-Correspondence-Others-(09-10-2012).pdf	2012-10-09
16	8428-delnp-2012-GPA-(09-10-2012).pdf	2012-10-09
16	8428-DELNP-2012-AbandonedLetter.pdf	2019-09-26

Search Strategy

1	8428DELNP2012_31-08-2018.pdf