Sign In to Follow Application
View All Documents & Correspondence

Augmenting Telemetry Data To Claim Records To Generate Training Dataset For Machine Failure Prediction

Abstract: State of art the techniques have worked on mapping of machine failure claim records to sensor acquired data. However, there exists technical challenges in practical scenarios, where the recorded claim dates not necessarily relate to the sensor/ telemetry data of actual date of failure of machine. Embodiments of the present disclosure provide a method and system for augmenting telemetry data to claim records for generating training datasets for machine failure prediction. The augmentation utilizes an unique fuzzy score (f-score) approach to indicate proximity of collected time stamped telemetry datasets of each machine among a plurality of machines to be annotated as machine failure instance data. Top-2 f-scores of each machine that map to recorded claim dates for the corresponding machine are identified as the most relevant machine failure telemetry datasets. [To be published with 3]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
11 March 2022
Publication Number
37/2023
Publication Type
INA
Invention Field
ELECTRONICS
Status
Email
Parent Application

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th Floor, Nariman Point Mumbai Maharashtra India 400021

Inventors

1. DAS, Abhisek
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
2. CHATTOPADHYAY, Tanushyam
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
3. GHOSH, Shubhrangshu
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
4. DUTTA, Suvra
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
AUGMENTING TELEMETRY DATA TO CLAIM RECORDS TO
GENERATE TRAINING DATASET FOR MACHINE FAILURE
PREDICTION
Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD
[001] The embodiments herein generally relate to machine failure prediction and, more particularly, to a method and system for augmenting telemetry data to claim records to generate training dataset for machine failure prediction.
BACKGROUND
[002] Machine Learning (ML) is being used widely in intelligent systems that predict a condition they are trained for. Most of the existing ML based solutions, in the industrial IoT (IIoT) use cases like manufacturing, energy and utility industry segments, for prediction of failure of the machine depends on appropriate selection of training datasets. First, and foremost requirement for ML based machine failure predictions are (a) availability of large number annotated samples for building model (training), and then (b) equal / almost equal ratio of failure records and non-failure records, referred as claim records. In the real world or in practical scenarios, for the above mentioned IIoT use cases, there is hardly any properly annotated dataset, that satisfy above requirements, available. Datasets acquired for generating training dataset from the industrial setups, sensors etc., are real life datasets with the constraints such as: (i) Sparsity in size: very few relevant training samples are present, (ii) Skewed Data: In most of the use cases mentioned, non-failure records form the majority in the dataset, (iii) There is no annotation for the actual failure timestamp, (iv) In some use cases, metadata of the failure records are available in the form of claimed date of failure, and (v) It is observed from the domain experts that the claimed date of failure might/might not be the actual date of failure. Hence, a scope of augmentation of data arises for prediction of failure date. Current ML training datasets lead to considerable false alarms, since majority of times actual failure date and claim dates recorded for the machine failure are not same, which adds to selection of irrelevant data. Moreover, experimental results shows that if a ML model is build using state of the art training dataset, it results into low fuzzy scores, recall, and precision on actual claim date.

SUMMARY
[003] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.
[004] For example, in one embodiment, a method for augmenting telemetry data to claim records to generate training dataset for machine failure prediction is provided. The method includes obtaining a plurality of claim time stamps from the claim records of a plurality of machines, wherein the plurality of claim time stamps correspond to a recorded machine failure. Further, the method comprises obtaining the telemetry data of the plurality of machines recorded by a plurality of sensors that capture plurality of parameters of the plurality of machines while the plurality of machines are in operating condition. The telemetry data is obtained for a plurality of time stamps prior to a claimed date associated with of each of the plurality of machines in the claim records. Furthermore, the method comprises performing data cleansing on the telemetry data to filter out a final set of parameters from among the plurality of parameters that contribute as a set of significant features in predicting machine failure. The steps of the data cleansing comprise: A) identifying, for each of the plurality of machines, a first set of features from among the plurality of parameters that provide most relevant information for predicting failure state of the machine, wherein the first set of features are identified based on domain knowledge, B) Generating a Data Driven Dependency Graph (D3G) of the first set of features to identify a second set of features from among the first set of features by eliminating the least contributing features in predicting the failure state, from the first set of features, C) Trimming the second set of features to obtain the set of significant features by eliminating a) NULL columns that correspond to empty data for one or more time stamps and b) duplicate features that correspond to repeated or redundant information within the telemetry data, D) Converting Boolean flags in the second set of features, from TRUE or FALSE to binary values of '0' or'1', wherein the Boolean flags represent sensor readings indicating system generated alerts recorded as alarms information in the telemetry

data, wherein the alarms indicate one of: a machine failure indication, and an abnormality not related to machine failure, E) Normalizing values of each of the set of significant features corresponding to each of the plurality of machines in the range of [0,1].
[005] Furthermore, the method comprises computing fuzzy values, from the normalized values, for each of the set of significant features corresponding to each of the plurality of machines obtained for the plurality of time stamps. Furthermore, the method comprises computing an aggregated fuzzy score (f-score) indicative of failure state of machine for each of the plurality of machines for each of the plurality of time stamps. The aggregated fuzzy score is a weighted summation of the values of each of the set of significant features for a corresponding time stamp among a plurality of time stamps of a corresponding machine among the plurality of machines, Further, the method comprises selecting top-n aggregated f-scores, for each of the plurality of machines and comparing them with the corresponding claim time stamp, after aligning the aggregated f-score for each of the plurality of machines for each of the plurality of time stamps. Further, the method comprises augmenting the claim time stamp and the telemetry data corresponding one of the selected top-n aggregated f-scores for a corresponding machine among the plurality of machines if the claim time stamp and the time stamp of one of the selected top-n aggregated f-scores in with an acceptable variation threshold.
[006] Thereafter, the method comprises utilizing the telemetry data corresponding to the mapped claim records for creating the training dataset for building a ML model for the machine failure prediction.
[007] In another aspect, a system for augmenting telemetry data to claim records to generate training dataset for machine failure prediction is provided. The system comprises a memory storing instructions, one or more Input/Output (I/O) interfaces, and one or more hardware processors coupled to the memory via the one or more I/O interfaces, wherein the one or more hardware processors are configured by the instructions to obtain a plurality of claim time stamps from the claim records of a plurality of machines, wherein the plurality of claim time stamps correspond to a recorded machine failure. Further, the system is configured to obtain the

telemetry data of the plurality of machines recorded by a plurality of sensors that capture plurality of parameters of the plurality of machines while the plurality of machines are in operating condition. The telemetry data is obtained for a plurality of time stamps prior to a claimed date associated with of each of the plurality of machines in the claim records. Furthermore, the system is configured to perform data cleansing on the telemetry data to filter out a final set of parameters from among the plurality of parameters that contribute as a set of significant features in predicting machine failure. The steps of the data cleansing comprise A) identifying, for each of the plurality of machines, a first set of features from among the plurality of parameters that provide most relevant information for predicting failure state of the machine, wherein the first set of features are identified based on domain knowledge, B) Generating a Data Driven Dependency Graph (D3G) of the first set of features to identify a second set of features from among the first set of features by eliminating the least contributing features in predicting the failure state, from the first set of features, C) Trimming the second set of features to obtain the set of significant features by eliminating a) NULL columns that correspond to empty data for one or more time stamps and b) duplicate features that correspond to repeated or redundant information within the telemetry data, D) Converting Boolean flags in the second set of features, from TRUE or FALSE to binary values of '0' or'1', wherein the Boolean flags represent sensor readings indicating system generated alerts recorded as alarms information in the telemetry data, wherein the alarms indicate one of: a machine failure indication, and an abnormality not related to machine failure, E) Normalizing values of each of the set of significant features corresponding to each of the plurality of machines in the range of [0,1].
[008] Furthermore, the system is configured to compute fuzzy values, from the normalized values, for each of the set of significant features corresponding to each of the plurality of machines obtained for the plurality of time stamps. Furthermore, the system is configured to compute an aggregated fuzzy score (f-score) indicative of failure state of machine for each of the plurality of machines for each of the plurality of time stamps. The aggregated fuzzy score is a weighted summation of the values of each of the set of significant features for a corresponding

time stamp among a plurality of time stamps of a corresponding machine among the plurality of machines, Further, the system comprises selecting top-n aggregated f-scores, for each of the plurality of machines and comparing them with the corresponding claim time stamp, after aligning the aggregated f-score for each of the plurality of machines for each of the plurality of time stamps. Further, the system comprises augmenting the claim time stamp and the telemetry data corresponding one of the selected top-n aggregated f-scores for a corresponding machine among the plurality of machines if the claim time stamp and the time stamp of one of the selected top-n aggregated f-scores in with an acceptable variation threshold.
[009] Thereafter, the system is configured to utilize the telemetry data corresponding to the mapped claim records for creating the training dataset for building a ML model for the machine failure prediction.
[0010] In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions, which when executed by one or more hardware processors causes a method for augmenting telemetry data to claim records to generate training dataset for machine failure prediction. The method includes obtaining a plurality of claim time stamps from the claim records of a plurality of machines, wherein the plurality of claim time stamps correspond to a recorded machine failure. Further, the method comprises obtaining the telemetry data of the plurality of machines recorded by a plurality of sensors that capture plurality of parameters of the plurality of machines while the plurality of machines are in operating condition. The telemetry data is obtained for a plurality of time stamps prior to a claimed date associated with of each of the plurality of machines in the claim records. Furthermore, the method comprises performing data cleansing on the telemetry data to filter out a final set of parameters from among the plurality of parameters that contribute as a set of significant features in predicting machine failure. The steps of the data cleansing comprise A) identifying, for each of the plurality of machines, a first set of features from among the plurality of parameters that provide most relevant information for predicting failure state of the machine, wherein the first set of features are identified

based on domain knowledge, B) Generating a Data Driven Dependency Graph (D3G) of the first set of features to identify a second set of features from among the first set of features by eliminating the least contributing features in predicting the failure state, from the first set of features, C) Trimming the second set of features to obtain the set of significant features by eliminating a) NULL columns that correspond to empty data for one or more time stamps and b) duplicate features that correspond to repeated or redundant information within the telemetry data, D) Converting Boolean flags in the second set of features, from TRUE or FALSE to binary values of '0' or'1', wherein the Boolean flags represent sensor readings indicating system generated alerts recorded as alarms information in the telemetry data, wherein the alarms indicate one of: a machine failure indication, and an abnormality not related to machine failure, E) Normalizing values of each of the set of significant features corresponding to each of the plurality of machines in the range of [0,1].
[0011] Furthermore, the method comprises computing fuzzy values, from the normalized values, for each of the set of significant features corresponding to each of the plurality of machines obtained for the plurality of time stamps. Furthermore, the method comprises computing an aggregated fuzzy score (f-score) indicative of failure state of machine for each of the plurality of machines for each of the plurality of time stamps. The aggregated fuzzy score is a weighted summation of the values of each of the set of significant features for a corresponding time stamp among a plurality of time stamps of a corresponding machine among the plurality of machines, Further, the method comprises selecting top-n aggregated f-scores, for each of the plurality of machines and comparing them with the corresponding claim time stamp, after aligning the aggregated f-score for each of the plurality of machines for each of the plurality of time stamps. Further, the method comprises augmenting the claim time stamp and the telemetry data corresponding one of the selected top-n aggregated f-scores for a corresponding machine among the plurality of machines if the claim time stamp and the time stamp of one of the selected top-n aggregated f-scores in with an acceptable variation threshold.

[0012] Thereafter, the method comprises utilizing the telemetry data corresponding to the mapped claim records for creating the training dataset for building a ML model for the machine failure prediction.
[0013] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[0015] FIG. 1 is a functional block diagram of a system, for augmenting telemetry data to claim records to generate training dataset for machine failure prediction, in accordance with some embodiments of the present disclosure.
[0016] FIGS. 2A, FIG. 2B and FIG. 2C (collectively referred as FIG. 2) is a flow diagram illustrating a method for augmenting telemetry data to claim records to generate training dataset for machine failure prediction, using the system of FIG. 1, in accordance with some embodiments of the present disclosure.
[0017] FIG. 3 depicts a sample Data Driven Dependency Graph (D3G) used for feature selection, in accordance with some embodiments of the present disclosure.
[0018] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems and devices embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.

DETAILED DESCRIPTION OF EMBODIMENTS
[0019] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[0020] State of the art techniques have worked on mapping of machine failure claim records to sensor acquired data. However, there exists technical challenges in practical scenarios, where the recorded claim dates not necessarily relate to the sensor/ telemetry data of actual date of failure of machine. Embodiments of the present disclosure provide a method and system for augmenting telemetry data to claim records for generating training datasets for machine failure prediction. The augmentation utilizes an unique fuzzy score (f-score) approach to indicate proximity of collected time stamped telemetry datasets of each machine among a plurality of machines to be annotated as machine failure instance data. Top-2 f-scores of each machine that map to recorded claim dates for the corresponding machine are identified as the most relevant machine failure telemetry datasets. Thus, the method provides more accurate identification of machine failure telemetry data, enhancing accuracy of annotated data, and effectively improving the ML model building for accurate machine failure prediction and reduction in false alarms, in practical scenarios and real environments, since availability of abundant training data and obtaining true machine failure data that most closely maps with recorded claims dates remain a challenge.
[0021] Referring now to the drawings, and more particularly to FIGS. 1 through 3, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.

[0022] FIG. 1 is a functional block diagram of a system 100 for augmenting telemetry data to claim records to generate training dataset for machine failure prediction, in accordance with some embodiments of the present disclosure.
[0023] In an embodiment, the system 100 includes a processor(s) 104, communication interface device(s), alternatively referred as input/output (I/O) interface(s) 106, and one or more data storage devices or a memory 102 operatively coupled to the processor(s) 104. The system 100 with one or more hardware processors is configured to execute functions of one or more functional blocks of the system 100.
[0024] Referring to the components of system 100, in an embodiment, the processor(s) 104, can be one or more hardware processors 104. In an embodiment, the one or more hardware processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 104 are configured to fetch and execute computer-readable instructions stored in the memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems including laptop computers, notebooks, hand-held devices such as mobile phones, workstations, mainframe computers, servers, and the like.
[0025] The I/O interface(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface to display the generated target images and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular and the like. In an embodiment, the I/O interface (s) 106 can include one or more ports for connecting to a number of external devices or to another server or devices.
[0026] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or

non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
[0027] Further, the memory 102 includes a database 108. Further, the memory 102 includes modules such as a fuzzy score computation module (not shown) and the like. Further, the memory 102 may comprise information pertaining to input(s)/output(s) of each step performed by the processor(s) 104 of the system100 and methods of the present disclosure. In an embodiment, the database 108 may be external (not shown) to the system 100 and coupled to the system via the I/O interface 106. Functions of the components of the system 100 are explained in conjunction with flow diagram of FIG. 2.
[0028] FIGS. 2A, FIG. 2B and FIG. 2C (collectively referred as FIG. 2) is a flow diagram illustrating a method 200 for augmenting telemetry data to claim records to generate training dataset for machine failure prediction, using the system of FIG. 1, in accordance with some embodiments of the present disclosure.
[0029] In an embodiment, the system 100 comprises one or more data storage devices or the memory 102 operatively coupled to the processor(s) 104 and is configured to store instructions for execution of steps of the method 200 by the processor(s) or one or more hardware processors 104. The steps of the method 200 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG. 1 and the steps of flow diagram as depicted in FIG. 2 followed by a use case example. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods, and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
[0030] Referring to the steps of the method 200, at step 202 of the method 200, the one or more hardware processors 104 obtain a plurality of claim time stamps from the claim records of a plurality of machines. The plurality of claim time stamps, also interchangeably referred as claim records, correspond to a

recorded machine failure. Thus, the above claim records refer to manual recorded entries or dates of failure of any machine, for which corresponding telemetry data is gathered.
[0031] At step 204 of the method 200, the one or more hardware processors 104 obtain the telemetry data of the plurality of machines. The telemetry data is recorded by a plurality of sensors that capture plurality of parameters of the plurality of machines while the plurality of machines are in operating condition. The telemetry data is obtained for a plurality of time stamps prior to a claimed date associated with of each of the plurality of machines in the claim records i.e. the telemetry data represents values of the parameters when each of the plurality of machines are in the operating condition i.e. before the failure.
[0032] Once the claim records and the telemetry data are obtained, at step
206 of the method 200, the one or more hardware processors 104 perform data
cleansing on the telemetry data to filter out telemetry data associated with a final
set of parameters from among the plurality of parameters, wherein the final set of
parameters comprise parameters that have been identified as contributing to a set of
significant features in predicting machine failure. The data cleansing is a critical
process as the large volume of telemetry data collected may have redundant data,
blank data and so on, which needs to be removed to derive more accurate data
representing the machine failure. The steps for, the data cleansing are listed below:
Step 1- Identifying, for each of the plurality of machines, a first set
of features from among the plurality of parameters that provide most relevant
information for predicting failure state of the machine. This first set of
features are identified based on domain knowledge such as from a subject
matter expert and can be automated based on a rule engine implemented for
first feature set selection. It can be understood that when any machine is
monitored in an Industrial IoT (IIoT) environment, large number of sensors
continuously sense data corresponding to various parameters of the machine,
which are later used to derive different insights. When specifically focused
on machine failure prediction, it is critical that only those sensors reading
relevant parameters that contribute significantly to derive insights on

machine failure need to be selected. This ensures eliminating unwanted data processing and improves data relevancy.
Step 2- Generating a Data Driven Dependency Graph (D3G) of the first set of features to identify a second set of features from among the first set by eliminating the least contributing features in predicting the failure state, from the first set of features. Thus, the method enables fine tuning the first feature set by automatically selecting the second feature set that are most relevant. The D3G can use Mutual Information Content (MIC) to construct a tree where root indicates the failure and most influencing factors/sensor observations that has highest impact on machine failure comes in the immediate child node. The D3G selection technique for selection of features is explained below.
1. The domain expert takes a rough look at the plurality of parameters (sensors) that generate the telemetry data and excludes ‘n’ number of features (sensors) that least contribute to providing information about machine failure. Now the total number of features become (N-n), referred herein as the first set of features.
2. The telemetry data (data set) corresponding to the first set of features (N-n) is given as an input to the D3G selection technique. The first set of features, which correspond to sensors and are referred as the parameters are identified as variables. From among the variables, a target variable is identified that specifies the starting point of the D3G graph to be constructed. The target variable is based on end user or client’s requirements.
3. The D3G selection technique constructs a graph as depicted in FIG. 3, starting from the target parameter as an initial node or root node. FIG. 3 depicts D3G generation initiated from a sample target variable such as a throttle position sensor (throt_pos) to generate parent nodes and child nodes connected with each other via an edge. The D3G graph shows the relationships between multiple features and target variable.

4. The D3G selection technique identifies strength of relationships (with value ranging between 0-1) between the variables starting from the target variable. Here each node represents a feature (usually a sensor) and numbers on each edge represents the strengths of relationships between the connecting node (sensor).
5. By default, a node is considered to be disconnected if the strength of relationships becomes zero. Say there are ‘d’ disconnected nodes. These disconnected nodes are removed to obtain reduced features, such as (N-n-d) features.
6. Further, child nodes that have connected edges with parent nodes but lie below a relationship threshold value can be excluded. Say there are ‘t’ such nodes.
7. Thus, (N-n-d-t) features, where (N-n-d-t) << N, provides the second set of features.
The D3G selection technique is one approach , however any feature selection techniques available in the art can be used to reduce large number of features to only most relevant features.
Step 3-Trimming the second set of features to obtain the set of significant features by eliminating a) NULL columns that correspond to empty data for one or more time stamps and b) duplicate features that correspond to repeated or redundant information within the telemetry data.
Step 4- converting Boolean flags in the second set of features, from TRUE or FALSE to binary values of '0' or'1', wherein the Boolean flags represent sensor readings indicating system generated alerts recorded as alarms information in the telemetry data, wherein the alarms indicate one of: a machine failure indication, and an abnormality not related to machine failure.
Step 5-normalizing values of each of the set of significant features corresponding to each of the plurality of machines in the range of [0,1]. [0033] In a use case example of machine failure scenario such as vehicle engine failure, the set of significant features identified includes i) a total mileage

covered , ii) a total time travelled, (iii) a total break time, (iv) an effective time travelled obtained from the total time travelled and the total break time, and (v) average speed obtained from the total mileage covered and the effective time travelled. Each of the set of significant features of each vehicle engine of a plurality of vehicles under observation are computed for each vehicle per day and are normalized in the range of [0,1].
[0034] Once the data is cleansed and normalized, at step 208, the one or more hardware processors 104 compute fuzzy values, from the normalized values, for each of the set of significant features corresponding to each of the plurality of machines obtained for the plurality of time stamps. The fuzzy values can be obtained based on fuzzy rules. In an example scenario, the fuzzy values obtained using defined fuzzy rules for the set of significant features for the vehicle engine comprises:
i) f1 : 1- Total Mileage (normalized),
ii) f2 : 1-EXP(-1*Total_Time)
iii) f3 :1-EXP(-1*Total Break Time)
iv) f4 : 1-EXP(-1*Effective Time)
v) f5 : 1- Avg. Speed( Normalized)
[0035] At step 210, the one or more hardware processors 104 compute an aggregated fuzzy score (f-score) indicative of failure state of machine for each of the plurality of machines for each of the plurality of time stamps. The aggregated fuzzy score is a weighted summation of the values of each of the set of significant features for a corresponding time stamp among a plurality of time stamps of a corresponding machine among the plurality of machines. A plurality of weights, in the weighted summation, corresponding to each of the fuzzy values to generate the aggregated f-score providing a fuzzy multi factor analysis is obtained using Neural network. For the example of the vehicle engine referred above, the aggregated f-score per feature is calculated per vehicle engine per day as f-score = f1*w1+ f2*w2+ f3*w3+…+fnwn., wherein w1 through wn are weights defined for the weighted summation values. These weights can be identified using neural network or domain expert explicitly. In a typical realization the weights of a neural networks are

initialized by the domain experts and then a neural network is used to compute the final weights for each factor (each feature f1 through f5). The fuzzy values and the aggregated f-score is computed by the fuzzy score computation module.
[0036] At step 212, the one or more hardware processors 104 select top-n aggregated f-scores, for each of the plurality of machines and compare them with the corresponding claim time stamp, after aligning the aggregated f-score for each of the plurality of machines for each of the plurality of time stamps in descending order. In one implementation, based the value of n is preconfigured derived from domain knowledge and may be based on implementation specific requirements. For example, first two (top-2, where n=2) f-scores are selected that are indicative of the machine failure date. Thus, option to decide ‘n’ makes the system more flexible, adaptive, and agile.
[0037] At step 214, the one or more hardware processors 104 augment the claim time stamp and the telemetry data corresponding one of the selected top-2 aggregated f-scores for a corresponding machine among the plurality of machines if the claim time stamp and the time stamp of one of the selected top-n aggregated f-scores in with an acceptable variation threshold. The selected top-n fuzzy scores provides multiple options in identifying the failure date. Thus, the failure date can be selected by the system 100 based on user defined rule, for example, select (i) highest fuzzy score having (ii) the failure date nearest to the claim date, but lying withing 3 days of variation ( acceptable variation threshold) w.r.t the recorded claim date.
[0038] The method, at step 216, further utilizes the telemetry data corresponding to the mapped claim records for creating the training dataset for building a ML model for the machine failure prediction.
[0039] The method 200 can be better understood by an illustrative use case, wherein the is data collected for vehicle engines say from vehicles having vehicle identification numbers (vin) as v1 and v2. Each of the vehicles seamlessly send data during their operation captured vis the sensors in the IIoT environment. A large numbers of sensors capture various parameters or features such as: 1) Total_Mileage Total_Time

2) Break Effective_Time
3) Avg Speed (mph)
4) Actual Engine Perc. Torque
5) Actual fuel consumption
6) Battery voltage
7) DEF Pressure DEF Tank Level (T4B)
8) DEF Tank Temp. (T4B)
9) Engine Coolant Temp.
10) Engine Hours Engine Oil Pressure
[0040] Out of theses 10 features, only 5 features are significant for machine failure prediction and are identified using the data cleansing steps mentioned above. Thus, the set of significant features identified includes i) a total mileage covered, ii) a total time travelled, (iii) a total break time, (iv) an effective time travelled obtained from the total time travelled and the total break time , and (v) average speed obtained from the total mileage covered and the effective time travelled. This is depicted in table 1 below:
TABLE 1:

vin
v1 v1 v1 v1 v1 v1 v1 v1 v1 Date Total_
Mileage
(feature1) Total_Time (feature2) Break (feature3) Effective_
Time
(feature4) Avg Speed (mph) (feature5)

7/16/2020 1688 25814 21455 4359 1394.081

7/17/2020 1697 6243 601 5642 1082.808

7/18/2020 1403 5670 600 5070 996.213

7/19/2020 1119 4644 601 4043 996.3888

7/20/2020 1540 5316 601 4715 1175.822

7/21/2020 4952 31024 25312 5712 3121.008

7/22/2020 3554 78090 41712 36378 351.7071

7/23/2020 230 73131 12994 60137 13.76856

7/24/2020 22304 35534 8511 27023 2971.336

v1 v1 v1 v1 v1 v1 v2 v2 v2 v2 v2 v2 v2 v2 7/25/2020 10949 54127 45371 8756 4501.645

7/26/2020 1423 5246 601 4645 1102.863

7/27/2020 10178 33328 7476 25852 1417.329

7/28/2020 27981 83478 47479 35999 2798.178

7/29/2020 42899 85517 41704 43813 3524.899

7/30/2020 18124 86049 43488 42561 1533.009

8/4/2020 4455 30527 10150 20377 787.0638

8/5/2020 2447 67915 35446 32469 271.3111

8/6/2020 2073 33139 8744 24395 305.9151

8/7/2020 974 82597 50252 32345 108.4062

8/8/2020 725 4886 1611 3275 796.9466

8/9/2020 106 4186 1324 2862 133.3333

8/10/2020 4823 31921 3366 28555 608.0476

8/11/2020 0 11888 9416 2472 0
[0041] Table 2 below depicts the normalized fuzzy values f1 through f5 for each of the features1 through feature5. TABLE 2:

vin
v1
v1
v1
v1
v1
v1
v1
v1
v1
v1
v1
v1
v1
v1
v1
v2
v2
v2
v2 Date f1 f2 f3 f4 f5

7/16/2020 0.03417 0.260058 0.444869 0.005633 0.30756479

7/17/2020 0.034381 0.019643 2.13E-05 0.028506 0.23820599

7/18/2020 0.027491 0.012604 0 0.018309 0.218910783

7/19/2020 0.020835 0 2.13E-05 0 0.218949956

7/20/2020 0.030701 0.008255 2.13E-05 0.01198 0.258931681

7/21/2020 0.110666 0.324059 0.527144 0.029754 0.692363119

7/22/2020 0.077902 0.90223 0.876981 0.576443 0.075300323

7/23/2020 0 0.841312 0.264383 1 0

7/24/2020 0.517331 0.379461 0.168754 0.409669 0.659012627

7/25/2020 0.251213 0.607862 0.955033 0.08402 1

7/26/2020 0.027959 0.007395 2.13E-05 0.010732 0.24267487

7/27/2020 0.233143 0.352362 0.146675 0.388794 0.312745014

7/28/2020 0.650378 0.968417 1 0.569687 0.620429163

7/29/2020 1 0.993465 0.876811 0.708988 0.782359054

7/30/2020 0.419368 1 0.914866 0.686669 0.338521056

8/4/2020 0.923699 0.335935 0.180388 0.596893 0.987599271

8/5/2020 0.507361 0.812756 0.697392 1 0.340438254

8/6/2020 0.429815 0.369247 0.151651 0.73084 0.383859044

8/7/2020 0.201949 1 1 0.995866 0.136026993

v2 v2 v2 v2 8/8/2020 0.150321 0.008927 0.005866 0.026769 1

8/9/2020 0.021978 0 0 0.013001 0.167305236

8/10/2020 1 0.353713 0.041735 0.86952 0.76297164

8/11/2020 0 0.098226 0.165386 0 0
[0042] Table 3 below depicts the aggregated f-score calculation based on individual weights w1 through w5 assigned to corresponding features f1 through f5. TABLE 3:

vin
v1
v1
v1
v1
v1
v1
v1
v1
v1
v1
v1
v1
v1
v1
v1
v2
v2
v2
v2
v2
v2
v2
v2 f1*w1 f2*w2 f3*w3 f4*w4 f5*w5 f-score= f1*w1+f2*w2 +f3*w3+f4*w4 +f5*w5

0.96583 0.228993 0.359092 0.005618 0.692435 0.450393456

0.965619 0.019451 2.13E-05 0.028103 0.761794 0.354997709

0.972509 0.012525 0 0.018142 0.781089 0.356853012

0.979165 0 2.13E-05 0 0.78105 0.352047315

0.969299 0.008221 2.13E-05 0.011908 0.741068 0.346103533

0.889334 0.276792 0.409712 0.029315 0.307637 0.382558082

0.922098 0.594336 0.583963 0.438107 0.9247 0.692640626

1 0.568855 0.23232 0.632121 1 0.686659283

0.482669 0.31577 0.155283 0.33613 0.340987 0.326167868

0.748787 0.455486 0.615201 0.080587 0 0.38001216

0.972041 0.007368 2.13E-05 0.010675 0.757325 0.3494859

0.766857 0.296974 0.136426 0.322126 0.687255 0.441927491

0.349622 0.620316 0.632121 0.434297 0.379571 0.483185332

0 0.629709 0.583892 0.507858 0.217641 0.387819947

0.580632 0.632121 0.59943 0.49675 0.661479 0.594082357

0.076301 0.28533 0.165053 0.449481 0.012401 0.19771324

0.492639 0.556366 0.502118 0.632121 0.659562 0.568561182

0.570185 0.308745 0.140712 0.518496 0.616141 0.430855672

0.798051 0.632121 0.632121 0.630597 0.863973 0.711372364

0.849679 0.008888 0.005849 0.026414 0 0.178165805

0.978022 0 0 0.012917 0.832695 0.364726778

0 0.297924 0.040876 0.580847 0.237028 0.231335066

1 0.093556 0.152433 0 1 0.44919788
[0043] Table 4 below depicts the aggregated f-score computed with selection of top-2 f-scores for each vehicle (v1 and v2).

TABLE 4:

vin Date Claim Date F-Score
v1 7/23/2020 7/25/2020 0.686659283
v1 7/22/2020 7/25/2020 0.692640626
v2 8/5/2020 7/25/2020 0.568561182
v2 8/7/2020 7/25/2020 0.711372364
[0044] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[0045] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.

[0046] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[0047] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[0048] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or

stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[0049] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor implemented method (200) for augmenting telemetry data to claim records to create training dataset for machine failure prediction, the method comprising:
obtaining (202), by one or more hardware processors, a plurality of claim time stamps from the claim records of a plurality of machines, wherein the plurality of claim time stamps correspond to a recorded machine failure;
obtaining (204), by the one or more hardware processors, the telemetry data of the plurality of machines, wherein the telemetry data is recorded by a plurality of sensors that capture plurality of parameters of each of the plurality of machines while the plurality of machines are in operating condition, wherein the telemetry data is obtained for a plurality of time stamps prior to a claimed date associated with of each of the plurality of machines in the claim records;
performing (206), by the one or more hardware processors, data cleansing on the telemetry data to filter out a final set of parameters from among the plurality of parameters that contribute as a set of significant features in predicting machine failure, the data cleansing comprising:
a) identifying, for each of the plurality of machines, a first set of features from among the plurality of parameters that provide most relevant information for predicting failure state of the machine, wherein the first set of features are identified based on domain knowledge (206a);
b) generating a Data Driven Dependency Graph (D3G) of the first set of features to identify a second set of features from among the first set of features by eliminating the least contributing features in predicting the failure state, from the first set of features (206b);
c) trimming the second set of features to obtain the set of significant features by eliminating a) NULL columns that

correspond to empty data for one or more time stamps and b) duplicate features that correspond to repeated or redundant information within the telemetry data;
d) converting Boolean flags in the second set of features, from TRUE or FALSE to binary values of '0' or'1', wherein the Boolean flags represent sensor readings indicating system generated alerts recorded as alarms information in the telemetry data, wherein the alarms indicate one of: a machine failure indication, and an abnormality not related to machine failure; and
e) normalizing values of each of the set of significant features corresponding to each of the plurality of machines in the range of [0,1];
computing (208), by the one or more hardware processors, fuzzy values, from the normalized values, for each of the set of significant features corresponding to each of the plurality of machines obtained for the plurality of time stamps;
computing (210), by the one or more hardware processors, an aggregated fuzzy score (f-score) indicative of failure state of machine for each of the plurality of machines for each of the plurality of time stamps, wherein the aggregated fuzzy score is a weighted summation of the values of each of the set of significant features for a corresponding time stamp among a plurality of time stamps of a corresponding machine among the plurality of machines;
selecting (212), by the one or more hardware processors, top-n aggregated f-scores, for each of the plurality of machines and comparing them with the corresponding claim time stamp, after aligning the aggregated f-score for each of the plurality of machines for each of the plurality of time stamps; and
augmenting (214), by the one or more hardware processors, the claim time stamp and the telemetry data corresponding one of the selected

top-n aggregated f-scores for a corresponding machine among the plurality of machines if the claim time stamp and the time stamp of one of the selected top-n aggregated f-scores in with an acceptable variation threshold.
2. The method as claimed in claim 1 further comprising, utilizing the telemetry data corresponding to the mapped claim records for creating the training dataset for building a ML model for the machine failure prediction (216).
3. The method as claimed in claim 1, wherein a plurality of weights, in the weighted summation, corresponding to each of the fuzzy values to generate the aggregated f-score providing a fuzzy multi factor analysis is obtained using Neural network.
4. The method as claimed in claim 1, wherein computing fuzzy values is based on a set of fuzzy rules.
5. The method as claimed in claim 1, wherein the set of significant features when the machine is a vehicle engine comprises i) a total mileage covered , ii) a total time travelled, (iii) a total break time, (iv) an effective time travelled obtained from the total time travelled and the total break time, and (v) average speed obtained from the total mileage covered and the effective time travelled, wherein each of the set of significant features of each vehicle engine of a plurality of vehicles under observation are computed for each vehicle per day and are normalized in the range of [0,1].
6. The method as claimed in claim 5, wherein the fuzzy values for the set of significant features for the vehicle engine comprises
i) f1 : 1- Total Mileage (normalized),
ii) f2 : 1-EXP(-1*Total_Time),
iii) f3 :1-EXP(-1*Total Break Time),
iv) f4 : 1-EXP(-1*Effective Time),
v) f5 : 1- Avg. Speed( Normalized),

wherein the aggregated f-score per feature is calculated per vehicle engine per day as f-score = f1*w1+ f2*w2+ f3*w3+…+fnwn., wherein w1 through wn are weights defined for the weighted summation values.
7. A system (100) for augmenting telemetry data to claim records to create training dataset for machine failure prediction, the system (100) comprising: a memory (102) storing instructions; one or more Input/Output (I/O) interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more I/O interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
obtain a plurality of claim time stamps from the claim records of a plurality of machines, wherein the plurality of claim time stamps correspond to a recorded machine failure;
obtain the telemetry data of the plurality of machines, wherein the telemetry data is recorded by a plurality of sensors that capture plurality of parameters of each of the plurality of machines while the plurality of machines are in operating condition, wherein the telemetry data is obtained for a plurality of time stamps prior to a claimed date associated with of each of the plurality of machines in the claim records;
perform data cleansing on the telemetry data to filter out a final set of parameters from among the plurality of parameters that contribute as a set of significant features in predicting machine failure, the data cleansing comprising:
a) identifying, for each of the plurality of machines, a first set of features from among the plurality of parameters that provide most relevant information for predicting failure state of the machine, wherein the first set of features are identified based on domain knowledge;
b) generating a Data Driven Dependency Graph (D3G) of the first set of features to identify a second set of features from

among the first set of features by eliminating the least contributing features in predicting the failure state, from the first set of features;
c) trimming the second set of features to obtain the set of significant features by eliminating a) NULL columns that correspond to empty data for one or more time stamps and b) duplicate features that correspond to repeated or redundant information within the telemetry data;
d) converting Boolean flags in the second set of features, from TRUE or FALSE to binary values of '0' or'1', wherein the Boolean flags represent sensor readings indicating system generated alerts recorded as alarms information in the telemetry data, wherein the alarms indicate one of: a machine failure indication, and an abnormality not related to machine failure ; and
e) normalizing values of each of the set of significant features corresponding to each of the plurality of machines in the range of [0,1];
compute fuzzy values, from the normalized values, for each of the set of significant features corresponding to each of the plurality of machines obtained for the plurality of time stamps;
compute an aggregated fuzzy score (f-score) indicative of failure state of machine for each of the plurality of machines for each of the plurality of time stamps, wherein the aggregated fuzzy score is a weighted summation of the values of each of the set of significant features for a corresponding time stamp among a plurality of time stamps of a corresponding machine among the plurality of machines;
select top-n aggregated f-scores, for each of the plurality of machines and comparing them with the corresponding claim time stamp, after aligning the aggregated f-score for each of the plurality of machines for each of the plurality of time stamps; and

augment the claim time stamp and the telemetry data corresponding one of the selected top-n aggregated f-scores for a corresponding machine among the plurality of machines if the claim time stamp and the time stamp of one of the selected top-n aggregated f-scores in with an acceptable variation threshold.
8. The system as claimed in claim 7 further comprising, utilizing the telemetry data corresponding to the mapped claim records for creating the training dataset for building a ML model for the machine failure prediction.
9. The system as claimed in claim 7, wherein a plurality of weights, in the weighted summation, corresponding to each of the fuzzy values to generate the aggregated f-score providing a fuzzy multi factor analysis is obtained using Neural network.
10. The system as claimed in claim 7, wherein computing fuzzy values is based on a set of fuzzy rules.
11. The system as claimed in claim 7, wherein the set of significant features when the machine is a vehicle engine comprises i) a total mileage covered , ii) a total time travelled, (iii) a total break time, (iv) an effective time travelled obtained from the total time travelled and the total break time , and (v) average speed obtained from the total mileage covered and the effective time travelled, wherein each of the set of significant features of each vehicle engine of a plurality of vehicles under observation are computed for each vehicle per day and are normalized in the range of [0,1].
12. The system as claimed in claim 11, wherein the fuzzy values for the set of significant features for the vehicle engine comprises:
i) f1 : 1- Total Mileage (normalized),
ii) f2 : 1-EXP(-1*Total_Time)
iii) f3 :1-EXP(-1*Total Break Time)

iv) f4 : 1-EXP(-1*Effective Time)
v) f5 : 1- Avg. Speed( Normalized) and wherein the aggregated f-score per feature is calculated per vehicle engine per day as f-score = f1*w1+ f2*w2+ f3*w3+…+fnwn., wherein w1 through wn are weights defined for the weighted summation values.

Documents

Application Documents

# Name Date
1 202221013458-STATEMENT OF UNDERTAKING (FORM 3) [11-03-2022(online)].pdf 2022-03-11
2 202221013458-REQUEST FOR EXAMINATION (FORM-18) [11-03-2022(online)].pdf 2022-03-11
3 202221013458-PROOF OF RIGHT [11-03-2022(online)].pdf 2022-03-11
4 202221013458-FORM 18 [11-03-2022(online)].pdf 2022-03-11
5 202221013458-FORM 1 [11-03-2022(online)].pdf 2022-03-11
6 202221013458-FIGURE OF ABSTRACT [11-03-2022(online)].jpg 2022-03-11
7 202221013458-DRAWINGS [11-03-2022(online)].pdf 2022-03-11
8 202221013458-DECLARATION OF INVENTORSHIP (FORM 5) [11-03-2022(online)].pdf 2022-03-11
9 202221013458-COMPLETE SPECIFICATION [11-03-2022(online)].pdf 2022-03-11
10 202221013458-FORM-26 [22-06-2022(online)].pdf 2022-06-22
11 Abstract1.jpg 2022-07-12
12 202221013458-FER.pdf 2024-08-29
13 202221013458-OTHERS [18-12-2024(online)].pdf 2024-12-18
14 202221013458-FER_SER_REPLY [18-12-2024(online)].pdf 2024-12-18
15 202221013458-CLAIMS [18-12-2024(online)].pdf 2024-12-18

Search Strategy

1 Search_Stratey_202221013458E_23-08-2024.pdf