Sign In to Follow Application
View All Documents & Correspondence

Detecting Machine Failure From Unannotated Time Series Data By Identifying Anomaly

Abstract: State of art techniques unsupervised anomaly detection algorithms use ML based approaches, which are computation intensive, and prediction largely depends on rightly annotated large volume of time series data required during ML model. Embodiments of the present disclosure provide a method and system for detecting machine failure from unannotated time series data by identifying anomaly. The method disclosed analyzes and processes unannotated time series data received from a plethora of sensors to localize suspected times stamps corresponding to machine failure data using unsupervised statistical analysis. Domain knowledge is utilized to automatically segregate and confirm the suspected time stamps as machine failure data and anomalous data. The identified machine failure data is further validated using validation criteria based on geo-temporal information associated with he identified machine failure data. [To be published with 1B]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
11 March 2022
Publication Number
37/2023
Publication Type
INA
Invention Field
ELECTRONICS
Status
Email
Parent Application

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th Floor, Nariman Point Mumbai Maharashtra India 400021

Inventors

1. CHATTOPADHYAY, Tanushyam
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
2. DAS, Abhisek
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
3. DUTTA, Suvra
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
4. KALIRAJ, Senthil Kumar
755 W Big Beaver Rd Suite 800 Troy MI USA 48084
5. GHOSH, Shubhrangshu
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
6. MISRA, Prateep
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
DETECTING MACHINE FAILURE FROM UNANNOTATED TIME
SERIES DATA BY IDENTIFYING ANOMALY
Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD
[001] The embodiments herein generally relate to automated machine failure detection and, more particularly, to a method and system for detecting machine failure from unannotated time series data by identifying anomaly.
BACKGROUND
[002] Machines are a crucial part of most industries. Machines, specifically the vehicle engines, are central to the automobile and aviation industry. Remotely and seamlessly monitoring machine health by capturing machine status via a plethora of sensors in an Industrial Internet of Things (IIoT) environment is a growing trend and a need. However, a key concern in the manufacturing and the aviation industry is determining anomalies from large volume of seamlessly generated unannotated time series data generated by the sensors attached to the machine, with each sensor providing 24X7 data. This large volume of data obtained from the sensors is a real challenge for a domain expert, who manually must annotate all times series data. The major challenge in identifying true machine failure data is because machine parts sometimes go for only minor maintenance and sometimes it needs to be replaced totally. Many a times, due to lack of context associated with the sensor reading, the readings may falsely indicate machine failure and lead to wrong annotation. Existing methods mostly rely on unsupervised anomaly detection algorithms for detecting machine failures, where signal processing-based approach and unsupervised ML based approaches are used. However, Machine Learning (ML) based approaches are computation intensive and prediction largely depends on rightly annotated large volume of time series data required during ML model training. However, as discussed above automated data annotation itself is a technical challenge. Another existing method for unsupervised multivariate relational fault detection system for a vehicle requires a dedicated control system for fault detection, that adds on the hardware requirement increasing cost of implementation.

SUMMARY
[003] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.
[004] For example, in one embodiment, a method for detecting machine failure from unannotated time series data by identifying anomaly is provided. The method includes receiving a time series data captured by a plurality of sensors associated with a machine among a plurality of machines, wherein the time series data of each of the plurality of sensors is unannotated and is arranged in a plurality of rows in accordance with a plurality of time stamps, wherein each of the plurality of rows represent the time series data of each of the plurality of sensors for a corresponding time stamp. Further, the method includes localizing a plurality of suspected rows from among the plurality of rows, wherein the plurality of suspected rows correspond to a plurality of suspected times stamps from among the plurality of time stamps corresponding to a machine failure data of the machine, wherein localizing the plurality of suspected time stamps comprises: A) Uniformly sampling the time-series data of each of the plurality of sensors over the plurality of timestamps to obtain a sampled time series data. B) Computing a first order derivative of the sampled time series data of the plurality of rows to obtain a first order time series data of each of the plurality of sensors. C) Identifying whether the first order time series data for each of the plurality of sensors follows a Gaussian distribution or a skewed distribution, wherein if the first order time series data is identified to be following the skewed distribution, a Box-Cox technique is applied on the first order time series data to convert the first order time series data having the skewed distribution to the Gaussian distribution. D) Computing a median and a standard deviation of the first order timeseries data, of each of the plurality of sensors, having the Gaussian distribution. E) Determining (204e), an anomaly threshold of the Gaussian distribution based on the median and the standard deviation for the first order time series data of each of the plurality of sensors. F) Localizing a set of timestamps from among the plurality of timestamps of each of

the plurality of sensors that lie below the anomaly threshold of the Gaussian distribution as the plurality of suspected time stamps for each of the plurality of sensors of the machine. Furthermore, the method includes segregating the plurality of suspected time stamps as one of the machine failure data and an anomalous time series data based on a domain knowledge associated with known machine behavior, wherein majority voting-based ensemble is approach is applied among the plurality of sensors to identify the machine failure data, and wherein segregation is performed for the plurality of suspected time stamps that experience a zero-reset condition of one or more of the plurality of sensors corresponding to one or more cumulative parameters of the machine. Further, the method includes validating one or more suspected time stamps, from among the plurality of suspected time stamps, segregated as the machine failure data using a validation criterion based on geo-temporal information associated with plurality of suspected time stamps.
[005] In another aspect, a system for detecting machine failure from unannotated time series data by identifying anomaly is provided. The system comprises a memory storing instructions; one or more Input/Output (I/O) interfaces; and one or more hardware processors coupled to the memory via the one or more I/O interfaces, wherein the one or more hardware processors are configured by the instructions to receive a time series data captured by a plurality of sensors associated with a machine among a plurality of machines, wherein the time series data of each of the plurality of sensors is unannotated and is arranged in a plurality of rows in accordance with a plurality of time stamps, wherein each of the plurality of rows represent the time series data of each of the plurality of sensors for a corresponding time stamp. Further, the system is configured to localize a plurality of suspected rows from among the plurality of rows, wherein the plurality of suspected rows correspond to a plurality of suspected times stamps from among the plurality of time stamps corresponding to a machine failure data of the machine, wherein localizing the plurality of suspected time stamps comprises: A) Uniformly sampling the time-series data of each of the plurality of sensors over the plurality of timestamps to obtain a sampled time series data. B) Computing a first order derivative of the sampled time series data of the plurality of rows to obtain a first

order time series data of each of the plurality of sensors. C) Identifying whether the first order time series data for each of the plurality of sensors follows a Gaussian distribution or a skewed distribution, wherein if the first order time series data is identified to be following the skewed distribution, a Box-Cox technique is applied on the first order time series data to convert the first order time series data having the skewed distribution to the Gaussian distribution. D) Computing a median and a standard deviation of the first order timeseries data, of each of the plurality of sensors, having the Gaussian distribution. E) Determining (204e), an anomaly threshold of the Gaussian distribution based on the median and the standard deviation for the first order time series data of each of the plurality of sensors. F) Localizing a set of timestamps from among the plurality of timestamps of each of the plurality of sensors that lie below the anomaly threshold of the Gaussian distribution as the plurality of suspected time stamps for each of the plurality of sensors of the machine. Furthermore, the system is configured to segregate the plurality of suspected time stamps as one of the machine failure data and an anomalous time series data based on a domain knowledge associated with known machine behavior, wherein majority voting-based ensemble is approach is applied among the plurality of sensors to identify the machine failure data, and wherein segregation is performed for the plurality of suspected time stamps that experience a zero-reset condition of one or more of the plurality of sensors corresponding to one or more cumulative parameters of the machine. Further, the system is configured to validate one or more suspected time stamps, from among the plurality of suspected time stamps, segregated as the machine failure data using a validation criterion based on geo-temporal information associated with plurality of suspected time stamps.
[006] In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions, which when executed by one or more hardware processors causes a method for detecting machine failure from unannotated time series data by identifying anomaly. The method includes receiving a time series data captured by a plurality of sensors associated with a machine among a plurality of machines,

wherein the time series data of each of the plurality of sensors is unannotated and is arranged in a plurality of rows in accordance with a plurality of time stamps, wherein each of the plurality of rows represent the time series data of each of the plurality of sensors for a corresponding time stamp. Further, the method includes localizing a plurality of suspected rows from among the plurality of rows, wherein the plurality of suspected rows correspond to a plurality of suspected times stamps from among the plurality of time stamps corresponding to a machine failure data of the machine, wherein localizing the plurality of suspected time stamps comprises: A) Uniformly sampling the time-series data of each of the plurality of sensors over the plurality of timestamps to obtain a sampled time series data. B) Computing a first order derivative of the sampled time series data of the plurality of rows to obtain a first order time series data of each of the plurality of sensors. C) Identifying whether the first order time series data for each of the plurality of sensors follows a Gaussian distribution or a skewed distribution, wherein if the first order time series data is identified to be following the skewed distribution, a Box-Cox technique is applied on the first order time series data to convert the first order time series data having the skewed distribution to the Gaussian distribution. D) Computing a median and a standard deviation of the first order timeseries data, of each of the plurality of sensors, having the Gaussian distribution. E) Determining (204e), an anomaly threshold of the Gaussian distribution based on the median and the standard deviation for the first order time series data of each of the plurality of sensors. F) Localizing a set of timestamps from among the plurality of timestamps of each of the plurality of sensors that lie below the anomaly threshold of the Gaussian distribution as the plurality of suspected time stamps for each of the plurality of sensors of the machine. Furthermore, the method includes segregating the plurality of suspected time stamps as one of the machine failure data and an anomalous time series data based on a domain knowledge associated with known machine behavior, wherein majority voting-based ensemble is approach is applied among the plurality of sensors to identify the machine failure data, and wherein segregation is performed for the plurality of suspected time stamps that experience a zero-reset condition of one or more of the plurality of sensors corresponding to

one or more cumulative parameters of the machine. Further, the method includes validating one or more suspected time stamps, from among the plurality of suspected time stamps, segregated as the machine failure data using a validation criterion based on geo-temporal information associated with plurality of suspected time stamps.
[007] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[008] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[009] FIG. 1A is a functional block diagram of a system, for detecting machine failure from unannotated time series data by identifying anomaly, in accordance with some embodiments of the present disclosure.
[0010] FIG. 1B depicts architectural and process overview of the system of FIG. 1A, in accordance with some embodiments of the present disclosure.
[0011] FIGS. 2A and 2B (collectively referred as FIG. 2) is a flow diagram illustrating a method for detecting machine failure from unannotated time series data by identifying anomaly, using the system of FIGS. 1A and 1B, in accordance with some embodiments of the present disclosure.
[0012] FIG. 3 depicts a histogram of normalized data of a sensor capturing the idle time reading, in accordance with some embodiments of the present disclosure.
[0013] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems and devices embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and

so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
DETAILED DESCRIPTION OF EMBODIMENTS
[0014] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[0015] State of the art techniques for unsupervised anomaly detection use Machine Learning (ML) based approaches, which are computation intensive, and prediction largely depends on rightly annotated large volume of time series data, which is required during training of the ML model. Embodiments of the present disclosure provide a method and system for detecting machine failure from unannotated time series data by identifying anomaly. The method disclosed analyzes and processes unannotated time series data received from a plethora of sensors to localize suspected times stamps corresponding to machine failure data using unsupervised statistical analysis. Domain knowledge is utilized to automatically segregate and confirm the suspected time stamps into machine failure data and anomalous data. The identified machine failure data is further validated using validation criteria based on geo-temporal information associated with the identified machine failure data.
[0016] Referring now to the drawings, and more particularly to FIGS. 1A through 3, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.

[0017] FIG. 1A is a functional block diagram of a system 100 for detecting machine failure from unannotated time series data by identifying anomaly, in accordance with some embodiments of the present disclosure.
[0018] In an embodiment, the system 100 includes a processor(s) 104, communication interface device(s), alternatively referred as input/output (I/O) interface(s) 106, and one or more data storage devices or a memory 102 operatively coupled to the processor(s) 104. The system 100 with one or more hardware processors is configured to execute functions of one or more functional blocks of the system 100.
[0019] Referring to the components of system 100, in an embodiment, the processor(s) 104, can be one or more hardware processors 104. In an embodiment, the one or more hardware processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 104 are configured to fetch and execute computer-readable instructions stored in the memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems including laptop computers, notebooks, hand-held devices such as mobile phones, workstations, mainframe computers, servers, and the like.
[0020] The I/O interface(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface to display the generated target images and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular and the like. In an embodiment, the I/O interface (s) 106 can include one or more ports for connecting to a number of external devices or to another server or devices.
[0021] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or

non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
[0022] Further, the memory 102 includes a database 108 that stores the unannotated time series data received from plethora of sensors and the like. Further, the memory 102 includes modules such as localization module, machine failure data identification module, validation module and the like as depicted in FIG. 1B. Further, the memory 102 may comprise information pertaining to input(s)/output(s) of each step performed by the processor(s) 104 of the system100 and methods of the present disclosure. In an embodiment, the database 108 may be external (not shown) to the system 100 and coupled to the system via the I/O interface 106. Functions of the components of the system 100 are explained in conjunction with FIG. 1B depicting architectural and process overview of the system of FIG. 1A and flow diagram of FIG. 2.
[0023] FIGS. 2A and 2B (collectively referred as FIG. 2) is a flow diagram illustrating a method 200 for detecting machine failure from unannotated time series data by identifying anomaly, using the system of FIGS. 1A and 1B, in accordance with some embodiments of the present disclosure.
[0024] In an embodiment, the system 100 comprises one or more data storage devices or the memory 102 operatively coupled to the processor(s) 104 and is configured to store instructions for execution of steps of the method 200 by the processor(s) or one or more hardware processors 104. The steps of the method 200 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIGS. 1A and 1B and the steps of flow diagram as depicted in FIG. 2. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods, and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.

[0025] Referring to the steps of the method 200, at step 202 of the method 200, the one or more hardware processors 104 receive time series data captured by a plurality of sensors associated with a machine among a plurality of machines in the environment under observation. For example, the machine can be a vehicle engine from among the plurality of vehicle engines whose time series data is to be analyzed for machine (engine) failure detection. The time series data of each of the plurality of sensors is unannotated, which is then arranged in a plurality of rows as a tubular data in accordance with a plurality of time stamps of the timeseries data. Each of the plurality of rows represent the time series data of each of the plurality of sensors for a corresponding time stamp. Some of the sensor readings are cumulatively increasing (w.r.t. time) in nature while some are not cumulatively increasing. For a vehicle engine, the cumulatively increasing sensor readings, also referred as cumulative records, include total distance travelled, total fuel consumed, and so on which is not reset after a trip gets completed. The sensor data further includes positional information like latitude and longitude at the time the sensor sensed the data for the corresponding time stamp.
[0026] Let, the sensors be represented as S1, S2, , Sp (for p sensors), and
time stamps are represented as T1, T2 ..., Tq (for q time stamps) per sensor. Let, R(i,j) represents i-th row and j-th column of the above-mentioned tabular format, which in turn represents Sj sensor reading for the time-stamp Ti.
[0027] At step 204 of the method 200, the one or more hardware processors 104 localize via the localization module executed by the one or more hardware processors, a plurality of suspected rows from among the plurality of rows corresponding to a plurality of suspected times stamps from among the plurality of time stamps. The plurality of suspected rows that are localized correspond to a machine failure data of the machine. The steps performed by the localization module for localizing the plurality of suspected time stamps are as below:
a) First the time-series data of each of the plurality of sensors is uniformly sampled (204a), over the plurality of timestamps to obtain sampled time series data. Thus, the received original time-series data for each sensor not necessarily will have values at every time stamp,

and hence undergoes uniform sampling using standard uniform sampling technique to generate unform data for further processing. Post uniform sampling, the modified time series data in the tabular format is such that any 2 consecutive time stamps, like (T2, T3) or (T3, T4) etc. have constant time-gap. For example, uniformly sampled time stamps would be like 2020-03-31~18:30:00, 2020-03-31~18:31:00, 2020-03-31~18:32:00 etc.
b) At step (204b), a first order derivative of the sampled time series data of the plurality of rows is computed to obtain a first order time series data of each of the plurality of sensors. Let, first order derivative for the sensor Sj is Dj. Then Dj = R (i+1, j) – R (i, j) for all i. The method suggests the first order derivative approach because for sensor readings that are cumulative in nature, the difference between successive reading is more insightful with respect to further analysis than mere absolute readings. As mentioned earlier, cumulative type of sensors includes sensors corresponding to fuel consumption, total distance travelled etc. For example, before step 204 b, the ‘total distance travelled’ sensor would have consecutive values like ..., 830, 875, 960, 993, ... etc. After first order derivative the same column representing the same sensor has the corresponding values like ..., 45, 85, 33, .... etc. Thus, at step 204b, the sensor reading available are modified sensor readings.
c) The modified sensor readings for each sensor ideally should follow Gaussian (i.e., Normal) distribution, or in the other words, values corresponding to each column/sensor should follow Gaussian distribution. Thus, at step (204c), it is identified, whether the first order time series data for each of the plurality of sensors follows a Gaussian distribution or a skewed distribution. If the first order time series data is identified to have the skewed distribution, a known Box-Cox technique is applied on the first order time series data to

convert the first order time series data having the skewed distribution to the Gaussian distribution, also referred as normal distribution. The method 220 applies the skew to normal conversion since normality is an important assumption in the statistical technique and if the data is bit skewed then converting it into Normal form enables to carry out important anomaly detection related tests on the data. FIG. 3 depicts an example histogram on the normalized data for the sensor readings corresponding to ‘idle-time’.
d) Post converting first order time series data associated with all sensors to follow Gaussian distribution, at step 204d a median and a standard deviation of the first order timeseries data, of each of the plurality of sensors, is computed. Thus, the median (m) of the distribution for each of the sensor under consideration and the standard deviation (ơ) of the distribution for each of the sensor under consideration is computed.
e) Determine (204e), an anomaly threshold of the Gaussian distribution based on the median and the standard deviation for the first order time series data of each of the plurality of sensors. Thus, the anomaly threshold is τ = m-2* ơ. Normally, in normal or Gaussian distribution the standard statistical approach utilizes mean (µ) and ơ for threshold computation. However, it is observed that, specifically for vehicle engines, because of the replacement of some machine parts the Dj or δs (first order derivative value) drastically reduces at those points, thus the mean gets biased towards those anomalous points. Thus, the method disclosed, utilizes the median instead of mean, which reduces the bias towards the anomalous points. The anomalous points are those sensor readings whose values strongly deviate from values of majority of the sensor reading captured by the respective sensor. As mentioned above, the method uses median as measure of central tendency to identify the anomalous points.

f) Localizing (204f), a set of timestamps from among the plurality of timestamps of each of the plurality of sensors that lie below the anomaly threshold of the Gaussian distribution. These set of time stamps are marked as the plurality of suspected time stamps for each of the plurality of sensors of the machine.
Steps 204d, 204e, 204f are explained with an example and the
normalized sensor data of FIG. 3. Consider a single cumulative type
of sensor like idle time. At this stage, the values under the column
‘idle-time’ are normally distributed. In normal/Gaussian
distribution, majority of the sample values lie within the region
specified by μ ± 2σ where µ and σ are the mean and the standard
deviation of the distribution respectively. For the current context,
median(m) is used in place of the mean. So obviously any value
which lies outside the range m±2σ would be susceptible to be
anomalous. For the current context, m-2σ is treated as the
anomaly threshold and any value which is lesser than this threshold
is suspected one (suspected time stamp). Referring to FIG.3, let, the
median of the ‘idle-time’ distribution be, m = 20000 unit and
standard deviation of the ‘idle-time’ distribution, σ= 9000 unit. So,
the threshold here is m - 2σ = 2000 unit. So, any row (i.e., data
point with the time stamp) with the corresponding ‘idle-time’ value
less than 2000 unit would be suspected. Although, in the example it
is shown for the single column (‘idle-time’ corresponding to a
sensor), the same process is followed for all the columns
corresponding to the plurality of sensors.
[0028] Once the suspected time stamps are identified, then at step 206, the
machine failure data identification module executed by the one or more hardware
processors 104 segregates the plurality of suspected time stamps as one of the
machine failure data and an anomalous time series data based on the domain
knowledge associated with known machine behavior. The segregation or
confirmation utilizes majority voting-based ensemble among the plurality of

sensors to identify the machine failure data, The segregation is performed for the plurality of suspected time stamps that experience a zero-reset condition of one or more of the plurality of sensors corresponding to one or more cumulative parameters of the machine. As a result of the previous step, there would be a large collection of the suspected time stamps and hence the data points (i.e., rows from the tabular view). But many of them might be false positive as they might represent anomalous/faulty sensor readings or minor vehicle repairing that led to sensor value reset, but NOT the vehicle engine replacement (machine failure) which is the ultimate target. So, the collection needs to be refined. One of the refinement strategies would be to consider the data around 5 hours for each suspected time stamp (i.e., before 5 hours and after 5 hours from the suspected time stamp). And within that time-window, the following are checked:
a) If any of the cumulative type sensors is reset. Generally, the cumulative type sensors (like total fuel consumption, total running hour) readings keep increasing but in case of vehicle engine replacement, it is reset to zero and again keeps on increasing.
b) The number of cumulative type sensors which are reset (also known as cumulative sensor reset count)
[0029] Only those suspected time stamps which have large cumulative
sensor reset count are selected (based on the majority voting-based ensemble) to
proceed further. E.g., Suppose there are 10 cumulative type sensors, then only those
suspected time stamps for which cumulative sensor reset count >= 5 are
considered further. This data is identified as machine failure data.
[0030] Let m (0<=m<=n) sensors have witnessed a zero reset within this time interval of five hours from the suspected time. Now if m>=n/2 the method disclosed at step 206 confirms that a machine replacement has occurred at the suspected time interval, else even though there is zero reset, the suspected time stamp is rejected and marked as minor servicing work and not a machine failure.
[0031] After the machine failure data is identified, at step 208 the system 100 validates (208) one or more time stamps from machine failure data using a validation criterion based on geo-temporal information associated with plurality of

suspected time stamps. The validation criteria validate the identified machine failure data based on the geo-temporal information by:
a) Confirming whether at least one service center of the machine is detected within a predefined radius from a location associated with a time stamp when the machine experienced the zero-reset condition of the one or more plurality of sensors associated with the one or more cumulative parameters. The occurrence of a geographical anomaly is noted on detecting absence of the at least one service center within the predefined radius; and
b) Computing and determining whether idle time of the machine, at the location, after confirming presence of one or more service centers is greater than a repair time threshold, wherein occurrence of a time anomaly is noted if the idle time is equal to or less than the repair time threshold.
It is to be noted that simultaneous occurrence of the geographical
anomaly and the temporal anomaly is used to confirm or validate the
machine failure.
[0032] The refined collection of the suspected time stamps is further filtered
using vehicles’ geographical information (latitude, longitude) and the domain
knowledge. It is checked within the 5 hours window around each suspected time
stamp if the vehicle stays at the same place for a considerably longer duration (i.e.,
its latitude, longitude readings change is insignificantly small for sufficiently longer
duration). E.g., latitude, longitude change is within 1 degree for 3 hours.
Additionally, it is also checked whether there is one or more vehicle service
station(s) nearby that latitude, longitude where the vehicle is staying for the longer
duration. The suspected time stamps which satisfy all the above conditions are
finally selected as the vehicle engine replacement points or validate machine failure
data.
[0033] Geo-Temporal information-based validation: Following steps are executed to validate the confirmed replacement points:
a) List the latitude and longitude coordinates at the failure time stamp

b) Go to map application to locate those places
c) Search service center or maintenance shop nearest to that v position using web-based retrieval using any normal web search engine.
d) If any such center is found within the proximity of the suspected region within a radius of distance r
e) The radius r is derived from the confidence level of the positional sensor used (e.g. if the GPS sensor used as a positional sensor gives an error of 100 meters, r = 100 meters is used)
[0034] Temporal validation: Next compute the time (t) consumed at the identified location. This assumes that any machine replacement is a time-consuming event. If the machine part under maintenance is localized in a particular geographical location for a time greater than its average waiting time it is temporally confirmed, as well. The temporal validation process is as below:
a) Compute the average idle time of the machine
b) Estimate the idle times using Poison distribution
c) If the idle time under suspected maintenance is greater than the statistically obtained threshold from the Poisson distribution, the point is marked as temporally validated maintenance point.
[0035] Results:

vin Replace befor afte befor after befor befor after- after Repl
ment e- idle r- e- fuel e- lat e-long lat - acem
time idle fuel total long ent
V1 time total Point s

5/23/20 14927 228 4803 14.80 35.07 - 35.07 - Mem
20 11:22 40 48 1 644 576 89.97 18 61 89.97 11 phisT ruck. com
V2 LLC

4/29/20 12038 375 6609 1.896 37.24 - 37.24 - Prim
20 9:39 40 48 3.749 09 811 93.23 73 93.22 e Inc
V3 9 02 948

5/24/20 22293 900 7868 1 39.79 - 39.79 - Blue
20 13:00 00 0 263 104.9 06 13 104.9 04 Beac
on
Truc
k
Wash
of
Com
merc
e
City

Table 1
[0036] The above table 1 is the snippet of the final result. Not all the columns are included here for the brevity. Each row represents an engine replacement point found for a vehicle identified by the ‘vin’ (V1, V2, V3). Approximate replacement time is given by ‘Replacement’ value. As a sample, 2 cumulative type sensors, namely ‘idle-time’ and ‘fuel-total’ are taken and their corresponding values before and after the engine replacement (within the 5 hours window around the engine replacement point) are given. These values evidently shows that these sensors were reset within the time window. Also, latitude, longitude values before and after the replacement are given (in the same time window). These values indicate that the vehicle was at the same place. And the ‘ReplacementPoints’ value shows the presence of the specific nearby vehicle service station where the suspected vehicle engine replacement might take place.
[0037] Thus, the method and system disclosed herein provides a semi-supervised statistical approach, which is interpretable, unlike the Machine Learning (ML) based approaches, which have ML models that are black boxes. Thus, as well understood explainability of predictions made by the ML models is challenging. Furthermore, unlike the ML based approaches that are computation and cost

intensive requiring powerful processors and trainings with large, annotated datasets, the method disclosed can be implemented using general purpose processors and handles unannotated data to derive machine failure information. Additionally, usage of domain knowledge applied to the statistical findings enables more accurate detection of machine failure points for a vehicle. Thus, the above points make the method disclosed easily implementable, enhancing usability and utility across industries.
[0038] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[0039] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.

[0040] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[0041] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[0042] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or

stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[0043] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor implemented method (200), the method comprising:
receiving (202), by one or more hardware processors, a time series data captured by a plurality of sensors associated with a machine among a plurality of machines, wherein the time series data of each of the plurality of sensors is unannotated and is arranged in a plurality of rows in accordance with a plurality of time stamps, wherein each of the plurality of rows represent the time series data of each of the plurality of sensors for a corresponding time stamp;
localizing (204), by the one or more hardware processors, a plurality of suspected rows from among the plurality of rows, wherein the plurality of suspected rows correspond to a plurality of suspected times stamps from among the plurality of time stamps corresponding to a machine failure data of the machine, wherein localizing the plurality of suspected time stamps comprises:
uniformly sampling (204a), the time-series data of each of the plurality of sensors over the plurality of timestamps to obtain a sampled time series data;
computing (204b), a first order derivative of the sampled time series data of the plurality of rows to obtain a first order time series data of each of the plurality of sensors;
identifying (204c), whether the first order time series data for each of the plurality of sensors follows a Gaussian distribution or a skewed distribution, wherein if the first order time series data is identified to be following the skewed distribution, a Box-Cox technique is applied on the first order time series data to convert the first order time series data having the skewed distribution to the Gaussian distribution;

computing (204d), a median and a standard deviation of the first order timeseries data, of each of the plurality of sensors, having the Gaussian distribution;
determining (204e), an anomaly threshold of the Gaussian distribution based on the median and the standard deviation for the first order time series data of each of the plurality of sensors; and
localizing (204f), a set of timestamps from among the plurality of timestamps of each of the plurality of sensors that lie below the anomaly threshold of the Gaussian distribution as the plurality of suspected time stamps for each of the plurality of sensors of the machine;
segregating (206), by the one or more hardware processors, the plurality of suspected time stamps as one of the machine failure data and an anomalous time series data based on a domain knowledge associated with known machine behavior, wherein majority voting-based ensemble is approach is applied among the plurality of sensors to identify the machine failure data, and wherein segregation is performed for the plurality of suspected time stamps that experience a zero-reset condition of one or more of the plurality of sensors corresponding to one or more cumulative parameters of the machine; and
validating (208) by the one or more hardware processors, one or more suspected time stamps, from among the plurality of suspected time stamps, segregated as the machine failure data using a validation criterion based on geo-temporal information associated with plurality of suspected time stamps.
2. The method as claimed in claim 1, wherein the one or more cumulative parameters of the machine, when machine is a vehicle engine, comprise fuel consumption, total running hour, total distance travelled that undergo the zero-reset during replacement of the machine post failure.

3. The method as claimed in claim 1, wherein the validation criteria validate
the identified machine failure data based on the geo-temporal information
by:
a) confirming whether at least one service center of the machine is
detected within a predefined radius from a location associated with a
time stamp when the machine experienced the zero-reset condition of
the one or more plurality of sensors associated with the one or more
cumulative parameters, wherein occurrence of a geographical anomaly
is noted on detecting absence of the at least one service center within the
predefined radius; and
b) computing and determining whether idle time of the machine, at the
location, after confirming presence of one or more service centers is
greater than a repair time threshold, wherein occurrence of a time
anomaly is noted if the idle time is equal to or less than the repair time
threshold,
wherein, simultaneous occurrence of the geographical anomaly and the temporal anomaly is used to validate the machine failure.
4. A system (100) comprising:
a memory (102) storing instructions;
one or more Input/Output (I/O) interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the
one or more I/O interfaces (106), wherein the one or more hardware
processors (104) are configured by the instructions to:
receive a time series data captured by a plurality of sensors associated with a machine among a plurality of machines, wherein the time series data of each of the plurality of sensors is unannotated and is arranged in a plurality of rows in accordance with a plurality of time stamps, wherein each of the plurality of rows represent the time series data of each of the plurality of sensors for a corresponding time stamp;

localize a plurality of suspected rows from among the plurality of rows, wherein the plurality of suspected rows correspond to a plurality of suspected times stamps from among the plurality of time stamps corresponding to a machine failure data of the machine, wherein localizing the plurality of suspected time stamps comprises:
uniformly sampling the time-series data of each of the plurality of sensors over the plurality of timestamps to obtain a sampled time series data;
computing a first order derivative of the sampled time series data of the plurality of rows to obtain a first order time series data of each of the plurality of sensors;
identifying whether the first order time series data for each of the plurality of sensors follows a Gaussian distribution or a skewed distribution, wherein if the first order time series data is identified to be following the skewed distribution, a Box-Cox technique is applied on the first order time series data to convert the first order time series data having the skewed distribution to the Gaussian distribution;
computing a median and a standard deviation of the first order timeseries data, of each of the plurality of sensors, having the Gaussian distribution;
determining an anomaly threshold of the Gaussian distribution based on the median and the standard deviation for the first order time series data of each of the plurality of sensors; and
localizing a set of timestamps from among the plurality of timestamps of each of the plurality of sensors that lie below the anomaly threshold of the Gaussian distribution as the plurality of suspected time stamps for each of the plurality of sensors of the machine;
segregate the plurality of suspected time stamps as one of the machine failure data and an anomalous time series data based on a domain

knowledge associated with known machine behavior, wherein majority voting-based ensemble is approach is applied among the plurality of sensors to identify the machine failure data, and wherein segregation is performed for the plurality of suspected time stamps that experience a zero-reset condition of one or more of the plurality of sensors corresponding to one or more cumulative parameters of the machine; and
validate one or more suspected time stamps, from among the plurality of suspected time stamps, segregated as the machine failure data using a validation criterion based on geo-temporal information associated with plurality of suspected time stamps.
5. The system as claimed in claim 4, wherein the one or more cumulative parameters of the machine, when machine is a vehicle engine, comprise fuel consumption, total running hour, total distance travelled that undergo the zero-reset during replacement of the machine post failure.
6. The system as claimed in claim 4, wherein the validation criteria validate the identified machine failure data based on the geo-temporal information by:
a) confirming whether at least one service center of the machine is
detected within a predefined radius from a location associated with a
time stamp when the machine experienced the zero-reset condition of
the one or more plurality of sensors associated with the one or more
cumulative parameters, wherein occurrence of a geographical anomaly
is noted on detecting absence of the at least one service center within the
predefined radius; and
b) computing and determining whether idle time of the machine, at the
location, after confirming presence of one or more service centers is
greater than a repair time threshold, wherein occurrence of a time
anomaly is noted if the idle time is equal to or less than the repair time
threshold,

wherein, simultaneous occurrence of the geographical anomaly and the temporal anomaly is used to validate the machine failure.

Documents

Application Documents

# Name Date
1 202221013457-STATEMENT OF UNDERTAKING (FORM 3) [11-03-2022(online)].pdf 2022-03-11
2 202221013457-REQUEST FOR EXAMINATION (FORM-18) [11-03-2022(online)].pdf 2022-03-11
3 202221013457-FORM 18 [11-03-2022(online)].pdf 2022-03-11
4 202221013457-FORM 1 [11-03-2022(online)].pdf 2022-03-11
5 202221013457-FIGURE OF ABSTRACT [11-03-2022(online)].jpg 2022-03-11
6 202221013457-DRAWINGS [11-03-2022(online)].pdf 2022-03-11
7 202221013457-DECLARATION OF INVENTORSHIP (FORM 5) [11-03-2022(online)].pdf 2022-03-11
8 202221013457-COMPLETE SPECIFICATION [11-03-2022(online)].pdf 2022-03-11
9 202221013457-FORM-26 [22-06-2022(online)].pdf 2022-06-22
10 Abstract1.jpg 2022-07-12
11 202221013457-Proof of Right [08-09-2022(online)].pdf 2022-09-08
12 202221013457-FER.pdf 2024-10-22
13 202221013457-FER_SER_REPLY [17-04-2025(online)].pdf 2025-04-17

Search Strategy

1 SearchHistoryE_18-10-2024.pdf