Sign In to Follow Application
View All Documents & Correspondence

System And Method For Determining Missing Value By Bidirectional Rnn And Fully Connected Neural Network

Abstract: There is a challenge in populating missing sensor data for better analytics prior to applying diagnostic techniques. This disclosure relates a method to determine missing value by bidirectional RNN and fully connected neural network. At least one data set with a time stamp is preprocessed to obtain at least one ordered dataset. At least one target sensor is identified from one or more sensors based on the at least one ordered dataset. A forward data set, a backward data set, a current data set, and a target data set is generated from a windowed data set to obtain complete final data set. The forward data set and the backward data set are fed through a forward RNN layer and a backward RNN layer to obtain set of forward activations and set of backward activations respectively. At least one missing value is determined based on one or more trained parameters. [To be published with FIG. 2]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
14 October 2021
Publication Number
16/2023
Publication Type
INA
Invention Field
ELECTRONICS
Status
Email
kcopatents@khaitanco.com
Parent Application

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th Floor, Nariman Point Mumbai Maharashtra India 400021

Inventors

1. DUTTA, Suvra
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
2. DAS, Abhisek
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
3. GHOSH, Shubhrangshu
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
4. MISRA, Prateep
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
5. CHATTOPADHYAY, Tanushyam
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
SYSTEM AND METHOD FOR DETERMINING MISSING VALUE BY BIDIRECTIONAL RNN AND FULLY CONNECTED NEURAL
NETWORK
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD [001] The disclosure herein generally relates to Internet of things (IOT), and, more particularly, to method and system for determining missing value by bidirectional recurrent neural network (RNN) and fully connected neural network.
BACKGROUND
[002] In health care and Internet of things (IoT) which involves logistics, utilities (e.g., Energy, oil, and gas), transportation, mining and metals, aviation and other manufacturing industrial sectors, a very common problem is to prepare data for executing a task with missing values in the input data. In a typical example, any machine part is normally connected with a plethora of sensors including location, temperature, speed, odometer, and telemetric data like different engine part sensors. In many use cases there was a requirement to track and analyze operating condition of a machine (or machine part) moving over time, like engine of a vehicle or flights. All these sensors play an important role in such analytics as it can be used for (a) prediction of a critical sensor values like milage (odometer reading), (b) monitor the operating condition of the machine, (c) track a route of the vehicle and recommend an optimal path, and (d) provide refueling, or maintenance alert. But in most of the cases, continuous records from the sensors are not available for applying analytics algorithm.
[003] For example, some engines log their odometer reading only during maintenance stage. Thus, the intermediate values are not available. Similarly, some input parameters were missed because of packet loss due to poor network (like internet) connectivity. In existing linear interpolation technique, the captured data was not uniformly sampled which results in a failure. Further, applying a physics-based model was achieved only to specific sensors and not generalize across most of the sensors. Also, often it is not possible by a domain expert to provide a suitable physics-based model when the system is inherently complex, and the number of sensors in magnitude of 10^3. Existing statistical machine learning methods requires a manual feature extraction and are too simplistic and they don’t capture complex pattern in time series data. Existing models are unidirectional, and they do

not consider patterns present in a future data. As a result, performance of the existing models was less. In a typical scenario, predicted odometer reading becomes greater than the actual odometer noted during servicing at a future date.
SUMMARY [004] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor implemented method of detecting missing value by ensembled bidirectional recurrent neural network (RNN) and fully connected neural network is provided. The processor implemented method includes at least one of: preprocessing, via one or more hardware processors, at least one data set with a time stamp to obtain an at least one ordered dataset; identifying, via the one or more hardware processors, at least one target sensor from one or more sensors for which at least one missing value is to be imputed based on the at least one ordered dataset; obtaining, via the one or more hardware processors, at least one training data set (X); generating, via the one or more hardware processors, a forward data set (X_forward), a backward data set (X_backward), a current data set (X_current), and a target data set (Y) from a windowed data set to obtain a complete final data set; feeding, via the one or more hardware processors, (i) the X_forward through a forward recurrent neural network (RNN) layer to obtain a set of forward activations (Aforward(t-1)), and (ii) the X_backward through a backward recurrent neural network (RNN) layer to obtain a set of backward activations (Abackward(t+1)); concatenating, via the one or more hardware processors, the set of forward activations (Aforward(t-1)), the set of backward activations (Abackward(t+1)), and a X_current (t) to obtain a concatenated value (Xst); determining, via the one or more hardware processors, a mean squared error loss to back propagate and train one or more parameters associated with one or more fully connected layer, the forward RNN layer, and the backward RNN layer; determining, via the one or more hardware processors, at least one missing value based on the one or more the trained parameters. In an embodiment, each row in the at least one ordered dataset represents an observation

from the one or more sensors at the time stamp. In an embodiment, the at least one training data set comprise at least one data associated with the at least one target sensor and at least one dependency sensor. In an embodiment, at least one row with one or more null values from the at least one training data set is removed. In an embodiment, an order of the data is reversed for each window. In an embodiment, the one or more fully connected layer corresponds to a fully connected layer A, and a fully connected layer B.
[005] In an embodiment, the time stamp of the observation of the at least one sensor is represented as t0, t1, …, tm. In an embodiment, the m corresponds to number of telemetry data points. In an embodiment, every column of the at least one ordered dataset with a distinct sensor value is considered if at least one missing value to be imputed for the one or more sensors. In an embodiment, the at least one target sensor is identified by selecting at least one column from the at least one ordered dataset. In an embodiment, remaining column from the at least one ordered dataset corresponds to the at least one dependency sensor. In an embodiment, the windowed data set is obtained by applying a windowing technique on the at least one training data set with a fixed window size computed based on a sampling rate of the at least one data. In an embodiment, the current data set (X_current) is obtained by removing a target column for each row, and the target dataset (Y) with the target column.
[006] In another aspect, there is provided a system for detection of missing value by ensembled bidirectional recurrent neural network (RNN) and fully connected neural network. The system includes a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: preprocess, at least one data set with a time stamp to obtain an at least one ordered dataset; identify, at least one target sensor from one or more sensors for which at least one missing value is to be imputed based on the at least one ordered dataset; obtain, at least one training data set (X); generate, a forward data set (X_forward), a backward data set (X_backward), a current data set (X_current), and a target data set (Y) from a windowed

data set to obtain a complete final data set; feed, (i) the X_forward through a forward recurrent neural network (RNN) layer to obtain a set of forward activations (Aforward(t-1)), and (ii) the X_backward through a backward recurrent neural network (RNN) layer to obtain a set of backward activations (Abackward(t+1)); concatenate, the set of forward activations (Aforward(t-1 )), the set of backward activations (Abackward(t+1)), and a X_current (t) to obtain a concatenated value (Xst); determine, a mean squared error loss to back propagate and train one or more parameters associated with one or more fully connected layer, the forward RNN layer, and the backward RNN layer; determine, at least one missing value based on the one or more the trained parameters. In an embodiment, each row in the at least one ordered dataset represents an observation from one or more sensors at the time stamp. In an embodiment, the at least one training data set comprise at least one data associated with the at least one target sensor and at least one dependency sensor. In an embodiment, at least one row with one or more null values from the at least one training data set is removed. In an embodiment, an order of the data is reversed for each window. In an embodiment, the one or more fully connected layer corresponds to a fully connected layer A, and a fully connected layer B.
[007] In an embodiment, the time stamp of the observation of the at least one sensor is represented as t0, t1, …, tm. In an embodiment, the m corresponds to number of telemetry data points. In an embodiment, every column of the at least one ordered dataset with a distinct sensor value is considered if at least one missing value to be imputed for the one or more sensors. In an embodiment, the at least one target sensor is identified by selecting at least one column from the at least one ordered dataset. In an embodiment, remaining column from the at least one ordered dataset corresponds to the at least one dependency sensor. In an embodiment, the windowed data set is obtained by applying a windowing technique on the at least one training data set with a fixed window size computed based on a sampling rate of the at least one data. In an embodiment, the current data set (X_current) is obtained by removing a target column for each row, and the target dataset (Y) with the target column.

[008] In yet another aspect, there are provided one or more non-transitory machine readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors causes at least one of: preprocessing, at least one data set with a time stamp to obtain an at least one ordered dataset; identifying, at least one target sensor from one or more sensors for which at least one missing value is to be imputed based on the at least one ordered dataset; obtaining, at least one training data set (X); generating, a forward data set (X_forward), a backward data set (X_backward), a current data set (X_current), and a target data set (Y) from a windowed data set to obtain a complete final data set; feeding, (i) the X_forward through a forward recurrent neural network (RNN) layer to obtain a set of forward activations (Aforward(t-1 )), and (ii) the X_backward through a backward recurrent neural network (RNN) layer to obtain a set of backward activations (Abackward(t+1)); concatenating, the set of forward activations (Aforward(t-1)), the set of backward activations (Abackward(t+1)), and a X_current (t) to obtain a concatenated value (Xst); determining, a mean squared error loss to back propagate and train one or more parameters associated with one or more fully connected layer, the forward RNN layer, and the backward RNN layer; determining, at least one missing value based on the one or more the trained parameters. In an embodiment, each row in the at least one ordered dataset represents an observation from the one or more sensors at the time stamp. In an embodiment, the at least one training data set comprise at least one data associated with the at least one target sensor and at least one dependency sensor. In an embodiment, at least one row with one or more null values from the at least one training data set is removed. In an embodiment, an order of the data is reversed for each window. In an embodiment, the one or more fully connected layer corresponds to a fully connected layer A, and a fully connected layer B.
[009] In an embodiment, the time stamp of the observation of the at least one sensor is represented as t0, t1, …, tm. In an embodiment, the m corresponds to number of telemetry data points. In an embodiment, every column of the at least one ordered dataset with a distinct sensor value is considered if at least one missing value to be imputed for the one or more sensors. In an embodiment, the at least one

target sensor is identified by selecting at least one column from the at least one ordered dataset. In an embodiment, remaining column from the at least one ordered dataset corresponds to the at least one dependency sensor. In an embodiment, the windowed data set is obtained by applying a windowing technique on the at least one training data set with a fixed window size computed based on a sampling rate of the at least one data. In an embodiment, the current data set (X_current) is obtained by removing a target column for each row, and the target dataset (Y) with the target column.
[010] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[011] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[012] FIG. 1 illustrates a system for detection of missing value by ensembled bidirectional recurrent neural network (RNN) and fully connected neural network, according to some embodiments of the present disclosure.
[013] FIG. 2 is an exemplary functional block diagram of the system of FIG. 1, according to some embodiments of the present disclosure.
[014] FIGS. 3A and 3B are exemplary flow diagrams illustrating a method of detecting the missing value by the ensembled bidirectional recurrent neural network (RNN) and the fully connected neural network, according to some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
[015] Exemplary embodiments are described with reference to the
accompanying drawings. In the figures, the left-most digit(s) of a reference number
identifies the figure in which the reference number first appears. Wherever
convenient, the same reference numbers are used throughout the drawings to refer

to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[016] There is a requirement to populate missing sensor data for better analytics prior to applying predictive/diagnostic/prognostic techniques. Embodiments of the present disclosure provide a method and system for detection of a missing value by an ensembled bidirectional recurrent neural network (RNN) and a fully connected neural network. Applying the bidirectional RNN along with two fully connected layers where a part of an input of the fully connected neural layers comes from the Bi-RNN and other part comes from one or more sensors directly.
[017] Referring now to the drawings, and more particularly to FIGS. 1 through 3B, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[018] FIG. 1 illustrates a system for detection of missing value by ensembled bidirectional recurrent neural network (RNN) and the fully connected neural network, according to some embodiments of the present disclosure. In an embodiment, the system 100 includes one or more processor(s) 102, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 104 operatively coupled to the one or more processors 102. The memory 104 includes a database. The one or more processor(s) processor 102, the memory 104, and the I/O interface(s) 106 may be coupled by a system bus such as a system bus 108 or a similar mechanism. The system 100 is further connected to a radar and antenna unit (Not Shown in figure) via the I/O interface(s) 106. The one or more processor(s) 102 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more processor(s) 102 are

configured to fetch and execute computer-readable instructions stored in the memory 104. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud, and the like.
[019] The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface device(s) 106 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a camera device, and a printer. Further, the I/O interface device(s) 106 may enable the system 100 to communicate with other devices, such as web servers and external databases. The I/O interface device(s) 106 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. In an embodiment, the I/O interface device(s) 106 can include one or more ports for connecting number of devices to one another or to another server.
[020] The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 104 includes a plurality of modules 110 and a repository 112 for storing data processed, received, and generated by the plurality of modules 110. The plurality of modules 110 may include routines, programs, objects, components, data structures, and so on, which perform particular tasks or implement particular abstract data types.
[021] Further, the database stores information pertaining to inputs fed to the system 100 and/or outputs generated by the system (e.g., data/output generated at each stage of the data processing) 100, specific to the methodology described

herein. More specifically, the database stores information being processed at each step of the proposed methodology.
[022] Additionally, the plurality of modules 110 may include programs or coded instructions that supplement applications and functions of the system 100. The repository 112, amongst other things, includes a system database 114 and other data 116. The other data 116 may include data generated as a result of the execution of one or more modules in the plurality of modules 110. Further, the database stores information pertaining to inputs fed to the system 100 and/or outputs generated by the system (e.g., at each stage), specific to the methodology described herein. Herein, the memory for example the memory 104 and the computer program code configured to, with the hardware processor for example the processor 102, causes the system 100 to perform various functions described herein under.
[023] FIG. 2 is an exemplary functional block diagram 200 of the system of FIG. 1, according to some embodiments of the present disclosure. The system 200 may be an example of the system 100 (FIG. 1). In an embodiment, the fully connected neural network is alternatively referred as the one or more fully connected layers. The one or more fully connected layers correspond to a fully connected layer A, and a fully connected layer B. At least one data set with a time stamp is preprocessed to obtain an at least one ordered dataset. Each row in the at least one dataset represents an observation from one or more sensors at the time stamp. For example, the time stamp of the observation of the at least one sensor is represented as t0, t1, …, tm. The ‘m’ corresponds to number of telemetry data points. At least one target sensor is identified from the one or more sensors for which at least one missing value is to be determined based on the at least one ordered dataset. The at least one target sensor is identified by selecting at least one column from the at least one ordered dataset, and remaining column from the at least one ordered dataset corresponds to the at least one dependency sensor. In an embodiment, every column with a distinct sensor value is considered if at least one missing value is to be imputed for the one or more sensors. At least one training data set with at least one dependency sensor and the at least one target sensor is obtained. The at least one training data set includes at least one data associated with the at least one target

sensor and the at least one dependency sensor. In an embodiment, at least one row with one or more null values from the at least one training data set is removed.
[024] A windowed data set is obtained by applying a windowing technique on the at least one training data set with a fixed window size which is computed based on a sampling rate of the at least one data. The window size can be computed as n*sampling rate. For example, if the sampling rate in 1 HZ i.e., one sample per second, then the window size would be n. The number n should be selected to be represented as 2^k where k is any positive integer greater than 2. A forward data set (X_ forward) and a backward data set (X_backward) is generated from the windowed data set to obtain a complete final data set. In an embodiment, an order of the data is reversed for each window. A current data set (X_current) is generated by removing a target column for each row, and a target dataset (Y) with the target column. The complete final data set is a tuple of four.
Complete final data set = (X_forward, X_backward, X_current, Y)
[025] The X_forward is n+1 tuple entry with values of n dependency sensor and a target (T) that is to be predicted/imputed. Thus, represented as a matrix of size m x (n+1).
������� = ( 1, � 2, …..� �)
[026] Similarly, the X_backward is also a n+1 tuple with values of n dependency sensor and the target (T) with a matrix of size m x (n+1).
�_ �������� = (�1, �2,…..��, �)
[027] The X_current is a tuple of n sensors utilized to populate the missing value. The X_current is without including the target column and hence a matrix of size m x n.
X_ current = (S1, S2,…..Sn )
[028] Similarly, the Y is a target value. Thus, represented as a matrix of size m x1.
[029] The X_forward is fed through a forward recurrent neural network (RNN) layer to obtain a set of forward activations (Aforward(t-1)).
A forward (t-1)
= tanh(X_forward(t-1).WxhX_forward + bxh_forward
+ h_forward(t-1). Whh_ forward + bhh_forward)

[030] The X_backward is fed through a backward recurrent neural network (RNN) layer to obtain a set of backward activations (Abackward(t+1)).
A backward (t+1)
= tanh(X_ bacjward(t+1).Wxh _ bacjward + bxh _ bacjward + h_ bacjward (t+1).Whh_ _bacjward + bhh _ bacjward ) [031] The set of forward activations (Aforward(t-1)), the set of backward activations (Abackward(t+1)), and a X_current (t) is concatenated to obtain a concatenated value (Xst). The X_current (t) corresponds to time stamps (t) of the current data set
X_current.
Xst = A forward(t-1) + A bacjward (t+1)+X_ current (t)) [032] X(t) does not have direct access to Yt. Accordingly, the Xt is fed into the two fully connected layer to receive Y^t i.e., a predicted value of the target column at time stamps (t). The X(t) corresponds to time stamps (t) of the training data set (X). Yt corresponds to the time stamps (t) of the target data set (Y).
Y∧t = (RELU (W1.X(t) + b)).W2+b
Where RELU corresponds to a rectified linear unit.
Where ‘b’ corresponds to a bias.
[033] A mean squared error loss is determined to back propagate, and train one or more parameters associated with one or more fully connected layer, the forward RNN layer, and the backward RNN layer. In an embodiment, the one or more parameters may correspond to a deep learning trainable parameters e.g., weightages (W, W1, W2) for rows and columns.
Lmse = MSE(Y - Y∧)
Where MSE corresponds to a Mean Squared Error.
[034] The at least one missing value is determined based on the one or more trained parameters.
Y missing (t) = P (Y∧t | Y (t-1),Y(t -1) .......,Y(t+1), Y(t+2)........, X current(t))
Where P is a probability.
[035] FIGS. 3 A and 3B are exemplary flow diagrams illustrating a method of detecting the missing value by the ensembled bidirectional recurrent neural

network (RNN) and the fully connected neural network, according to some embodiments of the present disclosure. In an embodiment, the system 100 comprises one or more data storage devices or the memory 104 operatively coupled to the one or more hardware processors 102 and is configured to store instructions for execution of steps of the method by the one or more processors 102. The flow diagram depicted is better understood by way of following explanation/description. The steps of the method of the present disclosure will now be explained with reference to the components of the system as depicted in FIGS. 1 and 2.
[036] At step 302, at least one data set with a time stamp is preprocessed to obtain an at least one ordered dataset. Each row in the at least one ordered dataset represents an observation from one or more sensors at the time stamp. In an embodiment, the time stamp of the observation of the at least one sensor is represented as t0, t1, …, tm. In an embodiment, the ‘m’ corresponds to number of telemetry data points. At step 304, at least one target sensor is identified from the one or more sensors for which at least one missing value is to be imputed based on the at least one ordered dataset. In an embodiment, every column of the at least one ordered dataset with the distinct sensor value is considered if at least one missing value to be imputed for the one or more sensors. At step 306, at least one training data set (X) is obtained. The at least one training data set include at least one data associated with the at least one target sensor and at least one dependency sensor. In an embodiment, at least one row with one or more null values from the at least one training data set is removed. In an embodiment, the at least one target sensor is identified by selecting at least one column from the at least one ordered dataset. In an embodiment, remaining column from the at least one ordered dataset corresponds to the at least one dependency sensor.
[037] At step 308, a forward data set (X_forward), a backward data set (X_backward), a current data set (X_current), and a target data set (Y) are generated from a windowed data set to obtain a complete final data set. An order of the data is reversed for each window. In an embodiment, the windowed data set is obtained by applying a windowing technique on the at least one training data set with a fixed window size computed based on a sampling rate of the at least one data. In an

embodiment, the current data set (X_current) is obtained by removing a target column for each row, and the target dataset (Y) with the target column. At step 310, the X_forward is fed through a forward recurrent neural network (RNN) layer to obtain a set of forward activations (Aforward(t-1)). Similarly, the X_backward is fed through a backward recurrent neural network (RNN) layer to obtain a set of backward activations (Abackward(t+1)). At step 312, the set of forward activations (Aforward(t-1)), the set of backward activations (Abackward(t+1)), and a X_current (t) is concatenated to obtain a concatenated value (Xst). At step 314, a mean squared error loss is determined to back propagate and train one or more parameters associated with one or more fully connected layer, the forward RNN layer, and the backward RNN layer. The one or more fully connected layer corresponds to a fully connected layer A, and a fully connected layer B. At step 316, determining, via the one or more hardware processors, at least one missing value is determined based on the one or more trained parameters.
[038] Experimental results:
[039] For example, a study is conducted on a data set. A statistics of the data set is presented in a tabular format below at Table 1:

Total number of time series from different machines 100000
Total number of Sensor used 14
Total number of sensors having at least one missing value 13
Number of layers in the D3G 4
Total number of time instances 25 million
Percentage of missing entries that were populated 17.3%
Table 1
[040] The embodiments of present disclosure herein address unresolved problem of preparing data for executing a task with missing values in an input data. The embodiment of the present disclosure provides a missing value imputation by ensembled bidirectional recurrent neural network (RNN) and fully connected neural network. The embodiment of the present disclosure proposes a single method that

takes care of both co-related and un corelated features. For co-related features, use of bidirectional RNN results in a better accuracy. The claimed method does not have any constraint (e.g., a gate variable beta(t) as a result z(t) (feature-based estimation) and x(t) (history-based estimation)) to have the same dimension. There is no need of manual feature extraction since the system 100 utilizes the deep learning-based approach. The embodiment of the present disclosure in which the system 100 monitors a data pattern of past and future data points. The system 100 utilizes a two layer fully connected network and the bidirectional RNN/GRU layer to consider temporal pattern as well as patterns present in other features in the current time stamp. The claimed system 100 is a self-supervised. One or more intermediate values is populated using Bi-directional recursive neural network and use of a synthetic data for subsequent missing value population/prediction. Applying bidirectional RNN along with two fully connected layer where a part of the input of the fully connected layers comes from the BiRNN and the other part comes from sensors directly. For missing value imputation, the claimed method can capture linear dependency and nonlinear dependency.
[041] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[042] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means

like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
[043] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[044] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be

noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[045] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[046] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor implemented method (300), comprising:
preprocessing, via one or more hardware processors, at least one data set with a time stamp to obtain at least one ordered dataset, wherein each row in the at least one ordered dataset represents an observation from a plurality of sensors at the time stamp (302);
identifying, via the one or more hardware processors, at least one target sensor from the plurality of sensors for which at least one missing value is to be imputed based on the at least one ordered dataset (304);
obtaining, via the one or more hardware processors, at least one training data set (X), wherein the at least one training data set comprises at least one data associated with the at least one target sensor and at least one dependency sensor, and wherein at least one row with a plurality of null values from the at least one training data set is removed (306);
generating, via the one or more hardware processors, a forward data set (X_forward), a backward data set (X_backward), a current data set (X_current), and a target data set (Y) from a windowed data set to obtain a complete final data set, wherein an order of the data is reversed for each window (308); feeding, via the one or more hardware processors, (i) the X_forward through a forward recurrent neural network (RNN) layer to obtain a set of forward activations (Aforward(t-1 )), and (ii) the X_backward through a backward recurrent neural network (RNN) layer to obtain a set of backward activations (Abackward(t+1)) (310);
concatenating, via the one or more hardware processors, the set of forward activations (Aforward(t-1 )), the set of backward activations (Abackward(t+1)), and a X_current (t) to obtain a concatenated value (Xst) (312);
determining, via the one or more hardware processors, a mean squared error loss to back propagate and train a plurality of parameters associated with a plurality of fully connected layer, the forward RNN layer, and the backward RNN layer, wherein the plurality of fully connected layer

corresponds to a fully connected layer A, and a fully connected layer B (314); and
determining, via the one or more hardware processors, at least one missing value based on the plurality of the trained parameters (316).
2. The processor implemented method (300) as claimed in claim 1, wherein the time stamp of the observation of the at least one sensor is represented as t0, t1, …, tm, and wherein m corresponds to number of telemetry data points.
3. The processor implemented method (300) as claimed in claim 1, wherein every column of the at least one ordered dataset with a distinct sensor value is considered if at least one missing value to be imputed for the plurality of sensors.
4. The processor implemented method (300) as claimed in claim 1, wherein the at least one target sensor is identified by selecting at least one column from the at least one ordered dataset, and wherein remaining column from the at least one ordered dataset corresponds to the at least one dependency sensor.
5. The processor implemented method (300) as claimed in claim 1, wherein the windowed data set is obtained by applying a windowing technique on the at least one training data set with a fixed window size computed based on a sampling rate of the at least one data.
6. The processor implemented method (300) as claimed in claim 1, wherein the current data set (X_current) is obtained by removing a target column for each row, and the target dataset (Y) with the target column.
7. A system (100), comprising:
a memory (104) storing instructions;

one or more communication interfaces (106); and
one or more hardware processors (102) coupled to the memory (104) via the one or more communication interfaces (106), wherein the one or more hardware processors (102) are configured by the instructions to:
preprocess, at least one data set with a time stamp to obtain at least one ordered dataset, wherein each row in the at least one ordered dataset represents an observation from a plurality of sensors at the time stamp;
identify, at least one target sensor from the plurality of sensors for which at least one missing value is to be imputed based on the at least one ordered dataset;
obtain, at least one training data set, wherein the at least one training data set (X) comprises at least one data associated with the at least one target sensor and at least one dependency sensor, and wherein at least one row with a plurality of null values from the at least one training data set is removed;
generate, a forward data set (X_forward), a backward data set (X_backward), a current data set (X_current), and a target data set (Y) from a windowed data set to obtain a complete final data set, wherein an order of the data is reversed for each window;
feed, (i) the X_forward through a forward recurrent neural network (RNN) layer to obtain a set of forward activations (Aforward(t-1)), and (ii) the X_backward through a backward recurrent neural network (RNN) layer to obtain a set of backward activations (Abackward(t+1));
concatenate, the set of forward activations (Aforward(t-1)), the set of backward activations (Abackward(t+1)), and a X_current (t) to obtain a concatenated value (Xst);
determine, a mean squared error loss to back propagate and train a plurality of parameters associated with a plurality of fully connected layer, the forward RNN layer, and the backward RNN layer, and wherein the plurality of fully connected layer corresponds to a fully connected layer A, and a fully connected layer B; and

determine, at least one missing value based on the plurality of the trained parameters.
8. The system (100) as claimed in claim 7, wherein the time stamp of the observation of the at least one sensor is represented as t0, t1, …, tm, and wherein m corresponds to number of telemetry data points.
9. The system (100) as claimed in claim 7, wherein every column of the at least one ordered dataset with a distinct sensor value is considered if at least one missing value to be imputed for the plurality of sensors.
10. The system (100) as claimed in claim 7, wherein the at least one target sensor is identified by selecting at least one column from the at least one ordered dataset, and wherein remaining column from the at least one ordered dataset corresponds to the at least one dependency sensor.
11. The system (100) as claimed in claim 7, wherein the windowed data set is obtained by applying a windowing technique on the at least one training data set with a fixed window size computed based on a sampling rate of the at least one data.
12. The system (100) as claimed in claim 7, wherein the current data set (X_current) is obtained by removing a target column for each row, and the target dataset (Y) with the target column.

Documents

Application Documents

# Name Date
1 202121046882-STATEMENT OF UNDERTAKING (FORM 3) [14-10-2021(online)].pdf 2021-10-14
2 202121046882-REQUEST FOR EXAMINATION (FORM-18) [14-10-2021(online)].pdf 2021-10-14
3 202121046882-PROOF OF RIGHT [14-10-2021(online)].pdf 2021-10-14
4 202121046882-FORM 18 [14-10-2021(online)].pdf 2021-10-14
5 202121046882-FORM 1 [14-10-2021(online)].pdf 2021-10-14
6 202121046882-DRAWINGS [14-10-2021(online)].pdf 2021-10-14
6 202121046882-FIGURE OF ABSTRACT [14-10-2021(online)].jpg 2021-10-14
7 202121046882-DRAWINGS [14-10-2021(online)].pdf 2021-10-14
8 202121046882-DECLARATION OF INVENTORSHIP (FORM 5) [14-10-2021(online)].pdf 2021-10-14
9 202121046882-COMPLETE SPECIFICATION [14-10-2021(online)].pdf 2021-10-14
10 Abstract1.jpg 2021-12-22
11 202121046882-FORM-26 [20-04-2022(online)].pdf 2022-04-20
12 202121046882-FER.pdf 2024-12-04
13 202121046882-FER_SER_REPLY [09-05-2025(online)].pdf 2025-05-09

Search Strategy

1 SearchHistoryE_03-12-2024.pdf