Sparse Neural Network Based Anomaly Detection In Multi Dimensional Time Series
Abstract:
Anomaly detection from time series is one of the key components in automated monitoring of one or more entities. Domain-driven sensor selection for anomaly detection is restricted by knowledge of important sensors to capture only a certain set of anomalies from the entire set of possible anomalies. Hence, existing anomaly detection approaches are not very effective for multi-dimensional time series. Embodiments of the present disclosure depict sparse neural network for anomaly detection in multi-dimensional time series (MDTS) corresponding to a plurality of parameters of entities. A reduced-dimensional time series is obtained from the MDTS via an at least one feedforward layer by using a dimensionality reduction model. The dimensionality reduction model and recurrent neural network (RNN) encoder-decoder model are simultaneously learned to obtain a multi-layered sparse neural network. A plurality of error vectors corresponding to at least one time instance of the MDTS is computed to obtain. An anomaly score.
Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence
Claims:1. A processor implemented method, comprising:
receiving, at an input layer, a multi-dimensional time series corresponding to a plurality of parameters of an entity (202);
obtaining, using a dimensionality reduction model, a reduced-dimensional time series from the multi-dimensional time series via an at least one feedforward layer, wherein connections between the input layer and the feedforward layer are sparse to access at least a portion of the plurality of parameters (204);
estimating, by using a recurrent neural network (RNN) encoder-decoder model, the multi-dimensional time series using the reduced-dimensional time series obtained by the dimensionality reduction model (206);
simultaneously learning, by using the estimated multi-dimensional time series, the dimensionality reduction model and the RNN encoder-decoder model to obtain a multi-layered sparse neural network (208);
computing, by using the multi-layered sparse neural network, a plurality of error vectors corresponding to at least one time instance of the multi-dimensional time series by performing a comparison of the multi-dimensional time series and the estimated multi-dimensional time series (210); and
generating at least one anomaly score based on the plurality of the error vectors (212).
2. The processor implemented method of claim 1, wherein each of the plurality of parameters in the reduced-dimensional time series is a non-linear function of a subset of the plurality of parameters of the multi-dimensional time series.
3. The processor implemented method of claim 1, wherein the dimensionality reduction model comprises a plurality of feedforward layers with Least Absolute Shrinkage and Selection Operator (LASSO) sparsity constraint on plurality of parameters of the feedforward layers.
4. The processor implemented method of claim 1, further comprising:
(a) classifying at least one time instance in the multi-dimensional time series as anomalous if the anomaly score is greater than a threshold, or
(b) classifying at least one time instance in the multi-dimensional time series as normal if the anomaly score is less than or equal to the threshold.
5. The processor implemented method of claim 4, wherein the threshold is learned based on a hold-out validation set while maximizing F-score, wherein the hold-out validation set comprises at least one normal time instance and at least one anomalous time instance of the multi-dimensional time series.
6. A system comprising:
a memory (102) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
receive, at an input layer, a multi-dimensional time series corresponding to a plurality of parameters of an entity;
obtain, using a dimensionality reduction model, a reduced-dimensional time series from the multi-dimensional time series via an at least one feedforward layer, wherein connections between the input layer and the feedforward layer are sparse to access at least a portion of the plurality of parameters;
estimate, by using a recurrent neural network (RNN) encoder-decoder model, the multi-dimensional time series using the reduced-dimensional time series obtained by the dimensionality reduction model;
simultaneously learn, by using the estimated multi-dimensional time series, the dimensionality reduction model and the RNN encoder-decoder model to obtain a multi-layered sparse neural network;
compute, by using the multi-layered sparse neural network, a plurality of error vectors corresponding to at least one time instance of the multi-dimensional time series by performing a comparison of the multi-dimensional time series and the estimated multi-dimensional time series; and
generate at least one anomaly score based on the plurality of the error vectors.
7. The system of claim 6, wherein each of the plurality of parameters in the reduced-dimensional time series is a non-linear function of a subset of the plurality of parameters of the multi-dimensional time series.
8. The system of claim 6, wherein the dimensionality reduction model comprises a plurality of feedforward layers with Least Absolute Shrinkage and Selection Operator (LASSO) sparsity constraint on plurality of parameters of the feedforward layers.
9. The system of claim 6, wherein the one or more hardware processors are further configured to:
(a) classify at least one time instance in the multi-dimensional time series as anomalous if the anomaly score is greater than a threshold, or
(b) classify at least one time instance in the multi-dimensional time series as normal if the anomaly score is less than or equal to the threshold.
10. The system of claim 9, wherein the threshold is learned based on a hold-out validation set while maximizing F-score, wherein the hold-out validation set comprises at least one normal time instance and at least one anomalous time instance of the multi-dimensional time series. , Description:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
SPARSE NEURAL NETWORK BASED ANOMALY DETECTION IN MULTI-DIMENSIONAL TIME SERIES
Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The disclosure herein generally relates to time series analysis, and, more particularly, to systems and methods for anomaly detection in multi-dimensional time series based on Sparse neural network.
BACKGROUND
In the current Digital Era, streaming data is ubiquitous and growing at a rapid pace, enabling automated monitoring of systems, e.g. using Industrial Internet of Things with large number of sensors capturing the operational behavior of an equipment. Complex industrial systems such as engines, turbines, aircrafts, etc., are typically instrumented with a large number (tens or even hundreds) of sensors resulting in multi-dimensional streaming data. There is a growing interest among original equipment manufacturers (OEMs) to leverage this data to provide remote health monitoring services and help field engineers take informed decisions.
Anomaly detection from time series is one of the key components in building any health monitoring system. For example, detecting early symptoms of an impending fault in a machine in form of anomalies can help take corrective measures to avoid the fault or reduce maintenance cost and machine downtime. Recently, Recurrent Neural Networks (RNNs) have found extensive applications for anomaly detection in multivariate time series by building a model of normal behavior of complex systems from multi-sensor data, and then flagging deviations from the learned normal behavior as anomalies. Consequently, the notion of finding meaningful anomalies becomes substantially more complex in multi-dimensional data.
Domain-driven sensor selection for anomaly detection using RNNs is restricted by the knowledge of important sensors to capture a given set of anomalies, and would therefore miss other types of anomalous signatures in any sensor not included in the set of relevant sensors. Similarly, approaches considering each sensor or a subset of sensors independently to handle such scenarios may not be appropriate given that: a) it leads to loss of useful sensor-dependency information, and b) when the number of sensors is large, building and deploying a separate RNN model for each sensor may be impractical and computationally infeasible. However, existing anomaly detection approaches are not very effective for multi-dimensional time series.
SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one aspect, a processor implemented method for detecting anomaly in multi-dimensional time series based on sparse neural network is provided. The method comprises receiving, at an input layer, a multi-dimensional time series corresponding to a plurality of parameters of an entity; obtaining, using a dimensionality reduction model, a reduced-dimensional time series from the multi-dimensional time series via an at least one feedforward layer, wherein connections between the input layer and the feedforward layer are sparse to access at least a portion of the plurality of parameters; estimating, by using a recurrent neural network (RNN) encoder-decoder model, the multi-dimensional time series using the reduced-dimensional time series obtained by the dimensionality reduction model; simultaneously learning, by using the estimated multi-dimensional time series, the dimensionality reduction model and the RNN encoder-decoder model to obtain a multi-layered sparse neural network; computing, by using the multi-layered sparse neural network, a plurality of error vectors corresponding to at least one time instance of the multi-dimensional time series by performing a comparison of the multi-dimensional time series and the estimated multi-dimensional time series; and generating at least one anomaly score based on the plurality of the error vectors.
In an embodiment, each of the plurality of parameters in the reduced-dimensional time series is a non-linear function of a subset of the plurality of parameters of the multi-dimensional time series. The dimensionality reduction model includes a plurality of feedforward layers with Least Absolute Shrinkage and Selection Operator (LASSO) sparsity constraint on plurality of parameters of the feedforward layers. The method may further comprise classifying at least one time instance in the multi-dimensional time series as anomalous if the anomaly score is greater than a threshold (e.g., a dynamic threshold). The method may further comprise classifying at least one time instance in the multi-dimensional time series as normal if the anomaly score is less than or equal to the threshold. The threshold may be learned based on a hold-out validation set while maximizing F-score. The hold-out validation set comprises at least one normal time instance and at least one anomalous time instance of the multi-dimensional time series.
In another aspect, there is provided a processor implemented system for detecting anomaly in multi-dimensional time series based on sparse neural network. The system comprises: a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: receive, at an input layer, a multi-dimensional time series corresponding to a plurality of parameters of an entity; obtain, using a dimensionality reduction model, a reduced-dimensional time series from the multi-dimensional time series via an at least one feedforward layer, wherein connections between the input layer and the feedforward layer are sparse to access at least a portion of the plurality of parameters; estimate, by using a recurrent neural network (RNN) encoder-decoder model, the multi-dimensional time series using the reduced-dimensional time series obtained by the dimensionality reduction model; simultaneously learn, by using the estimated multi-dimensional time series, the dimensionality reduction model and the RNN encoder-decoder model to obtain a multi-layered sparse neural network; compute, by using the multi-layered sparse neural network, a plurality of error vectors corresponding to at least one time instance of the multi-dimensional time series by performing a comparison of the multi-dimensional time series and the estimated multi-dimensional time series; and generate at least one anomaly score based on the plurality of the error vectors.
In an embodiment, each of the plurality of parameters in the reduced-dimensional time series is a non-linear function of a subset of the plurality of parameters of the multi-dimensional time series. In an embodiment, the dimensionality reduction model includes a plurality of feedforward layers with Least Absolute Shrinkage and Selection Operator (LASSO) sparsity constraint on plurality of parameters of the feedforward layers. In an embodiment, the one or more hardware processors are further configured to: classify at least one time instance in the multi-dimensional time series as anomalous if the anomaly score is greater than a threshold (e.g., a dynamic threshold) and classify at least one time instance in the multi-dimensional time series as normal if the anomaly score is less than or equal to the threshold. The threshold may be learned based on a hold-out validation set while maximizing for F-score. The hold-out validation set may comprise at least one normal time instance and at least one anomalous time instance of the multi-dimensional time series.
In yet another aspect, there are provided one or more non-transitory machine readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors causes receiving, at an input layer, a multi-dimensional time series corresponding to a plurality of parameters of an entity; obtaining, using a dimensionality reduction model, a reduced-dimensional time series from the multi-dimensional time series via an at least one feedforward layer, wherein connections between the input layer and the feedforward layer are sparse to access at least a portion of the plurality of parameters; estimating, by using a recurrent neural network (RNN) encoder-decoder model, the multi-dimensional time series using the reduced-dimensional time series obtained by the dimensionality reduction model; simultaneously learning, by using the estimated multi-dimensional time series, the dimensionality reduction model and the RNN encoder-decoder model to obtain a multi-layered sparse neural network; computing, by using the multi-layered sparse neural network, a plurality of error vectors corresponding to at least one time instance of the multi-dimensional time series by performing a comparison of the multi-dimensional time series and the estimated multi-dimensional time series; and generating at least one anomaly score based on the plurality of the error vectors.
In an embodiment, the instructions when executed by the one or more hardware processors may further cause each of the plurality of parameters in the reduced-dimensional time series to be a non-linear function of a subset of the plurality of parameters of the multi-dimensional time series. The dimensionality reduction model includes a plurality of feedforward layers with Least Absolute Shrinkage and Selection Operator (LASSO) sparsity constraint on plurality of parameters of the feedforward layers. The method may further comprise classifying at least one time instance in the multi-dimensional time series as anomalous if the anomaly score is greater than a threshold (e.g., a dynamic threshold). The method may further comprise classifying at least one time instance in the multi-dimensional time series as normal if the anomaly score is less than or equal to the threshold. The threshold (e.g., a dynamic threshold) may be learned based on a hold-out validation set while maximizing for F-score. The hold-out validation set may comprise at least one normal time instance and at least one anomalous time instance of the multi-dimensional time series.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles.
FIG. 1 illustrates an exemplary block diagram of a system for detecting anomaly in multi-dimensional time series based on sparse neural network in accordance with an embodiment of the present disclosure.
FIG. 2 illustrates an exemplary flow diagram illustrating a method for detecting anomaly in multi-dimensional time series based on sparse neural network using the system of FIG. 1 according to an embodiment of the present disclosure.
FIG. 3A depicts a Standard Recurrent Neural Network (RNN) Encoder-Decoder.
FIG. 3B depicts a Sparse Neural Network based anomaly detection as implemented by the system 100 of FIG. 1 in accordance with some embodiments of the present disclosure.
FIG. 3C depicts a comparison between the Standard RNN Encoder-Decoder and the Sparse Neural Network in accordance with some embodiments of the present disclosure.
FIG. 4A-4C depicts a graphical representation illustrating Performance Comparison of Anomaly Detection Models in terms of AUROC in accordance with an exemplary embodiment of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope and spirit being indicated by the following claims.
In the present disclosure, embodiments and systems and methods associated thereof provide an efficient way for extension to such approaches for multi-dimensional time series. The present approach combines advantages of non-temporal dimensionality reduction techniques and recurrent autoencoders for time series modeling through an end-to-end learning framework. The recurrent encoder gets sparse access to the input dimensions via a feedforward layer while the recurrent decoder is forced to reconstruct all the input dimensions, thereby leading to better regularization and a robust temporal model. The autoencoder thus trained on normal time series is likely to give a high reconstruction error, and a corresponding high anomaly score, for any anomalous time series pattern.
The present disclosure proposes Sparse Neural Network based Anomaly Detection, or (SPREAD): an approach that combines the point-wise (i.e. non-temporal) dimensionality reduction via one or more sparsely connected feedforward layers over the input layer with a recurrent neural encoder-decoder in an end-to-end learning setting to model the normal behavior of a system. Once a model for normal behavior is learned, it can be used for detecting behavior deviating from normal by analyzing the reconstruction via a recurrent decoder that attempts to reconstruct the original time series back using output of the recurrent encoder. Having been trained only on normal data, the model is likely to fail in reconstructing an anomalous time series and result in high reconstruction error. This error in reconstruction is used to obtain an anomaly score.
In the present disclosure, further efficacy with significant improvement is observed by implementation of the proposed approach through experiments on a public dataset and two real-world datasets in anomaly detection performance over several baselines. The proposed approach is able to perform well even without knowledge of relevant dimensions carrying the anomalous signature in a multi-dimensional setting. The present disclosure further proposes an effective way to leverage sparse networks via L1 regularization for anomaly detection in multi-dimensional time series.
Referring now to the drawings, and more particularly to FIGS. 1 through FIG. 4A-4C, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 illustrates an exemplary block diagram of a system 100 for detecting anomaly in multi-dimensional time series based on sparse neural network in accordance with an embodiment of the present disclosure. In an embodiment, the system 100 includes one or more processors 104, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more processors 104. The memory 102 comprises a database 108. The one or more processors 104 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
The database 108 may store information but are not limited to, a plurality of parameters obtained from one or more sensors, wherein the parameters are specific to an entity (e.g., user, machine, and the like). In an embodiment, one or more sensors may be a temperature sensor, a motion sensor, a pressure sensor, a vibration sensor and the like. Parameters may comprise sensor data captured through the sensors either connected to the user and/or machine. Further, the database 108 stores information pertaining to inputs fed to the system 100 and/or outputs generated by the system (e.g., at each stage), specific to the methodology described herein. More specifically, the database 108 stores information being processed at each step of the proposed methodology.
FIG. 2, with reference to FIG. 1, illustrates an exemplary flow diagram illustrating a method for detecting anomaly in multi-dimensional time series based on sparse neural network using the system 100 of FIG. 1 according to an embodiment of the present disclosure. In an embodiment, the system 100 comprises one or more data storage devices or the memory 102 operatively coupled to the one or more hardware processors 104 and is configured to store instructions for execution of steps of the method by the one or more processors 104. The flow diagram depicted in FIG. 2 is better understood by way of following explanation/description.
An RNN based Encoder-decoder anomaly detection (EncDec-AD) as shown in FIG. 3A first trains a recurrent neural network encoder-decoder (RNN-ED) as a temporal autoencoder using reconstruction error as a loss function. The autoencoder is trained on normal time series such that the network learns to reconstruct a normal time series well but is likely not to reconstruct an anomalous time series. The reconstruction error is then used to obtain an anomaly score.
More specifically, FIG. 3B, with reference to FIGS. 1 through 2, depict sparse neural network encoder-decoder based anomaly detection as implemented by the system 100 of FIG. 1 in accordance with some embodiments of the present disclosure. More specifically, Recurrent neural network Encoded-decoder (RNN-ED) is trained in such a manner that the target time series x_( T…1)^((i)) is reverse of the input time series x_ ^((i))=x_( 1…T)^((i)), for ith time series instance. In an embodiment, x_(1...T) denote a multivariate real-valued time series x_1,x_2,...,x_T of length T where each x_(t )? R^d, (d being the input dimension, e.g. number of sensors in our case). The overall process can be thought of as a non-linear mapping of the input multivariate time series to a fixed-dimensional vector z_( T)^((i)) via an encoder function f_E, followed by another non-linear mapping of the fixed-dimensional vector to a multivariate time series via a decoder function f_D. RNN-ED is trained to minimize the loss function L given by the average of squared reconstruction error:
z_T^((i))=f_E (x^((i) );W_E)
x ^^((i))=f_D (z_T^((i));W_D)
e_t^((i))=x_t^((i))- x ^_t^((i) ),t=1….T (1)
C_1 (x ^^((i) ),x^((i) ) )=1/T ?_(t=1)^T¦?e_t^((i)) ? ¦(2@2)
L=1/N ?_(i=1)^N¦C_1 ?(x ^?^((i) ),x^((i) ))
where, N is the number of multivariate time series instances in training set, ?.?2 denotes L2-norm, and W_E and W_D represent the parameters of the encoder and decoder RNNs, respectively.
Given the error vector e_t^((i)), Mahalanobis distance is used to compute the anomaly score a_t^((i)) as follows:
a_t^((i)) =v(?(e_t^((i) )- µ)?^T S^(-1) (e_t^((i) )- µ)) (2)
where µ and ? are the mean and covariance matrix of the error vectors corresponding to the normal training time series instances. This anomaly score can be obtained in an online setting by using a window of length T ending at current time t as the input, making it possible to generate timely alarms related to anomalous behavior. A point x_t^((i)) is classified as anomalous if a_t^((i))>t; the threshold t can be learned using a hold-out validation set while optimizing for F-score.
The steps of the method of the present disclosure will now be explained with reference to the components of the system 100 as depicted in FIG. 1, and the flow diagram of FIG. 2. In an embodiment of the present disclosure, at step 202, the one or more hardware processors 104 receive, at an input layer, a multi-dimensional time series corresponding to a plurality of parameters of an entity (e.g., in this case entity can be a user, or a machine, and the like). In an embodiment, each dimension of the multi-dimensional time series corresponds to at least one parameter from the plurality of parameters of the entity. In an embodiment of the present disclosure, at step 204, the one or more hardware processors 104 obtain, using a dimensionality reduction model, a reduced-dimensional time series from the multi-dimensional time series via an at least one feedforward layer. In one embodiment, connections between the input layer and the feedforward layer are sparse to access at least a portion of the plurality of parameters. In one embodiment, a provision for mapping each multi-dimensional point in the input time series to a reduced-dimensional point via a feedforward dimensionality reduction layer, and then use the time series in reduced-dimensional space to reconstruct the original multi-dimensional time series via RNN-ED, as in EncDec- AD.
A sparsity constraint is added on the weights of the feedforward layer such that each unit in the feedforward layer has access to a subset of the input parameters (e.g., input dimensions). A feedforward layer with sparse connections WR from the input layer is used to map x_t^((i))? R^(d )to y_t^((i))? R^r, such that r
Documents
Orders
Section
Controller
Decision Date
15 & 43(1)
Krishna Kumar Gupta
2024-02-21
15 & 43(1)
Krishna Kumar Gupta
2024-02-21
Application Documents
#
Name
Date
1
201821025602-STATEMENT OF UNDERTAKING (FORM 3) [09-07-2018(online)].pdf
2018-07-09
2
201821025602-FORM 1 [09-07-2018(online)].pdf
2018-07-09
3
201821025602-FIGURE OF ABSTRACT [09-07-2018(online)].jpg