Abstract: ABSTRACT SYSTEM AND METHOD FOR ANOMALY IDENTIFICATION IN A NETWORK The present invention relates to a system (120) and a method (500) for anomaly identification in a network (105) is disclosed. The system (120) includes a receiving unit (220) configured to receive a first set of data from one or more data sources. The system (120) includes a training unit (225) configured to train a model utilizing the first set of data. The system (120) includes an applying unit (230) configured to apply a second set of data to the trained model to identify the one or more anomalies. The system (120) includes a predicting unit (235) configured to predict one of an anomalous datapoint and a normal datapoint. The system (120) includes a rendering unit (240) to render an interactive graphical representation of datapoints on a display unit (250). The system (120) includes a plotting unit (245) to plot the prediction to identify the one or more anomalies. Ref. Fig. 2
DESC:
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10 and rule 13)
1. TITLE OF THE INVENTION
SYSTEM AND METHOD FOR ANOMALY IDENTIFICATION IN A NETWORK
2. APPLICANT(S)
NAME NATIONALITY ADDRESS
JIO PLATFORMS LIMITED INDIAN OFFICE-101, SAFFRON, NR. CENTRE POINT, PANCHWATI 5 RASTA, AMBAWADI, AHMEDABAD 380006, GUJARAT, INDIA
3.PREAMBLE TO THE DESCRIPTION
THE FOLLOWING SPECIFICATION PARTICULARLY DESCRIBES THE NATURE OF THIS INVENTION AND THE MANNER IN WHICH IT IS TO BE PERFORMED.
FIELD OF THE INVENTION
[0001] The present invention relates to the field of wireless communication networks, more particularly to a system and a method for anomaly identification in a network.
BACKGROUND OF THE INVENTION
[0002] A telecommunication network requires constant monitoring in order to detect any anomalies and fix these anomalies in a timely manner before these anomalies transform into critical issues thereby disrupting the network.
[0003] In the traditional systems, there are various machine learning models which have been developed to detect and predict the anomalies in the network. However, these machine learning models are complex due to plain training with historic data, therefore in case of any detection or prediction of the anomalies are performed by the machine learning models, the network engineers find it difficult to understand the basis for these predictions or detections. It is important for the network engineers to understand the basis for the predictions, since based on the predictions, they would have to address the root cause of the issues in order to resolve them. In the current traditional systems, since the network engineers find it difficult to understand the basis of predictions performed by the machine learning models, the task of them figuring out the basis for such predictions takes a long time, hence the time taken to fix these issues is delayed, thereby leading to disrupting the network.
[0004] Further, even though many traditional systems claim that real time monitoring is performed, there has been lag in monitoring thereby a delay in detection of the anomalies. Delay in the detection of the anomalies lead to a delayed identification of a problem and a response to critical events. Furthermore, many of the traditional machine learning models generate false positives of the anomalies. False positives lead to high number of unnecessary alerts, which may unnecessary burden the resources handling these anomalies. Furthermore, the depiction of the anomalies on a display is quite complex in the traditional systems, which requires engineers who are trained to understand the detections and predictions displayed. In other words, the depictions provided by the traditional systems are not user friendly. As such, the engineers fail to focus on specific regions of interest or obtain information related to the anomaly detected. In addition, owing to the complexity of the display of the anomalies, users lacking technical expertise may face difficulties in comprehending and interpreting the information corresponding to the anomaly. Accordingly, complexity of the traditional display systems further limits accessibility and usefulness of detection of the anomaly.
[0005] In view of the above, there is a dire need for a method and a system to efficiently detect the anomalies and provide meaningful and easy to understand insights as depiction on displays.
SUMMARY OF THE INVENTION
[0006] One or more embodiments of the present disclosure provide a method and a system for anomaly identification in a network.
[0007] In one aspect of the present invention, the method for anomaly identification in the network is disclosed. The method includes the step of receiving, by one or more processors, a first set of data from one or more data sources. The method includes the step of training, by the one or more processors, a model utilizing the first set of data to identify one or more anomalies in the first set of received data. The method includes the step of applying, by the one or more processors, a second set of data received from the one or more data sources to the trained model to identify the one or more anomalies in the second set of received data. The method includes the step of predicting, by the one or more processors, if each datapoint in the second set of data is one of an anomalous datapoint and a normal datapoint based on a comparison of the first set of data and the second set of data. The method includes the step of rendering, by the one or more processors, an interactive graphical representation of datapoints of each of the first set of data and the second set of data and a set of values for each of the datapoints of the first and the second set of data on a display unit. The method includes the step of plotting, by the one or more processors, the prediction pertaining to the second set of data on the interactive graphical representation to identify the one or more anomalies in the second set of data
[0008] In one embodiment, the one or more data sources is at least one of a Network Management System (NMS) and a probing unit and wherein the received data is pre-processed and standardized.
[0009] In another embodiment, training the model aids in distinguishing between the anomalous datapoint and the normal datapoint.
[0010] In yet another embodiment, the second set of data is received in real time from the one or more data sources.
[0011] In yet another embodiment, the set of values comprises a predicted value, an actual value, a lower threshold, and an upper threshold.
[0012] In yet another embodiment, each of the data point is identified as the anomalies if the data point exceeds one of the lower and the upper threshold.
[0013] In another aspect of the present invention, the system for anomaly identification in the network is disclosed. The system includes a receiving unit configured to receive a first set of data from one or more data sources. The system includes a training unit configured to train a model utilizing the first set of data to identify one or more anomalies in the first set of received data. The system includes an applying unit configured to apply a second set of data received from the one or more data sources to the trained model to identify the one or more anomalies in the second set of received data. The system includes a predicting unit configured to predict, if each datapoint in the second set of data is one of the anomalous datapoint and a normal datapoint based on a comparison of the first set of data and the second set of data. The system includes a rendering unit configured to render, an interactive graphical representation of datapoints of each of the first set of data and the second set of data and a set of values for each of the datapoints of the first and the second set of data on a display unit. The system includes a plotting unit configured to plot, the prediction pertaining to the second set of data on the interactive graphical representation to identify the one or more anomalies in the second set of data.
[0014] In another aspect of the embodiment, a non-transitory computer-readable medium having stored thereon computer-readable instructions that, when executed by a processor is disclosed. The processor is configured to receive a first set of data from one or more data sources. The processor is configured to train a model utilizing the first set of data to identify one or more anomalies in the first set of received data. The processor is configured to apply a second set of data received from the one or more data sources to the trained model to identify the one or more anomalies in the second set of received data. The processor is configured to predict if each datapoint in the second set of data is one of the anomalous datapoint and a normal datapoint based on a comparison of the first set of data and the second set of data. The processor is configured to render an interactive graphical representation of datapoints of each of the first set of data and the second set of data and a set of values for each of the datapoints of the first and the second set of data on a display unit. The processor is configured to plot the prediction pertaining to the second set of data on the interactive graphical representation to identify the one or more anomalies in the second set of data.
[0015] Other features and aspects of this invention will be apparent from the following description and the accompanying drawings. The features and advantages described in this summary and in the following detailed description are not all-inclusive, and particularly, many additional features and advantages will be apparent to one of ordinary skill in the relevant art, in view of the drawings, specification, and claims hereof. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes disclosure of electrical components, electronic components or circuitry commonly used to implement such components.
[0017] FIG. 1 is an exemplary block diagram of an environment for anomaly identification in a network, according to one or more embodiments of the present disclosure;
[0018] FIG. 2 is an exemplary block diagram of a system for anomaly identification in the network, according to the one or more embodiments of the present disclosure;
[0019] FIG. 3 is a block diagram of an architecture that can be implemented in the system of FIG.2, according to the one or more embodiments of the present disclosure;
[0020] FIG. 4 illustrates a graphical depiction of anomaly detection on a display unit, according to the one or more embodiments of the present disclosure; and
[0021] FIG. 5 is a flow diagram illustrating a method for anomaly identification in the network, according to the one or more embodiments of the present disclosure.
[0022] The foregoing shall be more apparent from the following detailed description of the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0023] Some embodiments of the present disclosure, illustrating all its features, will now be discussed in detail. It must also be noted that as used herein and in the appended claims, the singular forms "a", "an" and "the" include plural references unless the context clearly dictates otherwise.
[0024] Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. However, one of ordinary skill in the art will readily recognize that the present disclosure including the definitions listed here below are not intended to be limited to the embodiments illustrated but is to be accorded the widest scope consistent with the principles and features described herein.
[0025] A person of ordinary skill in the art will readily ascertain that the illustrated steps detailed in the figures and here below are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments.
[0026] Various embodiments of the present invention provide a system and method for anomaly identification in a network. The present disclosure incorporates an Artificial Intelligence/Machine Learning (AI/ML) model with a processor. The processor enables machine learning anomaly detection. Further, the anomaly detection is depicted graphically on a display unit in order to thoroughly evaluate the presence of anomaly in data points. The depiction is user friendly with easy-to-understand data points, which negates the requirement of skilled people to understand the results. The integration empowers the users to visually inspect and interpret results, enhancing decision-making. Further, the processor displays lower and upper thresholds with predicted and actual values graphically, thereby enabling intuitive and relative anomaly identification. The dynamic approach of the anomaly identification bridges advanced machine learning with actionable, user-friendly insights, revolutionizing anomaly detection in various applications.
[0027] Referring to FIG. 1, FIG. 1 illustrates an exemplary block diagram of an environment 100 for anomaly identification in a network 105, according to one or more embodiments of the present invention. The environment 100 includes the network 105, a User Equipment (UE) 110, a server 115, and a system 120. The UE 110 aids a user to interact with the system 120 for identifying the anomalies in the network 105. In an embodiment, the user is at least one of, a network operator, and a service provider. The anomaly identification in the network 105 refers to the process of detecting unusual patterns or behaviors in network traffic or performance that deviate from established norms, which is crucial for maintaining network security, performance, and reliability. The anomaly identification involves analyzing network data to spot irregularities that could indicate issues such as security breaches, system failures, or performance degradation. The anomaly identification utilizes statistical methods, machine learning algorithms, or predefined rules to determine what constitutes normal behavior and anomalous behavior.
[0028] For the purpose of description and explanation, the description will be explained with respect to the UE 110, or to be more specific will be explained with respect to a first UE 110a, a second UE 110b, and a third UE 110c, and should nowhere be construed as limiting the scope of the present disclosure. Each of the UE 110 from the first UE 110a, the second UE 110b, and the third UE 110c is configured to connect to the server 115 via the network 105. In an embodiment, each of the first UE 110a, the second UE 110b, and the third UE 110c is one of, but not limited to, any electrical, electronic, electro-mechanical or an equipment and a combination of one or more of the above devices such as smartphones, virtual reality (VR) devices, augmented reality (AR) devices, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, or any other computing device.
[0029] The network 105 includes, by way of example but not limitation, one or more of a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof. The network 105 may include, but is not limited to, a Third Generation (3G), a Fourth Generation (4G), a Fifth Generation (5G), a Sixth Generation (6G), a New Radio (NR), a Narrow Band Internet of Things (NB-IoT), an Open Radio Access Network (O-RAN), and the like.
[0030] The server 115 may include by way of example but not limitation, one or more of a standalone server, a server blade, a server rack, a bank of servers, a server farm, hardware supporting a part of a cloud service or system, a home server, hardware running a virtualized server, one or more processors executing code to function as a server, one or more machines performing server-side functionality as described herein, at least a portion of any of the above, some combination thereof. In an embodiment, the entity may include, but is not limited to, a vendor, a network operator, a company, an organization, a university, a lab facility, a business enterprise, a defense facility, or any other facility that provides content.
[0031] The environment 100 further includes the system 120 communicably coupled to the server 115 and each of the first UE 110a, the second UE 110b, and the third UE 110c via the network 105. The system 120 is configured for anomaly identification in the network 105. The system 120 is adapted to be embedded within the server 115 or is embedded as the individual entity, as per multiple embodiments of the present invention.
[0032] Operational and construction features of the system 120 will be explained in detail with respect to the following figures.
[0033] FIG. 2 is an exemplary block diagram of a system 120 for anomaly identification in the network 105, according to one or more embodiments of the present disclosure.
[0034] The system 120 includes a processor 205, a memory 210, a user interface 215, and a database 255. For the purpose of description and explanation, the description will be explained with respect to one or more processors 205, or to be more specific will be explained with respect to the processor 205 and should nowhere be construed as limiting the scope of the present disclosure. The one or more processor 205, hereinafter referred to as the processor 205 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, single board computers, and/or any devices that manipulate signals based on operational instructions.
[0035] As per the illustrated embodiment, the processor 205 is configured to fetch and execute computer-readable instructions stored in the memory 210. The memory 210 may be configured to store one or more computer-readable instructions or routines in a non-transitory computer-readable storage medium, which may be fetched and executed to create or share data packets over a network service. The memory 210 may include any non-transitory storage device including, for example, volatile memory such as RAM, or non-volatile memory such as EPROM, flash memory, and the like.
[0036] The User Interface (UI) 215 includes a variety of interfaces, for example, interfaces for a Graphical User Interface (GUI), a web user interface, a Command Line Interface (CLI), and the like. The user interface 215 facilitates communication of the system 120. In one embodiment, the user interface 215 provides a communication pathway for one or more components of the system 120. Examples of the one or more components include, but are not limited to, the UE 110, and the database 255.
[0037] The database 255 is one of, but not limited to, a centralized database, a cloud-based database, a commercial database, an open-source database, a distributed database, an end-user database, a graphical database, a No-Structured Query Language (NoSQL) database, an object-oriented database, a personal database, an in-memory database, a document-based database, a time series database, a wide column database, a key value database, a search database, a cache databases, and so forth. The foregoing examples of database 255 types are non-limiting and may not be mutually exclusive e.g., a database can be both commercial and cloud-based, or both relational and open-source, etc.
[0038] Further, the processor 205, in an embodiment, may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processor 205. In the examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processor 205 may be processor-executable instructions stored on a non-transitory machine-readable storage medium and the hardware for processor 205 may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the memory 210 may store instructions that, when executed by the processing resource, implement the processor 205. In such examples, the system 120 may comprise the memory 210 storing the instructions and the processing resource to execute the instructions, or the memory 210 may be separate but accessible to the system 120 and the processing resource. In other examples, the processor 205 may be implemented by electronic circuitry.
[0039] In order for the system 120 to identify anomaly in the network 105, the processor 205 includes a receiving unit 220, a training unit 225, an applying unit 230, a predicting unit 235, a rendering unit 240, a plotting unit 245, and a display unit 250 communicably coupled to each other. In an embodiment, operations and functionalities of the receiving unit 220, the training unit 225, the applying unit 230, the predicting unit 235, the rendering unit 240, the plotting unit 245 and the display unit 250 can be used in combination or interchangeably.
[0040] The receiving unit 220 is configured to receive a first set of data from one or more data sources in real time. In an embodiment, the one or more data sources is at least one of a Network Management System (NMS), and a probing unit. The NMS is a system that manages, monitors, and maintains one or more network resources. The NMS collects a wide array of data regarding network performance, device status, traffic patterns, and potential issues. The data is essential for understanding an operational state of the network 105. The probing unit refers to devices or software that actively collect the data from the network 105. The probing unit can monitor traffic, gather metrics, and detect the anomalies in real time, and provide granular insights that can complement the broader data collected by the NMS.
[0041] The first set of data involves an initial collection of information from the one or more data sources. In one embodiment, the first set of data is real-time streaming data and requires processing before the first set of data can be used effectively. Real-time data focuses on immediate monitoring and alerting, enabling quick responses to threats as they arise. In an exemplary embodiment, the system 120 tracks user logins and behavior in real time. If the user logs in from multiple geographic locations within a short period, it triggers an alert for potential account compromise or insider threat. In another embodiment, the first set of data is non-real time data. In an exemplary embodiment, network analysts review traffic logs archived over several months to identify trends and anomalies. For instance, the network analyst might analyze periods of heavy traffic to correlate them with security incidents that occurred in the past.
[0042] The received first set of data is pre-processed and standardized. The pre-processing involves cleaning, transforming, and organizing the data in a standardized data for training a model. In an embodiment, the model includes, but not limited to, an Artificial Intelligence/Machine Learning (AI/ML) model. The cleaning of the data involves removing errors or inconsistencies from the data. In an example, missing values, duplicates, and irrelevant information are addressed during the cleaning process. The transforming of the data includes converting the data into a usable format. In an example, categorical variables might be encoded, numerical data could be normalized or scaled, and timestamps might be standardized. Once the data is cleaned and transformed, the data is structured into a standardized format. The organizing of the data involves arranging the data into tables or specific data structures that facilitate easier access and manipulation. Upon pre-processing the data, the system 120 receives the standardized data for machine learning or data analysis.
[0043] Upon receiving the pre-processed and standardized data from the receiving unit 220, the training unit 225 is configured to train a model utilizing the first set of data to identify one or more patterns corresponding to the one or more anomalies in the first set of received data. In an embodiment, the model includes, but not limited to, an Artificial Intelligence/Machine Learning (AI/ML) model. The model is trained to identify the one or more patterns corresponding to the one or more anomalies by utilizing the machine learning algorithms, such as autoencoders, isolation forest, or other anomaly detection methods. As the model is trained on the first set of data, the model learns to recognize the one or more patterns associated with normal behavior. The one or more patterns are represented as clusters, decision boundaries, or reconstruction errors, depending on the machine learning algorithm used. The autoencoders are neural network architectures used for unsupervised learning, primarily for tasks like dimensionality reduction and anomaly detection. The autoencoders include an encoder and a decoder. The encoder compresses the input into a lower-dimensional representation and the decoder reconstructs the original input from the compressed representation. During training, normal data patterns are learned, and the model becomes adept at reconstructing the patterns accurately. When a new data point is received, the data is passed through the autoencoder. The reconstruction error is calculated by comparing the original input to the reconstructed output. If the reconstruction error exceeds a predefined threshold, the data point is flagged as an anomaly. The autoencoders are beneficial for capturing complex patterns and handling high-dimensional data, making suitable for reconstructing normal behavior and identifying one or more deviations.
[0044] Furthermore, the isolation forest is an ensemble learning technique specifically designed for anomaly detection. The isolation forests do not require labeled training data, making them well-suited for unsupervised anomaly detection. The isolation forests can handle high-dimensional data effectively and scale well with large datasets. After training, new data points are processed through the ensemble of trees. The anomaly scores are calculated, and points above a certain threshold are identified as the anomalies. The isolation forests are particularly useful in detecting network intrusions, fraud detection, and identifying unusual patterns in network traffic. The isolation forests quickly and efficiently isolate the anomalies without requiring the labeled data, where a normal datapoint and an anomalous datapoint are distinguished. The model is trained aids in distinguishing between the anomalous datapoint and the normal datapoint.
[0045] Upon training the model, the applying unit 230 is configured to apply a second set of data received from the one or more data sources to the trained model to identify one or more patterns corresponding to the one or more anomalies in the second set of received data. In an embodiment, the second set of data is received in real time from the one or more data sources. After the model is trained using the first set of data, the applying unit 230 is responsible for applying the trained model to new incoming data, often referred to as the second set of data such as new, and unseen data, which allows the system 120 to operationalize the insights gained from a training phase, enabling real-time anomaly detection. During the training phase, the model learns to identify one or more patterns in the first set of data. The first set of data consists of historical data that includes both normal behavior and anomalies behavior. The model uses machine learning techniques such as supervised or unsupervised learning, depending on whether the labeled data is available. Once the model is trained, the system 120 is utilized to analyze the second set of data. The second set of data is typically sourced from the one or more data sources during the training phase, which includes network traffic logs, system performance metrics, and security logs. The system is designed to handle the second set of data in real time. The second set of data is processed immediately, allowing for prompt identification of any anomalies.
[0046] Upon applying the second set of data received from the one or more data sources to the trained model, the predicting unit 235 is configured to predict if each datapoint in the second set of data is one of the anomalous datapoint and the normal datapoint. The predicting unit 235 is configured to predict if each datapoint in the second set of data is based on the comparison of the one or more patterns of the first set of data and the one or more patterns of the second set of data. If each datapoint deviates significantly from the learned one or more patterns, it is flagged as an anomaly. The system uses statistical methods or machine learning techniques (like reconstruction error for the autoencoders or path length for the isolation forests) to determine whether the data point is anomalous. By applying the trained model to the second set of data (new data), the system 120 enables proactive monitoring of network behavior. The anomalies can be detected in real-time, allowing for swift action to mitigate potential threats. Although the model is initially trained on a specific dataset, the application phase can inform future training cycles. In an example, the identified anomalies can be labeled and incorporated into future training datasets, enhancing the model's accuracy over time. The insights gained from the second set of data are integrated into the system for continuous improvement. By analyzing false positives and negatives, the model can be fine-tuned for better accuracy.
[0047] Upon predicting each datapoint in the second set of data, the rendering unit 240 is configured to render an interactive graphical representation of datapoints of each of the first set of data and the second set of data and a set of values for each of the datapoints of the first and the second set of data on a display unit 250. In an embodiment, the set of values include but not limited to, a predicted value, an actual value, a lower threshold, and an upper threshold. In one embodiment, the set of values include are defined by the user. The predicted value refers to the outputs generated by the trained model for a given data point. The predicted values help in identifying one or more deviations from expected behavior. In anomaly detection, if the actual value deviates significantly from the predicted value, it may indicate an anomaly or abnormal condition. The actual value refers to the true or observed value for the same data point, serving as a reference point against which predictions made by the trained model are compared. The actual values are crucial for validating the model’s predictions. By comparing the actual values with the predicted values, the system 120 can assess accuracy, refine the model, and determine the presence of anomalies.
[0048] Furthermore, the lower threshold represents a minimum acceptable value for the metric being monitored. The data points falling below the lower threshold may be considered anomalous or indicative of potential issues. The lower threshold helps identify underperformance or potential issues. In an exemplary embodiment, if a network's bandwidth usage drops below the lower threshold, it may a signal failure or a security breach. The lower threshold provides a clear boundary for acceptable performance, guiding intervention when necessary. The upper threshold refers to a maximum acceptable value for the monitored metric. The data points exceeding the upper threshold may be considered anomalous or indicative of potential issues. The upper threshold is essential for detecting overperformance or unexpected spikes in metrics. In an exemplary embodiment, if CPU usage exceeds the upper threshold, it could indicate a denial-of-service attack or system overload. The upper threshold also sets a clear boundary for acceptable operational behavior. In an embodiment, each of the data points is identified as the anomalies if the data point exceeds one of the lower and the upper thresholds.
[0049] Upon identifying the anomalies, the display unit 250 displays the interactive graphical representation of datapoints of each of the first set of data and the second set of data. The user can visually analyze the data points. In an embodiment, the anomalies are identified by comparing the predicted value to the lower and upper thresholds, then the datapoint can be flagged as the anomalies. The interactive nature of the graph allows the user to hover over the data points for additional information (e.g., specific values, time stamps). The user can also zoom in on specific time frames for detailed analysis or filter data to focus on the anomalies.
[0050] Upon rendering the interactive graphical representation of datapoints of each of the first set of data and the second set of data, the plotting unit 245 is configured to plot the prediction pertaining to the second set of data. In one embodiment, the plotting unit 245 is configured to plot the prediction based on the interactive graphical representation to identify the one or more anomalies in the second set of data by the user. In another embodiment, an automated handler is pre-configured to select random/desired/pre-entered by the user to identify the one or more anomalies. In an exemplary embodiment, consider a network monitoring system that tracks bandwidth usage in megabits per second (Mbps). The system has defined the following thresholds based on historical data. The lower threshold refers to 50 Mbps and the upper threshold refers to 200 Mbps. The data points at 10:00 AM is 60 Mbps, the data points at 10:05 AM is 30 Mbps, and the data points at 10:10 AM is 220 Mbps. Based on the exemplary scenario, the data points at 10:05 AM is 30 Mbps and 10:10 AM is 220 Mbps are identified as the anomalies since the data point exceeds the lower and upper thresholds, respectively. The system 120 allows for quick identification of unusual behavior in network usage, enabling proactive measures to address potential issues.
[0051] By identification of the anomalies, the system 120 provides a clear and intuitive visual representation of anomaly detection results, making it easier for users to interpret and understand the model's predictions, performs real-time analysis, which allows for real-time monitoring and analysis of the data, enabling the user to quickly identify and respond to the anomalies as they occur, enhances user interaction, which empowers the user to interact with the data, potentially zooming in on specific regions of interest, hovering over data points for detailed information, and toggling the visibility of the predicted and actual values, reduces the need for the users to have in-depth knowledge of machine learning algorithms, enables the users to quickly spot anomalies that fall outside the predefined lower and upper thresholds, facilitating timely decision-making and intervention. The system 120 improves processing speed of the processor 205 and reduces requirement of memory space.
[0052] FIG. 3 is a block diagram of an architecture 300 that can be implemented in the system of FIG.2, according to one or more embodiments of the present disclosure. The architecture 300 of the system 120 includes the UI 215, the processor 205, the database 255, and a workflow manager 320. The processor 205 includes an anomaly detection model training unit 305, an anomaly detection and output generation unit 310, and an anomaly visualization tool 315.
[0053] The UI 215 includes a variety of interfaces, for example, interfaces for a Graphical User Interface (GUI), a web user interface, a Command Line Interface (CLI), and the like. The user interface 215 facilitates communication of the system 120. In one embodiment, the user interface 215 provides a communication pathway for one or more components of the system 120. Examples of the one or more components include, but are not limited to, the processor 205, and the database 255.
[0054] The processor 205 includes the anomaly detection model training unit 305, the anomaly detection and output generation unit 310, and the anomaly visualization tool 315. The anomaly detection model training unit 305 is configured to train the model utilizing the first set of data to identify the one or more patterns corresponding to the one or more anomalies in the first set of received data. The model is trained to identify the one or more patterns corresponding to the one or more anomalies by utilizing the machine learning algorithms, such as autoencoders, isolation forest, or other anomaly detection methods. Further, the anomaly detection model training unit 305 is configured to train the AI/ML model.
[0055] Upon training the model, the anomaly detection and output generation unit 310 predicts if each datapoint in the second set of data is one of the anomalous datapoint and the normal datapoint. The anomaly detection and output generation unit 310 is configured to predict if each datapoint in the second set of data based on the comparison of the one or more patterns of the first set of data and the one or more patterns of the second set of data. If each datapoint deviates significantly from the learned one or more patterns, it is flagged as an anomaly. The trained model aids in distinguishing between the anomalous datapoint and the normal datapoint.
[0056] The anomaly visualization tool 315 is configured to graphically depict and provide insights on anomaly detection. The anomaly visualization tool 315 renders the interactive graphical representation of datapoints of each of the first set of data and the second set of data and the set of values for each of the datapoints of the first and the second set of data. In an embodiment, the set of values include but not limited to, the predicted value, the actual value, the lower threshold, and the upper threshold. Upon rendering the interactive graphical representation of datapoints of each of the first set of data and the second set of data, the display unit 250 displays the prediction pertaining to the second set of data, which is easier for the user to interpret and understand the predictions performed by the ML model.
[0057] Advantageously, the graphical depiction of anomaly ensures that only actual anomalies are detected and reported, and false positive anomalies are discarded and filtered by combining interactive visualization APIs. Further, the display unit 250 configured to display the user interface 215 may send a request to the workflow manager 320 and vice-versa to carry out tasks. The workflow manager 320 is responsible for coordinating and automating the sequence of tasks and processes necessary for data analysis, model training, and anomaly detection. The workflow manager 320 ensures that each step of the workflow is executed in a correct order and manages dependencies between the tasks. The database 255 is used to store the data for quick access.
[0058] Further, the architecture 300 of the system 120 further provides for immediate anomaly identification, which enables the user to quickly spot the anomalies that fall outside the lower and upper thresholds, facilitating timely decision-making and intervention. Further, the user will get a better understanding of the basis of anomaly prediction by the ML model, which in turn will help the user to provide right sets of the historical data to train the ML model in the future, thereby increasing the efficiency of anomaly prediction by the ML model in the future.
[0059] FIG. 4 illustrates a graphical depiction of anomaly detection on the display unit 250, according to the one or more embodiments of the present disclosure.
[0060] The processor 205 generates the graphical depiction of the results of anomaly detection on the user interface 215 of the display unit 250. The graphical depiction includes features such as, but not limited to, the predicted value, the actual value, the lower threshold, and the upper threshold. The predicted value refers to the outputs generated by the trained model for a given data point. The predicted values help in identifying one or more deviations from expected behavior. In anomaly detection, if the actual value deviates significantly from the predicted value, it may indicate an anomaly or abnormal condition.
[0061] The actual value refers to the true or observed value for the same data point, serving as a reference point against which predictions made by the trained model are compared. The actual values are crucial for validating the model’s predictions. By comparing the actual values with the predicted values, the system 120 can assess accuracy, refine the model, and determine the presence of anomalies. Furthermore, the lower threshold represents a minimum acceptable value for the metric being monitored. The data points falling below the lower threshold may be considered anomalous or indicative of potential issues. The lower threshold helps identify underperformance or potential issues. The lower threshold provides a clear boundary for acceptable performance, guiding intervention when necessary. The upper threshold refers to a maximum acceptable value for the monitored metric. The data points exceeding the upper threshold may be considered anomalous or indicative of potential issues. The upper threshold is essential for detecting overperformance or unexpected spikes in metrics. The upper threshold also sets a clear boundary for acceptable operational behavior. In an embodiment, each of the data points is identified as the anomalies if the data point exceeds one of the lower and the upper thresholds.
[0062] In an embodiment, the graph is depicted by the processor 205 by plotting the predicted and actual values. In particular, the predicted values generated by the ML model are plotted against time or index axis. Additionally, the actual values are plotted for reference. Further, the thresholds are depicted on the graph. In particular, horizontal lines representing the lower and upper thresholds are depicted on the graph. These serve as visual references for acceptable ranges.
[0063] In an embodiment, the processor 205 includes the ML model performs the anomaly detection by comparing the predicted values by the ML model to the lower and upper thresholds. Data points having the values beyond the scope of lower and upper thresholds are considered anomalies. Advantageously, any false positive anomalies are filtered from the list of anomalies detected using the `ical depiction set up. Advantageously, the graphical set up provided by the processor 205 provides a quality check of anomaly detection, thereby providing a user friendly and evaluated anomaly detection set up.
[0064] FIG. 5 is a flow diagram illustrating a method for anomaly identification in the network 105, according to one or more embodiments of the present disclosure.
[0065] At step 505, the method 500 includes the step of receiving the data from the data source by the receiving unit 220. In an embodiment, the one or more data sources is at least one of the NMS, and the probing unit. The first set of data involves an initial collection of information from the one or more data sources. The first set of data is often raw and requires processing before the first set of data can be used effectively. The received first set of data is pre-processed and standardized. Upon pre-processing the data, the system 120 receives the standardized data for machine learning or data analysis.
[0066] At step 510, the method 500 includes the step of training the model utilizing the first set of data to identify one or more patterns corresponding to the one or more anomalies in the first set of received data by the training unit 225. In an embodiment, the model includes, but not limited to, the Artificial Intelligence/Machine Learning (AI/ML) model. The model is trained to identify the one or more patterns corresponding to the one or more anomalies by utilizing the machine learning algorithms, such as autoencoders, isolation forest, or other anomaly detection methods. As the model is trained on the first set of data, the model learns to recognize the one or more patterns associated with normal behavior. The one or more patterns are represented as clusters, decision boundaries, or reconstruction errors, depending on the machine learning algorithm used. The model is trained aids in distinguishing between the anomalous datapoint and the normal datapoint.
[0067] At step 515, the method 500 includes the step of applying the second set of data received from the one or more data sources to the trained model to identify one or more patterns corresponding to the one or more anomalies in the second set of received data by the applying unit 230. In an embodiment, the second set of data is received in real time from the one or more data sources. After the model is trained using the first set of data, the applying unit 230 is responsible for applying the trained model to new incoming data, often referred to as the second set of data such as new, and unseen data, which allows the system 120 to operationalize the insights gained from a training phase, enabling real-time anomaly detection. Once the model is trained, the system 120 is utilized to analyze the second set of data. The second set of data is typically sourced from the one or more data sources during the training phase, which includes network traffic logs, system performance metrics, and security logs. The system is designed to handle the second set of data in real time. The second set of data is processed immediately, allowing for prompt identification of any anomalies.
[0068] At step 520, the method 500 includes the step of predicting if each datapoint in the second set of data is one of the anomalous datapoint and the normal datapoint by the predicting unit 235. The predicting unit 235 is configured to predict if each datapoint in the second set of data is based on the comparison of the one or more patterns of the first set of data and the one or more patterns of the second set of data. If each datapoint deviates significantly from the learned one or more patterns, it is flagged as an anomaly. The anomalies can be detected in real-time, allowing for swift action to mitigate potential threats. The insights gained from the second set of data are integrated into the system for continuous improvement. By analyzing false positives and negatives, the model can be fine-tuned for better accuracy.
[0069] At step 525, the method 500 includes the step of rendering the interactive graphical representation of datapoints of each of the first set of data and the second set of data and the set of values for each of the datapoints of the first and the second set of data on a display unit 250 by the rendering unit 240. In an embodiment, the set of values include but not limited to, the predicted value, the actual value, the lower threshold, and the upper threshold. The predicted value refers to the outputs generated by the trained model for a given data point. The predicted values help in identifying one or more deviations from expected behavior. In anomaly detection, if the actual value deviates significantly from the predicted value, it may indicate an anomaly or abnormal condition. The actual value refers to the true or observed value for the same data point, serving as a reference point against which predictions made by the trained model are compared. The actual values are crucial for validating the model’s predictions. By comparing the actual values with the predicted values, the system 120 can assess accuracy, refine the model, and determine the presence of anomalies.
[0070] Furthermore, the lower threshold represents a minimum acceptable value for the metric being monitored. The data points falling below the lower threshold may be considered anomalous or indicative of potential issues. The lower threshold helps identify underperformance or potential issues. In an exemplary embodiment, if a network's bandwidth usage drops below the lower threshold, it may a signal failure or a security breach. The lower threshold provides a clear boundary for acceptable performance, guiding intervention when necessary. The upper threshold refers to a maximum acceptable value for the monitored metric. The data points exceeding the upper threshold may be considered anomalous or indicative of potential issues. The upper threshold is essential for detecting overperformance or unexpected spikes in metrics. In an exemplary embodiment, if CPU usage exceeds the upper threshold, it could indicate a denial-of-service attack or system overload. The upper threshold also sets a clear boundary for acceptable operational behavior. In an embodiment, each of the data points is identified as the anomalies if the data point exceeds one of the lower and the upper thresholds.
[0071] In one embodiment, the lower threshold and the upper threshold are represented utilizing at least one of, but not limited to, horizontal lines. The horizontal lines aids in providing a visual reference of the acceptable range between the lower and the upper threshold.
[0072] Upon identifying the anomalies, the display unit 250 displays the interactive graphical representation of datapoints of each of the first set of data and the second set of data. The user can visually analyze the data points. In an embodiment, the anomalies are identified by comparing the predicted value to the lower and upper thresholds, then the datapoint can be flagged as the anomalies. The interactive nature of the graph allows the user to hover over the data points for additional information (e.g., specific values, time stamps). The user can also zoom in on specific time frames for detailed analysis or filter data to focus on the anomalies.
[0073] At step 530, the method 500 includes the step of plotting the prediction pertaining to the second set of data by the plotting unit 245 upon rendering the interactive graphical representation of datapoints of each of the first set of data and the second set of data by the user. In another embodiment, an automated handler is pre-configured to select random/desired/pre-entered by the user to identify the one or more anomalies. The plotting unit 245 is configured to plot the prediction based on the interactive graphical representation to identify the one or more anomalies in the second set of data. The system 120 allows for quick identification of unusual behavior in network usage, enabling proactive measures to address potential issues.
[0074] In another aspect of the embodiment, a non-transitory computer-readable medium having stored thereon computer-readable instructions that, when executed by a processor 205 is disclosed. The processor 205 is configured to receive a first set of data from one or more data sources. The processor 205 is configured to train a model utilizing the first set of data to identify one or more anomalies in the first set of received data. The processor 205 is configured to apply a second set of data received from the one or more data sources to the trained model to identify the one or more anomalies in the second set of received data. The processor 205 is configured to predict if each datapoint in the second set of data is one of the anomalous datapoint and a normal datapoint based on a comparison of the first set of data and of the second set of data. The processor 205 is configured to render an interactive graphical representation of datapoints of each of the first set of data and the second set of data and a set of values for each of the datapoints of the first and the second set of data on a display unit. The processor 205 is configured to plot the prediction pertaining to the second set of data on the interactive graphical representation to identify the one or more anomalies in the second set of data.
[0075] A person of ordinary skill in the art will readily ascertain that the illustrated embodiments and steps in description and drawings (FIGS.1-5) are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments.
[0076] The present disclosure provides technical advancement for combining the machine learning-based anomaly detection with the user-friendly graphical interface, which renders the interactive graphical representation of datapoints of each of the first set of data and the second set of data and the set of values for each of the datapoints of the first and the second set of data. The interactive graphical representation empowers the users to visually inspect and interpret results, enhancing decision-making. By presenting lower and upper thresholds with the predicted and actual values graphically, which enables intuitive and relative anomaly identification.
[0077] The present disclosure provides technical advancement for performing intuitive visualization to provide a clear and intuitive visual representation of anomaly detection results, making it easier for the users to interpret and understand the ML model's predictions, real-time analysis of the datapoints, which allows for real-time monitoring and analysis of data, enabling users to quickly identify and respond to anomalies as they occur, enhances the user interaction, potentially zooming in on specific regions of interest, hovering over data points for detailed information, and toggling the visibility of predicted and actual values, reduces need for users to have in-depth knowledge of machine learning algorithms, making it accessible to a broader audience, including those without technical expertise, enables the users to quickly spot anomalies that fall outside the predefined lower and upper thresholds, facilitating timely decision-making and intervention, and provides transparency and insights into the model's behavior by the user interface, which can enhance user trust in the system's capabilities.
[0078] The present invention offers multiple advantages over the prior art and the above listed are a few examples to emphasize on some of the advantageous features. The listed advantages are to be read in a non-limiting manner.
REFERENCE NUMERALS
[0079] Environment - 100
[0080] Network-105
[0081] User equipment- 110
[0082] Server - 115
[0083] System -120
[0084] Processor - 205
[0085] Memory - 210
[0086] User interface-215
[0087] Receiving unit – 220
[0088] Training unit– 225
[0089] Applying unit- 230
[0090] Predicting unit– 235
[0091] Rendering unit- 240
[0092] Plotting unit- 245
[0093] Display unit- 250
[0094] Database -255
[0095] Architecture- 300
[0096] Anomaly detection model training unit-305
[0097] Anomaly detection and output generation unit - 310
[0098] Anomaly visualization tool-315
[0099] Workflow manager- 320
,CLAIMS:CLAIMS
We Claim:
1. A method (500) of anomaly identification in a network (105), the method (500) comprising the steps of:
receiving, by one or more processors (205), a first set of data from one or more data sources;
training, by the one or more processors (205), a model utilizing the first set of data to identify one or more anomalies in the first set of received data;
applying, by the one or more processors (205), a second set of data received from the one or more data sources to the trained model to identify the one or more anomalies in the second set of received data;
predicting, by the one or more processors (205), if each datapoint in the second set of data is one of an anomalous datapoint and a normal datapoint based on a comparison of the first set of data and the second set of data;
rendering, by the one or more processors (205), an interactive graphical representation of datapoints of each of the first set of data and the second set of data and a set of values for each of the datapoints of the first and the second set of data on a display unit (250); and
plotting, by the one or more processors (205), the prediction pertaining to the second set of data on the interactive graphical representation to identify the one or more anomalies in the second set of data.
2. The method (500) as claimed in claim 1, wherein the one or more data sources is at least one of a Network Management System (NMS) and a probing unit and wherein the received data is pre-processed and standardized.
3. The method (500) as claimed in claim 1, wherein training the model aids in distinguishing between the anomalous datapoint and the normal datapoint.
4. The method (500) as claimed in claim 1, wherein the second set of data is received in real time from the one or more data sources.
5. The method (500) as claimed in claim 1, wherein the set of values comprises a predicted value, an actual value, a lower threshold, and an upper threshold.
6. The method (500) as claimed in claim 1, wherein each of the data point is identified as the anomalies if the data point exceeds one of the lower and the upper threshold.
7. A system (120) for anomaly identification in a network (105), the system (120) comprises:
a receiving unit (220) configured to receive, a first set of data from one or more data sources;
a training unit (225) configured to train, a model utilizing the first set of data to identify the one or more anomalies in the first set of received data;
an applying unit (230) configured to apply, a second set of data received from the one or more data sources to the trained model to identify the one or more anomalies in the second set of received data;
a predicting unit (235) configured to predict, if each datapoint in the second set of data is one of the anomalous datapoint and a normal datapoint based on a comparison of the first set of data and the second set of data;
a rendering unit (240) configured to render, an interactive graphical representation of datapoints of each of the first set of data and the second set of data and a set of values for each of the datapoints of the first and the second set of data on a display unit (250); and
a plotting unit (245) configured to plot, the prediction pertaining to the second set of data on the interactive graphical representation to identify the one or more anomalies in the second set of data.
8. The system (120) as claimed in claim 7, wherein the one or more data sources is at least one of a Network Management system (NMS) and a probing unit and wherein the received data is pre-processed and standardized.
9. The system (120) as claimed in claim 7, wherein training the model aids in distinguishing between an anomalous datapoint and a normal datapoint.
10. The system (120) as claimed in claim 7, wherein the second set of data is received in real time from the one or more data sources.
11. The system (120) as claimed in claim 7, wherein the set of values comprises a predicted value, an actual value, a lower threshold, and an upper threshold.
12. The system (120) as claimed in claim 7, wherein each of the data point is identified as the anomalies if the data point exceeds one of the lower and the upper threshold.
| # | Name | Date |
|---|---|---|
| 1 | 202321067261-STATEMENT OF UNDERTAKING (FORM 3) [06-10-2023(online)].pdf | 2023-10-06 |
| 2 | 202321067261-PROVISIONAL SPECIFICATION [06-10-2023(online)].pdf | 2023-10-06 |
| 3 | 202321067261-FORM 1 [06-10-2023(online)].pdf | 2023-10-06 |
| 4 | 202321067261-FIGURE OF ABSTRACT [06-10-2023(online)].pdf | 2023-10-06 |
| 5 | 202321067261-DRAWINGS [06-10-2023(online)].pdf | 2023-10-06 |
| 6 | 202321067261-DECLARATION OF INVENTORSHIP (FORM 5) [06-10-2023(online)].pdf | 2023-10-06 |
| 7 | 202321067261-FORM-26 [27-11-2023(online)].pdf | 2023-11-27 |
| 8 | 202321067261-Proof of Right [12-02-2024(online)].pdf | 2024-02-12 |
| 9 | 202321067261-DRAWING [07-10-2024(online)].pdf | 2024-10-07 |
| 10 | 202321067261-COMPLETE SPECIFICATION [07-10-2024(online)].pdf | 2024-10-07 |
| 11 | Abstract.jpg | 2024-12-28 |
| 12 | 202321067261-Power of Attorney [24-01-2025(online)].pdf | 2025-01-24 |
| 13 | 202321067261-Form 1 (Submitted on date of filing) [24-01-2025(online)].pdf | 2025-01-24 |
| 14 | 202321067261-Covering Letter [24-01-2025(online)].pdf | 2025-01-24 |
| 15 | 202321067261-CERTIFIED COPIES TRANSMISSION TO IB [24-01-2025(online)].pdf | 2025-01-24 |
| 16 | 202321067261-FORM 3 [31-01-2025(online)].pdf | 2025-01-31 |