Method And System For Managing Issues In Information Technology (It)

< Back

Method And System For Managing Issues In Information Technology (It) Infrastructure

Abstract: A method for managing issues in an IT infrastructure is provided. Plurality of events are periodically received based on monitoring of one or more entities. Each event comprises event data and an entity ID corresponding to one of the one or more entities. Each of the plurality of events corresponds to an issue in the one of the entities. Vector event data is determined from the event data of each of the events. Each of the events is classified as a noise event or a unique event using a first ML model based on probability distribution of words in the vector event data and result of comparison of the corresponding entity ID. Unique events from the plurality of events are determined. Causal event from the unique events is determined based on correlative analysis of the unique events and historical event data. Recommended solution for the causal event is determined. (To be published with FIG. 1)

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

15 September 2025

Publication Number

40/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

HCL Technologies Limited

806, Siddharth, 96, Nehru Place, New Delhi, 110019, India

Inventors

1. Navin Sabharwal

N-3A, Jangpura Extension, New Delhi, 110014, India

2. Punith Krishnamurthy

No. 21, 5th Cross, Shakthiganapathinagar, 8th Main, Basaveshwaranagar, Bengaluru, Karnataka, 560079, India

Specification

Description:FIELD OF INVENTION
[0001] This disclosure relates generally to the field of the Information Technology (IT) service management and particularly relates to method and system for performing root cause analysis in an IT infrastructure.
BACKGROUND
[0002] Information Technology (IT) infrastructure management faces substantial challenges in maintaining system reliability and performance across complex, interconnected environments. Modern IT systems generate vast volumes of events and alerts from monitoring tools, creating overwhelming amounts of data that must be analyzed to identify genuine issues. The proliferation of false alarms, duplicate events, and noise in monitoring systems makes it increasingly difficult for IT operations teams to distinguish between actual problems requiring immediate attention and benign system notifications.
[0003] Existing IT service management solutions often rely on manual processes for event correlation, root cause analysis, and issue resolution, leading to prolonged system downtime and inefficient resource utilization. Traditional approaches struggle to automatically correlate related events, identify causal relationships between system components, and provide actionable solutions based on historical incident data. These limitations result in delayed problem resolution, increased operational costs, and reduced system availability. Therefore, there is a need to develop automated systems that can intelligently process event data, eliminate noise, perform accurate root cause analysis, and provide timely recommendations for issue resolution in complex IT environments.
SUMMARY
[0004] This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
[0005] In an embodiment, a method for managing issues in an information technology (IT) infrastructure is disclosed. The method may include receiving, by a processor and in a predefined time duration, a plurality of events based on a monitoring of one or more entities of the IT infrastructure. It may be noted that each of the plurality of events comprises event data and an entity ID corresponding to one of the one or more entities. Further, each of the plurality of events corresponds to at least one issue in the one of the one or more entities. The method may further include, determining, by the processor, vector event data from the event data of each of the plurality of events. Further, the method may include classifying, by the processor and by using a first machine learning (ML) model, each of the plurality of events as a noise event or a unique event based on an analysis of at least one of: a probability distribution of a set of words in the vector event data of each of the plurality of events and a result of comparison of the corresponding entity ID of each of the plurality of events. The method may include, determining, by the processor, a set of unique events from the plurality of events based on the classification of each of the plurality of events. The method may further include, determining, by the processor and by using a second ML model, a causal event from the set of unique events based on a correlative analysis of the set of unique events and historical event data. The method may further include, determining, by the processor and by using the second ML model, a recommended solution for the causal event based on a result of the correlative analysis and a similarity analysis of the causal event and a set of historical solutions for the historical event data.
[0006] In an embodiment, a system for managing issues in an information technology (IT) infrastructure. The system may include a processor, and a memory communicatively coupled to the processor. In an embodiment, the memory may store processor-executable instructions, which when executed by the processor, cause the processor to receive in a predefined time duration, a plurality of events based on a monitoring of one or more entities of the IT infrastructure. In an embodiment, each of the plurality of events comprises event data and an entity ID corresponding to one of the one or more entities. In an embodiment, each of the plurality of events corresponds to at least one issue in the one of the one or more entities. The processor may further determine vector event data from the event data of each of the plurality of events. The processor may classify, by using a first machine learning (ML) model, each of the plurality of events as a noise event or a unique event based on an analysis of at least one of: a probability distribution of a set of words in the vector event data of each of the plurality of events and a result of comparison of the corresponding entity ID of each of the plurality of events. The processor may determine a set of unique events from the plurality of events based on the classification of each of the plurality of events. The processor may further determine, by using a second ML model, a causal event from the set of unique events based on a correlative analysis of the set of unique events and historical event data. Further, the processor may determine, by using the second ML model, a recommended solution for the causal event based on a result of the correlative analysis and a similarity analysis of the causal event and a set of historical solutions for the historical event data.
[0007] Various objects, features and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which numerals represent like components.
BRIEF DESCRIPTION OF FIGURES
[0008] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain principles.
[0009] FIG. 1 illustrates a block diagram of a system for managing issues in an information technology (IT) infrastructure, in accordance with an embodiment of the present disclosure.
[0010] FIG. 2 illustrates a functional block diagram of the software architecture implemented within the memory, in accordance with an embodiment of the present disclosure.
[0011] FIG. 3 illustrates a flowchart of a method for managing issues in an information technology (IT) infrastructure, in accordance with an embodiment of the present disclosure.
[0012] FIG. 4 illustrates a detailed flowchart of method of classification of the one or more events of step 306 of FIG. 3, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
[0013] Exemplary embodiments are described with reference to the accompanying drawings. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following claims. Additional illustrative embodiments are listed.
[0014] Further, the phrases “in some embodiments”, “in accordance with some embodiments”, “in the embodiments shown”, “in other embodiments”, and the like, mean a particular feature, structure, or characteristic following the phrase is included in at least one embodiment of the present disclosure and may be included in more than one embodiment. In addition, such phrases do not necessarily refer to the same embodiments or different embodiments. It is intended that the following detailed description be considered exemplary only, with the true scope being indicated by the following claims.
[0015] Referring to FIG. 1, a block diagram of a system 100 for managing issues in an information technology (IT) infrastructure is illustrated, in accordance with an embodiment of the present disclosure. In an embodiment, the IT infrastructure may be implemented across various industries including information technology, healthcare, retail, finance, manufacturing, and telecommunications. In an embodiment, the IT infrastructure may include one or more entities, including but not limited to servers, databases, network devices, cloud computing resources, storage systems, and services. The system 100 may include an issue management device 102, an external device 112, a data server 114, monitoring systems 116 communicably coupled to each other through a wired or wireless communication network 110. Further, the issue management device 102 may include a processor 104, a memory 106, and an input output device 108.
[0016] In an embodiment, examples of processor(s) 104 may include, but are not limited to, microcontrollers, microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), system-on-chip (SoC) components, or any other suitable programmable logic devices. Examples of processor(s) 104 may include but are not limited to, an Intel® Itanium® or Itanium 2 processor(s), or AMD® Opteron® or Athlon MP® processor(s), Motorola® lines of processors, Nvidia®, FortiSOC™ system on a chip processors or other future processors.
[0017] In an embodiment, the memory 106 may store instructions that, when executed by the processor 104, and cause the processor 104 to generate medical report using large language models, as will be discussed in greater detail herein below. In an embodiment, the memory 106 may be a non-volatile memory or a volatile memory. Examples of non-volatile memory may include but are not limited to, a flash memory, a Read Only Memory (ROM), a Programmable ROM (PROM), Erasable PROM (EPROM), and Electrically EPROM (EEPROM) memory. Further, examples of volatile memory may include but are not limited to, Dynamic Random Access Memory (DRAM), and Static Random-Access memory (SRAM).
[0018] In an embodiment, the I/O device 108 may comprise of variety of interface(s), for example, interfaces for data input and output devices, and the like. The I/O device 108 may facilitate inputting of instructions by a user communicating with the computing device 102. In an embodiment, the I/O device 108 may be wirelessly connected to the issue management device 102 through wireless network interfaces such as Bluetooth®, infrared, or any other wireless radio communication known in the art. In an embodiment, the I/O device 108 may be connected to a communication pathway for one or more components of the computing device 102 to facilitate the transmission of inputted instructions and output results of data generated by various components such as, but not limited to, processor(s) 104 and memory 106. In an embodiment, the input output device 108 may facilitate communication between the issue management device 102 and external systems for receiving event data and rendering solutions via a graphic user interface (GUI).
[0019] In an embodiment, the data server 114 may be enabled in a remote cloud server or a co-located server and may include a database (not shown) to store a plurality of events, entity ID, semantic information, temporal information, weighted temporal relationship, vector representation, topology graph, and any other data necessary for the system 100 to manage issues in the IT infrastructure. The data server 114 may maintain historical event data, including sets of historical root cause events and corresponding historical solutions, which the processor 104 may access during similarity analysis and correlative analysis operations. In an embodiment, the data server 114 may store data input by the external device 112 or output generated by the issue management device 102. In an embodiment, the issue management device 102 may be communicatively coupled with the data server 114 through the communication network 110.
[0020] In an embodiment, the communication network 110 may be a wired or a wireless network or a combination thereof. The communication network 110 can be implemented as one of the different types of networks, such as but not limited to, ethernet IP network, intranet, local area network (LAN), wide area network (WAN), or a Metropolitan Area Network (MAN). Various devices in the system 100 may be configured to connect to the communication network 110, in accordance with various wired and wireless communication protocols. Examples of such wired and wireless communication protocols may include, but are not limited to, a Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), Zig Bee, EDGE, IEEE 802.11, light fidelity (Li-Fi), 802.16, IEEE 802.11s, IEEE 802.11g, multi-hop communication, wireless access point (AP), device to device communication, cellular communication protocols, and Bluetooth (BT) communication protocols. Further the communication network 110 can include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[0021] In an embodiment, the issue management device 102 may receive a plurality of inputs from the plurality of external device 112a-112n through the communication network 110. In an embodiment, the issue management device 102 and the plurality of external device 112 may be a computing system, including but not limited to, a laptop computer, a desktop computer, a notebook, a workstation, a server, a portable computer, a handheld or a mobile device. In an embodiment, the issue management device 102 may be, but not limited to, in-built into the external device 112 or may be a standalone computing device. The external devices 112a-112n may represent various IT infrastructure entities such as servers, virtual machines, network devices, storage systems, and application components that generate events during normal operations or when issues occur.
[0022] In an embodiment, the monitoring systems 116 may include various hardware and software components configured to continuously track, analyse, and report events within the IT infrastructure. The monitoring systems 116 may include, but are not limited to, network monitoring tools, application performance monitoring (APM) systems, security information and event management (SIEM) solutions, infrastructure monitoring tools, and log management systems. These systems may be deployed across different layers of the IT infrastructure, including servers, databases, cloud resources, network devices, and application services. The monitoring systems 116 may collect a plurality of events and an associated event data corresponding to the one or more entities of the IT infrastructure.
[0023] The processor 104 may be configured to periodically receive a plurality of events based on monitoring of one or more entities of an information technology (IT) infrastructure. It may be noted that each of the plurality of events comprises event data and an entity ID corresponding to one of the one or more entities. In some embodiments, the event data may include descriptive information about system conditions such as "CPU utilization exceeded 90% threshold on server-01", "Database connection timeout occurred on MySQL instance", or "Network latency spike detected between data centers". The event data may also comprise temporal information indicating when the event occurred, severity levels ranging from informational to critical, and contextual details about the affected system components. In some cases, the event data may contain structured log entries with error codes, performance metrics, and diagnostic messages that provide technical details about the underlying issue affecting the IT infrastructure entity.
[0024] In some embodiments, the entity ID may comprise predefined identifiers for that correspond to each of the one or more entities of the IT infrastructure, such as "SRV-001" for a database server, "NET-SWITCH-05" for a network switch, "VM-WEB-APP-12" for a virtual machine hosting a web application, or "STORAGE-CLUSTER-03" for a storage cluster. In an embodiment, entity ID may also include hierarchical identifiers that reflect the organizational structure of the IT environment.
[0025] Further, each of the plurality of events may correspond to at least one issue in the one of the one or more entities. Further, the processor 104 may determine vector event data from the event data of each of the plurality of events. The processor 104 may then classify each of the plurality of events as a noise event or a unique event using a first ML model based on an analysis of at least one of: a probability distribution of a set of words in the vector event data of each of the plurality of events; and a result of comparison of the corresponding entity ID of each of the plurality of events.
[0026] The associated event data may include entity semantic information and temporal information. The semantic information may include, but are not limited to, event parameters (i.e., event source, event severity, event type, entity identifier, user information, process or transaction ID, and configuration data), event title, and event description. The temporal information may include, but is not limited to, time of occurrence of a corresponding event (i.e., event timestamp).
[0027] Further, the processor 104 may determine vector event data of each of the plurality of events. In an embodiment, the vector event data generates vector representations of event descriptions using a Large Language Model (LLM). In an embodiment, the LLM may be trained on contextual information associated with one or more domains to determine the vector representation of events descriptions. The computing device 102 may further store the vector representations and the corresponding event description in the database within the data server 114.
[0028] Thus, the processor 104 may determine a set of unique events from the plurality of events based on the classification of each of the plurality of events. Further, the processor 104 may determine a causal event from the set of unique events using a second machine learning model based on a correlative analysis of the set of unique events and historical event data. The processor 104 may then determine a recommended solution for the causal event based on a result of the correlative analysis and a similarity analysis of the causal event and a set of historical solutions for the historical event data.
[0029] In some cases, the issue management device 102 may be integrated with the monitoring system 116 as a plugin, providing an aggregated view to users directly at the monitoring system level without requiring separate interfaces or data transfer mechanisms. This integration may enable seamless data flow from event detection to solution recommendation within a unified operational environment. When event rates are very high and high throughput processing becomes necessary, the system may utilize NoSQL databases instead of traditional SQL databases to handle the increased data volume and concurrent processing demands.
[0030] Referring to FIG. 2, a functional block diagram of the software architecture implemented within the memory 106 is illustrated, in accordance with an embodiment of the present disclosure. The memory 106 may store processor-executable instructions that, when executed by the processor 104, cause the processor 104 to perform various operations through specialized modules such as, but not limited to, an events receiving module 202, a preprocessing module 204, a vector representation conversion module 206, a classification module 208, a causal event determination module 212, a recommended solution determination module 214, a rendering module 216, a validation module 218 and a training module 220.
[0031] The events receiving module 202 may be configured to periodically receive a plurality of events based on monitoring of one or more entities 112a-112n by the monitoring systems 116 through the communication network 110. The events receiving module 202 may handle incoming event streams in real-time, managing high-volume data ingestion from multiple monitoring sources simultaneously. In some cases, the events receiving module 202 may implement queuing mechanisms to buffer incoming events during peak load periods, ensuring no event data is lost during processing.
[0032] The preprocessing module 204 may be communicatively coupled to the events receiving module 202 and may be configured to process the received event data before further analysis. The preprocessing module 204 may implement masking techniques for security purposes when the system consumes external large language models, protecting sensitive information such as IP addresses, hostnames, and other confidential data elements. In some cases, the preprocessing module 204 may perform natural language processing operations including tokenization, lemmatization, stemming, and removal of irrelevant words from event descriptions. The tokenization techniques involve splitting words from the event description of the plurality of events. For example, the event description “CPU utilization is high on serverA which is more than threshold value” may be split into a plurality of words such as “CPU”, “utilization”, “is”, “high”, “on”, “serverA”, “which”, “is”, “more”, “than”, “threshold”, “value” to ensure that each of the plurality of words are assigned tokens. The stemming techniques may involve reducing words in the event description of the plurality of events to their base or root form by removing suffixes and prefixes. For example, the words “monitoring”, “monitored”, and “monitor” may be reduced to their common root form, “monitor” to standardize the event description. The lemmatization techniques may further analyse the context of the words of the event description and convert them into their canonical form (lemma). For example, the words “better” and “good” may be converted to a common lemma, “good” to ensure that semantically related words are treated as equivalent. The filtering techniques may be employed to remove irrelevant or redundant information from the event description, such as common stop words (e.g., “and”, “the”, “is”), special characters, or noise that does not contribute to the events meaning. The masking techniques are applied to replace sensitive or variable information, such as IP addresses, hostnames, or user-specific details, with generalized placeholders. The preprocessing module 204 may also replace specific device identifiers with corresponding placeholders, such as converting "10.1.1.1" to "", enabling the system to identify similar issues across different devices without being influenced by device-specific information. The preprocessing module 204 may filter out watermark text and other non-relevant content that monitoring systems 116 may have appended to event descriptions.
[0033] The vector representation conversion module 206 may be operatively connected to the preprocessing module 204 and may be configured to convert the processed event data into vector representations using large language models. In an embodiment, the vector representations may be stored in a vector database provided in the data server 114. The vector representation conversion module 206 may generate high-dimensional vectors, typically ranging from 768 to 1536 dimensions, that may capture semantic meaning of event descriptions in a mathematical format suitable for similarity analysis. In some cases, the vector representation conversion module 206 may utilize either sentence embedding techniques or large language model-based sentence embedding depending on whether the deployment is on-premise or in a public cloud environment. The vector representation conversion module 206 may store the generated vector representations in a vector database, creating a searchable repository of event embeddings that enables rapid similarity comparisons. The vector representation conversion module 206 may implement Principal Component Analysis (PCA) to reduce vector dimensions for computational efficiency while maintaining approximately 95% of the variance across the corpus, thereby optimizing processing speed without sacrificing analytical accuracy. Examples of the LLM may be, but may not be limited to, open Artificial Intelligence (openAI), Generative Pre-trained Transformer (GPT)-3, GPT-3.5, GPT-4, Language Model for Dialogue Applications (LaMDA), Pathways Language Model (PaLM), Gemini, Claude, BigScience Large Open-science Open-access Multilingual Language Model (BLOOM), Large Language Model Meta AI (Llama), Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, or the like.
[0034] The classification module 208 may be coupled to the vector representation conversion module 206 and may be configured to classify each of the plurality of events as a noise event or a unique event using the first ML model. The first ML model may be trained based on domain knowledge related to the IT infrastructure and historical event data. The classification module 208 may analyze the probability distribution of a set of words in the vector event data and perform entity ID comparisons to determine event classification. In some cases, the classification module 208 may calculate probability scores for events having identical entity IDs and compare these scores against configured threshold values to distinguish between noise events and unique events. The classification module 208 may determine a first group of events from the plurality of events having identical entity IDs based on the entity ID comparisons. Further, the classification module 208 may calculate a probability score for each of the first group of events based on the probability distribution of the set of words. The classification module 208 may compare the probability score for each of the first group of events with a configured threshold value. It may be noted that each of the first group of events that may have the probability score above the configured threshold value may be classified as the noise event. Further, each of the first group of events that may have the probability score below or equal to the configured threshold value may be classified as the unique event.
[0035] The classification module 208 may implement automatic algorithm selection capabilities, evaluating multiple classification algorithms and selecting the optimal algorithm for the training data if evaluation score requirements are satisfied. The classification module 208 may handle imbalanced training data by implementing various sampling methods including under-sampling, over-sampling, and Synthetic Minority Oversampling Technique (SMOTE) to ensure balanced model training. The classification module 208 may utilize both cosine similarity and L2 distance calculations for similarity scoring, where lower L2 distance values may indicate higher similarity between events.
[0036] In an exemplary scenario where multiple events related to an issue "high CPU utilization on a server" are received as:
“Event 1: CPU utilization high on server serverA which is more than threshold value 70% at time t
Event 2: System load high on server A at time t+2
Event 3: CPU utilization high on server serverA which is more than threshold value 70% at t+3 ...
Event n: System load high on server serverA at time t+n”
[0037] Accordingly, since all events originate from the same entity “serverA” and explain same issue “high CPU usage”, their vector representation may yield a high similarity score, exceeding the configured threshold value. In this example, multiple events are generated over time, all referring to the same entity “serverA” and describing the same issues “high CPU utilization”. As these events are reported at different times (e.g. t, t+2, t+3... t+n), they may initially appear as independent events within the IT infrastructure. However, upon further analysis, the system identifies that they are repeated notifications of the same problem. Thus, in order to waste efforts in solving each independent events, the classification module 208 may determine a first group of events from the plurality of events having identical entity IDs. Further, the classification module 208 may calculate a probability score for each of the first group of events based on the probability distribution of the set of words.
[0038] Thus, in order to classify an event as noise event based on the probability distribution of words in the events title or description may be calculated. The proposed system may generate the probability distribution of each of words extracted from the previous step. For example, P(w|{w1, w2, .. wn}) represents probability distribution of word “w” provided set of words occurred earlier {w1,w2, .. wn}. It calculates probability of occurring of word “w” if words in the set {w1,w2, ..wn} occurred earlier.
e.g. “Event 1: CPU utilization is high on server serverA which is more than threshold value 70%”.
In this case, there is very high probability that word “CPU” and “utilization” will come together or in same context. In this example, the classification module 208 may calculate the same for every word present in the corpus with respect to events and store this distribution as trained model as given by equation (1) below. Further, the new event comes up, it may go through same NLP pipeline where tokenization, lemmatization, stemming and similar word identification may happen. However, upon further analysis each of the plurality of extracted or converted word or token, system identifies probability distribution value and calculates some of these probabilities. In case the probability score exceeds the configured threshold value for an event, it can be considered as noise, or if the probability score is lower or equal to the configured threshold value of probability score, the event may be classified as the unique event.
P(event as noise) = Sum(P(w1|{w2,….wn}), P(w2|{w1,….wn})…., P(wn|{w1,….wn-1})) ..... (1)
[0039] It may be noted that P(event as noise) represents probability of an event being classified as noise. Thus, an event having the set of words for which the probability score is less than the configured threshold value may be classified as unique event and in case the probability score is greater than the configured threshold value, the event may be classified as noise event.
[0040] The mathematical approaches implemented during the probability analysis process may involve complex statistical calculations that may evaluate word frequency distributions, contextual relationships, and temporal patterns within the event data. The classification module 208 may utilize probability distribution functions to model the likelihood of specific word sequences appearing in different types of events, creating mathematical representations that may enable accurate distinction between noise events and unique events. In some cases, the probability calculations may incorporate Bayesian inference techniques that may update probability estimates based on new event data and historical patterns observed in the IT infrastructure. The mathematical models may also consider entity-specific factors, analyzing how probability distributions may vary across different types of IT infrastructure entities such as servers, network devices, storage systems, and application components.
[0041] With continued reference to FIG. 2, a causal event determination module 212 may be operatively connected to the classification module 208 and may be configured to determine causal events from the set of unique events using the second machine learning model. The causal event determination module 212 may perform correlative analysis of the set of unique events and historical event data, examining temporal sequences and relationships between events to identify root causes or issues in any of the entities. As discussed earlier, the event data may include semantic information and temporal information. Further, the historical event data may include a set of historical root cause events of a set of historical events and the set of historical solutions for the set of historical root cause events. Further, the correlative analysis may be performed by determining a similarity score between each of the set of unique events and the set of historical root cause events based on the associated semantic information and the temporal information by using a second ML model. Thus, the causal event determination module 212 may determine a pair of unique event and historical solution from the set of unique events and the set of historical solutions based on determination of the similarity score above a predefined threshold value using the second ML model. The second ML model may be trained based on trained based on domain knowledge related to the IT infrastructure and the historical event data to perform correlative analysis.
[0042] In some cases, the causal event determination module 212 may create topology graphs that may map relationships between different IT infrastructure entities, analyzing how issues in one component may cascade to affect other components. The causal event determination module 212 may implement transfer learning techniques to leverage existing knowledge from pre-trained models when adapting to new IT environments, enabling faster deployment and improved accuracy in unfamiliar infrastructure configurations. The causal event determination module 212 may store feedback data in vector database collections as an alternative to training multi-class classification models, providing flexibility in recommendation approaches.
[0043] The recommended solution determination module 214 may be communicatively coupled to the causal event determination module 212 and may be configured to determine recommended solutions for identified causal events based on a result of the correlative analysis and a similarity analysis of the causal event and the set of historical solutions for the historical event data. It may be noted that the recommended solutions may include a set of actionable steps for addressing the at least one issue. It may be noted that the set of actionable steps may correspond to, but not limited to, configuration changes, resource allocations, and/or component replacements. Accordingly, the recommended solution determination module 214 may recommend a maintenance time window for executing the recommended solution in the IT infrastructure based a predefined maintenance schedule using the second ML model.
[0044] The recommended solution determination module 214 may perform similarity analysis between current causal event and historical solutions stored in the data server 114, identifying previously successful resolution approaches for similar issues. In some cases, the recommended solution determination module 214 may search both script collections and ticket collections simultaneously, combining results from multiple sources to provide comprehensive solution recommendations. The recommended solution determination module 214 may generate actionable steps corresponding to configuration changes, resource allocations, and component replacements based on the correlative analysis results. The recommended solution determination module 214 may recommend maintenance time windows for executing solutions based on predefined maintenance schedules and historical execution patterns.
[0045] The rendering module 216 may render the causal event and the recommended solution via a graphic user interface through the input output device 108. The rendering module 216 may present event information, correlation analysis results, and solution recommendations in an intuitive visual format that enables operators to quickly understand issue relationships and recommended actions. In some cases, the rendering module 216 may implement caching mechanisms such as, but not limited to, Redis for faster retrieval of latest alerts data, improving response times when displaying real-time event information to users. The rendering module 216 may enable collaboration features between users of the same organization, allowing team members to communicate regarding particular issues or actionables through integrated messaging and annotation capabilities within the graphic user interface.
[0046] The validation module 218 may allow users to validate the recommended solution for the causal event based on user feedback received via the graphic user interface. The validation module 218 may capture user inputs regarding the accuracy and effectiveness of recommended solutions, storing this feedback for continuous system improvement. In some cases, the validation module 218 may process user corrections and alternative solution suggestions, incorporating this information into the training data for model refinement and training of the first ML model and the second ML model. The validation module 218 may track solution success rates and user satisfaction metrics to evaluate system performance over time. The training module 220 may be operatively connected to the validation module 218 and may be configured to continuously train the first ML model and the second ML model based on the user feedback and domain knowledge related to the IT infrastructure and historical event data. The training module 220 may implement adaptive learning algorithms that incorporate new feedback data while maintaining previously learned patterns, ensuring the system evolves with changing IT infrastructure requirements and user preferences.
[0047] Referring to FIG. 3, a flowchart 300 of a method for managing issues in an information technology (IT) infrastructure is illustrated, in accordance with an embodiment of the present disclosure. FIG. 3 is explained in conjunction with FIGs. 1 and 2. In an embodiment, the flowchart 300 may include a plurality of steps that may be executed by various modules saved in memory 106. The method may begin at a step 302, where the processor 104 may periodically receive a plurality of events based on monitoring of one or more entities of the IT infrastructure. Each of the plurality of events may comprise event data and an entity ID corresponding to one of the one or more entities, and each of the plurality of events may correspond to at least one issue in the one of the one or more entities. In some cases, the step 302 may involve implementing buffering mechanisms to handle high-volume event streams during peak operational periods, ensuring that no event data may be lost during the ingestion process.
[0048] At step 304, the processor 104 may determine vector event data from the event data of each of the plurality of events. The vector representation conversion module 206 may generate vector representations of event descriptions using a language model and may store the vector representations in a vector database within the data server 114. In some cases, the step 304 may involve utilizing either sentence embedding techniques or large language model-based sentence embedding depending on whether the deployment may be on-premise or in a public cloud environment. The vector representation conversion module 206 may generate high-dimensional vectors that may capture semantic meaning of the event descriptions in mathematical formats suitable for subsequent similarity analysis operations.
[0049] With continued reference to FIG. 3, the method may advance to a step 306, where the processor 104 may classify each of the plurality of events as a noise event or a unique event using a first ML model based on an analysis of at least one of: a probability distribution of a set of words in the vector event data of each of the plurality of events and a result of comparison of the corresponding entity ID of each of the plurality of events. The classification module 208 may perform this classification by analyzing patterns in the vector event data and comparing entity identifiers to identify duplicate or related events. In some cases, the step 306 may involve the classification module 208 implementing automatic algorithm selection capabilities, where multiple classification algorithms may be evaluated and the optimal algorithm may be selected for the training data if evaluation score requirements may be satisfied. The first ML model may utilize existing data from previous engagements or environments to generate hyperparameters for the recommendation system using Inverse Reinforcement Learning, enabling the system to leverage historical knowledge patterns for improved classification accuracy.
[0050] Further, at step 308, the processor 104 may determine a set of unique events from the plurality of events based on the classification of each of the plurality of events performed in the step 306. The classification module 208 may filter out events that may have been classified as noise events, retaining only those events that may have been identified as unique events for further processing. In some cases, the step 308 may involve the processor 104 grouping events that may share similar characteristics while maintaining their individual identities within the set of unique events. The step 308 may also involve the classification module 208 storing the classification results and event mappings in database structures that may facilitate rapid retrieval during subsequent analysis operations.
[0051] Further, at step 310, the processor 104 may determine a causal event from the set of unique events using a second ML model based on a correlative analysis of the set of unique events and historical event data. The causal event determination module 212 may perform this correlative analysis by examining temporal sequences and relationships between events to identify root causes of issues affecting the IT infrastructure entities. In some cases, the step 310 may involve the causal event determination module 212 by creating topology graphs that may map relationships between different IT infrastructure entities, analyzing how issues in one component may cascade to affect other components. The second ML model may implement transfer learning techniques to leverage existing knowledge from pre-trained models when adapting to new IT environments, enabling faster deployment and improved accuracy in unfamiliar infrastructure configurations.
[0052] Further, at step 312, the processor 104 may determine a recommended solution for the causal event using the second ML model based on a result of the correlative analysis and a similarity analysis of the causal event and a set of historical solutions for the historical event data. The recommended solution determination module 214 may perform similarity analysis between the current causal event and historical solutions stored in the data server 114, identifying previously successful resolution approaches for similar issues. In some cases, the step 312 may involve the recommended solution determination module 214 integrating with automation systems such as, but not limited to, Ansible and so on, for script execution and ticket updates, enabling automated implementation of recommended solutions when appropriate approval workflows may be completed. The step 312 may also involve the recommended solution determination module 214 searching both script collections and ticket collections simultaneously, combining results from multiple sources to provide comprehensive solution recommendations that may address the identified causal event effectively.
[0053] At step 314, the processor 104 may render a set of actionable steps for addressing the at least one issue through the rendering module 216. The set of actionable steps may correspond to configuration changes, resource allocations, and component replacements that may be derived from the correlative analysis results and historical solution patterns. In some cases, the step 314 may involve the recommended solution determination module 214 generating specific technical instructions for system administrators, including command sequences for configuration modifications, resource allocation parameters for scaling operations, and component replacement procedures for hardware or software updates.
[0054] The rendering module 216 may present the set of actionable steps through the graphic user interface via the input output device 108, enabling operators to review and execute the recommended solutions in a structured manner. The actionable steps may include detailed configuration changes such as adjusting CPU threshold values, modifying memory allocation parameters, updating network routing tables, or reconfiguring database connection pools based on the identified causal event characteristics. In some cases, the resource allocation recommendations may specify scaling requirements for virtual machines, storage capacity adjustments, bandwidth modifications, or load balancer configurations that may address performance-related issues. The component replacement suggestions may identify specific hardware components, software modules, or system dependencies that may require updates or replacements to resolve the underlying causal event effectively.
[0055] With continued reference to FIG. 3, at step 316, the processor 104 may recommend a maintenance time window for executing the recommended solution in the IT infrastructure based on a predefined maintenance schedule using the second ML model. The step 316 may involve the recommended solution determination module 214 analyzing predefined maintenance schedule, historical maintenance patterns, system usage statistics, and operational requirements to identify optimal time periods for implementing the recommended solutions. The second ML model may evaluate factors such as system load patterns, user activity levels, business operation schedules, and previous maintenance execution times to determine maintenance windows that may minimize impact on IT infrastructure availability and performance. In some cases, the step 316 may involve the processor 104 coordinating with existing maintenance scheduling systems to ensure that recommended maintenance windows may align with organizational policies and operational constraints.
[0056] The recommended solution determination module 214 may implement sophisticated scheduling mechanisms that may consider multiple variables when determining maintenance time windows, including system dependencies, resource availability, personnel schedules, and business continuity requirements. The scheduling mechanisms may analyze temporal patterns in the historical event data to identify recurring maintenance windows that may have been successful for similar causal events in previous implementations. In some cases, the processor 104 may generate Standard Operating Procedures (SOP) or solution documents from work notes using large language model capabilities after administrative approval, creating comprehensive documentation that may guide future maintenance activities and solution implementations. The generated SOP documents may include step-by-step procedures, rollback instructions, verification checkpoints, and post-implementation validation criteria that may ensure consistent and reliable solution execution across different maintenance scenarios.
[0057] The system may implement automatic resolution synchronization features that may coordinate solution implementation across multiple IT infrastructure components and external systems. When Information Technology Service Management (ITSM) tickets may be resolved in external ticketing systems, the processor 104 may automatically resolve corresponding actionables and related alerts to avoid confusion and maintain data consistency across integrated platforms. The automatic resolution synchronization may involve the recommended solution determination module 214 monitoring ticket status changes in connected ITSM systems and propagating resolution updates to internal event tracking databases and alert management systems. In some cases, the synchronization process may include updating maintenance schedules, closing related work orders, and notifying stakeholders about completed solution implementations through automated communication mechanisms.
[0058] The rendering module 216 may populate event, alert, and actionable details over time scale in chronological or anti-chronological order for operator analysis, supporting solution implementation and maintenance scheduling processes through comprehensive historical data visualization. The chronological data display features may enable operators to analyze event progression patterns, solution implementation timelines, and maintenance window effectiveness across different time periods and system configurations. In some cases, the processor 104 may generate temporal visualizations that may show the relationship between causal events, implemented solutions, and subsequent system performance metrics, enabling operators to evaluate the effectiveness of different solution approaches and maintenance scheduling strategies. The anti-chronological ordering may present the most recent events and solutions first, allowing operators to quickly identify current system status and recent solution implementation results while maintaining access to historical context for comprehensive analysis and decision-making processes.
[0059] Referring to FIG. 4, a detailed flowchart 400 of method of classification of the one or more events of step 306 of FIG. 3 is illustrated, in accordance with an embodiment of the present disclosure. FIG. 4 is explained in conjunction with the FIG. 3. FIG. 4 demonstrates the specific methodology by which the classification module 208 processes events to distinguish between noise events and unique events through mathematical probability analysis. The flowchart 400 begins with the step 402, where the processor 104 may determine a first group of events from the plurality of events having identical entity IDs, enabling the system to analyze events that may originate from the same IT infrastructure entity for potential correlation or duplication patterns. At step 404, the processor 104 may calculate a probability score for each of the first group of events based on the probability distribution of the set of words extracted from the vector event data. The probability score calculation may involve analysis of word occurrence patterns within event descriptions, examining how frequently specific terms and phrases may appear in similar contexts across the historical event data stored in the data server 114. The classification module 208 may implement natural language processing techniques to generate probability distributions that may represent the likelihood of specific word combinations occurring in noise events versus unique events. In some cases, the step 404 may involve the classification module 208 analyzing semantic information and temporal information contained within the event data to enhance the accuracy of probability score calculations. The semantic information may include contextual relationships between words in event descriptions, while the temporal information may provide timing patterns that may influence the probability analysis results.
[0060] With continued reference to FIG. 4, at step 406, the processor 104 may compare the probability score for each of the first group of events with a configured threshold value to determine final event classification. The configured threshold value may be predefined or determined based on historical analysis of event patterns and may be adjustable by system administrators to accommodate different IT infrastructure environments and operational requirements. It may be noted that events having probability scores above the configured threshold value may be classified as noise events, indicating that these events may represent recurring patterns or false alarms that may not require immediate operational attention. Conversely, events having probability scores below or equal to the configured threshold value may be classified as unique events, signifying that these events may represent genuine issues or anomalies that may warrant further investigation and potential resolution actions. The step 406 may enable the classification module 208 to maintain consistent classification criteria while allowing for customization based on organizational preferences and infrastructure characteristics.
[0061]
[0062] Machine readable storage including machine-readable instructions, when executed, to implement a method or realize an apparatus in any of the examples of the present application.
[0063] Various techniques, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, a non-transitory computer readable storage medium, or any other machine-readable storage medium wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the various techniques. In the case of program code execution on programmable computers, the computing device may include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. The volatile and non-volatile memory and/or storage elements may be a RAM, an EPROM, a flash drive, an optical drive, a magnetic hard drive, or another medium for storing electronic data. The eNB (or other base station) and UE (or other mobile station) may also include a transceiver component, a counter component, a processing component, and/or a clock component or timer component. One or more programs that may implement or utilize the various techniques described herein may use an application programming interface (API), reusable controls, and the like. Such programs may be implemented in a high-level procedural or an object-oriented programming language to communicate with a computer system. However, the program(s) may be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or an interpreted language, and combined with hardware implementations.
[0064] It should be understood that many of the functional units described in this specification may be implemented as one or more components, which is a term used to more particularly emphasize their implementation independence. For example, a component may be implemented as a hardware circuit comprising custom very large scale integration (VLSI) circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A component may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices, or the like.
[0065] Components may also be implemented in software for execution by various types of processors. An identified component of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions, which may, for instance, be organized as an object, a procedure, or a function. Nevertheless, the executables of an identified component need not be physically located together, but may comprise disparate instructions stored in different locations that, when joined logically together, comprise the component and achieve the stated purpose for the component.
[0066] Indeed, a component of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Similarly, operational data may be identified and illustrated herein within components, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, merely as electronic signals on a system or network. The components may be passive or active, including agents operable to perform desired functions.
[0067] Reference throughout this specification to “an example” means that a particular feature, structure, or characteristic described in connection with the example is included in at least one embodiment of the present invention. Thus, appearances of the phrase “in an example” in various places throughout this specification are not necessarily all referring to the same embodiment.
[0068] As used herein, a plurality of items, structural elements, compositional elements, and/or materials may be presented in a common list for convenience. However, these lists should be construed as though each member of the list is individually identified as a separate and unique member. Thus, no individual member of such list should be construed as a de facto equivalent of any other member of the same list solely based on its presentation in a common group without indications to the contrary. In addition, various embodiments and examples of the present invention may be referred to herein along with alternatives for the various components thereof. It is understood that such embodiments, examples, and alternatives are not to be construed as de facto equivalents of one another, but are to be considered as separate and autonomous representations of the present invention.
[0069] Although the foregoing has been described in some detail for purposes of clarity, it will be apparent that certain changes and modifications may be made without departing from the principles thereof. It should be noted that there are many alternative ways of implementing both the processes and apparatuses described herein. Accordingly, the present embodiments are to be considered illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
[0070] Those having skill in the art will appreciate that many changes may be made to the details of the above-described embodiments without departing from the underlying principles of the invention. The scope of the present invention should, therefore, be determined only by the following claims. , Claims:CLAIMS
I/We Claim:
1. A method for managing issues in an information technology (IT) infrastructure, the method comprising:
periodically receiving, by a processor, a plurality of events based on a monitoring of one or more entities of the IT infrastructure,
wherein each of the plurality of events comprises event data and an entity ID corresponding to one of the one or more entities, and
wherein each of the plurality of events corresponds to at least one issue in the one of the one or more entities;
determining, by the processor, vector event data from the event data of each of the plurality of events;
classifying, by the processor and by using a first machine learning (ML) model, each of the plurality of events as a noise event or a unique event based on an analysis of at least one of:
a probability distribution of a set of words in the vector event data of each of the plurality of events; and
a result of comparison of the corresponding entity ID of each of the plurality of events;
determining, by the processor, a set of unique events from the plurality of events based on the classification of each of the plurality of events;
determining, by the processor and by using a second ML model, a causal event from the set of unique events based on a correlative analysis of the set of unique events and historical event data; and
determining, by the processor and by using the second ML model, a recommended solution for the causal event based on a result of the correlative analysis and a similarity analysis of the causal event and a set of historical solutions for the historical event data.

2. The method as claimed in claim 1, wherein determining the vector event data comprises:
generating, by the processor, vector representations of event descriptions using a language model; and
storing, by the processor, the vector representations in a vector database.

3. The method as claimed in claim 1, wherein classifying each of the plurality of events as the noise event or the unique event comprises:
determining, by the processor, a first group of events from the plurality of events having identical entity IDs;
calculating, by the processor, a probability score for each of the first group of events based on the probability distribution of the set of words; and
comparing, by the processor, the probability score for each of the first group of events with a configured threshold value,
wherein each of the first group of events having the probability score above the configured threshold value are classified as the noise event, and
wherein each of the first group of events having the probability score below or equal to the configured threshold value are classified as the unique event.

4. The method as claimed in claim 3, wherein the event data comprises semantic information and temporal information,
wherein the historical event data comprises a set of historical root cause events of a set of historical events and the set of historical solutions for the set of historical root cause events,
wherein the correlative analysis comprises:
determining, by the processor, a similarity score between each of the set of unique events and the set of historical root cause events based on the associated semantic information and the temporal information; and
determining, by the processor, a pair of unique event and historical solution from the set of unique events and the set of historical solutions based on determination of the similarity score above a predefined threshold value,
wherein the recommended solution for the causal event is determined based on the pair of unique event and historical solution.

5. The method as claimed in claim 1, wherein the recommended solution for the causal event comprises:
rendering, by the processor, a set of actionable steps for addressing the at least one issue,
wherein the set of actionable steps corresponds to configuration changes, resource allocations, and/or component replacements; and
recommending, by the processor, a maintenance time window for executing the recommended solution in the IT infrastructure based a predefined maintenance schedule using the second ML model.

6. The method as claimed in claim 1, further comprising:
rendering, by the processor, the causal event and the recommended solution for the causal event via a graphic user interface (GUI); and
validating, by the processor, the recommended solution for the causal event based on a user feedback received via the GUI.

7. The method as claimed in claim 6, wherein the first ML model and the second ML model are continuously trained based on the user feedback and domain knowledge related to the IT infrastructure and the historical event data.

8. A system for managing issues in an information technology (IT) infrastructure, the system comprising:
a processor; and
a memory communicatively coupled to the processor, wherein the memory stores processor-executable instructions, which when executed by the processor, cause the processor to:
periodically receive a plurality of events based on a monitoring of one or more entities of the IT infrastructure,
wherein each of the plurality of events comprises event data and an entity ID corresponding to one of the one or more entities, and
wherein each of the plurality of events corresponds to at least one issue in the one of the one or more entities;
determine vector event data from the event data of each of the plurality of events;
classify by using a first machine learning (ML) model, each of the plurality of events as a noise event or a unique event based on an analysis of at least one of:
a probability distribution of a set of words in the vector event data of each of the plurality of events; and
a result of comparison of the corresponding entity ID of each of the plurality of events;
determine a set of unique events from the plurality of events based on the classification of each of the plurality of events;
determine by using a second ML model, a causal event from the set of unique events based on a correlative analysis of the set of unique events and historical event data; and
determine by using the second ML model, a recommended solution for the causal event based on a result of the correlative analysis and a similarity analysis of the causal event and a set of historical solutions for the historical event data.

9. The system as claimed in claim 8, wherein determine the vector event data comprises:
generate vector representations of event descriptions using a language model; and
store the vector representations in a vector database.

10. The system as claimed in claim 8, wherein classifying each of the plurality of events as the noise event or the unique event comprises:
determine a first group of events from the plurality of events having identical entity IDs;
calculate a probability score for each of the first group of events based on the probability distribution of the set of words; and
compare the probability score for each of the first group of events with a configured threshold value,
wherein each of the first group of events having the probability score above the configured threshold value are classified as the noise event, and
wherein each of the first group of events having the probability score below or equal to the configured threshold value are classified as the unique event.

11. The system as claimed in claim 10, wherein the event data comprises semantic information and temporal information,
wherein the historical event data comprises a set of historical root cause events of a set of historical events and the set of historical solutions for the set of historical root cause events,
wherein the correlative analysis comprises:
determine a similarity score between each of the set of unique events and the set of historical root cause events based on the associated semantic information and the temporal information; and
determine a pair of unique event and historical solution from the set of unique events and the set of historical solutions based on determination of the similarity score above a predefined threshold value,
wherein the recommended solution for the causal event is determined based on the pair of unique event and historical solution.

12. The system as claimed in claim 8, wherein the recommended solution for the causal event comprises:
render a set of actionable steps for addressing the at least one issue,
wherein the set of actionable steps corresponds to configuration changes, resource allocations, and/or component replacements; and
recommending, by the processor, a maintenance time window for executing the recommended solution in the IT infrastructure based a predefined maintenance schedule using the second ML model.

13. The system as claimed in claim 8, further comprising:
render the causal event and the recommended solution for the causal event via a graphic user interface (GUI); and
validating, by the processor, the recommended solution for the causal event based on a user feedback received via the GUI.

14. The system as claimed in claim 13, wherein the first ML model and the second ML model are continuously trained based on the user feedback and domain knowledge related to the IT infrastructure and the historical event data.

Documents

Application Documents

#	Name	Date
1	202511087629-STATEMENT OF UNDERTAKING (FORM 3) [15-09-2025(online)].pdf	2025-09-15
2	202511087629-REQUEST FOR EXAMINATION (FORM-18) [15-09-2025(online)].pdf	2025-09-15
3	202511087629-REQUEST FOR EARLY PUBLICATION(FORM-9) [15-09-2025(online)].pdf	2025-09-15
4	202511087629-PROOF OF RIGHT [15-09-2025(online)].pdf	2025-09-15
5	202511087629-POWER OF AUTHORITY [15-09-2025(online)].pdf	2025-09-15
6	202511087629-FORM-9 [15-09-2025(online)].pdf	2025-09-15
7	202511087629-FORM 18 [15-09-2025(online)].pdf	2025-09-15
8	202511087629-FORM 1 [15-09-2025(online)].pdf	2025-09-15
9	202511087629-FIGURE OF ABSTRACT [15-09-2025(online)].pdf	2025-09-15
10	202511087629-DRAWINGS [15-09-2025(online)].pdf	2025-09-15
11	202511087629-DECLARATION OF INVENTORSHIP (FORM 5) [15-09-2025(online)].pdf	2025-09-15
12	202511087629-COMPLETE SPECIFICATION [15-09-2025(online)].pdf	2025-09-15