Extracting Features For A Prediction Model Based On Ontology

Abstract: The present disclosure relates to system(s) and method(s) for extracting features for a prediction model based on ontology. The system (102) receives data comprising one or more features and parameters associated with the one or more features. The parameters comprise a source, a topology, a time window, a behavioral change, and a weightage. The system (102) further extracts primary features from the one or more features based on an analysis of the one or more features using one or more of the parameters. The one or more features may be analysed based on the source using an Identical Feature Identification technique. The one or more features may be analysed based on the topology and the weightage using a Feature Restriction technique. The one or more features may be further analysed based on the time window and the behavioral change using a Feature Filtering technique. [To be published with Figure 1]

Patent Information

Application #

Filing Date

22 March 2019

Publication Number

14/2019

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

ip@legasis.in

Parent Application

Patent Number

Legal Status

Grant Date

2024-02-12

Renewal Date

Applicants

HCL Technologies Limited

A-9, Sector - 3, Noida 201 301, Uttar Pradesh, India

Inventors

1. SUNDARARAJ, Jayaramakrishnan

HCL Technologies Limited, Karle Tech Park, Nagawara, Bangalore - 560045, Karnataka, India

2. WARRIER, Harikrishna C

HCL Technologies Limited, Karle Tech Park, Nagawara, Bangalore - 560045, Karnataka, India

Specification

CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
[001] The present application does not claim priority from any patent application.

TECHNICAL FIELD
[002] The present disclosure in general relates to the field of feature engineering. More particularly, the present invention relates to a system and method for extracting features for a prediction model based on ontology.
BACKGROUND
[003] Generally, machine learning and data analytics algorithms need good amount of data for effective processing. However, the data that is available may not be relevant for meaningful results and may not impact a data model implementation. In many cases, the data would be redundant, would not be correlated with an outcome of interest. Also, the data may contain a lot of noise and duplications. Hence, the data should be filtered into correlated features that can be used for machine learning processing. The mechanism of identifying correlated features for data analytics is called feature engineering. It is to be noted that the existing techniques look for redundancy and correlation in the data and extract meaningful features out of them.
SUMMARY
[004] Before the present systems and methods for extracting features for a prediction model on ontology, is described, it is to be understood that this application is not limited to the particular systems, and methodologies described, as there can be multiple possible embodiments which are not expressly illustrated in the present disclosure. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present application. This summary is provided to introduce concepts related to systems and method for extracting features for the prediction model based on the ontology. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
[005] In one implementation, a method for extracting features for a prediction model is illustrated. In one embodiment, the method may comprise receiving data comprising one or more features, and parameters associated with the one or more features. The parameters may comprise a source, a topology, a time window, a behavioral change, and a weightage. The method may further comprise extracting primary features from the one or more features based on an analysis of one or more features using one or more of the parameters. In one embodiment, the analysis may comprise analysing one or more features based on the source using an Identical Feature Identification technique. In another embodiment, the method may comprise analysing one or more features based on the topology and the weightage using a Feature Restriction technique. In yet another embodiment, the method may comprise analysing one or more features based the time window and the behavioral change using a Feature Filtering technique.
[006] In one implementation, a system for extracting features for a prediction model is illustrated. The system comprises a memory and a processor coupled to the memory, further the processor is configured to execute instructions stored in the memory to receive data comprising one or more features, and parameters associated with the one or more features. The parameters may comprise a source, a topology, a time window, a behavioral change, and a weightage. The processor may further execute instructions stored in the memory to extract primary features from the one or more features based on an analysis of one or more features using one or more of the parameters. In one embodiment, the analysis may comprise analysing the one or more features based on the source using an Identical Feature Identification technique. In another embodiment, the method may comprise analysing the one or more features based on the topology and the weightage using a Feature Restriction technique. In yet another embodiment, the method may comprise analysing the one or more features based the time window and the behavioral change using a Feature Filtering technique.
BRIEF DESCRIPTION OF DRAWINGS
[007] The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
[008] Figure 1 illustrates a network implementation of a system for extracting features for a prediction model based on ontology, in accordance with an embodiment of the present subject matter.
[009] Figure 2 illustrates the system for extracting features for the prediction model based on the ontology, in accordance with an embodiment of the present subject matter.
[0010] Figure 3A illustrates a method for receiving initial features list based on historical features, in accordance with an embodiment of the present subject matter.
[0011] Figure 3B illustrates a method for extracting features for a prediction model based on ontology, in accordance with an embodiment of the present subject matter.
DETAILED DESCRIPTION
[0012] Some embodiments of the present disclosure, illustrating all its features, will now be discussed in detail. The words “receiving”, “extracting”, “analysing”, and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms "a", "an" and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present disclosure, the exemplary, systems and methods for extracting features for a prediction model based on ontology are now described. The disclosed embodiments of the system and method for extracting the features for the prediction model based on ontology are merely exemplary of the disclosure, which may be embodied in various forms.
[0013] Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. However, one of ordinary skill in the art will readily recognize that the present disclosure for extracting features for a prediction model based on ontology is not intended to be limited to the embodiments illustrated, but is to be accorded the widest scope consistent with the principles and features described herein.
[0014] The present subject matter relates to extracting features for a prediction model based on ontology. In one embodiment, data comprising one or more features and parameters associated with the one or more features may be received. Once the data is received, primary features from the one or more features may be extracted. The primary features may be extracted based on an analysis of one or more features using one or more of the parameters. The parameters may comprise a source, a topology, a time window, a behavioral change, and a weightage. In one embodiment, the one or more features may be analysed based on the source using an Identical Feature Identification technique. In another embodiment, the one or more features may be analysed based on the topology and the weightage using a Feature Restriction technique. In yet another embodiment, the one or more features may be analysed based the time window and the behavioral change using a Feature Filtering technique.
[0015] Referring now to Figure 1, a network implementation 100 of a system 102 for extracting features for a prediction model based on ontology is disclosed. Although the present subject matter is explained considering that the system 102 is implemented on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like. In one implementation, the system 102 may be implemented over a cloud network. Further, it will be understood that the system 102 may be accessed by multiple users through one or more user devices 104-1, 104-2…104-N, collectively referred to as user device 104 hereinafter, or applications residing on the user device 104. Examples of the user device 104 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user device 104 may be communicatively coupled to the system 102 through a network 106.
[0016] In one implementation, the network 106 may be a wireless network, a wired network or a combination thereof. The network 106 may be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further, the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[0017] In one embodiment, the system 102 may receive data. The data may comprise one or more features and parameters associated with the one or more features. The parameters may comprise a source, a topology, a time window, a behavioral change, a weightage and the like. The data may be received based on an analysis of a use case and a historical repository. The historical repository may comprise historical features associated with the use case. The use case may indicate a requirement received from a user.
[0018] Once the data is received, the system 102 may extract primary features from the one or more features. The primary features may be extracted based on an analysis of the one or more features using one or more of parameters. In one embodiment, the system 102b may analyse the one or more features based on the source. The system 102 may use an Identical Feature Identification (IFI) technique. In another embodiment, the system 102 may analyse the one or more features based on the topology and the weightage. The system 102 may use a Feature Restriction technique. In yet another embodiment, the system 102 may analyse the one or more features based on the time window and the behavioral change. The system 102 may use a Feature Filtering technique
[0019] The system 102 may further generate a prediction model using the primary features. Once the prediction model is generated, the system 102 may validate the prediction model. The prediction model 102 may be validated based on comparison of predictions with cross validation data. The predictions may be generated by the prediction model. The cross validation data may be use from a repository.
[0020] Upon validation, the system 102 may compare the predictions with cross validation data. Based on the comparison, the system 102 may optimize the primary features. The optimization may correspond to removing features from the primary features. Upon optimization, the system 102 may generate secondary features from the primary features. The secondary features may be again used to generate predictions by the prediction model. The predictions may be again compared with the threshold predictions. Based on the comparison, the secondary features may be again optimized. If the predictions are not similar to or not greater than the threshold predictions, then the last secondary features used to generate the predictions may be final features. The final features may be further used by the prediction model.
[0021] In another embodiment, the system 102 for extracting may outline a unique set of methods to derive data analytics features based on an ontology of entities that generate the data. It is to be noted that the ontology does not mean the data, but also a source from where the data is originated, connection between the entities, time of the data generation, and the like. The system 102 may use domain knowledge to get relationships among disparate data, and also associate importance to certain data over others based on the impact of that data on a final outcome. In other words, the system 102 may analyse spatial, temporal and root cause characteristics of the features in order to derive final features. The parameters may comprise a source, a topology, a time series, a relationship and a criticality.
[0022] In one embodiment, the source of features may be an important aspect to be considered for determining critical features. There may be cases where features originating from the same source might be providing similar outcomes or may provide a view of how the prediction variables might be manifesting. Thus, redundancy in such features may be removed based on the source.
[0023] Further, relationship between the features may be considered from a topology perspective. In one embodiment, many duplicate features may be weeded off based on the connection between the entities, and an interaction between the entities.
[0024] Further, the time based aggregation of the features may provide meaningful insights into critical features that can be considered for feature selection and feature engineering. In one embodiment, based on a time window, data size, time intervals and other temporal characteristics, the features can be grouped and ordered.
[0025] Further, the relation between the features and the similarity of feature names may provide information associated with the relationship between the features. Further, a weightage or an impact of each feature provides a vital dimension of the importance of the feature. The weightage may be measured based on a perceived effect of the feature on the predicted or forecasted variable(s). In one example, a subject matter expert, a domain expert may provide information associated with the criticality and weightage of the features.
[0026] In one aspect, the system 102 may receive a use case from the user. Further, the system 102 may deep dive into the relevant features of requirement associated with the use case based on historical feature learning models. The historical feature learning models may be generated based on past experiments and implementation. The relationship of features may be extracted from a historical repository. This method of feature capture may be referred as a Feature Exploration and Selection (FES) based on a historical feature set (HFS). Based on the FES method, the system 102 may receive the one or more features. The one or more features may be referred as a raw set of inputs for the feature engineering.
[0027] Further, the system 102 may analyse the one or more features using an Identical Feature Identification (IFI) technique. In one aspect, all identical features based on the source of the features may be extracted. It makes a unique set of features based on the source. Further, the system 102 may analyse the one or more features based on the topology and the criticality of the features. The system 102 may filter the features using a Feature Restriction Algorithm (FRA). The FRA may be referred as a Feature Restriction technique. In this case, the features with more proximity, similar functionality, and the features which do not have much bearing on the output of the final predicted variables may be filtered. The filtered features may be further tuned. Further, the system 102 may analyse the one or more features using a Feature Filtering Algorithm (FFA). The FFA may be referred as Feature Filtering technique. In this case, behavior changing attitude of the one or more features over time may be analysed. In one aspect, the IFI technique, the FFA and the FRA may be together referred as a feature engineering block.
[0028] The features extracted using the IFI technique, the FFA and the FRA may be used for creation of a prediction model. The features may be sent to a machine learning training model that may be referred as a prototype prediction generator (PPG). If the machine learning training model generated based on the features provides an accurate prediction model, when tested with cross validation data, then it means that the features are providing a good response. Thus, the features may be further optimized. In one embodiment, a positive output of the prediction model may be sent back to the feature engineering block, and more rigorous filtering criteria may be applied. It results in lower number of features for the prediction model. Further, new set of predictions may be compared with the cross-validation data. If the accuracy is within limits, further optimization of the features may be performed again. Otherwise, if the accuracy starts falling, then it means the optimal set of features has been reached, and the feature engineering optimization process can be stopped. The finalized set of features may be used for the prediction model for further analysis.
[0029] Referring now to figure 2, the system 102 for extracting features for a prediction model based on ontology is illustrated in accordance with an embodiment of the present subject matter. In one embodiment, the system 102 may include at least one processor 202, an input/output (I/O) interface 204, and a memory 206. The at least one processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, at least one processor 202 may be configured to fetch and execute computer-readable instructions stored in the memory 206.
[0030] The I/O interface 204 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 204 may allow the system 102 to interact with the user directly or through the user device 104. Further, the I/O interface 204 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 204 may facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 204 may include one or more ports for connecting a number of devices to one another or to another server.
[0031] The memory 206 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 206 may include modules 208 and data 210.
[0032] The modules 208 may include routines, programs, objects, components, data structures, and the like, which perform particular tasks, functions or implement particular abstract data types. In one implementation, the module 208 may include a receiving module 212, an extracting module 214, and other modules 216. The other modules 216 may include programs or coded instructions that supplement applications and functions of the system 102.
[0033] The data 210, amongst other things, serve as a repository for storing data processed, received, and generated by one or more of the modules 208. The data 210 may also include a repository 218, and other data 220. In one embodiment, the other data 220 may include data generated as a result of the execution of one or more modules in the other modules 216.
[0034] In one implementation, a user may access the system 102 via the I/O interface 204. The user may be registered using the I/O interface 204 in order to use the system 102. In one aspect, the user may access the I/O interface 204 of the system 102 for obtaining information, providing input information or configuring the system 102.
[0035] In one embodiment, the receiving module 212 may receive data. The data may be received based on an analysis of a use case and a historical repository. The use case may indicate requirement received from the user. The historical repository may comprise historical features associated with the use case. In one embodiment, the receiving module 212 may analyse the use case. Based on the analysis of the use case, the receiving module 212 may identify the requirements. Further, the receiving module 212 may analyse information stored in the historical repository based on the requirements. Based on the analysis of the information, the receiving module 212 may receive the data comprising one or more features and parameters associated with the one or more features. The parameters may comprise a source, a topology, a time window, a behavioral change, a weightage and the like.
[0036] Upon receiving the data, the extraction module 214 may extract primary features from the one or more features. The primary features may be extracted based on an analysis of the one or more features using one or more of the parameters.
[0037] In one embodiment, the one or more features may be analysed based on the source. The one or more features may analysed using an Identical Feature Identification (IFI) technique. The IFI technique may map all identical features from the one or more features based on the source and makes a unique set of features. It may reduce a number of features. The IFI technique may comprise associating the one or more features with the source. In other words, the IFI technique may perform cataloguing of features based on the source. Further, the IFI technique may comprise analysing the one or more features based on a time and a proximity. The IFI technique may further analyse a pattern and a trend associated with the one or more features. In one example, features with similar pattern and trend may be filtered out from the one or more features.
[0038] In one aspect, the source of features may be an important aspect to be considered for determining critical features. There may be cases where features originating from the same source might be providing similar outcomes or may provide a view of how the prediction variables might be manifesting. Thus, redundancy in such features may be removed based on the source.
[0039] Further, the one or more features may be analysed based on the topology and the weightage. The one or more features may be analysed using a Feature Restriction technique. The Feature Restriction technique may comprise creating the topology of the entities based on a connectivity map. Further, the weightage may be determined based on an impact of the one or more features. Based on the weightage, critical features may be identified. If entities are connected to one another and generate features, then a set of unique features may be abstracted out of the generated features.
[0040] In one aspect, relationship between the features may be considered from a topology perspective. In one embodiment, many duplicate features may be weeded off based on the connection between the entities, and an interaction between the entities. Further, the relation between the features and the similarity of feature names may provide information associated with the relationship between the features. In one example, link failure, communication outage, heartbeat loss etc. mean the same, though they may be getting generated from multiple sources. Based on the subject matter expertise, such redundancy may be removed systematically, and optimization of features may be done.
[0041] Further, the weightage may be measured based on a perceived effect of the feature on the predicted or forecasted variable(s). In one example, a subject matter expert, a domain expert may provide information associated with the criticality and weightage of the features. In one example, CPU utilization going high might sound alarming, but if the associated traffic is also high, it might not be something too bad. Similarly, a failure of a redundant or spare entity in a network might not be a major cause of alarm, though a failure event might be reported.
[0042] Further, the one or more features may be analysed based on the time window and the behavioral change. The one or more features may be analysed using a Feature Filtering technique. The Feature Filtering technique may comprise ordering the one or more features, irrespective of the source, the topology or the weightage, by the time window. In other words, the basic premise of looking from this aspect is that if some changes or perturbations happen in the flow of the data over time, then it would happen to the entire system as a whole. Further, the Feature Filtering technique may comprise identifying outliers and exceptions. In order to identify outliers, various configuration parameters may be applied on the one or more features such as time windows, time stamp based intervals, defining correlation thresholds and the like.
[0043] In one embodiment, the time based aggregation of the features may provide meaningful insights into critical features that can be considered for feature selection and feature engineering. In one embodiment, based on a time window, data size, time intervals and other temporal characteristics, the features can be grouped and ordered.
[0044] Further, the extracting module 214 may extract the primary features from the one or more features using the IFI technique, the Feature Restriction technique and the Feature Filtering technique.
[0045] Further, the extracting module 214 may use the primary features to generate a prediction model. Once the prediction model is generated, the extracting module 214 may validate the prediction model. The prediction model may be validated using cross validation data. In one embodiment, the prediction model may generate predictions using the primary features. Further, an accuracy of the predictions may be checked based on comparison of the predictions with the cross validation data. In one embodiment, if the accuracy is greater than a threshold, then the primary features may be optimized. In one embodiment, if the accuracy is less than, then the primary features may not be optimized. The primary features may be considered as final features by the prediction model.
[0046] Based on the optimization of the primary features, secondary features may be generated. In one aspect, the optimization may indicate reducing a number of features. Upon generation of the secondary features, the prediction module may generate the predictions using the secondary features. Further, the accuracy of the predictions may be checked based on comparison of the predictions with the cross validation data. Further, the secondary features may be optimized based on the accuracy. In one embodiment, if the accuracy is less than, then the secondary features may not be optimized. The secondary features may be considered as final features by the prediction model.
[0047] In one embodiment, the primary features may be optimized based on a feedback receives from a user. If the feedback associated with the predictions generated by the prediction model is good, then the primary features may be optimized. Based on the optimization, the secondary features may be generated. If the feedback associated with the predictions generated by the prediction model is not good, then the primary features may be considered as final features.
[0048] Exemplary embodiments discussed above may provide certain advantages. Though not required to practice aspects of the disclosure, these advantages may include those provided by the following features.
[0049] Some embodiments of the system and the method is configured to extract features based on an ontology.
[0050] Some embodiments of the system and the method is configured to create optimal features based on predictions.
[0051] Some embodiments of the system and the method is configured to segregate feature data into common feature model and critical feature model by using historical data repository.
[0052] Referring now to figure 3A, a method for receiving initial features list based on historical features, is disclosed in accordance with an embodiment of the present subject matter. Referring now to figure 3B, a method for extracting features for a prediction model based on ontology, is disclosed in accordance with an embodiment of the present subject matter. In one embodiment, the method to extract the features is illustrated with figure 3A and figure 3B together.
[0053] The method may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, and the like, that perform particular functions or implement particular abstract data types. The method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
[0054] The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method or alternate methods. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method may be considered to be implemented in the above described system 102.
[0055] At block 301, a use case may be received. The use case may be received from a user. The use case may indicate requirements received from the user. In one implementation, the receiving module 212 may receive the user case.
[0056] At block 302, raw data stored in a repository may be analysed based on the use case. Based on the analysis of the raw data, at block 303, features may be extracted from the raw data. In one implementation, the receiving module 212 may extract the features.
[0057] At block 305, historical features from a historical repository may be extracted. The historical features may be extracted based on the use case. In one embodiment, based on past feature learning models, include features which are missed out. Past feature learning models generated from past experiments and implementation may be made available as the historical repository for feature comparison. This may be performed by searching and mapping the use case along with model and algorithm that is preferred, so as to match as close to the use case under consideration as possible.
[0058] At block 304, the features from the historical repository and the features extracted from the raw data may be collated. Based on collating the features, an initial feature list may be received.
[0059] At block 306, data may be received. The data may comprise one or more features, and parameters associated with the one or more features. The one or more features may be referred the initial feature list. The parameters may comprise a source, a topology, a time window, a behavioral change, a weightage and the like.
[0060] At block 308, one or more features may be analysed based on the source using an Identical Feature Identification technique. The Identical Feature Identification (IFI) technique may be referred to as an Identical Feature Identification algorithm. In one implementation, the extraction module 214 may analyse the one or more features.
[0061] The IFI technique may map all identical features from the one or more features based on the source and makes a unique set of features. It may reduce a number of features. The IFI technique may comprise associating the one or more features with the source. In other words, the IFI technique may perform cataloguing of features based on the source. Further, the IFI technique may comprise analysing the one or more features based on a time and a proximity. The IFI technique may further analyse a pattern and a trend associated with the one or more features. In one example, features with similar pattern and trend may be filtered out from the one or more features.
[0062] In one aspect, the source of features may be an important aspect to be considered for determining critical features. There may be cases where features originating from the same source might be providing similar outcomes or may provide a view of how the prediction variables might be manifesting. Thus, redundancy in such features may be removed based on the source.
[0063] At block 310, the one or more features may be analysed based on the topology and the weightage using a Feature Restriction technique. The Feature Restriction technique mag be referred as a Feature Restriction algorithm. In one implementation, the extraction module 214 may analyse the one or more features using the Feature Restriction technique.
[0064] The Feature Restriction technique may comprise creating the topology of the entities based on a connectivity map. Further, the weightage may be determined based on an impact of the one or more features. Based on the weightage, critical features may be identified. If entities are connected to one another and generate features, then a set of unique features may be abstracted out of the generated features.
[0065] In one aspect, relationship between the features may be considered from a topology perspective. In one embodiment, many duplicate features may be weeded off based on the connection between the entities, and an interaction between the entities. Further, the relation between the features and the similarity of feature names may provide information associated with the relationship between the features. In one example, link failure, communication outage, heartbeat loss etc. mean the same, though they may be getting generated from multiple sources. Based on the subject matter expertise, such redundancy may be removed systematically, and optimization of features may be done.
[0066] Further, the weightage may be measured based on a perceived effect of the feature on the predicted or forecasted variable(s). In one example, a subject matter expert, a domain expert may provide information associated with the criticality and weightage of the features. In one example, CPU utilization going high might sound alarming, but if the associated traffic is also high, it might not be something too bad. Similarly, a failure of a redundant or spare entity in a network might not be a major cause of alarm, though a failure event might be reported.
[0067] At block 312, the one or more features may be analysed based on the time window and the behavioral change using a Feature Filtering technique. The Feature Filtering technique may be referred as a Feature Filtering algorithm. In one implementation, the extraction module 214 may analyse the one or more features using the Feature Filtering technique.
[0068] The Feature Filtering technique may comprise ordering the one or more features, irrespective of the source, the topology or the weightage, by the time window. In other words, the basic premise of looking from this aspect is that if some changes or perturbations happen in the flow of the data over time, then it would happen to the entire system as a whole. Further, the Feature Filtering technique may comprise identifying outliers and exceptions. In order to identify outliers, various configuration parameters may be applied on the one or more features such as time windows, time stamp based intervals, defining correlation thresholds and the like.
[0069] In one embodiment, the time based aggregation of the features may provide meaningful insights into critical features that can be considered for feature selection and feature engineering. In one embodiment, based on a time window, data size, time intervals and other temporal characteristics, the features can be grouped and ordered.
[0070] At block 314, primary features may be extracted from the one or more features. The primary features may be extracted based on the analysis of the features using the IFI technique, the Feature Restriction technique and the Feature Filtering technique. In one implementation, the extraction module 214 may extract the primary features.
[0071] At block 316, the primary features may be used to generate a prediction model. In one implementation, the extraction module 214 may generate the prediction model. In one aspect, a Machine Learning Training Agent and a Prototype prediction model generator may be used to generate the prediction model. The Machine Learning Training Agent and the Prototype prediction model generator may be referred as a prototype prediction generator (PPG). The prediction model may be configured to generate predictions using the primary features.
[0072] At block 318, the predictions may be compared with cross validation data. Based on the comparison, the prediction model may be validated. In one aspect, an accuracy of the predictions may be checked based on comparison of the predictions with the cross validation data. In one embodiment, if the accuracy is greater than a threshold, then the primary features may be optimized. In one embodiment, if the accuracy is less than, then the primary features may not be optimized. The primary features may be considered as final features by the prediction model.
[0073] At block 316, the primary features may be optimized to generate secondary features may. In one aspect, the optimization may indicate reducing a number of features. Upon generation of the secondary features, the prediction module may generate the predictions using the secondary features. Further, the accuracy of the predictions may be checked based on comparison of the predictions with the cross validation data. Further, the secondary features may be optimized based on the accuracy. In one embodiment, if the accuracy is less than, then the secondary features may not be optimized. The secondary features may be considered as final features by the prediction model.
[0074] At block 320, final features may be extracted based on the optimization. The final features may be used by the prediction model.
[0075] At block 322, a feedback associated with the prediction may be received. Based on the feedback, the final features may be optimized.
[0076] Although implementations for systems and methods for extracting features for a prediction model based on ontology have been described, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for extracting features for the prediction model based on ontology.

Claims:
1. A method for extracting features for a prediction model, the method comprising:
receiving, by a processor (202), data comprising one or more features, and parameters associated with the one or more features, wherein the parameters comprise a source, a topology, a time window, a behavioral change, and a weightage; and
extracting, by the processor (202), primary features from the one or more features based on an analysis of the one or more features using one or more of the parameters, wherein the analysis comprises:
analysing the one or more features based on the source using an Identical Feature Identification technique;
analysing the one or more features based on the topology and the weightage using a Feature Restriction technique; and
analysing the one or more features based the time window and the behavioral change using a Feature Filtering technique.

2. The method as claimed in claim 1, further comprises:
generating a prediction model using the primary features;
validating the prediction model based on comparison of predictions with cross validation data, wherein the predictions are generated by the prediction model using the primary features;
optimizing the primary features based on the validation of the prediction model; and
generating secondary features based on the optimization of the primary features.

3. The method as claimed in claim 1, further comprises modifying the secondary features based on a feedback associated with the predictions.

4. The method as claimed in claim 1, wherein the data is received based on an analysis of a use case and a historical repository.

5. A system (102) for extracting features for a prediction model, the system (102) comprising:
a memory (206);
a processor (202) coupled to the memory (206), wherein the processor (202) is configured to execute instructions stored in the memory (206) to:
receive data comprising one or more features, and parameters associated with the one or more features, wherein the parameters comprise a source, a topology, a time window, a behavioral change, and a weightage; and
extract primary features from the one or more features based on an analysis of the one or more features using one or more of the parameters, wherein the analysis comprises:
analysing the one or more features based on the source using an Identical Feature Identification technique;
analysing the one or more features based on the topology and the weightage using a Feature Restriction technique; and
analysing the one or more features based the time window and the behavioral change using a Feature Filtering technique.

6. The system (102) as claimed in claim 5, further configured to:
generate a prediction model using the primary features;
validate the prediction model based on comparison of predictions with cross validation data, wherein the predictions are generated by the prediction model using the primary features;
optimize the primary features based on the validation of the prediction model; and
generate secondary features based on the optimization of the primary features.

7. The system (102) as claimed in claim 5, further configured to modify the secondary features based on a feedback associated with the predictions.

8. The system (102) as claimed in claim 5, wherein the data is received based on an analysis of a use case and a historical repository.

Documents

Application Documents

#	Name	Date
1	201911011264-IntimationOfGrant12-02-2024.pdf	2024-02-12
1	201911011264-STATEMENT OF UNDERTAKING (FORM 3) [22-03-2019(online)].pdf	2019-03-22
2	201911011264-PatentCertificate12-02-2024.pdf	2024-02-12
2	201911011264-REQUEST FOR EXAMINATION (FORM-18) [22-03-2019(online)].pdf	2019-03-22
3	201911011264-REQUEST FOR EARLY PUBLICATION(FORM-9) [22-03-2019(online)].pdf	2019-03-22
3	201911011264-FER.pdf	2021-10-18
4	201911011264-Proof of Right [13-10-2021(online)].pdf	2021-10-13
4	201911011264-POWER OF AUTHORITY [22-03-2019(online)].pdf	2019-03-22
5	201911011264-FORM-9 [22-03-2019(online)].pdf	2019-03-22
5	201911011264-COMPLETE SPECIFICATION [24-09-2021(online)].pdf	2021-09-24
6	201911011264-FORM 18 [22-03-2019(online)].pdf	2019-03-22
6	201911011264-CORRESPONDENCE [24-09-2021(online)].pdf	2021-09-24
7	201911011264-FORM 1 [22-03-2019(online)].pdf	2019-03-22
7	201911011264-FER_SER_REPLY [24-09-2021(online)].pdf	2021-09-24
8	201911011264-OTHERS [24-09-2021(online)].pdf	2021-09-24
8	201911011264-FIGURE OF ABSTRACT [22-03-2019(online)].jpg	2019-03-22
9	201911011264-DRAWINGS [22-03-2019(online)].pdf	2019-03-22
9	201911011264-FORM 13 [09-07-2021(online)].pdf	2021-07-09
10	201911011264-COMPLETE SPECIFICATION [22-03-2019(online)].pdf	2019-03-22
10	201911011264-POA [09-07-2021(online)].pdf	2021-07-09
11	201911011264-Correspondence-090719.pdf	2019-07-15
11	abstract.jpg	2019-05-01
12	201911011264-OTHERS-090719.pdf	2019-07-15
12	201911011264-Proof of Right (MANDATORY) [04-07-2019(online)].pdf	2019-07-04
13	201911011264-OTHERS-090719.pdf	2019-07-15
13	201911011264-Proof of Right (MANDATORY) [04-07-2019(online)].pdf	2019-07-04
14	201911011264-Correspondence-090719.pdf	2019-07-15
14	abstract.jpg	2019-05-01
15	201911011264-COMPLETE SPECIFICATION [22-03-2019(online)].pdf	2019-03-22
15	201911011264-POA [09-07-2021(online)].pdf	2021-07-09
16	201911011264-DRAWINGS [22-03-2019(online)].pdf	2019-03-22
16	201911011264-FORM 13 [09-07-2021(online)].pdf	2021-07-09
17	201911011264-OTHERS [24-09-2021(online)].pdf	2021-09-24
17	201911011264-FIGURE OF ABSTRACT [22-03-2019(online)].jpg	2019-03-22
18	201911011264-FORM 1 [22-03-2019(online)].pdf	2019-03-22
18	201911011264-FER_SER_REPLY [24-09-2021(online)].pdf	2021-09-24
19	201911011264-FORM 18 [22-03-2019(online)].pdf	2019-03-22
19	201911011264-CORRESPONDENCE [24-09-2021(online)].pdf	2021-09-24
20	201911011264-FORM-9 [22-03-2019(online)].pdf	2019-03-22
20	201911011264-COMPLETE SPECIFICATION [24-09-2021(online)].pdf	2021-09-24
21	201911011264-Proof of Right [13-10-2021(online)].pdf	2021-10-13
21	201911011264-POWER OF AUTHORITY [22-03-2019(online)].pdf	2019-03-22
22	201911011264-REQUEST FOR EARLY PUBLICATION(FORM-9) [22-03-2019(online)].pdf	2019-03-22
22	201911011264-FER.pdf	2021-10-18
23	201911011264-REQUEST FOR EXAMINATION (FORM-18) [22-03-2019(online)].pdf	2019-03-22
23	201911011264-PatentCertificate12-02-2024.pdf	2024-02-12
24	201911011264-STATEMENT OF UNDERTAKING (FORM 3) [22-03-2019(online)].pdf	2019-03-22
24	201911011264-IntimationOfGrant12-02-2024.pdf	2024-02-12

Search Strategy

1	Search11264E_25-03-2021.pdf