Abstract: System and method for automatic selection of training region selection for deep learning model is disclosed. An identification module (110) identifies equipment shutdowns and excludes buffer region data. A segmentation module (120) estimates operational regions and divides data into predefined operating regions using K-means clustering. A feature analysis module (130) determines top contributing tags and identifies low and high values within operational regions. A domain comparison module (140) compares contributing tags with domain knowledge, excluding regions associated with mapped failures. A data cleaning module (150) ensures data integrity by recalculating statistical measures and eliminating outliers. A filtering module (160) refines data and consolidates regions to reduce training ranges. This integrated approach enhances equipment monitoring and predictive analytics, enabling proactive maintenance strategies and optimizing operational efficiency. FIG. 1
Description:FIELD OF INVENTION
[0001] Embodiments of the present disclosure relate to the field of predictive analytics and anomaly detection in equipment monitoring using deep learning models, and more particularly, a system and method for training region selection for deep learning model.
BACKGROUND
[0002] Deep learning-based techniques have gained significant traction in the realm of anomaly detection and equipment monitoring due to their capability to learn intricate patterns from vast amounts of data. These techniques leverage the power of deep neural networks to process billions of data points, identifying subtle deviations that could indicate potential failures or inefficiencies in equipment performance. Despite their effectiveness, a critical challenge persists in the selection of training regions for these deep learning models.
[0003] Selecting appropriate training regions is essential for the accuracy and reliability of the models. Traditionally, this task has been the responsibility of the customers, who are expected to specify which regions represent ideal operating conditions for model training. However, customers often lack the necessary information and expertise to make these determinations. This gap not only makes the training region selection process time-consuming but also highly prone to errors.
[0004] In conventional methods, operation engineers manually review the data to select training regions. This process involves examining each tag (or sensor) within the system to identify alerts or underperforming regions. Given the complexity and volume of data, this manual approach is exceedingly time-consuming, often requiring several man-hours per system. Systems with tag counts ranging from tens to thousands exacerbate this challenge. To simplify the process, some solutions limit the number of sensors, or the amount of data considered. However, even when focusing on a limited number of critical tags, the review process can still exceed an hour per system. This significant time investment impedes the swift development and deployment of models.
[0005] Another conventional method relies on the judgment of operation engineers to select training regions. This method introduces considerable variability as engineers' selections are influenced by their individual risk preferences and understanding of the domain. Engineers need to have substantial domain knowledge and expertise in interpreting sensor data to make informed decisions. This human element introduces bias and inconsistency, resulting in challenges in achieving reproducible training regions and developing generalized models. The variability can lead to suboptimal training regions, ultimately affecting the accuracy and reliability of the models. Consequently, models trained on these regions may fail to generate accurate alerts when needed, undermining the effectiveness of the predictive analytics solution.
[0006] Hence, there is a need for an improved system and method for training region selection for deep learning model, which addresses the aforementioned issue(s).
OBJECTIVE OF THE INVENTION
[0007] An objective of the present invention is to develop a sophisticated system and method for improved equipment monitoring through intelligent data analysis.
[0008] Another objective of the present invention is to optimize predictive analytics by accurately selecting training regions for deep learning models.
[0009] Another objective of the present invention is to accurately identify equipment shutdowns by analyzing operational data and excluding buffer region data.
[0010] Another objective of the present invention is to estimate and segment operational regions within the data using advanced machine learning techniques such as K-means clustering.
[0011] Another objective of the present invention is to enhance predictive maintenance strategies, reduce downtime, minimize errors, and boost overall operational productivity by leveraging advanced data analysis and domain knowledge integration.
BRIEF DESCRIPTION
[0012] In accordance with an embodiment of the present disclosure, a system for training region selection for deep learning model is provided. The system includes a processing subsystem hosted on a server. The processing subsystem is configured to execute on a network to control bidirectional communications among a plurality of modules. The processing subsystem includes an identification module which is configured to identify a shutdown of at least one equipment by analyzing operational data and exclude data from the buffer region for a predefined period before and after the shutdown, upon identifying one or more buffer regions around the shutdown. The processing subsystem also includes a segmentation module operatively coupled to the identification module. The segmentation module is configured to estimate one or more of operational regions of the corresponding one or more equipment within the data to analyze variance and identify an optimal number of clusters and divide the data into predefined operating regions using K-means clustering technique based on the estimated number of operational regions. Further, the processing subsystem includes a feature analysis module operatively coupled to the segmentation module. The feature analysis module is configured to: determine one or more top contributing tags for each operational regions by selecting a predetermined percentage of data points closest to a centroid of each operational regions and sort the absolute values of their medians; and to identify low and high values of contributing tags within the operational regions by analyzing at least one absolute value. The processing subsystem also includes a domain comparison module operatively coupled to the feature analysis module. The domain comparison module is configured to compare the one or more top contributing tags of each region with corresponding domain knowledge and exclude one or more regions whose contributing tags are associated with mapped failures based on the domain knowledge. Furthermore, the processing subsystem includes a data cleaning module operatively coupled to the domain comparison module. The data cleaning module is configured to determine a mean and standard deviation within the data upon eliminating erroneous data by recalculating statistical measures and eliminate outlier data points from the remaining data by applying statistical outlier detection methods. The processing subsystem further includes a filtering module operatively coupled to the data cleaning module. The filtering module is configured to apply a filter over the remaining region to refine the data further and consolidate regions that are within a predetermined distance to reduce the number of training ranges by merging adjacent regions.
[0013] In accordance with another embodiment of the present disclosure, a method for training region selection for deep learning model. The method includes identifying a shutdown of at least one equipment by analyzing operational data and exclude data from the buffer region for a predefined period before and after the shutdown, upon identifying one or more buffer regions around the shutdown. The method also includes estimating one or more of operational regions of the corresponding one or more equipment within the data for analysing variance and identifying an optimal number of clusters, and dividing the data into predefined operating regions using K-means clustering technique. Furthermore, the method includes determining one or more top contributing tags for each operational regions by selecting a predetermined percentage of data points closest to a centroid of each operational regions and sorting the absolute values of their medians. The method also includes identifying low and high values of contributing tags within the operational regions by analyzing at least one absolute value. The method also includes comparing the one or more top contributing tags of each region with corresponding domain knowledge, and excluding one or more regions whose contributing tags are associated with mapped failures based on the domain knowledge. Further, the method includes determining a mean and standard deviation within the data upon eliminating erroneous data by recalculating statistical measures and eliminating outlier data points from the remaining data by applying statistical outlier detection methods. The method also includes applying a filter over the remaining region for refining the data further and consolidating regions that are within a predetermined distance for reducing the number of training ranges by merging adjacent regions.
[0014] To further clarify the advantages and features of the present disclosure, a more particular description of the disclosure will follow by reference to specific embodiments thereof, which are illustrated in the appended figures. It is to be appreciated that these figures depict only typical embodiments of the disclosure and are therefore not to be considered limiting in scope. The disclosure will be described and explained with additional specificity and detail with the appended figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] The disclosure will be described and explained with additional specificity and detail with the accompanying figures in which:
[0016] FIG. 1 is a block diagram representation of a system for training a region selection for deep learning model in accordance with an embodiment of the present disclosure;
[0017] FIG. 2 is a block diagram of a computer or a server in accordance with an embodiment of the present disclosure; and
[0018] FIG. 3 illustrates a flow chart representing the steps involved in a method for training region selection for deep learning model.
[0019] Further, those skilled in the art will appreciate that elements in the figures are illustrated for simplicity and may not have necessarily been drawn to scale. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the figures by conventional symbols, and the figures may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures with details that will be readily apparent to those skilled in the art having the benefit of the description herein.
DETAILED DESCRIPTION
[0020] For the purpose of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications in the illustrated system, and such further applications of the principles of the disclosure as would normally occur to those skilled in the art are to be construed as being within the scope of the present disclosure.
[0021] The terms “comprises”, “comprising”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more devices or subsystems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other devices, sub-systems, elements, structures, components, additional devices, additional sub-systems, additional elements, additional structures or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.
[0022] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting.
[0023] In the following specification and the claims, reference will be made to a number of terms, which shall be defined to have the following meanings. The singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise.
[0024] Embodiments of the present disclosure relate to the field of predictive analytics and anomaly detection in equipment monitoring using deep learning models, and more particularly, a system and method for automatic selection of training of region selection for deep learning model. The system includes a processing subsystem hosted on a server. The processing subsystem is configured to execute on a network to control bidirectional communications among a plurality of modules. The processing subsystem includes an identification module which is configured to identify a shutdown of at least one equipment by analyzing operational data and exclude data from the buffer region for a predefined period before and after the shutdown, upon identifying one or more buffer regions around the shutdown. The processing subsystem also includes a segmentation module operatively coupled to the identification module. The segmentation module is configured to estimate one or more of operational regions of the corresponding one or more equipment within the data to analyze variance and identify an optimal number of clusters and divide the data into predefined operating regions using K-means clustering technique based on the estimated number of operational regions. Further, the processing subsystem includes a feature analysis module operatively coupled to the segmentation module. The feature analysis module is configured to: determine one or more top contributing tags for each operational regions by selecting a predetermined percentage of data points closest to a centroid of each operational regions and sort the absolute values of their medians; and to identify low and high values of contributing tags within the operational regions by analyzing at least one absolute value. The processing subsystem also includes a domain comparison module operatively coupled to the feature analysis module. The domain comparison module is configured to compare the one or more top contributing tags of each region with corresponding domain knowledge and exclude one or more regions whose contributing tags are associated with mapped failures based on the domain knowledge. Furthermore, the processing subsystem includes a data cleaning module operatively coupled to the domain comparison module. The data cleaning module is configured to determine a mean and standard deviation within the data upon eliminating erroneous data by recalculating statistical measures and eliminate outlier data points from the remaining data by applying statistical outlier detection methods. The processing subsystem further includes a filtering module operatively coupled to the data cleaning module. The filtering module is configured to apply a filter over the remaining region to refine the data further and consolidate regions that are within a predetermined distance to reduce the number of training ranges by merging adjacent regions.
[0025] FIG. 1 is a block diagram of a system (100) for training region selection for deep learning model. The system (100) includes a processing subsystem (105) hosted on a server (108). In one embodiment, the server (108) may include a cloud-based server. In another embodiment, parts of the server (108) may be a local server coupled to a user device (not shown in FIG.1). The processing subsystem (105) is configured to execute on a network (115) to control bidirectional communications among a plurality of modules. In one example, the network (115) may be a private or public local area network (LAN) or Wide Area Network (WAN), such as the Internet. In another embodiment, the network (115) may include both wired and wireless communications according to one or more standards and/or via one or more transport mediums. In one example, the network (115) may include wireless communications according to one of the 802.11 or Bluetooth specification sets, or another standard or proprietary wireless communication protocol. In yet another embodiment, the network (115) may also include communications over a terrestrial cellular network, including, a global system for mobile communications (GSM), code division multiple access (CDMA), and/or enhanced data for global evolution (EDGE) network.
[0026] The system (100) includes an identification module (110) configured to identify a shutdown of at least one equipment by analyzing operational data and exclude data from the buffer region for a predefined period before and after the shutdown, upon identifying one or more buffer regions around the shutdown. More specifically, the identification module (110) may operate by continuously monitoring operational data associated with various equipment. This operational data may include sensor readings, performance metrics, and status indicators that reflect the real-time operational status of the equipment. In another embodiment, the identification module (110) may be designed to detect patterns or anomalies that indicate a shutdown event. A shutdown may be identified when the operational data exhibits specific characteristics such as zero operational activity, significant drops in performance metrics, or other predefined indicators of non-operation.
[0027] Further, upon identifying a shutdown event, the identification module (110) may determine buffer regions around the shutdown. The buffer region may be a predefined period extending both before and after the shutdown event. In one embodiment, the purpose of this buffer region may be to exclude potentially corrupted or irrelevant data that could adversely affect the training of deep learning models. Further, the predefined period may be set to a default value, such as seven days, but can also be adjusted based on user input or specific requirements of the equipment or operational context.
[0028] In another embodiment, once the buffer regions are determined, the identification module (110) may exclude the data within these regions from further analysis. This exclusion process may involve removing or marking the data points that fall within the buffer region, ensuring that they are not used in the subsequent steps of data processing and model training. By excluding this data, the system may prevent the inclusion of anomalous or non-representative data that could degrade the performance of the predictive models.
[0029] The system (100) also includes a segmentation module (120) operatively coupled to the identification module (110). The segmentation module (120) is configured to estimate one or more of operational regions of the corresponding one or more equipment within the data to analyze variance and identify an optimal number of clusters, and divide the data into predefined operating regions using K-means clustering technique based on the estimated number of operational regions.
[0030] In one embodiment, the segmentation module (120) may receive cleaned and refined data from the identification module (110), which may have already been identified and excluded data within buffer regions around equipment shutdowns. This integration ensures that the segmentation module (120) processes only high-quality, relevant data. Further, the segmentation module (120) may start by estimating the number of operational regions within the dataset. In one embodiment, the estimation may involve analyzing the variance within the data to determine the optimal number of clusters.
[0031] Further, the segmentation module (120) may employ the elbow method. The term ‘elbow method’ may be defined as a well-known technique in cluster analysis, to evaluate the variance explained as a function of the number of clusters. The optimal number of clusters may be identified at the point where adding more clusters yields diminishing returns in terms of explained variance, often visualized as an "elbow" in the graph of variance versus the number of clusters.
[0032] Once the optimal number of clusters is determined, the segmentation module (120) may apply the K-means clustering technique to divide the data into predefined operating regions. K-means clustering is a partitioning method that assigns each data point to one of K clusters based on the nearest mean, which serves as the cluster centroid.
[0033] Furthermore, the system (100) includes a feature analysis module (130) operatively coupled to the segmentation module (120). The feature extraction module is configured to determine one or more top contributing tags for each operational regions by selecting a predetermined percentage of data points closest to a centroid of each operational regions and sort the absolute values of their medians. The feature extraction module is also configured to identify low and high values of contributing tags within the operational regions by analyzing at least one absolute value.
[0034] In one embodiment, the feature analysis module (130) may receive segmented data from the segmentation module (120), which has already divided the data into predefined operational regions using the K-means clustering technique. This integration may ensure that the feature analysis module (130) processes data that has been effectively partitioned into distinct operational regions, each characterized by similar operational patterns.
[0035] Further, the feature analysis module (130) begins by determining the top contributing tags for each operational region. In such embodiment, the tags refer to specific parameters or sensor readings that may be monitored within the equipment. More specifically, for each operational region, the feature analysis module (130) may select a predetermined percentage of data points that are closest to the centroid of that region. The centroid may represent the mean or central point of the cluster, and the selected data points may be those that most accurately represent the typical behavior within that region. Further, the feature analysis module (130) may calculate the median values of the selected data points for each tag. In some embodiment, medians may be used as they are robust to outliers and provide a reliable measure of central tendency. Consequently, the feature analysis module (130) may sort the tags based on the absolute values of their medians. This sorting process may identify which tags have the most significant influence on the operational region, as tags with higher median values are considered to be the top contributors.
[0036] Furthermore, upon determining the top contributing tags, the feature analysis module (130) may identify the low and high values of these tags within each operational region. In one embodiment, the feature analysis module (130) may analyze the absolute values of the contributing tags to identify their low and high ranges. In some embodiments, absolute values may be used to ensure that both positive and negative deviations are considered, providing a comprehensive view of abnormal behavior. In such embodiment, the feature module may use predefined thresholds or statistical methods to define what constitutes low and high values for each tag. These thresholds may help in identifying abnormal operational conditions.
[0037] In one exemplary embodiment, the feature analysis module (130) may be configured to identify interactions between contributing tags across different regions to detect potential cross-regional anomalies. Additionally, feature analysis module (130) may generate a detailed report that may highlight the low and high values of these contributing tags within each operational region. This dual functionality may ensure comprehensive anomaly detection and provides valuable insights into the behavior of key parameters across different operational contexts.
[0038] The system (100) further includes a domain comparison module (140) operatively coupled to the feature analysis module (130). The domain comparison module (140) is configured to compare the one or more top contributing tags of each region with corresponding domain knowledge and exclude one or more regions whose contributing tags are associated with mapped failures based on the domain knowledge. In one embodiment, the domain knowledge may include at least one of assets, failures, and their mapping with contributing tags, failure recommendations, or a combination thereof.
[0039] In one embodiment, the domain comparison module (140) may receive data on the top contributing tags for each operational region from the feature analysis module (130). This integration may ensure that the domain comparison module (140) processes data that has been meticulously analyzed to identify key contributing factors within each operational region. In one embodiment, the domain comparison module (140) may access a repository of domain knowledge, which includes detailed mappings of assets, known failures, contributing tags associated with these failures, and any failure recommendations. This repository can be updated continuously with new insights and data. The domain comparison module (140) may then compare the top contributing tags identified by the feature analysis module (130) with the tags listed in the domain knowledge. This comparison may identify any matches between the contributing tags of the operational regions and those associated with known failures.
[0040] Further, if the comparison reveals that the contributing tags of an operational region are associated with mapped failures, the domain comparison module (140) may exclude these regions from further analysis and training data. This exclusion may ensure that regions likely to contain data indicative of failure conditions are not used for training, thereby improving the reliability and accuracy of the predictive models.
[0041] Furthermore, the system (100) includes a data cleaning module (150) operatively coupled to the domain comparison module (140). The data cleaning module (150) is configured to determine a mean and standard deviation within the data upon eliminating erroneous data by recalculating statistical measures and eliminate outlier data points from the remaining data by applying statistical outlier detection methods.
[0042] In one embodiment, the data cleaning module (150) may begin functioning by recalculating statistical measures, specifically the mean and standard deviation, for the data received. More specifically, the data cleaning module (150) may first identify and eliminate any remaining erroneous data points within the dataset. In such an embodiment, erroneous data points may include incorrect sensor readings, noise, or any data that does not conform to expected patterns based on predefined criteria. Further, once the erroneous data is removed, the data cleaning module (150) may recalculate the mean and standard deviation. These statistical measures may provide a basis for understanding the central tendency and dispersion of the data, respectively. The recalculated mean may offer a more accurate representation of the data's central value, while the recalculated standard deviation quantifies the variation or spread in the dataset.
[0043] Furthermore, after recalculating the mean and standard deviation, the data cleaning module (150) may proceed to eliminate outlier data points. More specifically, the data points may be assigned a Z-score based on their distance from the mean, measured in terms of standard deviations. Points with a Z-score beyond a certain threshold (typically ±3) may be considered outliers. The data cleaning module (150) may calculate the interquartile range (IQR) and identify outliers as data points that fall below the first quartile minus 1.5 times the IQR or above the third quartile plus 1.5 times the IQR. This method may use a robust estimate of the median and median absolute deviation (MAD) to identify outliers, particularly effective for datasets with skewed distributions.
[0044] Consequently, the data points identified as outliers through the selected method are removed from the dataset. This step may ensure that the remaining data points are within a reasonable range of variability and do not unduly influence the results of subsequent analyses.
[0045] The system (100) further includes a filtering module (160) operatively coupled to the data cleaning module (150). The filtering module (160) is configured to apply a filter over the remaining region to refine the data further and consolidate regions that are within a predetermined distance to reduce the number of training ranges by merging adjacent regions.
[0046] In one embodiment, the filtering module (160) may receive a cleaned dataset from the data cleaning module (150). This dataset may have already undergone processes to eliminate erroneous data and outliers, ensuring that the data provided to the filtering module (160) is of high quality and ready for further refinement. The filtering module (160) may apply additional filters to the data to ensure that only the most relevant and accurate data points are retained for training. The filtering module (160) may define criteria for filtering the data further. These criteria may include specific ranges for values, the removal of noise, and any other parameters deemed necessary to enhance data quality.
[0047] Further, the filtering module (160) applies the defined filters to the dataset. This process may include Smoothing filters: Techniques such as moving averages or Gaussian filters to smooth out short-term fluctuations and highlight long-term trends. Noise reduction: techniques to remove any remaining noise or irrelevant variations in the data. Range-based filters: eliminating data points that fall outside predefined acceptable ranges.
[0048] Consequently, after filtering the data, the filtering module (160) consolidates regions that are within a predetermined distance to reduce the number of training ranges. Further, the filtering module (160) may calculate the distances between the centroids of adjacent regions. The predetermined distance threshold may be set based on domain knowledge and specific application requirements. Further, the regions whose centroids are within the predefined distance may be merged. In one embodiment, this step may include combining the data points from the adjacent regions into a single region, updating the centroid of the newly formed region to reflect the mean position of the combined data points and adjusting any statistical measures (mean, standard deviation) to account for the merged data.
[0049] In one exemplary embodiment, the system (100) may include a recommendation module which may suggest maintenance actions based on the analysis of training regions and domain knowledge. This module may work with other system components to analyze refined data, identifying key parameters and patterns using machine learning. By accessing a repository of historical failure data and maintenance guidelines, the recommendation module may integrate current operational data with past insights. The module may predict potential failures and may generate specific preventive maintenance actions, prioritizing them based on severity and equipment criticality. This proactive approach may reduce downtime, prevent major failures, and extends equipment lifespan, all while minimizing human error and enhancing maintenance efficiency through automation.
[0050] FIG. 2 is a block diagram of a computer or a server in accordance with an embodiment of the present disclosure. The server (200) includes processor(s) (230), and memory (210) operatively coupled to the bus (220). The processor(s) (230), as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instruction word microprocessor, an explicitly parallel instruction computing microprocessor, a digital signal processor, or any other type of processing circuit, or a combination thereof.
[0051] The memory (210) includes several subsystems stored in the form of executable program which instructs the processor (230) to perform the method steps illustrated in FIG. 1. The memory (210) includes a processing subsystem (105) of FIG.1. The processing subsystem (105) further has following modules: an identification module (110), a segmentation module (120), a feature analysis module (130), a domain comparison module (140), a data cleaning module (150), a filtering module (160) and a recommendation module, each with distinct functions that enhance the overall process of training region selection for deep learning models used in equipment monitoring.
[0052] The identification module (110) is configured to identify equipment shutdowns by analyzing operational data and exclude data from buffer regions before and after the shutdown. The segmentation module (120), utilizing machine learning techniques such as the elbow method and K-means clustering, estimates the number of operational regions and divides the data accordingly. The feature analysis module (130) identifies key contributing tags within each region, determines their low and high values, and detects potential cross-regional anomalies, generating detailed reports. The domain comparison module (140) integrates historical data and domain knowledge, comparing top contributing tags with mapped failures to exclude unreliable regions. The data cleaning module (150) recalculates statistical measures to eliminate erroneous data and outliers, ensuring high-quality data for further analysis. The filtering module (160) refines the remaining data by applying filters and consolidating adjacent regions to reduce the number of training ranges. Finally, the recommendation module uses AI to analyze training regions and domain knowledge, predict potential failures, and generate prioritized maintenance recommendations, improving equipment reliability and maintenance efficiency.
[0053] The bus (220) as used herein refers to internal memory channels or computer network that is used to connect computer components and transfer data between them. The bus (220) includes a serial bus or a parallel bus, wherein the serial bus transmits data in bit-serial format and the parallel bus transmits data across multiple wires. The bus (220), as used herein, may include but not limited to, a system bus, an internal bus, an external bus, an expansion bus, a frontside bus, a backside bus and the like.
[0054] FIG. 3 illustrates a flow chart representing the steps involved in a method for training region selection for deep learning model. The methos (300) includes identifying a shutdown of at least one equipment by analyzing operational data and exclude data from the buffer region for a predefined period before and after the shutdown, upon identifying one or more buffer regions around the shutdown in step 310. More specifically, method (300) includes analyzing operational data to detect instances of equipment shutdowns. Upon identifying a shutdown event, data from the buffer region surrounding the shutdown is excluded to ensure accurate analysis.
[0055] The method (300) also includes estimating one or more of operational regions of the corresponding one or more equipment within the data for analysing variance and identifying an optimal number of clusters, and dividing the data into predefined operating regions using K-means clustering technique in step 320. More specifically, the method (300) estimates operational regions within the data by analyzing variance and determining an optimal number of clusters using the K-means clustering technique. The data is then divided into predefined operating regions based on these clusters.
[0056] Furthermore, the method (300) includes determining one or more top contributing tags for each operational regions by selecting a predetermined percentage of data points closest to a centroid of each operational regions and sorting the absolute values of their medians in step 330. More specifically, the method (300) includes identifying key contributing tags within each operational region. A predetermined percentage of data points closest to the centroid of each region is selected, and the absolute values of their medians are sorted to determine the top contributing tags.
[0057] The method (300) also includes identifying low and high values of contributing tags within the operational regions by analyzing at least one absolute value in step 340. More specifically, the method (300) identifies the low and high values of contributing tags within the operational regions. By analyzing at least one absolute value, anomalies can be detected, indicating potential issues or abnormalities.
[0058] Furthermore, the method (300) includes comparing the one or more top contributing tags of each region with corresponding domain knowledge, and excluding one or more regions whose contributing tags are associated with mapped failures based on the domain knowledge in step 350. More specifically, the method (300) includes comparing the top contributing tags of each region with domain knowledge. Regions associated with contributing tags mapped to known failures are excluded from further analysis, based on the domain knowledge.
[0059] The method (300) also includes determining a mean and standard deviation within the data upon eliminating erroneous data by recalculating statistical measures and eliminating outlier data points from the remaining data by applying statistical outlier detection methods in step 360. More specifically, in this step, statistical measures such as mean and standard deviation are recalculated within the data to ensure accuracy. Erroneous data points are eliminated, and outliers are detected and removed using statistical outlier detection methods.
[0060] Further, the method (300) includes applying a filter over the remaining region for refining the data further and consolidating regions that are within a predetermined distance for reducing the number of training ranges by merging adjacent regions in step 370. More specifically, the remaining data is further refined by applying filters. Regions within a predetermined distance are consolidated to reduce the number of training ranges, enhancing the efficiency of the system.
[0061] Various embodiments of the system and method for automatic selection of training of region selection for deep learning model offer several advantages in the realm of equipment monitoring and predictive analytics. By employing a sophisticated combination of data analysis techniques and domain knowledge integration, the system enhances the accuracy and efficiency of training region selection for deep learning models used in equipment monitoring. One notable advantage is the system's ability to accurately identify equipment shutdowns and exclude buffer region data, ensuring that only relevant operational data is utilized for analysis. This targeted approach reduces noise and improves the quality of insights derived from the data.
[0062] Furthermore, the system utilizes advanced machine learning algorithms to estimate operational regions and divide the data into predefined operating regions. This not only streamlines the analysis process but also enables the system to adapt to varying operational conditions and complexities effectively. Additionally, by leveraging domain knowledge, the system can intelligently compare contributing tags with known failure patterns, enabling proactive identification and mitigation of potential issues.
[0063] Moreover, the system ensures data integrity by recalculating statistical measures, eliminating erroneous data, and detecting outliers. This rigorous data cleaning process enhances the reliability of the analysis results and reduces the risk of false positives or erroneous conclusions. Additionally, the system refines the data by applying filters and consolidating regions, ultimately reducing the number of training ranges and enhancing the efficiency of the system.
[0064] The techniques described in this disclosure may be implemented, at least in part, in hardware, software, firmware, or any combination thereof. For example, various aspects of the described techniques may be implemented within one or more processors, including one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or any other equivalent integrated or discrete logic circuitry, as well as any combinations of such components. The term “processor” or “processing subsystem” may generally refer to any of the foregoing logic circuitry, alone or in combination with other logic circuitry, or any other equivalent circuitry. A control unit including hardware may also perform one or more of the techniques of this disclosure.
[0065] Such hardware, software, and firmware may be implemented within the same device or within separate devices to support the various techniques described in this disclosure. In addition, any of the described units, modules, or components may be implemented together or separately as discrete but interoperable logic devices. Depiction of different features as modules or units is intended to highlight different functional aspects and does not necessarily imply that such modules or units must be realized by separate hardware, firmware, or software components. Rather, functionality associated with one or more modules or units may be performed by separate hardware, firmware, or software components, or integrated within common or separate hardware, firmware, or software components.
[0066] It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the disclosure and are not intended to be restrictive thereof.
[0067] While specific language has been used to describe the disclosure, any limitations arising on account of the same are not intended. As would be apparent to a person skilled in the art, various working modifications may be made to the method in order to implement the inventive concept as taught herein.
[0068] The figures and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, the order of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts need to be necessarily performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples.
, Claims:1. A system (100) for training region selection for deep learning model comprising:
a processing subsystem (105) hosted on a server (108), wherein the processing subsystem (105) is configured to execute on a network (115) to control bidirectional communications among a plurality of modules comprising:
characterized in that,
an identification module (110) configured to identify a shutdown of at least one equipment by analyzing operational data and exclude data from the buffer region for a predefined period before and after the shutdown, upon identifying one or more buffer regions around the shutdown;
a segmentation module (120) operatively coupled to the identification module (110), and configured to estimate one or more of operational regions of the corresponding one or more equipment within the data to analyze variance and identify an optimal number of clusters, and divide the data into predefined operating regions using K-means clustering technique based on the estimated number of operational regions;
a feature analysis module (130) operatively coupled to the segmentation module (120), and configured to:
determine one or more top contributing tags for each operational regions by selecting a predetermined percentage of data points closest to a centroid of each operational regions and sort the absolute values of their medians; and
identify low and high values of contributing tags within the operational regions by analyzing at least one absolute value;
a domain comparison module (140) operatively coupled to the feature analysis module (130), and configured to compare the one or more top contributing tags of each region with corresponding domain knowledge, and exclude one or more regions whose contributing tags are associated with mapped failures based on the domain knowledge;
a data cleaning module (150) operatively coupled to the domain comparison module (140), and configured to determine a mean and standard deviation within the data upon eliminating erroneous data by recalculating statistical measures, and eliminate outlier data points from the remaining data by applying statistical outlier detection methods; and
a filtering module (160) operatively coupled to the data cleaning module (150), and configured to apply a filter over the remaining region to refine the data further and consolidate regions that are within a predetermined distance to reduce the number of training ranges by merging adjacent regions.
2. The system as claimed in claim 1, wherein the predefined number of days for excluding data before and after the shutdown event is at least seven days.
3. The system as claimed in claim 1, wherein the domain knowledge comprises at least one of assets, failures, and their mapping with contributing tags, failure recommendations, or a combination thereof.
4. The system as claimed in claim 1, wherein the feature analysis module (130) is configured to generate a report detailing the low and high values of the contributing tags within the operational region.
5. The system as claimed in claim 1, wherein the feature analysis module (130) is configured to identify interactions between contributing tags across different regions to detect potential cross-regional anomalies.
6. The system as claimed in claim 1, wherein the data cleaning module (150) is configured to eliminate outlier data points from the remaining data after excluding the regions associated with mapped failures.
7. The system as claimed in claim 1, wherein the processing submodule (105) comprises a recommendation module configured to suggest maintenance actions based on the analysis of the training regions and the domain knowledge.
8. A method (300) for training region selection for deep learning model, comprising:
identifying, by an identification module of a processing subsystem, a shutdown of at least one equipment by analyzing operational data and exclude data from the buffer region for a predefined period before and after the shutdown, upon identifying one or more buffer regions around the shutdown; (310)
estimating, by a segmentation module of a processing subsystem, one or more of operational regions of the corresponding one or more equipment within the data for analysing variance and identifying an optimal number of clusters, and dividing the data into predefined operating regions using K-means clustering technique; (320)
determining, by a feature analysis module of a processing subsystem, one or more top contributing tags for each operational regions by selecting a predetermined percentage of data points closest to a centroid of each operational regions and sorting the absolute values of their medians; (330)
identifying, by the feature analysis module of a processing subsystem, low and high values of contributing tags within the operational regions by analyzing at least one absolute value; (340)
comparing, by a domain comparison module of a processing subsystem, the one or more top contributing tags of each region with corresponding domain knowledge, and excluding one or more regions whose contributing tags are associated with mapped failures based on the domain knowledge; (350)
determining, by a data cleaning module of a processing subsystem, a mean and standard deviation within the data upon eliminating erroneous data by recalculating statistical measures, and eliminating outlier data points from the remaining data by applying statistical outlier detection methods; and (360)
applying, by a filtering module of a processing subsystem, a filter over the remaining region for refining the data further and consolidating regions that are within a predetermined distance for reducing the number of training ranges by merging adjacent regions. (370)
9. The method (300) as claimed in claim 6, comprises generating, by the identification module, an alert when a shutdown of equipment is detected.
10. The method (300) as claimed in claim 6, comprises displaying, by the segmentation module, a visual representation of the operational regions identified using the K-means clustering technique.
Dated this 10th day of September 2024
Signature
Jinsu Abraham
Patent Agent (IN/PA-3267)
Agent for the Applicant
| # | Name | Date |
|---|---|---|
| 1 | 202441068361-STATEMENT OF UNDERTAKING (FORM 3) [10-09-2024(online)].pdf | 2024-09-10 |
| 2 | 202441068361-REQUEST FOR EARLY PUBLICATION(FORM-9) [10-09-2024(online)].pdf | 2024-09-10 |
| 3 | 202441068361-PROOF OF RIGHT [10-09-2024(online)].pdf | 2024-09-10 |
| 4 | 202441068361-POWER OF AUTHORITY [10-09-2024(online)].pdf | 2024-09-10 |
| 5 | 202441068361-FORM-9 [10-09-2024(online)].pdf | 2024-09-10 |
| 6 | 202441068361-FORM FOR STARTUP [10-09-2024(online)].pdf | 2024-09-10 |
| 7 | 202441068361-FORM FOR SMALL ENTITY(FORM-28) [10-09-2024(online)].pdf | 2024-09-10 |
| 8 | 202441068361-FORM 1 [10-09-2024(online)].pdf | 2024-09-10 |
| 9 | 202441068361-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [10-09-2024(online)].pdf | 2024-09-10 |
| 10 | 202441068361-EVIDENCE FOR REGISTRATION UNDER SSI [10-09-2024(online)].pdf | 2024-09-10 |
| 11 | 202441068361-DRAWINGS [10-09-2024(online)].pdf | 2024-09-10 |
| 12 | 202441068361-DECLARATION OF INVENTORSHIP (FORM 5) [10-09-2024(online)].pdf | 2024-09-10 |
| 13 | 202441068361-COMPLETE SPECIFICATION [10-09-2024(online)].pdf | 2024-09-10 |
| 14 | 202441068361-STARTUP [11-09-2024(online)].pdf | 2024-09-11 |
| 15 | 202441068361-FORM28 [11-09-2024(online)].pdf | 2024-09-11 |
| 16 | 202441068361-FORM-8 [11-09-2024(online)].pdf | 2024-09-11 |
| 17 | 202441068361-FORM 18A [11-09-2024(online)].pdf | 2024-09-11 |
| 18 | 202441068361-FORM-26 [26-09-2024(online)].pdf | 2024-09-26 |
| 19 | 202441068361-FER.pdf | 2024-10-03 |
| 20 | 202441068361-REQUEST FOR CERTIFIED COPY [08-11-2024(online)].pdf | 2024-11-08 |
| 21 | 202441068361-FORM28 [08-11-2024(online)].pdf | 2024-11-08 |
| 22 | 202441068361-FORM 3 [12-12-2024(online)].pdf | 2024-12-12 |
| 23 | 202441068361-Proof of Right [24-12-2024(online)].pdf | 2024-12-24 |
| 24 | 202441068361-OTHERS [06-02-2025(online)].pdf | 2025-02-06 |
| 25 | 202441068361-FER_SER_REPLY [06-02-2025(online)].pdf | 2025-02-06 |
| 26 | 202441068361-PatentCertificate01-04-2025.pdf | 2025-04-01 |
| 27 | 202441068361-IntimationOfGrant01-04-2025.pdf | 2025-04-01 |
| 1 | SearchStrategyE_01-10-2024.pdf |
| 2 | 202441068361_SearchStrategyAmended_E_AmendedSearchAE_12-02-2025.pdf |