Abstract: This disclosure relates generally to machine fault diagnosis, and more particularly, to a method and system for CycleGAN based unsupervised domain adaptation for machine fault diagnosis. The performance of conventional methods for machine fault diagnosis is less when a large difference exists in data distribution of source domain and target domain. The embodiments of the present disclosure provide a CycleGAN based unsupervised domain adaptation for machine fault diagnosis. The disclosed method learns a feature extractor and classifier from labelled source domain data. Further a CycleGAN is utilized to learn source and target feature mapping. The target features translated to the source domain is further used to predict labels for target data using the classifier. The disclosed method is used for fault classification in physically different machines using data translation between source and target domain.
DESC:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
METHOD AND SYSTEM FOR CYCLEGAN BASED UNSUPERVISED DOMAIN ADAPTATION FOR MACHINE FAULT DIAGNOSIS
Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
The present application claims priority from Indian provisional patent application no. 202221063054, filed on November 04, 2022. The entire contents of the aforementioned application are incorporated herein by reference.
TECHNICAL FIELD
The disclosure herein generally relates to the field of machine fault diagnosis and more particularly, to a method and system for Cycle Generative Adversarial Network (CycleGAN) based unsupervised domain adaptation for machine fault diagnosis.
BACKGROUND
Machine condition monitoring and fault diagnosis aims to improve the reliability and life of the machine by reducing maintenance cost, downtime, and productivity loss. Hence it is crucial for industries. Rolling bearings and gearboxes play a major role among other machine components and are often used under extreme loads and hazardous environments. This makes them more vulnerable to damage. Hence, it is essential to monitor the health of the machine before it leads to a potential breakdown.
Different types of sensing techniques based on current, acoustics, and vibration have been employed for machine fault diagnosis. Among these, vibration signals are extensively used for machine fault analysis, especially bearings and gearboxes. There has been a lot of work on machine learning (ML) based techniques that use domain knowledge for fault diagnosis. These techniques rely on domain-specific features extracted from time, frequency, and time-frequency for fault detection. However, extracting domain-specific information becomes very costly and time-consuming when the application environment becomes complex. Deep Learning (DL) based algorithms address this problem, by learning features directly from the data for machine fault diagnosis. Also, sparse autoencoder and deep belief networks are used in another prior method for feature fusion based machine fault diagnosis. Another prior method presented an unsupervised feature learning method using artificial intelligence (AI) for intelligent fault diagnosis. While these methods perform well, all of them require a huge amount of labeled data for training the model, which is seldom available in practical application scenarios. Moreover, these methods assume that the training and test data follow the same distribution. However, in practice, it is seldom the case, and these models fail if the distribution discrepancy is not suitably addressed.
SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for CycleGAN based unsupervised domain adaptation for machine fault diagnosis is provided. The method includes receiving (i) a set of source domain data with an associated set of source domain labels and (ii) a set of target domain data, wherein the set of source domain data is associated with a source domain and the set of target domain data is associated with a target domain. Further the method includes learning (i) a set of feature extractor weights for training a feature extractor model and (ii) a set of classifier weights for training a classifier model, wherein the feature extractor model and the classifier model is trained using the set of source domain data and the associated set of source domain labels. Furthermore, the method includes training a CycleGAN for performing data translation between the source domain and the target domain utilizing the trained feature extractor model. The CycleGAN comprises (i) a first generator (G) (ii) a second generator (F) and (iii) two discriminators (D_b and D_a). The step of training the CycleGAN comprises, computing a first set of features for the set of source domain data using the trained feature extractor model. Further the step of training the Cycle GAN includes computing a second set of features for the set of target domain data using the trained feature extractor model. Furthermore, the step of training the CycleGAN includes updating a set of weights associated with (i) the first generator (ii) the second generator and (iii) the two discriminators based on the first set of features and the second set of features. The first generator and the second generator comprise at least one dense layer and at least four one dimensional convolutional neural network layers and wherein the discriminator comprises at least three one dimensional convolutional neural network layers and at least two dense layers.
In another aspect, a system for CycleGAN based unsupervised domain adaptation for machine fault diagnosis is provided. The system comprises memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to receive (i) a set of source domain data with an associated set of source domain labels and (ii) a set of target domain data, wherein the set of source domain data is associated with a source domain and the set of target domain data is associated with a target domain. Further the system includes to learn (i) a set of feature extractor weights for training a feature extractor model and (ii) a set of classifier weights for training a classifier model, wherein the feature extractor model and the classifier model is trained using the set of source domain data and the associated set of source domain labels. Furthermore, the system includes to train a CycleGAN for performing data translation between the source domain and the target domain utilizing the trained feature extractor model. The CycleGAN comprises (i) a first generator (G) (ii) a second generator (F) and (iii) two discriminators (D_b and D_a). The step of training the CycleGAN comprises, computing a first set of features for the set of source domain data using the trained feature extractor model. Further the step of training the CycleGAN includes computing a second set of features for the set of target domain data using the trained feature extractor model. Furthermore, the step of training the CycleGAN includes updating a set of weights associated with (i) the first generator (ii) the second generator and (iii) the two discriminators based on the first set of features and the second set of features. The first generator and the second generator comprise at least one dense layer and at least four one dimensional convolutional neural network layers and wherein the discriminator comprises at least three one dimensional convolutional neural network layers and at least two dense layers.
The source domain and the target domain are anyone of (i) physically different machines or (ii) different working conditions of a same machine. The feature extractor model and the classifier model are trained using a cross entropy loss. The CycleGAN is trained using a GAN loss, an identity loss and a cycle-consistency loss. A first generator mapping function associated with the first generator and a second generator mapping function associated with the second generator is learnt. The learning of the first generator mapping function is represented as G: A ? B and the learning of the second generator mapping function is represented as F: B ?A, wherein A is the first set of features and B is the second set of features. The GAN loss associated with learning the first generator mapping function is represented as,
L_gan (G,D_b,A,B)=E_(b~P(b)) [logD_b (b)]+E_(a~P(a)) [log?(1-D_b (G(a)))]
wherein a and b are data samples in a feature space of the first set of features and the second set of features and wherein data distribution of a and b are denoted as a ~ P (a) and b ~ P (b).The GAN loss associated with learning the second generator mapping function is represented as,
L_gan (F,D_a,B,A)=E_(a~P(a)) [logD_a (a)]+E_(b~P(b)) [log?(1-D_a (F(b)))]
The identity loss is represented as,
L_identity (G,F)=E_(a~P(a) ) [?F(a)-a?_1 ]+E_(b~P(b) ) [?G(b)-b?_1 ]
The cycle consistency loss is represented as,
L_cyc (G,F)=E_(a~P(a)) [?F(G(a))-a?_1 ]+E_(b~P(b)) [?G(F(b))-b?_1 ]
The system includes predicting a set of target labels associated with a set of target domain test data using the trained classifier model, wherein predicting the set of target labels comprises receiving set of target domain test data. Further the system includes computing a set of target domain features for the set of target domain test data using the trained feature extractor model and mapping the set of target domain features to a set of source domain features using the learnt weights of the second generator of the learnt CycleGAN. Furthermore, the system includes predicting the set of target labels associated with a set of target domain test data using the trained classifier model.
In yet another aspect, there is provided a computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device causes the computing device for CycleGAN based unsupervised domain adaptation for machine fault diagnosis by receiving (i) a set of source domain data with an associated set of source domain labels and (ii) a set of target domain data, wherein the set of source domain data is associated with a source domain and the set of target domain data is associated with a target domain. Further the computer readable program includes learning (i) a set of feature extractor weights for training a feature extractor model and (ii) a set of classifier weights for training a classifier model. The feature extractor model and the classifier model is trained using the set of source domain data and the associated set of source domain labels. Furthermore, the computer readable program includes training a CycleGAN for performing data translation between the source domain and the target domain utilizing the trained feature extractor model. The CycleGAN comprises (i) a first generator (G) (ii) a second generator (F) and (iii) two discriminators (D_b and D_a). The step of training the CycleGAN comprises, computing a first set of features for the set of source domain data using the trained feature extractor model. Further the step of training the CycleGAN includes computing a second set of features for the set of target domain data using the trained feature extractor model Furthermore, the step of training the CycleGAN includes updating a set of weights associated with (i) the first generator (ii) the second generator and (iii) the two discriminators based on the first set of features and the second set of features. The first generator and the second generator comprises at least one dense layer and at least four one dimensional convolutional neural network layers and wherein the discriminator comprises at least three one dimensional convolutional neural network layers and at least two dense layers.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG. 1 illustrates an exemplary block diagram of a system for Cycle Generative Adversarial Network (CycleGAN) based unsupervised domain adaptation for machine fault diagnosis, in accordance with some embodiments of the present disclosure.
FIG. 2 is a schematic diagram depicting process flow of a method for CycleGAN based unsupervised domain adaptation for machine fault diagnosis according to some embodiments of the present disclosure.
FIG. 3A and FIG. 3B is an exemplary flow diagram depicting training phase steps of the method for CycleGAN based unsupervised domain adaptation for machine fault diagnosis according to some embodiments of the present disclosure.
FIG. 4 is an exemplary flow diagram depicting testing phase steps of the method for CycleGAN based unsupervised domain adaptation for machine fault diagnosis according to some embodiments of the present disclosure.
FIG. 5 illustrates a block diagram of CycleGAN architecture for data translation according to some embodiments of the present disclosure.
FIG. 6A and FIG. 6B depicts a detailed network architecture of the CycleGAN according to some embodiments of the present disclosure.
FIG. 7 depicts a graphical representation of loss plot of CycleGAN according to some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
Domain Adaptation (DA) has emerged as one of the reliable methods to handle the distribution discrepancy between training (source domain) and test (target domain) data. DA aims to improve the performance of an algorithm on (limited labeled or unlabeled) target domain data by utilizing the knowledge learned from sufficiently labeled source domain data. It tries to learn domain invariant features using different techniques including mapping and adversarial learning based methods. DA is widely used in various fields, such as computer vision, financial engineering, medical image analysis, structural health monitoring with a recent focus on machine fault diagnosis.
Most of the prior works utilize mapping based DA methods for machine fault diagnosis, like Joint Maximum Mean Discrepancy (JMMD), Multi Kernels Maximum Mean Discrepancy (MK-MMD), and CORrelation Alignment (CORAL). These methods learn a transformation that extracts domain invariant features by mapping the source and target domain features to a common subspace. However, they fail to perform well when a large difference exists in the data distribution of the two domains. To address this, some of the methods employ adversarial learning to align the distribution between source and target domain for intelligent fault diagnosis like, Domain Adversarial Neural Network (DANN), Conditional Domain Adversarial Network (CDAN), and Adversarial Multiple-target Domain Adaptation (AMDA). A detailed comparative review of the various mapping and adversarial based DA techniques for bearing fault diagnosis have shown that the adversarial learning based methods perform well compared to mapping based methods. However, the data translation using the conventional Generative Adversarial Networks (GANs) does not guarantee class discriminability. Hence, Cycle Generative Adversarial Networks (CycleGANs) that use cycle-consistency loss in the adversarial learning has been employed for improved data translation that maintains the class distinguishability. CycleGAN has the ability to learn the mapping between the two domains for unpaired data.
In many practical application scenarios for machine fault diagnosis, access to labeled data is seldom available for every machine; hence, it is desired to transfer the knowledge acquired using the labeled data from one machine to different but related machines. This is a challenging problem as there will be a significant change in the data distribution of the two domains. All the prior works mentioned above consider adaptation between different working conditions, but only of the same machine and do not consider the difficult case of different machines, which is essential in practice.
The embodiments herein provide a method and system for CycleGAN based unsupervised domain adaptation for machine fault diagnosis. The disclosed method utilizes CycleGAN for unpaired data translation. The present disclosure uses a labeled source domain data for learning a feature extractor and a classifier in a supervised setting. CycleGAN is employed to learn the feature mapping between the source and target domain in an unsupervised manner. Subsequently, the target features are mapped to the source domain and used to predict the target labels using the learned classifier.
Referring now to the drawings, and more particularly to FIG. 1 through FIG. 7, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 illustrates an exemplary block diagram of a system 100 for CycleGAN based unsupervised domain adaptation for machine fault diagnosis. In an embodiment, the system 100 includes one or more processors 102, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 104 operatively coupled to the one or more processors 102. The one or more processors 102 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, graphics controllers, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) are configured to fetch and execute computer-readable instructions stored in the memory. In the context of the present disclosure, the expressions ‘processors’ and ‘hardware processors’ may be used interchangeably. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
The I/O interface (s) 106 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface(s) can include one or more ports for connecting a number of devices to one another or to another server.
The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
Further, the memory 104 may include a database or repository which may store source domain data and target domain data. The memory 104 may comprise information pertaining to input(s)/output(s) of each step performed by the processor(s) 102 of the system 100 and methods of the present disclosure. In an embodiment, the database may be external (not shown) to the system 100 and coupled via the I/O interface 106.
FIG. 2 is a schematic diagram depicting process flow of a method for CycleGAN based unsupervised domain adaptation for machine fault diagnosis according to some embodiments of the present disclosure. As depicted, the method includes three steps or three phases, wherein the step I provides the approach used for training a feature extractor (FE) and a classifier (C) from a labelled source domain data. The step II of the disclosed method provides the approach used for training a CycleGAN to learn the source and target feature mapping. In the step III of the disclosed method, the target features are translated to the source domain and used to predict the labels for the target data using classifier (C). FIG. 2 will be explained in detail in conjunction with the flow diagram depicting training phase steps and testing phase steps of the method for CycleGAN based unsupervised domain adaptation for machine fault diagnosis in FIG. 3A through FIG. 3B and FIG. 4.
FIG. 3A and FIG. 3B is an exemplary flow diagram depicting training phase steps of a method 300 for CycleGAN based unsupervised domain adaptation for machine fault diagnosis according to some embodiments of the present disclosure.
In an embodiment, the system 100 comprises one or more data storage devices or the memory 104 operatively coupled to the processor(s) 102 and is configured to store instructions for execution of steps of the method 300 by the processor(s) or one or more hardware processors 102. The steps of the method 300 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG. 1 and the steps of flow diagram as depicted in FIGS. 3A and 3B. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods, and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
At step 302 of the method 300, the one or more hardware processors 102 are configured to receive (i) a set of source domain data with an associated set of source domain labels and (ii) a set of target domain data, wherein the set of source domain data is associated with a source domain and the set of target domain data is associated with a target domain. Let X_s={x_s^i,y_s^i }_(i=1)^ndenote the labeled data from source domain S where x_s and y_s is the raw data and labels respectively for n samples, where y_s?{1,2,…,m} and m is the number of classes. The unlabeled data from target domain T be represented as X_t={x_t^i }_(i=1)^n. In an embodiment, the source domain and the target domain are physically different but similar machines having different data distribution. For example, source domain may be a reference data from lab setup or simulator and target domain may be experimental setup from real machines. For domain adaptation, the label and feature space should be same for both source and target domain.
At step 304 of the method 300, the one or more hardware processors 102 are configured to learn (i) a set of feature extractor weights for training a feature extractor model and (ii) a set of classifier weights for training a classifier model. The feature extractor model and the classifier model is trained using the set of source domain data and the associated set of source domain labels. The feature extractor model and the classifier model are trained using a cross entropy loss. The step I depicted in FIG. 2 is the supervised learning of the feature extractor (FE) model and the classifier (C) model from the set of source domain data and the associated set of source domain labels. A deep neural network is employed to learn FE and C using the well-known Cross entropy loss L_c given as in equation (1),
L_c=-1/n ?_(i=1)^n¦1{y_s^i=j} log??C(FE(x_s^i ?)) (1)
where ?? is the predicted class and 1{.} is an indicator function, which returns 1 when the argument is true and 0 otherwise.
At step 306 of the method 300, the one or more hardware processors 102 are configured to train the CycleGAN for performing data translation between the source domain and the target domain utilizing the trained feature extractor model. The CycleGAN comprises (i) a first generator (G) (ii) a second generator (F) and (iii) two discriminators (D_b and D_a). The step II depicted in FIG. 2 utilizes a CycleGAN to learn the source and target feature mapping. In an embodiment, the CycleGAN is trained using a GAN loss, an identity loss and a cycle-consistency loss. The step 306 comprises a set of steps from 306a to 306c explaining the training of the CycleGAN. At step 306a of the method 300, the one or more hardware processors 102 are configured to compute a first set of features for the set of source domain data using the trained feature extractor model.
At step 306b of the method 300, the one or more hardware processors 102 are configured to compute a second set of features for the set of target domain data using the trained feature extractor model. The learnt ???? is used to extract source and target domain features as: A = FE(X_s) and B = FE(X_t ). Subsequently, these features are used to learn a 1D-CycleGAN for data translation in the feature space.
At step 306c of the method 300, the one or more hardware processors 102 are configured to update a set of weights associated with (i) the first generator (ii) the second generator and (iii) the two discriminators based on the first set of features and the second set of features. FIG. 5 illustrates a block diagram of CycleGAN architecture for data translation according to some embodiments of the present disclosure. It employs two GANs, with generator functions G: A ? B and F: B ?A for learning the mappings and discriminators D_b and D_a, respectively. Each of the GANs are trained using a two-player min-max game, where the generator is learned to fool the discriminator by generating domain invariant features until equilibrium is achieved between them. A first generator mapping function associated with the first generator and a second generator mapping function associated with the second generator is learnt. The learning of the first generator mapping function is represented as G: A ? B and the learning of the second generator mapping function is represented as F: B ?A, wherein A is the first set of features and B is the second set of features. The adversarial loss L_(gan ) (G,D_b,A,B) considered for learning the first generator mapping function G and its discriminator D_b is expressed as in equation (2),
L_gan (G,D_b,A,B)=E_(b~P(b)) [logD_b (b)]+E_(a~P(a)) [log?(1-D_b (G(a)))] (2)
wherein a and b are data samples in a feature space of the first set of features and the second set of features and wherein data distribution of a and b are denoted as a ~ P (a) and b ~ P (b). The GAN loss associated with learning the second generator mapping function F is represented as,
L_gan (F,D_a,B,A)=E_(a~P(a)) [logD_a (a)]+E_(b~P(b)) [log?(1-D_a (F(b)))] (3)
Referring to FIG. 5, B ^ and A ^ are the fake samples generated in the feature space for the target and source domain using G and F, respectively. (a ) ^ and b ^ are the reconstructed samples in the source and target domain. To ensure that the mapping is consistent, cycle-consistency loss and identity loss are introduced, which ensures that the forward and reverse translation brings the data back to the original domain.
Identity loss (L_identity (G,F)) is used to enhance the ability to extract common features between domains during the cross-domain transfer process. When sample a is input into G, the expected output is b. It is to ensure that when sample b is input to G, the obtained G (b) should also be similar to the input b, to prove that G has the ability to generate data belonging to B and, similarly for the case of a input to F. The identity loss can be expressed as in equation (4):
L_identity (G,F)=E_(a~P(a) ) [?F(a)-a?_1 ]+E_(b~P(b) ) [?G(b)-b?_1 ] (4)
Identity loss ensures that the probability distribution of data features extracted by the generators on both sides are consistent, avoiding excessive feature transfer.
The cycle-consistency loss L_cyc (G,F) calculated between the original source domain data a and the reconstructed (a ) ^, similarly between b and b ^ is expressed as in equation (5),
L_cyc (G,F)=E_(a~P(a)) [?F(G(a))-a?_1 ]+E_(b~P(b)) [?G(F(b))-b?_1 ] (5)
The complete loss L_CG of the CycleGAN is given as in equation (6),
L_CG=L_gan (G,D_b,A,B)+L_gan (F,D_a,B,A)+?L_cyc (G,F)+0.5?L_identity (G,F) (6)
where the hyper-parameter ? is used to control the weightage of the cycle-consistency loss.
FIG. 6A and FIG. 6B depicts a detailed network architecture of the CycleGAN according to some embodiments of the present disclosure. The CycleGAN generators are implemented using one dense layer and four 1D-CNN (1Dimensional Convolution Neural Network) layers as shown in FIG. 6A. The CycleGAN discriminators are implemented using three 1D-CNN layers and two dense layers as shown in FIG. 6B. Adam optimizer has been used for implementation.
FIG. 4 is an exemplary flow diagram depicting testing phase steps of the method for CycleGAN based unsupervised domain adaptation for machine fault diagnosis according to some embodiments of the present disclosure. The learned CycleGAN, the learnt FE and the learnt classifier are used for domain adaptation for machine fault diagnosis. At step 402, the one or more hardware processors 102 are configured to receive the set of target domain test data. At step 404, the one or more hardware processors 102 are configured to compute a set of target domain features for the set of target domain test data using the trained feature extractor model. At step 406, the one or more hardware processors 102 are configured to map the set of target domain features to a set of source domain features using the learnt weights of the second generator of the learnt CycleGAN. At step 408, the one or more hardware processors 102 are configured to predict the set of target labels associated with a set of target domain test data using the trained classifier model C.
EXPERIMENTAL RESULTS: The disclosed method for fault diagnosis is evaluated using publicly available bearing datasets.
CWRU Dataset: This is a bearing fault dataset from Case Western Reserve University (CWRU) containing vibration signals acquired from the drive and fan end of the machine. The data is collected at 12 kHz for four different loading conditions (0, 1, 2, and 3 Horsepower (Hp)) with 1797, 1772, 1750, and 1730 rpm, respectively, as shown in Table 1. Bearing faults of different sizes (0.007, 0.014, 0.021 inches) are induced using electro-discharge machining (EDM). The data for each working condition is classified into four classes - Normal, Inner-race Fault (IF), Outer-race Fault (OF), and Bearing-race Fault (BF).
Paderborn Dataset: This bearing fault dataset is collected from a test rig consisting of a drive motor, a torque measurement shaft, the test modules, and a load motor. Here, inner ring and outer ring damage is introduced in the cylindrical roller bearings. Data from both real and artificially damaged bearings are available. It contains stator current and vibration data collected at 64 kHz for two rotating speeds (900 and 1500 rpm) with two loading torques (0.7 and 0.1 Nm). The data has three classes - Normal, Inner-race Fault (IF), and Outer-race Fault (OF). The vibration sensor data has been considered for analysis with faults introduced using EDM. The data specifications for the Paderborn Dataset are summarized in Table 2.
For the implementation of the disclosed method five 1D-CNN layers have been used to implement the ???? module. The classifier ?? is built using a fully connected neural network with a softmax layer. CycleGAN is trained for 100 epochs. FIG. 7 depicts a graphical representation of loss plot of CycleGAN according to some embodiments of the present disclosure. It can be seen that the CycleGAN converges quickly, within a few epochs.
The disclosed method has been evaluated for two scenarios, same machine scenario and different machine scenario. The disclosed method has been compared with the six state-of-the-art methods for unsupervised domain adaptation for machine fault diagnosis. Out of these, three are mapping based methods, namely Joint Maximum Mean Discrepancy (JMMD), Multi Kernels Maximum Mean Discrepancy (MK-MMD), and CORrelation Alignment (CORAL). The remaining three methods are based on adversarial learning, namely Domain Adversarial Neural Network (DANN), Conditional Domain Adversarial Network (CDAN), and Adversarial Multiple-target Domain Adaptation (AMDA).
In the same machine scenario, data from different working conditions of the same machine form the source and target domain. Here, only A and D working conditions of the CWRU dataset (Table 1), collected from the drive end, are considered for experimentation due to large domain discrepancy compared to other working conditions. Here, the pre-processing involved taking a sliding window of size 4096 with a shifting step of 290, resulting in 4000 samples for each working condition. A10-class classification problem is considered (1 Normal and 3 faults i.e., IF, OF, BF for 3 different fault sizes).
The different machine scenario considers a more challenging problem where data from different but related machines are considered as source and target domains. Here, CWRU (0 Hp motor torque and 0.007 inch fault size) and Paderborn data (900 rpm and 0.7 loading torque) are considered for evaluation. The Paderborn dataset is down sampled to 12 kHz to match with the CWRU sampling frequency. The pre-processing involved taking a sliding window of size 4096 with a shifting step of 160, resulting in 2175 samples for each machine (domain). A 3-class classification problem (Normal, IF, OF) is considered for this task.
The performance is evaluated using accuracy, precision, recall, and F1-score. 50% train-test split have been considered for all the experiments. For each scenario, the experiments are conducted five times, with random train-test splits and the average values are summarized in Tables 3 through Table 6. In addition to the benchmark methods, the results without adaptation obtained with the disclosed method (No adaptation) are presented to emphasize on the adaptation capability of the disclosed method. Following the notation, S ?T for adaptation from source to target, Table 3 and Table 4 provide the results for same machine scenario for A ? D and D ? A respectively. It can be observed that most of the state-of-the-art methods perform well for this case as the domain discrepancy between the source and target is not significant. While the AMDA method performed the best, the disclosed method also provided comparable results. Here, a 4% improvement in results is observed for the disclosed method due to adaptation compared to the case with no adaptation for both the cases.
Metrics JMMD MK-MMD CORAL DANN CDAN AMDA Disclosed method (No Adaptation) Disclosed method (Adaptation)
Accuracy 96.24 91.65 87.28 96.4 95.7 96.96 91.95 96.26
Precision 97.04 94.48 88.29 96.80 96.06 97.22 92.83 96.55
Recall 96.24 91.65 87.28 96.4 95.7 96.96 91.95 96.26
F1 score 96.08 90.19 86.44 96.37 95.67 96.63 91.93 96.29
Table 3
Metrics JMMD MK-MMD CORAL DANN CDAN AMDA Disclosed method (No Adaptation) Disclosed method (Adaptation)
Accuracy 94.48 91 87.36 93.02 93.35 95.52 91.65 95.26
Precision 94.82 92.68 86.5 93.65 93.59 95.29 93.38 94.95
Recall 94.48 91 87.36 93.02 93.35 95.52 91.65 95.26
F1 score 93.98 90.27 86.27 92.67 93.06 95.31 90.78 95.19
Table 4
Table 5 and Table 6 present the results for the different machine scenario for ???????? ? ?????????????????? and ?????????????????? ? ???????? respectively. This is a more challenging scenario as there is significant data distribution change between the two domains. From both the tables, it can be observed that the disclosed method performs significantly better than the competing state-of-the-art methods. This improved performance can be attributed to CycleGAN which is able to learn the mapping between the two domains more effectively. Here, 27% and 24% improvement in results are observed for the disclosed method due to adaptation compared to the case with no adaptation for CWRU ? Paderborn and Paderborn ? CWRU respectively.
Metrics JMMD MK-MMD CORAL DANN CDAN AMDA Disclosed method (No Adaptation) Disclosed method (Adaptation)
Accuracy 40.91 35.79 33.17 55.79 58.18 86.77 66.26 93.74
Precision 39.07 35.61 11.23 45.94 47.12 84.57 49.49 95.04
Recall 40.91 35.79 33.17 55.79 58.18 86.77 66.26 93.74
F1 score 39.85 35.7 16.67 48.39 50.2 84.61 55.06 93.61
Table 5
Metrics JMMD MK-MMD CORAL DANN CDAN AMDA Disclosed method (No Adaptation) Disclosed method (Adaptation)
Accuracy 38.76 35.57 34.48 44.14 53.97 87.33 66.91 90.97
Precision 35.72 32.73 23.4 41.49 47.65 87.84 50.09 92.23
Recall 38.76 35.57 34.48 44.14 53.97 87.33 66.91 90.97
F1 score 37.51 34.29 29.08 43.12 49.6 87.25 55.76 90.67
Table 6
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
,CLAIMS:
A processor implemented method (300) comprising:
receiving (302), via one or more hardware processors, (i) a set of source domain data with an associated set of source domain labels and (ii) a set of target domain data, wherein the set of source domain data is associated with a source domain and the set of target domain data is associated with a target domain;
learning (304), via the one or more hardware processors, (i) a set of feature extractor weights for training a feature extractor model and (ii) a set of classifier weights for training a classifier model, wherein the feature extractor model and the classifier model is trained using the set of source domain data and the associated set of source domain labels; and
training (306), via the one or more hardware processors, a Cycle Generative Adversarial Network (CycleGAN) for performing data translation between the source domain and the target domain utilizing the trained feature extractor model, wherein the CycleGAN comprises (i) a first generator (G) (ii) a second generator (F) and (iii) two discriminators (D_b and D_a), the step comprising,
computing (306a) a first set of features for the set of source domain data using the trained feature extractor model;
computing (306b) a second set of features for the set of target domain data using the trained feature extractor model; and
updating (306c) a set of weights associated with (i) the first generator (ii) the second generator and (iii) the two discriminators based on the first set of features and the second set of features, wherein the first generator and the second generator comprises at least one dense layer and at least four one dimensional convolutional neural network layers and wherein the discriminator comprises at least three one dimensional convolutional neural network layers and at least two dense layers.
The method as claimed in claim1, wherein the source domain and the target domain are anyone of (i) physically different machines or (ii) different working conditions of a same machine.
The method as claimed in claim1, wherein the feature extractor model and the classifier model are trained using a cross entropy loss.
The method as claimed in claim1, wherein the CycleGAN is trained using a GAN loss, an identity loss and a cycle-consistency loss.
The method as claimed in claim1, wherein a first generator mapping function associated with the first generator and a second generator mapping function associated with the second generator is learnt and wherein the learning of the first generator mapping function is represented as G: A ? B and the learning of the second generator mapping function is represented as F: B ?A, wherein A is the first set of features and B is the second set of features.
The method as claimed in claim1, wherein the GAN loss associated with learning the first generator mapping function is represented as,
L_gan (G,D_b,A,B)=E_(b~P(b)) [logD_b (b)]+E_(a~P(a)) [log?(1-D_b (G(a)))]
wherein a and b are data samples in a feature space of the first set of features and the second set of features and wherein data distribution of a and b are denoted as a ~ P (a) and b ~ P (b).
The method as claimed in claim1, wherein the GAN loss associated with learning the second generator mapping function is represented as,
L_gan (F,D_a,B,A)=E_(a~P(a)) [logD_a (a)]+E_(b~P(b)) [log?(1-D_a (F(b)))]
The method as claimed in claim1, wherein the identity loss is represented as,
L_identity (G,F)=E_(a~P(a) ) [?F(a)-a?_1 ]+E_(b~P(b) ) [?G(b)-b?_1 ]
The method as claimed in claim1, wherein the cycle consistency loss is represented as,
L_cyc (G,F)=E_(a~P(a)) [?F(G(a))-a?_1 ]+E_(b~P(b)) [?G(F(b))-b?_1 ]
The method as claimed in claim1, comprising predicting (400) a set of target labels associated with a set of target domain test data using the trained classifier model, wherein predicting the set of target labels comprises:
receiving (402), via the one or more hardware processors, the set of target domain test data;
computing (404), via the one or more hardware processors, a set of target domain features for the set of target domain test data using the trained feature extractor model;
mapping (406), via the one or more hardware processors, the set of target domain features to a set of source domain features using the learnt weights of the second generator of the learnt CycleGAN; and
predicting (408), via the one or more hardware processors, the set of target labels associated with a set of target domain test data using the trained classifier model.
A system (100), comprising:
a memory (104) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (102) coupled to the memory (104) via the one or more communication interfaces (106), wherein the one or more hardware processors (102) are configured by the instructions to:
receive (i) a set of source domain data with an associated set of source domain labels and (ii) a set of target domain data, wherein the set of source domain data is associated with a source domain and the set of target domain data is associated with a target domain;
learn (i) a set of feature extractor weights for training a feature extractor model and (ii) a set of classifier weights for training a classifier model, wherein the feature extractor model and the classifier model is trained using the set of source domain data and the associated set of source domain labels; and
train a CycleGAN for performing data translation between the source domain and the target domain utilizing the trained feature extractor model, wherein the CycleGAN comprises (i) a first generator (G) (ii) a second generator (F) and (iii) two discriminators (D_b and D_a), the step comprising,
compute a first set of features for the set of source domain data using the trained feature extractor model;
compute a second set of features for the set of target domain data using the trained feature extractor model; and
update a set of weights associated with (i) the first generator (ii) the second generator and (iii) the two discriminators based on the first set of features and the second set of features, wherein the first generator and the second generator comprises at least one dense layer and at least four one dimensional convolutional neural network layers and wherein the discriminator comprises at least three one dimensional convolutional neural network layers and at least two dense layers.
The system as claimed in claim 11, wherein the source domain and the target domain are anyone of (i) physically different machines or different working conditions of a same machine.
The system as claimed in claim 11, wherein the feature extractor model and the classifier model are trained using a cross entropy loss.
The system as claimed in claim 11, wherein the CycleGAN is trained using a GAN loss, an identity loss and a cycle-consistency loss.
The system as claimed in claim 11, wherein a first generator mapping function associated with the first generator and a second generator mapping function associated with the second generator is learnt and wherein the learning of the first generator mapping function is represented as G: A ? B and the learning of the second generator mapping function is represented as F: B ?A, wherein A is the first set of features and B is the second set of features.
The system as claimed in claim 11, wherein the GAN loss associated with learning the first generator mapping function is represented as,
L_gan (G,D_b,A,B)=E_(b~P(b)) [logD_b (b)]+E_(a~P(a)) [log?(1-D_b (G(a)))]
wherein a and b are data samples in a feature space of the first set of features and the second set of features and wherein data distribution of a and b are denoted as a ~ P (a) and b ~ P (b).
The system as claimed in claim 11, wherein the GAN loss associated with learning the second generator mapping function is represented as,
L_gan (F,D_a,B,A)=E_(a~P(a)) [logD_a (a)]+E_(b~P(b)) [log?(1-D_a (F(b)))]
The system as claimed in claim 11, wherein the identity loss is represented as,
L_identity (G,F)=E_(a~P(a) ) [?F(a)-a?_1 ]+E_(b~P(b) ) [?G(b)-b?_1 ]
The system as claimed in claim 11, wherein the cycle consistency loss is represented as,
L_cyc (G,F)=E_(a~P(a)) [?F(G(a))-a?_1 ]+E_(b~P(b)) [?G(F(b))-b?_1 ]
The system as claimed in claim 11, comprising predicting a set of target labels associated with a set of target domain test data using the trained classifier model, wherein predicting the set of target labels comprises:
receiving the set of target domain test data;
computing a set of target domain features for the set of target domain test data using the trained feature extractor model;
mapping the set of target domain features to a set of source domain features using the learnt weights of the second generator of the learnt CycleGAN; and
predicting the set of target labels associated with a set of target domain test data using the trained classifier model.
| # | Name | Date |
|---|---|---|
| 1 | 202221063054-STATEMENT OF UNDERTAKING (FORM 3) [04-11-2022(online)].pdf | 2022-11-04 |
| 2 | 202221063054-PROVISIONAL SPECIFICATION [04-11-2022(online)].pdf | 2022-11-04 |
| 3 | 202221063054-FORM 1 [04-11-2022(online)].pdf | 2022-11-04 |
| 4 | 202221063054-DRAWINGS [04-11-2022(online)].pdf | 2022-11-04 |
| 5 | 202221063054-DECLARATION OF INVENTORSHIP (FORM 5) [04-11-2022(online)].pdf | 2022-11-04 |
| 6 | 202221063054-FORM-26 [20-12-2022(online)].pdf | 2022-12-20 |
| 7 | 202221063054-FORM 3 [23-01-2023(online)].pdf | 2023-01-23 |
| 8 | 202221063054-FORM 18 [23-01-2023(online)].pdf | 2023-01-23 |
| 9 | 202221063054-ENDORSEMENT BY INVENTORS [23-01-2023(online)].pdf | 2023-01-23 |
| 10 | 202221063054-DRAWING [23-01-2023(online)].pdf | 2023-01-23 |
| 11 | 202221063054-COMPLETE SPECIFICATION [23-01-2023(online)].pdf | 2023-01-23 |
| 12 | 202221063054-Proof of Right [24-01-2023(online)].pdf | 2023-01-24 |
| 13 | Abstract1.jpg | 2023-02-09 |
| 14 | 202221063054-FER.pdf | 2025-07-04 |
| 15 | 202221063054-FORM 3 [05-09-2025(online)].pdf | 2025-09-05 |
| 1 | 202221063054_SearchStrategyNew_E_202221063054E_04-02-2025.pdf |