Sign In to Follow Application
View All Documents & Correspondence

"Method And System For Generating Synthetic Time Domain Signals To Build A Classifier"

Abstract: State of the art systems and methods attempting to generate synthetic biosignals such as PPG generate patient specific PPG signatures and do not correlate with pathophysiological changes. Embodiments herein provide a method and system for generating synthetic time domain signals to build a classifier. The synthetic signals are generated using statistical explosion. Initially, a parent dataset of actual sample data of class and non-class subjects is identified, and statistical features are extracted. Kernel density estimate (KDE) is used to vary the feature distribution and create multiple data template from a single parent signal. PPG signal is again reconstructed from the distribution pattern using non-parametric techniques. The generated synthetic data set is used to build the two stage cascaded classifier to classify CAD and Non CAD, wherein the classifier design enables reducing bias towards any class. [To be published with FIG. 3]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
30 April 2020
Publication Number
45/2021
Publication Type
INA
Invention Field
ELECTRONICS
Status
Email
kcopatents@khaitanco.com
Parent Application

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th Floor, Nariman Point Mumbai Maharashtra India 400021

Inventors

1. BHATTACHARYA, Sakyaji
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
2. MUZUMDER, Oishee
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
3. SINHA, Aniruddha
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
4. ROY, Dibyendu
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160
5. GHOSE, Avik
Tata Consultancy Services Limited Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata West Bengal India 700160

Specification

FORM 2 THE PATENTS ACT, 1970 (39 of 1970) & THE PATENT RULES, 2003 COMPLETE SPECIFICATION (See Section 10 and Rule 13) Title of invention: METHOD AND SYSTEM FOR GENERATING SYNTHETIC TIME DOMAIN SIGNALS TO BUILD A CLASSIFIER Applicant Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956 Having address: Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India Preamble to the description The following specification particularly describes the invention and the manner in which it is to be performed. TECHNICAL FIELD [001] The embodiments herein generally relates to synthetic data generation for classifiers and, more particularly, to a method and system for generating synthetic time domain signals to build a classifier. BACKGROUND [002] Synthetic data generation has recently emerged as a substitution technique for handling the problem of bulk data needed in training machine learning algorithms. Healthcare, primarily cardiovascular domain is a major area where synthetic physiological data can be used to improve accuracy of the machine learning algorithms. Synthetic data is artificially generated data, used to mimic real world data while preserving some selected properties from the original data. This technique is argued by works in literature to be a more efficient way of getting labeled data for recognition as well as a mean to test performance of new software and scalability of Machine Learning (ML) techniques. A lot of research has been performed in the area of synthetic data generation in privacy community, speech processing and healthcare. Mostly, efficacy of the synthetic data are evaluated through improvement in machine learning (ML) techniques by introducing surrogate data in training sets. SUMMARY [003] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for generating synthetic time domain signals to build a classifier is provided. The method receives a parent dataset of a plurality of samples of a time domain signal of interest comprising a combination of a class data and an non-class data. [004] Further identifies a plurality of subsets, from the parent dataset, corresponding to a plurality of morphological features identified for the time domain signal of interest, wherein each subset among the plurality of subset comprises p samples, wherein the plurality of morphological features comprise a Peak Sample (Ps), a Peak Amplitude (Pa), a Trough Sample (Ts), a Trough Amplitude (Ta), a Notch Sample (Ns), a Notch Amplitude (Na), a Dip Sample (Ds), a Dip Amplitude (Da), and distance between left and right samples corresponding to the 25%, 50%, 75% of the (Pa) defining distances d1, d2, d3, and wherein the plurality of morphological features define a template for the time domain signal of interest. [005] Furthermore, processes each of the plurality of subsets corresponding to each of the plurality of morphological features to generate a plurality of sets of observational values with each of the plurality of sets of observational values comprising p actual values corresponding to each of the plurality of morphological features; [006] Furthermore, fits a gaussian kernel density estimate (KDE) to each of the plurality of sets of observational values; [007] Thereafter, generates N-point simulated data for each of the plurality of morphological features by generating N random samples from the gaussian KDE fitted to each of the plurality of sets of observational values; and [008] Further constructs a plurality of synthetic time domain signals for the time domain signal of interest from the N-point simulated data for each of the plurality of morphological features in accordance to the template, wherein the constructing of the plurality of synthetic time domain signals includes: a) determining a plurality of sequences derived from the generated N-point simulated data for each of the plurality of morphological features, wherein a plurality of elements in each of the plurality of sequences comprising Ts, Ps, Ds, Ns, r1 r2 r3, q1, q2 and q3 wherein values of q1, q2, q3, r1, r2 r3 are derived based on a preselected values of the distances d1, d2, d3 defined for the template, and wherein position of each of the plurality of elements within each of the plurality of sequences is based on a set of conditions defined by a predefined set of morphological features among the plurality of morphological features; b) generating from the plurality of sequences, a plurality of time domain signals corresponding to the time domain signal of interest by performing a spline fitting on each of the plurality of sequences, wherein the spline fitting utilizes piecewise linear regression to obtain parameters of lines connecting two successive elements among the plurality of elements of the each of the plurality of sequences; c) smoothening each of the plurality of time domain signals using a smoothening technique to generate a plurality of smoothened time domain signals; and d) applying a peak smoothening technique on each of the smoothened plurality of time domain signals to construct the plurality of synthetic time domain signals for the time domain signal of interest. [009] Furthermore uses a combination of the plurality of samples in the parent dataset and the plurality of synthetic time domain signals as a training data for building a two stage cascaded classifier for classifying input data corresponding to time domain signal of interest into one of a class data and a non-class data, and wherein the two stage cascaded classifier comprises: a first classifier utilizing Matusita distance for likeness measurement and data explosion driven decision rule for classification; and a second classifier utilizing a random forest technique for classification. The first classifier classifies the input data as the non-class data and the class data with ambiguity, wherein the non-class data at an output of the first classifier is identified as a final non-class data; and the second classifier classifies the received class data with ambiguity into a final class data. [0010] Thereafter utilizes the two stage cascaded classifier to validate the plurality of synthetic time domain signals. [0011] Thereafter utilizes the two stage cascaded classifier build using the training data to test real time data corresponding to the time domain signal of interest and classify the real time data into the final class data and the final non-class data. [0012] In another aspect, a system for generating synthetic time domain signals to build a classifier is provided. The system comprises a memory storing instructions; one or more Input/Output (I/O) interfaces; and one or more hardware processors coupled to the memory via the one or more I/O interfaces, wherein the one or more hardware processors are configured by the instructions to receive a parent dataset of a plurality of samples of a time domain signal of interest comprising a combination of a class data and an non-class data. [0013] Further identify a plurality of subsets, from the parent dataset, corresponding to a plurality of morphological features identified for the time domain signal of interest, wherein each subset among the plurality of subset comprises p samples, wherein the plurality of morphological features comprise a Peak Sample (Ps), a Peak Amplitude (Pa), a Trough Sample (Ts), a Trough Amplitude (Ta), a Notch Sample (Ns), a Notch Amplitude (Na), a Dip Sample (Ds), a Dip Amplitude (Da), and distance between left and right samples corresponding to the 25%, 50%, 75% of the (Pa) defining distances dv d2i d3, and wherein the plurality of morphological features define a template for the time domain signal of interest. [0014] Furthermore, process each of the plurality of subsets corresponding to each of the plurality of morphological features to generate a plurality of sets of observational values with each of the plurality of sets of observational values comprising p actual values corresponding to each of the plurality of morphological features; [0015] Furthermore, fit a gaussian kernel density estimate (KDE) to each of the plurality of sets of observational values; [0016] Thereafter, generate N-point simulated data for each of the plurality of morphological features by generating N random samples from the gaussian KDE fitted to each of the plurality of sets of observational values; and [0017] Further construct a plurality of synthetic time domain signals for the time domain signal of interest from the N-point simulated data for each of the plurality of morphological features in accordance to the template, wherein the constructing of the plurality of synthetic time domain signals includes: a) determine a plurality of sequences derived from the generated N-point simulated data for each of the plurality of morphological features, wherein a plurality of elements in each of the plurality of sequences comprising Ts, Ps, Ds, Ns, r1 r2 r3, q1, q2 and q3 wherein values of q1, q2, q3, r1, r2 r3 are derived based on a preselected values of the distances d1, d2, d3 defined for the template, and wherein position of each of the plurality of elements within each of the plurality of sequences is based on a set of conditions defined by a predefined set of morphological features among the plurality of morphological features; b) generate from the plurality of sequences, a plurality of time domain signals corresponding to the time domain signal of interest by performing a spline fitting on each of the plurality of sequences, wherein the spline fitting utilizes piecewise linear regression to obtain parameters of lines connecting two successive elements among the plurality of elements of the each of the plurality of sequences; c) smoothen each of the plurality of time domain signals using a smoothening technique to generate a plurality of smoothened time domain signals; and d) apply a peak smoothening technique on each of the smoothened plurality of time domain signals to construct the plurality of synthetic time domain signals for the time domain signal of interest. [0018] Furthermore use a combination of the plurality of samples in the parent dataset and the plurality of synthetic time domain signals as a training data for building a two stage cascaded classifier for classifying input data corresponding to time domain signal of interest into one of a class data and a non-class data, and wherein the two stage cascaded classifier comprises: a first classifier utilizing Matusita distance for likeness measurement and data explosion driven decision rule for classification; and a second classifier utilizing a random forest technique for classification. The first classifier classifies the input data as the non-class data and the class data with ambiguity, wherein the non-class data at an output of the first classifier is identified as a final non-class data; and the second classifier classifies the received class data with ambiguity into a final class data. [0019] Thereafter utilize the two stage cascaded classifier to validate the plurality of synthetic time domain signals. [0020] Thereafter utilize the two stage cascaded classifier build using the training data to test real time data corresponding to the time domain signal of interest and classify the real time data into the final class data and the final non-class data. [0021] In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions, which when executed by one or more hardware processors causes a method for generating synthetic time domain signals to build a classifier. [0022] The method receives a parent dataset of a plurality of samples of a time domain signal of interest comprising a combination of a class data and an non-class data. [0023] Further identifies a plurality of subsets, from the parent dataset, corresponding to a plurality of morphological features identified for the time domain signal of interest, wherein each subset among the plurality of subset comprises p samples, wherein the plurality of morphological features comprise a Peak Sample (Ps), a Peak Amplitude (Pa), a Trough Sample (Ts), a Trough Amplitude (Ta), a Notch Sample (Ns), a Notch Amplitude (Na), a Dip Sample (Ds), a Dip Amplitude (Da), and distance between left and right samples corresponding to the 25%, 50%, 75% of the (Pa) defining distances d1, d2: d3, and wherein the plurality of morphological features define a template for the time domain signal of interest. [0024] Furthermore, processes each of the plurality of subsets corresponding to each of the plurality of morphological features to generate a plurality of sets of observational values with each of the plurality of sets of observational values comprising p actual values corresponding to each of the plurality of morphological features; [0025] Furthermore, fits a gaussian kernel density estimate (KDE) to each of the plurality of sets of observational values; [0026] Thereafter, generates N-point simulated data for each of the plurality of morphological features by generating N random samples from the gaussian KDE fitted to each of the plurality of sets of observational values; and [0027] Further constructs a plurality of synthetic time domain signals for the time domain signal of interest from the N-point simulated data for each of the plurality of morphological features in accordance to the template, wherein the constructing of the plurality of synthetic time domain signals includes: a) determining a plurality of sequences derived from the generated N-point simulated data for each of the plurality of morphological features, wherein a plurality of elements in each of the plurality of sequences comprising Ts, Ps, Ds, Ns, r1 r2 r3, q1, q2 and q3, wherein values of q1, q2, q3, r1, r2 r3 are derived based on a preselected values of the distances d1, d2, d3 defined for the template, and wherein position of each of the plurality of elements within each of the plurality of sequences is based on a set of conditions defined by a predefined set of morphological features among the plurality of morphological features; b) generating from the plurality of sequences, a plurality of time domain signals corresponding to the time domain signal of interest by performing a spline fitting on each of the plurality of sequences, wherein the spline fitting utilizes piecewise linear regression to obtain parameters of lines connecting two successive elements among the plurality of elements of the each of the plurality of sequences; c) smoothening each of the plurality of time domain signals using a smoothening technique to generate a plurality of smoothened time domain signals; and d) applying a peak smoothening technique on each of the smoothened plurality of time domain signals to construct the plurality of synthetic time domain signals for the time domain signal of interest. [0028] Furthermore uses a combination of the plurality of samples in the parent dataset and the plurality of synthetic time domain signals as a training data for building a two stage cascaded classifier for classifying input data corresponding to time domain signal of interest into one of a class data and a non-class data, and wherein the two stage cascaded classifier comprises: a first classifier utilizing Matusita distance for likeness measurement and data explosion driven decision rule for classification; and a second classifier utilizing a random forest technique for classification. The first classifier classifies the input data as the non-class data and the class data with ambiguity, wherein the non-class data at an output of the first classifier is identified as a final non-class data; and the second classifier classifies the received class data with ambiguity into a final class data. [0029] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed. BRIEF DESCRIPTION OF THE DRAWINGS [030] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles: [031] FIG. 1 is a functional block diagram of a system for generating synthetic time domain signals to build a classifier, in accordance with some embodiments of the present disclosure. [032] FIG. 3 illustrates architectural overview of the classifier of the system of FIG 1, wherein the classifier is a two stage cascaded classifier, in accordance with some embodiments of the present disclosure. [033] FIG. 4A depicts a template PPG signal (time domain signal) with landmarks for feature distribution, in accordance with some embodiments of the present disclosure. [034] FIG. 4B depicts a reconstructed signal from the feature distribution, in accordance with some embodiments of the present disclosure. [035] FIG. 4C depicts a reconstructed signal after applying a smoothening technique on the signal reconstructed from the feature distribution, in accordance with some embodiments of the present disclosure. [036] FIG. 4D depicts a reconstructed signal (synthetic time domain signal) after applying a peak smoothening technique on the smoothened signal, in accordance with some embodiments of the present disclosure. [037] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems and devices embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown. DETAILED DESCRIPTION OF EMBODIMENTS [038] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following claims. [039] The physiological data or biosignals acquired from subjects being monitored are analyzed by health care systems to determine health condition of the subjects. Cardiac health care is emerging as a major area where synthetic physiological data like Photoplethysmogram (PPG), Electrocardiogram (ECG), Phonocardiogram (PCG), can be used improve accuracy of machine learning algorithm and help in early screening of various cardiovascular diseases like Coronary artery disease (CAD). CAD is caused due to plaque formation in coronary artery resulting in reduction in vessel diameter and may lead to heart attack or stroke. An early non-invasive detection or screening of CAD is an open area of research till date. PPG measures volumetric blood flow in capillaries over time and it has been very popular recently due to low cost implementation in wearables. Morphological attributes of PPG signal has been widely used in measurement of heart rate, blood pressure and in detection of several cardiac disease like CAD, arrythmia, atrial fibrilation, and the like in existing works. Significant research activities on characterization and analysis of PPG signal have been reported in recent years but research in synthetic PPG signal generation is limited. In terms of PPG signal generation, most significant approach has been stochastic modeling, where patient specific atlases of PPG signals were generated along with set of parameters that allows regeneration of statistically equivalent PPG signals by utilizing shape parameterization and a nonstationary model of PPG signal time evolution. However, these technique generate only patient specific PPG signatures and do not correlate with pathophysiological changes. [040] Embodiments herein provide a method and system for generating synthetic time domain signals to build a classifier, wherein the classifier is a two stage cascaded classifier. The time domain signal of interest can be any of the biosignals such as PPG, ECG and the like. Such synthetically generated time domain signals are then utilized as training dataset to build the two stage classifier. The generated two stage cascaded classifier is further utilized to classify the subjects into a class data ( unhealthy subjects) and a non-class data (healthy subjects). The synthetic time domain signal generation approach provided by the method disclosed herein is explained in conjunction with synthetic generation of biosignal, with PPG signal as an example, and may not be construed as a limitation. Further, analysis of the synthetically generated PPG signals is performed to classify the subject associated with the PPG data, for example into a CAD class (unhealthy subjects) and a non-CAD class (healthy subjects). It can be understood that CAD and non-CAD is an example class and the classifier can be built and trained for classifying any class of interest. The method disclosed generates synthetic time domain signal of interest , such as the PPG, through statistical explosion. Initially, a parent dataset of actual samples of PPG data of CAD and non-CAD subjects is identified and statistical features (or morphological features) are extracted. Kernel density estimate (KDE) used to vary the feature distribution and create multiple data template from a single parent signal. PPG signal is again reconstructed from the distribution pattern using non-parametric techniques. The generated synthetic data set is used to build the two stage cascaded classifier to classify CAD ( class) and non-CAD ( non-class data). [041] Referring now to the drawings, and more particularly to FIGS.1 through 4D, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method. [042] FIG. 1 is a functional block diagram of a system 100 for generating synthetic time domain signals to build a classifier, in accordance with some embodiments of the present disclosure. [043] In an embodiment, the system 100 includes a processor(s) 104, communication interface device(s), alternatively referred as input/output (I/O) interface(s) 106, and one or more data storage devices or a memory 102 operatively coupled to the processor(s) 104. The system 100, with the processor(s) is configured to execute functions of one or more functional blocks of the system 100. [044] Referring to the components of system 100, in an embodiment, the processor(s) 104, can be one or more hardware processors 104. In an embodiment, the one or more hardware processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 104 are configured to fetch and execute computer-readable instructions stored in the memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems including laptop computers, notebooks, hand-held devices such as mobile phones, personal digital Assistants (PDAs), cloud servers and the like. [045] The I/O interface(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface (s) 106 can include one or more ports for connecting a number of devices (nodes) of the system 100 to one another or to another server. [046] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 102 may comprise a two stage cascaded classifier 110 and other modules (not shown), to implement the functions for generating the synthetic time domain signals to build the classifier such as the two stage cascaded classifier 110. [047] Further, the memory 102 may include a database 108, which may store a parent dataset of a plurality of samples of a time domain signal of interest comprising a combination of a class data and an non-class data, a plurality of subsets, from the parent dataset, corresponding to a plurality of morphological features identified for the time domain signal of interest, N-point simulated data for each of the plurality of morphological features by generating N random samples from the gaussian KDE fitted to each of the plurality of sets of observational values, constructed a plurality of synthetic time domain signals for the time domain signal of interest from the N-point simulated data for each of the plurality of morphological features and so on. [048] Further, the two stage cascaded classifier 110 comprising a first classifier utilizing Matusita distance for likeness measurement and data explosion driven decision rule for classification and a second classifier utilizing random forest technique for classification. The two stage cascaded classifier 110 is explained in conjunction with FIG. 3. [049] Further, the memory 102 may comprise information pertaining to input(s)/output(s) of each step performed by the processor(s) 104 of the system 100 and methods of the present FIGS. 2A, 2B and 2C depict a flow diagram illustrating a method for generating synthetic time domain signals to build the classifier using the system of FIG. 1, in accordance with some embodiments of the present disclosure. [050] In an embodiment, the database 108 may be external (not shown) to the system 100 and coupled to the system via the I/O interface 106. Functions of the components of system 100, for generating synthetic time domain signals to build the classifier, are explained in conjunction with FIGS. 2A through 4D. [051] FIGS. 2A, 2B and 2C depict a flow diagram illustrating a method for generating synthetic time domain signals to build the classifier using the system of FIG. 1, in accordance with some embodiments of the present disclosure. [052] In an embodiment, the system 100 comprises one or more data storage devices or the memory 102 operatively coupled to the processor(s) 104 and is configured to store instructions for execution of steps of the method 200 by the processor(s) or one or more hardware processors 104. The steps of the method 200 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG. 1 and FIG. 3 and the steps of flow diagram as depicted in FIGS. 2A through 2C. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously. [053] Referring now to the steps of the method 200, at step 202, one or more hardware processors 104 are configured to receive the parent dataset of a plurality of samples of a time domain signal of interest comprising a combination of a class data and n non-class data. For an example parent dataset can include PPG signals of 145 subjects, which are actual recorded signals at a hospital, using non-medical grade commercial pulse oximeter (CMS 50D+) at a sampling rate of 60 Hz. [054] This dataset serves as the parent PPG dataset for subsequent analysis and comprises of 90 CAD and 55 non CAD subjects, annotated using Angiogram report. The dataset ensures a wide variation in patient demography along with different pathological conditions for non-CAD patients and also varying percentage level of heart blockages for CAD patients. [055] At step 204 of the method 200, the one or more hardware processors 104 are configured to identify a plurality of subsets, from the parent dataset, corresponding to a plurality of morphological features identified for the time domain signal of interest. Each subset among the plurality of subset comprises p samples and the plurality of morphological features comprise a Peak Sample (Ps), a Peak Amplitude (Pa), a Trough Sample (Ts), a Trough Amplitude (Ta), a Notch Sample (Ns), a Notch Amplitude (Na), a Dip Sample (Ds), a Dip Amplitude (Da), and distance between left and right samples corresponding to the 25%, 50%, 75% of the (Pa) providing distances d1, d2i d3. The morphological features mentioned above are for a PPG signal, however, for any time domain signal these features can be as identified by a subject matter expert. These identified features capture the most important and relevant information in the signal, which would be critical in classifying the time domain signal of interest into the relevant classes. The plurality of morphological features define a template for the time domain signal of interest as depicted in FIG. 4A, used for generating the synthetic time domain signals. FIG. 4A depicts the template PPG signal (time domain signal) with landmarks for feature distribution, in accordance with some embodiments of the present disclosure. [056] Thus, from the parent data set of PPG signals in the database 108, certain morphological landmarks (also referred as the plurality of morphological features) are annotated as shown in FIG 4A. Depicted are the morphological features or landmarks such as the Peak Sample (Ps), the Peak Amplitude (Pa), the Trough Sample (Ts), the Trough Amplitude (Ta), the Notch Sample (Ns), the Notch Amplitude (Na), the Dip Sample (Ds), the Dip Amplitude (Da), and the distance between left and right samples corresponding to the 25%, 50%, 75% of Pa providing the distances dv d2i d3, where ‘left’ means the position before Ps, and ‘right’ means the position after the Ps. So in total there are eleven features. [057] At step 206 of the method 200, the one or more hardware processors 104 are configured to process each of the plurality of subsets corresponding to each of the plurality of morphological features to generate a plurality of sets of observational values with each of the plurality of sets of observational values comprising p actual values corresponding to each of the plurality of morphological features. [058] At step 208 of the method 200, the one or more hardware processors 104 are configured to fit a gaussian kernel density estimate (KDE) to each of the plurality of sets of observational values. [059] At step 210 of the method 200, the one or more hardware processors 104 are configured to generate N-point simulated data for each of the plurality of morphological features by generating N random samples from the gaussian KDE fitted to each of the plurality of sets of observational values. [060] The mathematical analysis for the steps 206 through 210 is explained below. [061] Several cycles are given for each morphological feature, interchangeably referred herein as component. Let the number of cycles be �. For data explosion, first step is to simulate each of the eleven components. In order to do so, the observations or observation values related to component X = xl ,i = 1,2,3…..p. A probability density function using a gaussian kernel density estimate (KDE) is fitted to the xi - s. ‘Bandwidth’ parameter of KDE algorithm affects the smoothness of the resulting curve. Mathematically, a kernel is a positive function K (x, h), which is controlled by the bandwidth parameter h. Given this kernel form, the density estimate at a point y within a group of points xi, i = 1,2, xi3…..p is given by: pK (y) = ∑pi=1K[(y - xi)/h]. A gaussian kernel is of the form K (x,h) = exp(x2/2h2). Now, suppose X is to be simulated, say N times. This is equivalent to drawing N random samples from kernel density. In order to do so, N random samples are drawn from x1,x2 with replacement. Let the samples be denoted by sampN. This constitutes the mean of the kernel density. Then N random samples are drawn from the kernel N (sampN,h). This constitutes the simulated data. This process is repeated for each of the eleven components or the morphological features. [062] At step 212 of the method 200, the one or more hardware processors 104 are configured to construct a plurality of synthetic time domain signals for the time domain signal of interest from the N-point simulated data for each of the plurality of morphological features in accordance to the template, wherein the constructing of the plurality of synthetic time domain signals comprises: a) Determining (212a) a plurality of sequences derived from the generated N-point simulated data for each of the plurality of morphological features, wherein a plurality of elements in each of the plurality of sequences comprising Ts,Ps,Ds,Ns, r1 r2 r3, q1, q2 and q3 wherein values of q1, q2,q3,r1,r2r3 are derived based on the distances d1, d2, d3 defined for the template. Position of each of the plurality of elements within each of the plurality of sequences is based on a set of conditions defined by a predefined set of morphological features among the plurality of morphological features. b) Generating (212b), from the plurality of sequences, a plurality of time domain signals corresponding to the time domain signal of interest by performing a spline fitting on each of the plurality of sequences, wherein the spline fitting utilizes piecewise linear regression to obtain parameters of lines connecting two successive elements among the plurality of elements of the each of the plurality of sequences. c) Smoothening (212c) each of the plurality of time domain signals using a smoothening technique to generate a plurality of smoothened time domain signals. d) Applying (212d) a peak smoothening technique on each of the smoothened plurality of time domain signals to construct the plurality of synthetic time domain signals for the time domain signal of interest. [001] The mathematical representation of steps 212a through 212d is explained below with case examples below. Thus, based on the simulated data for eleven components, required is to construct the entire signal. To do that, the first thing is to specify the sample point regarding the 25%, 50%, and 75% of Pa (d1,d2, and d3 )at both sides of Ps. Thus, the required is to determine the positions of Pa/4;Pa/ and 3Pa/4 before and after Ps. Suppose the, sample points corresponding to Pa/2 are q1 and r1 before and after Ps, respectively. [063] Deriving values of q1, q2,q3, r1,r2, r3 based on d1,d2 and d3 : As can be seen from FIG. 4A d1,d2, and d3 are distances between two points on the PPG template of FIG. 4A. Given or identified is a value d1 which is equal to r1 - q1. Similarly value of d2 is identified and is equal to and same follows for d3 , which is equal to r3 - q3 Now, from the starting point, i.e. trough sample Ts, an initial q1 is determined by unitary method, with comparison to Pa and q2 and q3 corresponding to l = 2 and 3l = 4 are determined similarly by unitary method; l being the signal width. Once q1, q2, and q3 are determined then, r1, r2,and r3 can be derived from d1, d2, and d3 [064] Determining position of each of the plurality of elements within each of the plurality of sequences is based on a set of conditions defined by the predefined set of morphological features: It is critical while selecting values of r2 and r1 . The values can fall before or after the dip sample Ds. To check this, Pa/2 is compared with dip amplitude Da. If Pa/2 > Da, that means r2 is placed either before Ds, or between Ds and Ns, or after Ns. The first situation will happen if r2 = q2 + d2 Ns. If the first situation (a condition among a first set happens for both r1 and r2 then the sequence of sample points or elements in the simulated density would be Ts,q1, q2,q3, Ps, r3, r2, r1, Ds,Ns along with their amplitudes. If the first situation happens for r2 and the second situation happens for r1 then the sequence of sample points in the simulated density would be Ts, q1, q2, q3,Ps, r2 ,r3, Ds, r1 ,Ns along with their amplitudes, and the process goes on similarly for other situations. In total, there are nine possibilities along with corresponding sequences. However, if Pa/2 < Da that will imply Pa/4

Documents

Application Documents

# Name Date
1 202021018573-CLAIMS [23-05-2022(online)].pdf 2022-05-23
1 202021018573-STATEMENT OF UNDERTAKING (FORM 3) [30-04-2020(online)].pdf 2020-04-30
2 202021018573-CORRESPONDENCE [23-05-2022(online)].pdf 2022-05-23
2 202021018573-REQUEST FOR EXAMINATION (FORM-18) [30-04-2020(online)].pdf 2020-04-30
3 202021018573-FORM 18 [30-04-2020(online)].pdf 2020-04-30
3 202021018573-FER_SER_REPLY [23-05-2022(online)].pdf 2022-05-23
4 202021018573-FORM 3 [23-05-2022(online)].pdf 2022-05-23
4 202021018573-FORM 1 [30-04-2020(online)].pdf 2020-04-30
5 202021018573-OTHERS [23-05-2022(online)].pdf 2022-05-23
5 202021018573-FIGURE OF ABSTRACT [30-04-2020(online)].jpg 2020-04-30
6 202021018573-FORM 3 [21-01-2022(online)].pdf 2022-01-21
6 202021018573-DRAWINGS [30-04-2020(online)].pdf 2020-04-30
7 202021018573-FER.pdf 2021-12-16
7 202021018573-DECLARATION OF INVENTORSHIP (FORM 5) [30-04-2020(online)].pdf 2020-04-30
8 202021018573-CORRESPONDENCE(IPO)-(CERTIFIED COPY LETTER)-(22-11-2021)..pdf 2021-11-22
8 202021018573-COMPLETE SPECIFICATION [30-04-2020(online)].pdf 2020-04-30
9 202021018573-REQUEST FOR CERTIFIED COPY [10-11-2021(online)].pdf 2021-11-10
9 Abstract1.jpg 2020-07-16
10 202021018573-CORRESPONDENCE(IPO)-(CERTIFIED COPY OF WIPO DAS)-(12-05-2021).pdf 2021-05-12
10 202021018573-Proof of Right [07-10-2020(online)].pdf 2020-10-07
11 202021018573-Covering Letter [28-04-2021(online)].pdf 2021-04-28
11 202021018573-FORM-26 [16-10-2020(online)].pdf 2020-10-16
12 202021018573-Form 1 (Submitted on date of filing) [28-04-2021(online)].pdf 2021-04-28
12 202021018573-Power of Attorney [28-04-2021(online)].pdf 2021-04-28
13 202021018573-Form 1 (Submitted on date of filing) [28-04-2021(online)].pdf 2021-04-28
13 202021018573-Power of Attorney [28-04-2021(online)].pdf 2021-04-28
14 202021018573-Covering Letter [28-04-2021(online)].pdf 2021-04-28
14 202021018573-FORM-26 [16-10-2020(online)].pdf 2020-10-16
15 202021018573-CORRESPONDENCE(IPO)-(CERTIFIED COPY OF WIPO DAS)-(12-05-2021).pdf 2021-05-12
15 202021018573-Proof of Right [07-10-2020(online)].pdf 2020-10-07
16 202021018573-REQUEST FOR CERTIFIED COPY [10-11-2021(online)].pdf 2021-11-10
16 Abstract1.jpg 2020-07-16
17 202021018573-CORRESPONDENCE(IPO)-(CERTIFIED COPY LETTER)-(22-11-2021)..pdf 2021-11-22
17 202021018573-COMPLETE SPECIFICATION [30-04-2020(online)].pdf 2020-04-30
18 202021018573-FER.pdf 2021-12-16
18 202021018573-DECLARATION OF INVENTORSHIP (FORM 5) [30-04-2020(online)].pdf 2020-04-30
19 202021018573-FORM 3 [21-01-2022(online)].pdf 2022-01-21
19 202021018573-DRAWINGS [30-04-2020(online)].pdf 2020-04-30
20 202021018573-OTHERS [23-05-2022(online)].pdf 2022-05-23
20 202021018573-FIGURE OF ABSTRACT [30-04-2020(online)].jpg 2020-04-30
21 202021018573-FORM 3 [23-05-2022(online)].pdf 2022-05-23
21 202021018573-FORM 1 [30-04-2020(online)].pdf 2020-04-30
22 202021018573-FORM 18 [30-04-2020(online)].pdf 2020-04-30
22 202021018573-FER_SER_REPLY [23-05-2022(online)].pdf 2022-05-23
23 202021018573-REQUEST FOR EXAMINATION (FORM-18) [30-04-2020(online)].pdf 2020-04-30
23 202021018573-CORRESPONDENCE [23-05-2022(online)].pdf 2022-05-23
24 202021018573-STATEMENT OF UNDERTAKING (FORM 3) [30-04-2020(online)].pdf 2020-04-30
24 202021018573-CLAIMS [23-05-2022(online)].pdf 2022-05-23

Search Strategy

1 SEARCHSTRATEGY-E_14-12-2021.pdf