Real Time Classification Of Heart Rhythms On Tiny Devices Using

< Back

Real Time Classification Of Heart Rhythms On Tiny Devices Using Optimized Attention Based Neural Network

Abstract: Manual detection of abnormal heart rhythms from 24x7 electrocardiogram (ECG) recordings is not practically feasible and error-prone. Machine learning (ML) and deep learning (DL)-based automatic detectors available in the art are large in size and computationally expensive. Hence, they cannot be deployed on small microcontrollers. Lack of standardization in hardware, limited memory space, and lower processing capacity are some of the key challenges of the microcontrollers. The present disclosure provides a light-weight deep neural network based system and method that can run on low-powered microcontrollers for classification of heart rhythms using single-lead ECG. A baseline Convolutional Neural Network (CNN) based Residual Network (ResNet) architecture with attention mechanism is optimized collaboratively using three approaches such that maximum allowable performance drop after each optimization approach is within a predefined threshold compared to the performance after a previous optimization approach to balance the size of a resulting integer-only model versus accuracy of classification. [To be published with FIG. #2]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

16 September 2022

Publication Number

12/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. BANERJEE, Rohan

Tata Consultancy Services Limited, Block -1B, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road, New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata 700160, West Bengal, India

2. GHOSE, Avik

Specification

Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:

REAL-TIME CLASSIFICATION OF HEART RHYTHMS ON TINY DEVICES USING OPTIMIZED ATTENTION BASED NEURAL NETWORK

Applicant

Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

Preamble to the description:
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The disclosure herein generally relates to the field of classification of heart rhythms, and, more particularly, to systems and methods for real-time classification of heart rhythms on tiny devices using optimized attention based neural network.

BACKGROUND
Automated decision-support systems are clinically appreciated in cardiology for analysis of electrocardiogram (ECG) or echocardiogram. An ECG represents the electrical activities of the heart in a graphical format by placing a set of electrodes on human-body near the chest. ECG is clinically used for diagnosis of abnormal heart rhythms like Atrial Fibrillation (AF) and other types of arrhythmias which are the early signs of a stroke or a cardiac arrest. However, it is not practically feasible to manually analyze the large volume of continuous ECG data for detection of intermittent disease episodes. There is prevailing research on automatic diagnosis from ECG using machine learning techniques that report clinical-grade accuracy in detecting AF and other types of abnormalities. The traditional approaches extract relevant features from ECG which are used to train supervised machine learning algorithms like Support Vector Machine (SVM) or AdaBoost for classification. The recent deep learning approaches have reportedly outperformed the traditional machine learning approaches. However, deep learning algorithms are resource-hungry and models are large in size. A deep learning network is typically trained on a powerful desktop server using accelerated computing hardware like a Graphics Processing Unit (GPU) or a Tensor Processing Unit (TPU) and is not suitable for small embedded devices and microcontrollers.

SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.
In an aspect, there is provided a processor implemented method comprising: receiving, via one or more hardware processors, an electrocardiogram (ECG) data in real-time from a subject; applying, via the one or more hardware processors, the received ECG data to a head block of a Neural Network, wherein the head block comprises a first batch normalization layer, a first convolution layer and a first Rectified Linear Unit (ReLU) activation layer to obtain an input tensor; processing the input tensor, using a baseline model executed by the one or more hardware processors, to obtain a final output tensor, wherein the baseline model comprises four sets of a Neural Network block 1 and three sets of a Neural Network block 2, positioned alternating each other, each set comprising one or more Neural Network blocks; calculating a plurality of attention weights, via the one or more hardware processors, by applying the final output tensor to an attention layer using tanh and softmax functions, wherein the plurality of attention weights are associated with key features in a feature map; flattening and applying, via the one or more hardware processors, an output of the attention layer associated with the key features in the feature map, to a fully connected layer followed by a softmax layer for classification of heart rhythms; and optimizing the baseline model for a tiny device, via the one or more hardware processors, using a plurality of optimization approaches collaboratively, wherein the plurality of optimization approaches include (i) weight pruning on the baseline model, (ii) weight clustering of a pruned base line model and (iii) applying quantization aware training to a pruned and clustered base line model to obtain an integer-only model such that a maximum allowable performance drop after each optimization approach is within a predefined threshold compared to the performance after a previous optimization approach to balance the size of a resulting integer-only model versus accuracy of classification.
In another aspect, there is provided a system comprising a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: receive an electrocardiogram (ECG) data in real-time from a subject; apply the received ECG data to a head block of a Neural Network, wherein the head block comprises a first batch normalization layer, a first convolution layer and a first Rectified Linear Unit (ReLU) activation layer to obtain an input tensor; process the input tensor, using a baseline model executed by the one or more hardware processors, to obtain a final output tensor, wherein the baseline model comprises four sets of a Neural Network block 1 and three sets of a Neural Network block 2, positioned alternating each other, each set comprising one or more Neural Network blocks; calculate a plurality of attention weights by applying the final output tensor to an attention layer using tanh and softmax functions, wherein the plurality of attention weights are associated with key features in a feature map; flatten and apply an output of the attention layer to a fully connected layer followed by a softmax layer for classifying heart rhythms; and optimize the baseline model for a tiny device using a plurality of optimization approaches collaboratively, wherein the plurality of optimization approaches include (i) weight pruning on the baseline model, (ii) weight clustering of a pruned base line model and (iii) applying quantization aware training to a pruned and clustered base line model to obtain an integer-only model such that a maximum allowable performance drop after each optimization approach is within a predefined threshold compared to the performance after a previous optimization approach to balance the size of a resulting integer-only model versus accuracy of classification.
In yet another aspect, there is provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause: receiving, via the one or more hardware processors, an electrocardiogram (ECG) data in real-time from a subject; applying, via the one or more hardware processors, the received ECG data to a head block of a Neural Network, wherein the head block comprises a first batch normalization layer, a first convolution layer and a first Rectified Linear Unit (ReLU) activation layer to obtain an input tensor; processing the input tensor, using a baseline model executed by the one or more hardware processors, to obtain a final output tensor, wherein the baseline model comprises four sets of a Neural Network block 1 and three sets of a Neural Network block 2, positioned alternating each other, each set comprising one or more Neural Network blocks; calculating a plurality of attention weights, via the one or more hardware processors, by applying the final output tensor to an attention layer using tanh and softmax functions, wherein the plurality of attention weights are associated with key features in a feature map; flattening and applying, via the one or more hardware processors, an output of the attention layer associated with the key features in the feature map, to a fully connected layer followed by a softmax layer for classification of heart rhythms; and optimizing the baseline model for a tiny device, via the one or more hardware processors, using a plurality of optimization approaches collaboratively, wherein the plurality of optimization approaches include (i) weight pruning on the baseline model, (ii) weight clustering of a pruned base line model and (iii) applying quantization aware training to a pruned and clustered base line model to obtain an integer-only model such that a maximum allowable performance drop after each optimization approach is within a predefined threshold compared to the performance after a previous optimization approach to balance the size of a resulting integer-only model versus accuracy of classification
In accordance with an embodiment of the present disclosure, the Neural Network block 1 is a Convolutional Neural Network (CNN) based Residual Network (ResNet) block 1 comprising a set of weight layers F1, wherein F1 comprises (i) a first pair of convolution layers having f1 number of filters, (ii) a second ReLU activation layer and a second batch normalization layer between the first pair of convolution layers; and the Neural Network block 2 is a CNN based ResNet block 2 comprising (a) a second set of weight layers F2, comprising (i) a second `pair of convolution layers having f2 number of filters, and (ii) a first maxpool layer, a fourth ReLU activation layer and a third batch normalization layer between the second pair of convolution layers; and (b) a skip layer comprising a second convolution layer having a single filter followed by a second maxpool layer.
In accordance with an embodiment of the present disclosure, the four sets of ResNet block 1 include (i) two ResNet block1, (ii) three ResNet block1, (iii) four ResNet block1 and (iv) two ResNet block1 respectively and each of the three sets of ResNet block 2 include one ResNet block 2.
In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured to process the input tensor to obtain the final output tensor by: processing the input tensor by the ResNet block 1 in each set of ResNet block 1 by: using the input tensor to the ResNet block 1 to generate a first intermediate tensor; and applying a sum of the input tensor and the first intermediate tensor to a third ReLU activation layer to obtain an intermediate output tensor having a same dimension as the input tensor, wherein the intermediate output tensor is the input tensor to a subsequent ResNet block 1 in the set of ResNet block 1; and processing the input tensor by the ResNet block 2 in each set of ResNet block 2 by: using the intermediate output tensor of the ResNet block 1 from the set of ResNet block 1 preceding the ResNet block 2 as the input tensor to the ResNet block 2 to generate a second intermediate tensor; reducing dimension of the input tensor to the ResNet block 2, by passing through the skip layer; applying a sum of the input tensor to the ResNet block 2 with reduced dimension and the second intermediate tensor to a fifth ReLU activation layer to obtain an intermediate output tensor of the ResNet block 2 that serves as the input sensor to the ResNet block 1 from the set of ResNet block 1 succeeding the ResNet block 2; wherein an output tensor of a last ResNet block 1 in the four sets of ResNet block 1 is the final output tensor.
In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured to reshape a feature map associated with the final output tensor prior to calculating a plurality of attention weights by applying the final output tensor to an attention layer.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG.1 illustrates an exemplary block diagram of a system for real-time classification of heart rhythms on tiny devices using optimized attention based neural network, in accordance with some embodiments of the present disclosure.
FIG.2 illustrates an exemplary architecture of the system for real-time classification of heart rhythms on tiny devices using optimized attention based neural network, in accordance with some embodiments of the present disclosure.
FIG.3A and FIG.3B illustrate configuration of a Convolutional Neural Network (CNN) based Residual Network (ResNet) block 1 and a CNN based ResNet block 2 comprised in the architecture of FIG.2.
FIG.4A through FIG.4B illustrate an exemplary flow diagram of a computer implemented method for real-time classification of heart rhythms on tiny devices using optimized attention based neural network, in accordance with some embodiments of the present disclosure.
FIG.5A through FIG.5C illustrate Receiver Operating Characteristic (ROC) curves of a baseline ResNet model and corresponding Area Under the Curve (AUC) values for target classes normal sinus rhythm, Atrial Fibrillation (AF) and other abnormal rhythms, on a test set of PhysioNet database, in accordance with some embodiments of the present disclosure.
FIG.6A illustrates impact of weight pruning on the test set by gradually increasing the amount of sparsity in the baseline ResNet model, in accordance with some embodiments of the present disclosure.
FIG.6B illustrates impact of weight clustering on the pruned model, in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
Manual detection of intermittent Atrial fibrillation (AF) episodes or in general, abnormal heart rhythms from 24x7 electrocardiogram (ECG) recordings is not practically feasible and error-prone. Machine learning (ML) and deep learning (DL)-based automatic detectors available in the art are large in size and computationally expensive. Hence, they cannot be deployed on small microcontrollers. Lack of standardization in hardware, limited memory space, and lower processing capacity are some of the key challenges of the microcontrollers. The present disclosure addresses these technical problems and provides a light-weight deep neural network based system and method that can entirely run on low-powered microcontrollers for classification of heart rhythms using single-lead ECG. The system and method provide 24x7 cardiac rhythm monitoring and real-time detection of intermittent abnormal heart rhythms on stand-alone wearable devices in order to generate timely alerts.
Referring now to the drawings, and more particularly to FIG. 1 through FIG.6B, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG.1 illustrates an exemplary block diagram of a system 100 for real-time classification of heart rhythms on tiny devices using optimized attention based neural network, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 includes one or more hardware processors 104, communication interface (s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more hardware processors 104. The one or more hardware processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, graphics controllers, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) are configured to fetch and execute computer-readable instructions stored in the memory. In the context of the present disclosure, the expressions ‘processors’ and ‘hardware processors’ may be used interchangeably. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
The communication interface (s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface(s) can include one or more ports for connecting a number of devices to one another or to another server.
The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more modules (not shown) of the system 100 can be stored in the memory 102.
FIG.2 illustrates an exemplary architecture of the system for real-time classification of heart rhythms on tiny devices using optimized attention based neural network, in accordance with some embodiments of the present disclosure. FIG.3A and FIG.3B illustrate configuration of a Convolutional Neural Network (CNN) based Residual Network (ResNet) block 1 and a CNN based ResNet block 2 comprised in the architecture of FIG.2. FIG.4A through FIG.4B illustrate an exemplary flow diagram of a computer implemented method 400 for real-time classification of heart rhythms on tiny devices using optimized attention based neural network, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 includes the memory 102 operatively coupled to the one or more hardware processors 104 and is configured to store instructions configured for execution of steps of the method 400 by the one or more hardware processors 104. The steps of the method 400 will now be explained in detail with reference to the components of the system 100 of FIG.1, the architecture illustrated in FIG.2 and the specific blocks illustrated in FIG.3A and FIG.3B. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
In accordance with the present disclosure, the one or more hardware processors 104, are configured to receive at step 402, an ECG data in real-time from a subject. The ECG data maybe recorded using any conventional recording device like a Holter monitor. The ECG data can be a single lead ECG data.
In accordance with the present disclosure, the one or more hardware processors 104, are configured to apply, at step 404, the received ECG data to a head block of a Neural Network (NN), wherein the head block comprises a first batch normalization layer, a first convolution layer and a first Rectified Linear Unit (ReLU) activation layer to obtain an input tensor. Typically, the head block of the NN extracts simpler features like edge detection, slope, and the like. Batch normalization layer normalizes the contributions to a layer for every mini-batch and is executed between layers of the NN. Accordingly, it may be noted by those skilled in the art that the batch normalization is also applied before every NN block as illustrated in FIG.3A and FIG.3B. A ReLU activation layer filters information that propagates forward through the NN. In the context of the present disclosure, the ReLU activation layer and ReLU layer (in the figures) are interchangeably used.
In accordance with the present disclosure, the one or more hardware processors 104, are configured to process the input tensor, at step 406, using a baseline model (refer FIG.2) executed by the one or more hardware processors 104, to obtain a final output tensor. The baseline model comprises four sets of a Neural Network (NN) block 1 and three sets of a Neural Network (NN) block 2, positioned alternating each other, such that each set comprises one or more Neural Network blocks. More the number of blocks, more are the features learnt. In the context of the present disclosure, the expressions Neural Network block and NN block are interchangeably used.
Referring to FIG.2, in an embodiment of the present disclosure, the NN block 1 is a CNN based ResNet block 1; and the Neural Network block 2 is a CNN based ResNet block 2. The ResNet block 1 extracts features while the ResNet block 2 performs dimension reduction along with further feature extraction and hence are positioned alternating each other as illustrated in FIG.2. Deep CNNs are difficult to optimize and cause vanishing gradient during training which prevents weights to update. The ResNet architecture of the present disclosure is designed to resolve this by applying identify mapping through skip connections (layers). The skip connections skip few layers from training and connects directly to the output. This acts as a regularization term to skip certain layers in a deep architecture that do not have a positive impact in the performance.
In accordance with an embodiment of the present disclosure, the four sets of ResNet block 1 include (i) two ResNet block1 (indicated by x2 in FIG.2), (ii) three ResNet block1(indicated by x3 in FIG.2), (iii) four ResNet block1(indicated by x4 in FIG.2) and (iv) two ResNet block1(indicated by x2 in FIG.2) respectively and each of the three sets of ResNet block 2 includes one ResNet block 2.
Referring to FIG.3A, in an embodiment of the present disclosure, the NN block 1 is a CNN based ResNet block 1 comprising a set of weight layers F1, wherein F1 comprises (i) a first pair of convolution layers having f1 number of filters, (ii) a second ReLU activation layer and a second batch normalization layer between the first pair of convolution layers. Referring to FIG.3B, in an embodiment of the present disclosure, the NN block 2 is a CNN based ResNet block 2 having a similar structure as ResNet block 1 but with a few more layers for feature dimensionality reduction. Accordingly, the ResNet block 2 comprises (a) a second set of weight layers F2, comprising (i) a second `pair of convolution layers having f2 number of filters, and (ii) a first maxpool layer, a fourth ReLU activation layer and a third batch normalization layer between the second pair of convolution layers; and (b) a skip layer comprising a second convolution layer having a single filter followed by a second maxpool layer.
In accordance with an embodiment of the present disclosure, the step of processing the input tensor to obtain the final output tensor of step 406 comprises processing the input tensor by the ResNet block 1 in each set of ResNet block 1 and further processing an output of a last ResNet block 1 in each set of ResNet block 1 by a subsequent ResNet block 2 in each set of ResNet block 2. With reference to FIG.2 and FIG.3A, the input tensor to the ResNet block 1 goes through batch normalization and is further processed to generate a first intermediate tensor. For instance, if the input tensor is x, then the first intermediate tensor is y=F1(x). A sum of the input tensor and the first intermediate tensor (x+F1(x)) is applied to a third ReLU activation layer to obtain an intermediate output tensor (also referred interchangeably as output tensor of the associated ResNet block) having a same dimension as the input tensor. The intermediate output tensor then becomes the input tensor to a subsequent ResNet block 1 in the set of ResNet block 1.
Again with reference to FIG.2 and FIG.3B, the intermediate output tensor of the ResNet block 1 from the set of ResNet block 1 preceding the ResNet block 2 serves as the input tensor to the ResNet block 2. The input tensor to the ResNet block 2 goes through batch normalization and is further processed to generate a second intermediate tensor. Instead of identity connection, the input tensor to the ResNet block 2 passes through the second convolution layer having the single filter followed by the second maxpool layer to reduce its dimension before adding with the output of the weight layers F2. A sum of the input tensor to the ResNet block 2 with reduced dimension and the second intermediate tensor is applied to a fifth ReLU activation layer to obtain an intermediate output tensor of the ResNet block 2 that serves as the input sensor to the ResNet block 1 from the set of ResNet block 1 succeeding the ResNet block 2 as shown in FIG.2.
The number of filters and kernel sizes mentioned in FIG.2 are exemplary values provided for an embodiment of the present disclosure. As shown in FIG.2, the received ECG data is first applied to the head block comprising the first batch normalization layer, the first convolution layer having 16 filters and the first ReLU activation layer to obtain the input tensor. The sets of ResNet block1 have 2, 3, 4 and 2 ResNet block1 with number of filters 16, 32, 64 and 128 respectively are arranged as illustrated. In between two sets of ResNet block 1, where the number of filters is increased, a ResNet block 2 is applied to reduce the dimension of the features.
The CNN based ResNet architecture forming the baseline model of the present disclosure along with attention mechanism enables classification of heart rhythms in the ECG data as normal sinus rhythms, AF, and other abnormal rhythms from the ECG data. Accordingly, the one or more hardware processors 104, are configured to calculate a plurality of attention weights, at step 408, by applying the final output tensor obtained in step 406, to an attention layer using tanh and softmax functions, wherein the plurality of attention weights are associated with key features in a feature map. The step of calculating a plurality of attention weights by applying the final output tensor to an attention layer is preceded by reshaping a feature map associated with the final output tensor since the attention layer requires a single dimension. The attention layer enables focus on key locations in the ECG data to extract relevant features required for classification from the feature map which is complex. The one or more hardware processors 104, are configured to flatten and apply, at step 410, an output of the attention layer to a fully connected layer followed by a softmax layer for classifying heart rhythms. In an embodiment, the fully connected layer or dense layer has 64 nodes and an ReLU activation function layer. In an embodiment, the kernel dimension is selected as 5x1 throughout the architecture with a stride length of 1. In an embodiment, the pooling window for maxpool operations in the ResNet block 2 is taken as 2x1.
The convolution operations in the ResNet blocks are performed using depthwise separable convolution algorithm proposed in Mobilenets: Efficient convolutional neural networks for mobile vision applications by Howard et al. In depthwise separable convolution, a depthwise spatial convolution is first performed on each input channel separately followed by a pointwise convolution that mixes resulting output channels. It is an efficient way of performing the convolution task with fewer mathematical operations which also results in a lesser number of trainable parameters. If 1D input data having dimension of lx1 and c input channels is considered for performing the convolution operation with kernel dimension of kx1 and p output channels, it takes l*k*c*p number of mathematical operations along with k*c*p trainable parameters for the NN to perform the standard convolution operation.
In depthwise separable convolution, it takes l*k*c*p operations and k*c trainable parameters for the NN to perform depthwise convolution for the input channels. An additional l*c*p operations are needed to perform a pointwise convolution for all output channels which requires c*p trainable parameters. Hence, in depthwise separable convolution, the total number of mathematical operations is l*c*(k+p) and the number of trainable parameters is c*(k+p). In comparison to the standard convolution, the number of operations and trainable parameters are both reduced by a factor of k*p/(k+p). The resulting model requires less memory space to store. It also ensures a faster model inference on resource-constrained tiny devices or edge devices due to lesser mathematical operations.
In accordance with an embodiment of the present disclosure, the baseline model is trained end-to-end to minimize the categorical cross-entropy loss using an Adam optimizer. The training was done for 200 epochs setting a learning rate of 0.0005. A mini-batch size was taken as 64. The NN was implemented in Python® 3.8.10 using TensorFlow 2.6.0 library. Initial weights of the convolution and the dense layers were set using Xavier initialization which ensures that the variance of activations were the same across every layer. The bias terms were initialized by zeros. The training was done on a computer having Intel® Xeon(R) 16-core processor, 64 GB of RAM, and an NVidia GeForce GTX 1080 Ti graphics processing unit.
The baseline model explained has a model size of 1.8 MB, which needs to be significantly compressed in order to run on low-powered commercial microcontrollers having a RAM size of few hundred kilobytes. Microcontrollers can handle only a limited number of deep network architectures. In accordance with the present disclosure, the architecture as well as number of trainable parameters in the baseline model are restricted to facilitate effective optimization without compromising performance of the model. Accordingly, the one or more hardware processors 104, are configured to optimize the baseline model for a tiny device (or an edge device), at step 412, using a plurality of optimization approaches or techniques collaboratively that provides a much smaller and faster model. The plurality of optimization approaches include (i) weight pruning on the baseline model, (ii) weight clustering of a pruned base line model and (iii) applying quantization aware training to a pruned and clustered base line model to obtain an integer-only model. By encompassing the various optimization approaches explained hereinafter, a best balance of target characteristics such as inference speed, model size and accuracy is achieved.
In accordance with the present disclosure, first the baseline model is optimized by applying weight pruning. Significant amount of weights in a large neural networks are of very small values. They generally have a minimum impact on overall model performance. Magnitude-based weight pruning introduces sparsity to different layers in the network by eliminating few elements. This is done by gradually zeroing out some of the low-magnitude weights based on their L2-norm. Sparse models are easy to compress, and occupy less memory space in the target device. The amount of sparsity is introduced to the baseline model in an iterative manner, and the corresponding impact on overall performance is noted. Being critical feature extraction layers, the attention mechanism, the dense layer, and the subsequent softmax layer are skipped from pruning. First 10% of sparsity is added to a selected layer of the baseline model and then gradually the amount is increased using a polynomial decay function, and eventually stopped at 40% of sparsity. In every step, the sparse model is retrained for 20 epochs using an Adam optimizer at a learning rate of 0.00005 to fine-tune the performance.
The pruned model is further compressed by weight clustering that reduces the number of unique weight values in the model. The weights in a particular layer are divided into N different clusters using K-Means algorithm. All weight values in a cluster are represented by a corresponding cluster centroid. A lesser number of clusters can create a more compressed model, but with a negative impact on model accuracy. In accordance with an embodiment of the present disclosure, the weight values in each layer of the pruned model are divided into 24 clusters to get an optimum performance. The cluster centroids are initialized by K-Means++ algorithm. The resulting model is fine-tuned by retraining for 20 epochs at a learning rate of 0.00003 using an Adam optimizer and the performance is noted. Similar to weight pruning, the critical layers are again skipped from clustering.
Many microcontrollers do not have direct hardware support for floating point arithmetic. In accordance with the present disclosure, as part of the collaborative optimization, finally, an integer-only model is obtained from the pruned and clustered model by applying quantization aware training. In quantization aware training, the weights are reduced from 32-bit floating points to 8-bit integers, which results in an approximately 4x smaller model. Integer-based models also have an improved CPU latency on microcontrollers. Lowering the precision from floating points can have a negative impact on accuracy. Hence, in accordance with the present disclosure, the model is again fine-tuned via retraining to mitigate the quantization error via backpropagation of the error. The following scale is defined to map the weight values in the floating point range to the values in the quantized range in each layer.
scale=(f_max-f_min)/(q_max-q_min )
Here, f_max and f_min represent maximum and minimum values in floating point precision, and q_max and q_min represent maximum and minimum values in a quantized range.
The TensorFlow Lite library is used for model optimization in a compressed TFLite format, and the deployable microcontroller equivalent C++ libraries are created using the TensorFlow Lite for Microcontrollers provided by David, R et al. in Embedded machine learning on tinml systems. The final model has a size of 144 KB which is around 12x smaller and 8x faster than the baseline model. Training of the baseline model, optimization, and converting to the equivalent TFLite model was done on a desktop. The optimized light-weight model was tested on two target hardware. The initial proof of concept was done on Raspberry Pi 3 Model B+ and the final deployment was done on Arduino Nano 33 BLE sense. Raspberry Pi has Cortex-A53 (ARMv8) 64-bit processor at a clock speed of 1.4 GHz and 1 GB of RAM. Arduino Nano 33 is a microcontroller-based development board having an operating voltage of 3.3 V. It is provided with an ARM Cortex-M4 processor at a clock speed of 64 MHz. It has 256 KB of RAM and 1 MB of flash memory which is enough to store and load our optimized model of 144 KB to make inferences.
The optimization approaches are lossy causing a performance drop in every stage. Hence the error in a resulting model after every optimization is compensated by applying a retraining for fine-tuning the performance. The retraining is done at a learning rate smaller than the baseline model. In each stage of optimization, while compressing the model, a maximum allowable performance drop based on the application or domain knowledge is set. In accordance with the present disclosure, after each optimization approach, the maximum allowable performance drop is set to be within a predefined threshold compared to the performance after a previous optimization approach to balance the size of the resulting integer-only model versus accuracy of classification. In an embodiment, the predefined threshold is 1%.

EXPERIMENTAL RESULTS
The PhysioNet Challenge 2017 training database provided by Clifford, G.D. et al. was used for training and evaluation of the NN of the present disclosure. The annotated dataset has 8528 single-lead ECG segments with four target labels or classes. In the highly imbalanced database, 5124 recordings are normal sinus rhythms, 771 recordings are AF, 2557 recordings are other types of abnormal rhythms, and remaining 46 recordings are too noisy to annotate. The original signals were sampled at 300 Hz. The noisy recordings were omitted from the study because of the small amount of available data. Single-lead ECGs are in general noisier than standard 12-lead data and hence the classification is more challenging. 80% of all data from the database was randomly selected to form the training set and the remaining portion was kept as the test set. Tuning of various network hyper-parameters of the baseline model was done in a random search manner applying 5-fold cross validation on the training data. Training of the final baseline model and retraining during optimization were done on the entire training data before final evaluation on the test set.
Duration of the original recordings varies from 9 seconds to 61 seconds with a mean duration of 32.5 seconds. A longer data contains more important disease markers, but the high computational latency compromises real-time classification performance in the target platform. The input data duration was selected as 35 seconds in the NN of the present disclosure. The shorter recordings in the database were appended on time-axis to get the desired length of input, whereas the longer recordings were truncated into multiple independent segments. It was strongly enforced that multiple segments obtained from the original recording were not mixed up in the training and test sets and also during cross validation analysis. The original signals were down-sampled at 100 Hz for reducing the computational load of the network. In order to improve the diversity of the training set for a generalized performance, various data augmentation techniques like addition of white Gaussian noise, band-pass filtering, baseline shift etc. were incorporated to extend the amount of data in the training set.
Classification performance of the baseline model: The classification performance is reported in terms of F1-score of detecting normal (?F1?_norm), AF(?F1?_af), and other abnormal rhythms ( ?F1?_oth). The overall F1-score (?F1?_chal), the metric provided in the PhysioNet Challenge 2017 is also provided to form the leader-board. The metric measures the mean F1-scores for all three target classes.
?F1?_chal=(?F1?_norm+?F1?_af+?F1?_oth)/3
Table 1 below summarizes the average classification performance in terms of F1-scores by applying 5-fold cross validation on the training set and also reports the performance achieved on the test set when the NN is trained on the entire training set. The baseline model of the present disclosure is compared with a plain CNN architecture having similar structure in terms of different layers including the attention mechanism but without having any skip connection.
Table 1: Classification performance of the baseline ResNet in comparison with a plain
CNN model having a similar architecture on the PhysioNet Challenge 2017 database.
Architecture Average F1-scores in a 5-fold cross validation on the training set Performance on the test set
Plain CNN structure without skip connections ?F1?_norm=0.95
?F1?_af=0.80
?F1?_oth=0.92
?F1?_chal=0.89 ?F1?_norm=0.95
?F1?_af=0.76
?F1?_oth=0.89
?F1?_chal=0.87
Baseline model of the present disclosure ?F1?_norm=0.97
?F1?_af=0.87
?F1?_oth=0.94
?F1?_chal=0.93 ?F1?_norm=0.96
?F1?_af=0.84
?F1?_oth=0.93
?F1?_chal=0.91
It is noted that although both the NNs are quite similar in terms of overall architecture and number of trainable parameters, the baseline model using CNN based ResNet blocks provides a much better classification performance due to the skip connections that ensure a better feature learning in the deep architecture. The improvement achieved by the ResNet architecture over the plain CNN model can be particularly seen in detection of AF which is the minority class in the database. FIG.5A through FIG.5C illustrate Receiver Operating Characteristic (ROC) curves of the baseline ResNet model and corresponding Area Under the Curve (AUC) values for all three target classes normal sinus rhythm, Atrial Fibrillation (AF) and other abnormal rhythms, on the test set of PhysioNet database, in accordance with some embodiments of the present disclosure.
Table 2 below shows that the baseline model of the present disclosure outperforms a number of popular conventional approaches that reported their accuracy on the PhysioNet Challenge 2017 database using deep architectures like ResNet, CNN, Bi-LSTM, and Neural Architecture Search (NAS). For performance comparison, the performance reported by the prior arts on the publicly available training part of the PhysioNet Challenge 2017 data were considered.
Table 2: Comparison of the proposed baseline ResNet model with prior approaches reported on the PhysioNet Challenge 2017 database.
Authors Reference Reported performance in ?F1?_chal
Warrick et al. 2017 Computing in Cardiology (CinC) Overall F1-score=0.83
Plesinger et al. Parallel use of a convolutional neural network and bagged tree ensemble for the classification of Holter ECG Overall F1-score=0.83
Shi et al. Automated atrial fibrillation detection based on feature fusion using discriminant canonical correlation analysis Overall F1-score=0.88
Najmeh Fayyazifar An accurate CNN architecture for atrial fibrillation detection using
neural architecture search Overall F1-score=0.82
Jiang et al. Hybrid attention-based deep learning network for automated arrhythmia classification Overall F1-score=0.88 using cross validation
Baseline model of the present disclosure Overall F1-score=0.93 using cross validation and 0.91 on test set
Brief description of the methodology in each of the references above is as provided below.
Warrick et al.: The approach used a combination of CNNs and a sequence of long short-term Memory units, with pooling, dropout and normalization techniques to design the classifier.
Plesinger et al.: The authors used two machine learning methods in parallel, a Bagged Tree Ensemble (BTE) process and a CNN connected to a shallow neural network. The two classifiers are combined for final prediction.
Shi et al.: The authors proposed discriminant canonical correlation analysis-based feature fusion, which integrates traditional features extracted by expert knowledge and deep learning features extracted by the residual network and gated recurrent unit network for classification.
Najmeh Fayyazifar: A Neural Architecture Search (NAS) algorithm was designed to discover an accurate classifier using CNN and RNN operations.
Jiang et al.: A hybrid attention-based deep learning network was proposed using residual network and bidirectional long short-term memory to obtain fusion features containing local and global information and improve the interpretability of the model through the attention mechanism.
Baseline model of the present disclosure: Residual network with attention mechanism.
Classification performance of the optimized model: A compressed deep learning model is small enough to run on resource-constrained target hardware (e.g., a mobile communication device, microcontroller(s), processing unit(s), and the like), but the performance is often compromised compared to the baseline model. A trade-off between model size and classification performance needs to be maintained during optimization. FIG.6A illustrates impact of weight pruning on the test set by gradually increasing the amount of sparsity in the baseline ResNet model, in accordance with some embodiments of the present disclosure. In the plot, the model performance is shown in terms of the challenge metric (?F1?_chal). It was observed that there is a significant drop in classification performance only when the amount of sparsity is more than 40%. Hence, 40% of sparsity was added to the model. FIG.6B illustrates impact of weight clustering on the pruned model, in accordance with some embodiments of the present disclosure. The optimization was started with 64 clusters in each layer to divide the weights. In spite of producing good classification performance, it caused no compression at all. Subsequently, the number of clusters was reduced. The optimum performance with a reduced model size was achieved when 24 clusters were used. Table 3 summarizes the impact of the collaborative optimization described in the present disclosure, in various stages in terms of model size and overall F1-score (?F1?_chal) on the test set.
Table 3: Stage-wise impact of collaborative optimization on the baseline model
Stage Optimization technique Overall F1-score (?F1?_chal) Resultant model size
0 Baseline model (no optimization) 0.910 1.758 MB
1 After weight pruning by adding 40% sparsity in each layer 0.902 632 KB
2 After weight clustering via 24 clusters in each layer 0.894 416 KB
3 After quantization aware training (resulting integer-only model) 0.885 144 KB
The baseline model has a size of 1.758 MB which cannot be deployed on the target microcontroller having a RAM size of 256 KB and 1 MB of flash memory. Apart from loading the model, the RAM needs to have available memory space for storing the input ECG data and various intermediate variables to make an inference. In stage 1, the compressed model size gets reduced to 632 KB after magnitude-based weight pruning. In stage 2, the pruned model is further reduced to 416 KB after weight clustering. The final model size becomes 144 KB after quantization aware training which is around 12x smaller than the baseline model. The final model reports an overall F1-score (?F1?_chal) of 0.885 on the test set which still outperforms the prior approaches discussed in Table 2 above, but with a much smaller model size.
Deployment on target Microcontroller: The optimized model of the present disclosure was used to design an end-to-end prototype system for on-device cardiac monitoring using commercially available components. MAX86150, an integrated ECG-PPG breakout board was used for recording of the ECG data. The board has an operating voltage of 3.3 V and has three leads with disposable electrodes for attachment to the human body for recording of ECG data as analogue voltage. It communicates with an Arduino Nano 33 BLE Sense microcontroller via the I2C interface which hosts the optimized deep learning model trained and evaluated on the PhysioNet Challenge database. The recorded data was sampled at 100 Hz and the continuous data-stream is sent to the microcontroller for making an inference on every 35 seconds of accumulated data. A five-point moving average filter was applied to the input data for noise cleaning. TensorFlow Lite for Microcontrollers was used to convert the TensorFlow model into the equivalent C++ libraries for Arduino. Since TensorFlow Lite has a limited number of supported APIs for deep learning models, few layers of our model (e.g. attention mechanism, batch normalization) were rewritten and slightly modified to maintain the desired performance on the target platform. Real-time performance of the prototype system was evaluated on a small population of 10 consenting subjects including normal subjects, subjects having chronic AF, and other types of arrhythmias like bradycardia, tachycardia etc. The optimized model of the present disclosure achieves a classification accuracy of 90% on the small test population. Average inference latency for a 35 seconds long ECG window is measured as 243 milliseconds on the Arduino Nano board.
Cardiovascular diseases are the key reason behind significant mortality and morbidity, causing 32% of all global deaths every year. They are often asymptomatic in early stages. Many patients seek medical attention in an advanced stage of a cardiac disease, which not only requires a prolonged hospital stay or a possible surgery but also reduces the chance of recovery. Artificial Intelligence (AI) driven on-device health monitoring systems are thus increasingly gaining attention in recent times as part of preventive healthcare. In spite of deployment challenges, such systems can ensure very low latency, reduced power consumption, increased security, and privacy which is particularly important in healthcare applications dealing with sensitive patient information. The CNN based ResNet architecture provided in the present disclosure was successfully evaluated on a large public database. Subsequently, the baseline model was optimized as explained above to realize a system for real-time analysis of received ECG data using a commercially available microcontroller. The experimental results above indicate that the baseline model as well as the optimized model outperform a number of conventional approaches.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
, Claims:We Claim:
A processor implemented method (400) comprising:
receiving, via one or more hardware processors, an electrocardiogram (ECG) data in real-time from a subject (402);
applying, via the one or more hardware processors, the received ECG data to a head block of a Neural Network, wherein the head block comprises a first batch normalization layer, a first convolution layer and a first Rectified Linear Unit (ReLU) activation layer to obtain an input tensor (404);
processing the input tensor, using a baseline model executed by the one or more hardware processors, to obtain a final output tensor, wherein the baseline model comprises four sets of a Neural Network block 1 and three sets of a Neural Network block 2, positioned alternating each other, each set comprising one or more Neural Network blocks (406);
calculating a plurality of attention weights, via the one or more hardware processors, by applying the final output tensor to an attention layer using tanh and softmax functions, wherein the plurality of attention weights are associated with key features in a feature map (408);
flattening and applying, via the one or more hardware processors, an output of the attention layer associated with the key features in the feature map, to a fully connected layer followed by a softmax layer for classification of heart rhythms (410); and
optimizing the baseline model for a tiny device, via the one or more hardware processors, using a plurality of optimization approaches collaboratively, wherein the plurality of optimization approaches include (i) weight pruning on the baseline model, (ii) weight clustering of a pruned base line model and (iii) applying quantization aware training to a pruned and clustered base line model to obtain an integer-only model such that a maximum allowable performance drop after each optimization approach is within a predefined threshold compared to the performance after a previous optimization approach to balance the size of a resulting integer-only model versus accuracy of the classification of the heart rhythms (412).

The processor implemented method as claimed in claim 1, wherein the Neural Network block 1 is a Convolutional Neural Network (CNN) based Residual Network (ResNet) block 1 comprising a set of weight layers F1, wherein F1 comprises (i) a first pair of convolution layers having f1 number of filters, (ii) a second ReLU activation layer and a second batch normalization layer between the first pair of convolution layers; and the Neural Network block 2 is a CNN based ResNet block 2 comprising (a) a second set of weight layers F2, comprising (i) a second `pair of convolution layers having f2 number of filters, and (ii) a first maxpool layer, a fourth ReLU activation layer and a third batch normalization layer between the second pair of convolution layers; and (b) a skip layer comprising a second convolution layer having a single filter followed by a second maxpool layer.

The processor implemented method as claimed in claim 2, wherein the four sets of the ResNet block 1 include (i) two ResNet block1, (ii) three ResNet block1, (iii) four ResNet block1 and (iv) two ResNet block1 respectively and each of the three sets of the ResNet block 2 includes one ResNet block 2.

The processor implemented method as claimed in claim 2, wherein the step of processing the input tensor to obtain the final output tensor comprises:
processing the input tensor by the ResNet block 1 in each set of ResNet block 1 by:
using the input tensor to the ResNet block 1 to generate a first intermediate tensor; and
applying a sum of the input tensor and the first intermediate tensor to a third ReLU activation layer to obtain an intermediate output tensor having a same dimension as the input tensor, wherein the intermediate output tensor is the input tensor to a subsequent ResNet block 1 in the set of ResNet block 1; and
processing the input tensor by the ResNet block 2 in each set of ResNet block 2 by:
using the intermediate output tensor of the ResNet block 1 from the set of ResNet block 1 preceding the ResNet block 2 as the input tensor to the ResNet block 2 to generate a second intermediate tensor;
reducing dimension of the input tensor to the ResNet block 2, by passing through the skip layer; and
applying a sum of the input tensor to the ResNet block 2 with reduced dimension and the second intermediate tensor to a fifth ReLU activation layer to obtain an intermediate output tensor of the ResNet block 2 that serves as the input sensor to the ResNet block 1 from the set of ResNet block 1 succeeding the ResNet block 2,
wherein an output tensor of a last ResNet block 1 in the four sets of ResNet block 1 is the final output tensor.

The processor implemented method as claimed in claim 1, wherein the step of calculating a plurality of attention weights by applying the final output tensor to an attention layer is preceded by reshaping a feature map associated with the final output tensor.

A system (100) comprising:
a memory (102) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
receive an electrocardiogram (ECG) data in real-time from a subject;
apply the received ECG data to a head block of a Neural Network, wherein the head block comprises a first batch normalization layer, a first convolution layer and a first Rectified Linear Unit (ReLU) activation layer to obtain an input tensor;
process the input tensor, using a baseline model executed by the one or more hardware processors, to obtain a final output tensor, wherein the baseline model comprises four sets of a Neural Network block 1 and three sets of a Neural Network block 2, positioned alternating each other, each set comprising one or more Neural Network blocks;
calculate a plurality of attention weights by applying the final output tensor to an attention layer using tanh and softmax functions, wherein the plurality of attention weights are associated with key features in a feature map;
flatten and apply an output of the attention layer to a fully connected layer associated with the key features in the feature map, followed by a softmax layer for classification of heart rhythms; and
optimize the baseline model for a tiny device using a plurality of optimization approaches collaboratively, wherein the plurality of optimization approaches include (i) weight pruning on the baseline model, (ii) weight clustering of a pruned base line model and (iii) applying quantization aware training to a pruned and clustered base line model to obtain an integer-only model such that a maximum allowable performance drop after each optimization approach is within a predefined threshold compared to the performance after a previous optimization approach to balance the size of a resulting integer-only model versus accuracy of the classification of the heart rhythms.

The system as claimed in claim 6, wherein the Neural Network block 1 is a Convolutional Neural Network (CNN) based Residual Network (ResNet) block 1 comprising a set of weight layers F1, wherein F1 comprises (i) a first pair of convolution layers having f1 number of filters, (ii) a second ReLU activation layer and a second batch normalization layer between the first pair of convolution layers; and the Neural Network block 2 is a CNN based ResNet block 2 comprising (a) a second set of weight layers F2, comprising (i) a second pair of convolution layers having f2 number of filters, and (ii) a first maxpool layer, a fourth ReLU activation layer and a third batch normalization layer between the second pair of convolution layers; and (b) a skip layer comprising a second convolution layer having a single filter followed by a second maxpool layer.

The system as claimed in claim 7, wherein the four sets of the ResNet block 1 include (i) two ResNet block1, (ii) three ResNet block1, (iii) four ResNet block1 and (iv) two ResNet block1 respectively and each of the three sets of the ResNet block 2 includes one ResNet block 2.

The system as claimed in claim 7, wherein the one or more processors are configured to process the input tensor to obtain the final output tensor by:
processing the input tensor by the ResNet block 1 in each set of ResNet block 1 by:
using the input tensor to the ResNet block 1 to generate a first intermediate tensor; and
applying a sum of the input tensor and the first intermediate tensor to a third ReLU activation layer to obtain an intermediate output tensor having a same dimension as the input tensor, wherein the intermediate output tensor is the input tensor to a subsequent ResNet block 1 in the set of ResNet block 1; and
processing the input tensor by the ResNet block 2 in each set of ResNet block 2 by:
using the intermediate output tensor of the ResNet block 1 from the set of ResNet block 1 preceding the ResNet block 2 as the input tensor to the ResNet block 2 to generate a second intermediate tensor;
reducing dimension of the input tensor to the ResNet block 2, by passing through the skip layer; and
applying a sum of the input tensor to the ResNet block 2 with reduced dimension and the second intermediate tensor to a fifth ReLU activation layer to obtain an intermediate output tensor of the ResNet block 2 that serves as the input sensor to the ResNet block 1 from the set of ResNet block 1 succeeding the ResNet block 2,
wherein an output tensor of a last ResNet block 1 in the four sets of ResNet block 1 is the final output tensor.

The system as claimed in claim 6, wherein the one or more processors are configured to reshape a feature map associated with the final output tensor prior to calculating the plurality of attention weights by applying the final output tensor to an attention layer.

Dated this 16th Day of September 2022

Tata Consultancy Services Limited
By their Agent & Attorney

(Adheesh Nargolkar)
of Khaitan & Co
Reg No IN-PA-1086

Documents

Application Documents

#	Name	Date
1	202221053023-STATEMENT OF UNDERTAKING (FORM 3) [16-09-2022(online)].pdf	2022-09-16
2	202221053023-REQUEST FOR EXAMINATION (FORM-18) [16-09-2022(online)].pdf	2022-09-16
3	202221053023-FORM 18 [16-09-2022(online)].pdf	2022-09-16
4	202221053023-FORM 1 [16-09-2022(online)].pdf	2022-09-16
5	202221053023-FIGURE OF ABSTRACT [16-09-2022(online)].pdf	2022-09-16
6	202221053023-DRAWINGS [16-09-2022(online)].pdf	2022-09-16
7	202221053023-DECLARATION OF INVENTORSHIP (FORM 5) [16-09-2022(online)].pdf	2022-09-16
8	202221053023-COMPLETE SPECIFICATION [16-09-2022(online)].pdf	2022-09-16
9	Abstract1.jpg	2022-11-29
10	202221053023-FORM-26 [29-11-2022(online)].pdf	2022-11-29
11	202221053023-Proof of Right [03-02-2023(online)].pdf	2023-02-03
12	202221053023-FER.pdf	2025-06-06
13	202221053023-FER_SER_REPLY [17-10-2025(online)].pdf	2025-10-17
14	202221053023-DRAWING [17-10-2025(online)].pdf	2025-10-17
15	202221053023-CLAIMS [17-10-2025(online)].pdf	2025-10-17

Search Strategy

1	Search_Strategy_MatrixE_10-01-2025.pdf