Method And System For Training A Neural Network For Time Series Data

Method And System For Training A Neural Network For Time Series Data Classification

Abstract: Neural networks can be used for time series data classification. However in a K-shot scenario in which sufficient training data is unavailable to train the neural network, the neural network may not produce desired results. Disclosed herein are a method and system for training a neural network for time series data classification. In this method, by processing a plurality of task specific data, a system generates a set of updated parameters, which is further used to train a neural network (network) till a triplet loss is below a threshold. The network is trained on a diverse set of few- shot tasks sampled from various domains (e.g. healthcare, activity recognition, and so on) such that it can solve a target task from another domain using only a small number of training samples from the target task. [To be published with FIG. 2]

Patent Information

Application #

Filing Date

28 August 2019

Publication Number

22/2022

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

kcopatents@khaitanco.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-03-05

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point Mumbai 400021 Maharashtra, India

Inventors

1. MALHOTRA, Pankaj

Tata Consultancy Services Limited Galaxy Business Park, Plot no. A-44 & A45, Ground , 1st to 05th floor & 10th floor Block C&D, Sector 62, Noida 201309 Uttar Pradesh, India

2. NARWARIYA, Jyoti

Tata Consultancy Services Limited Galaxy Business Park, Plot no. A-44 & A45, Ground , 1st to 05th floor & 10th floor Block C&D, Sector 62, Noida 201309 Uttar Pradesh, India

3. VIG, Lovekesh

Tata Consultancy Services Limited Block-C, Kings Canyon, ASF Insignia, Gurgaon- Faridabad, Gawal Pohari, Gurgaon - 122003 Haryana, India

4. SHROFF, Gautam

Tata Consultancy Services Limited Block-C, Kings Canyon, ASF Insignia, Gurgaon- Faridabad, Gawal Pohari, Gurgaon 122003 Haryana, India

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention
METHOD AND SYSTEM FOR TRAINING A NEURAL NETWORK FOR TIME SERIES DATA CLASSIFICATION
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD [001] The disclosure herein generally relates to training a data model, and, more particularly, to a method and system for training a neural network for time series data classification.
BACKGROUND
[002] A time series data set represents data pertaining to specific parameters, collected over a period of time. Such data find application in a variety of fields. For example, weather data collected over a period of time can be used for generating weather predictions. In case of an industrial plant monitoring, data pertaining to various parameters of the plant can be used for assessing/predicting plant performance.
[003] Data can be collected using appropriate sensors. Amount/quantity of such data collected over a period of time could be huge. As analyzing/processing huge quantity of data can be a cumbersome task, appropriate time series data classification approaches can be used to classify and extract required data, over time windows. Considering volume and complexity of such data collected over a period of time, the time series classification when handled manually can be a cumbersome task and may even be prone to errors.
[004] As machine learning is a popular and evolving field, the same can be used to automate the time series data classification. In the machine learning approach, machine learning algorithms build mathematical data models using sample data (also known as training data). Accuracy with which a data model can perform a task depends on quality and quantity of training data used to train/generate the model. However, in some scenarios, the amount of training data available is minimal, and this may not be sufficient for state of art training approaches to generate appropriate data models.
SUMMARY [005] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical

problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor implemented method for time series data classification is provided. In this method, at least one time series classification task is collected as input, via one or more hardware processors, wherein the time series classification task comprises a training set of time series data and number of classes across the time series classification tasks varies. Further, a neural network is trained using at least one initial parameter to solve the at least one time series classification task, via the one or more hardware processors. The training of the neural network comprises iteratively performing till an average triplet loss on a plurality of validation tasks is less than a threshold, sampling a pre-defined number of time series classification tasks; consolidating a set of updated parameters from the sampled time series classification tasks; obtaining a final set of updated parameters from the consolidated set of updated parameters; and using the final set of updated parameters along with data pertaining to the at least one time series classification task to train the neural network. Then the time series data classification is performed using the neural network, via the one or more hardware processors.
[006] In another aspect, a system for time series data classification is provided. The system includes one or more hardware processors, one or more communication interfaces, and one or more memory modules storing a plurality of instructions. The plurality of instructions when executed cause the one or more hardware processors to collect at least one time series classification task as input, wherein the time series classification task comprises a training set of time series data. The system then trains a neural network using at least one initial parameter to solve the at least one time series classification task, wherein training the neural network comprises iteratively perform till an average triplet loss on a plurality of validation tasks is less than a threshold, sampling a pre-defined number of time series classification tasks; consolidating a set of updated parameters from the sampled time series classification tasks; obtaining a final set of updated parameters from the consolidated set of updated parameters; and using the final set of updated parameters along with data pertaining to the at least one time series

classification task to train the neural network. The system then performs the time series data classification using the neural network.
[007] In yet another aspect, a non-transitory computer readable medium for time series data classification is provided. The non-transitory computer readable medium performs the method given below to perform the time series data classification by executing the following method. In this method, at least one time series classification task is collected as input, via one or more hardware processors, wherein the time series classification task comprises a training set of time series data and number of classes across the time series classification tasks varies. Further, a neural network is trained using at least one initial parameter to solve the at least one time series classification task, via the one or more hardware processors. The training of the neural network comprises iteratively performing till an average triplet loss on a plurality of validation tasks is less than a threshold, sampling a pre-defined number of time series classification tasks; consolidating a set of updated parameters from the sampled time series classification tasks; obtaining a final set of updated parameters from the consolidated set of updated parameters; and using the final set of updated parameters along with data pertaining to the at least one time series classification task to train the neural network. Then the time series data classification is performed using the neural network, via the one or more hardware processors.
[008] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS [009] The accompanying drawings, which are incorporated in and
constitute a part of this disclosure, illustrate exemplary embodiments and, together
with the description, serve to explain the disclosed principles:
[010] FIG. 1 illustrates an exemplary system for training a neural
network for time series data classification, according to some embodiments of the
present disclosure.

[011] FIG. 2 is a flow diagram depicting steps involved in the process of training a neural network for time series data classification, using the system of FIG. 1, according to some embodiments of the present disclosure.
[012] FIG. 3 depicts Convolutional Neural Network architecture used by the system of FIG. 1, according to some embodiments of the present disclosure.
[013] FIG. 4 depicts few-shots learning approach being performed by the system of FIG. 1, according to some embodiments of the present disclosure.
[014] FIG. 5A through FIG. 5D are example diagrams depicting comparison of 5-shot univariate time series data classification being performed by the system of FIG. 1 with state of art techniques, according to some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS [015] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following claims.
[016] Referring now to the drawings, and more particularly to FIG. 1 through FIG. 5D, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[017] FIG. 1 illustrates an exemplary system for training a neural network for time series data classification, according to some embodiments of the present disclosure. The system 100 includes one or more hardware processors 102, communication interface(s) or input/output (I/O) interface(s) 103, and one or

more data storage devices or memory module 101 operatively coupled to the one or more hardware processors 102. The one or more hardware processors 102 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, graphics controllers, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) are configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
[018] The communication interface(s) 103 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the communication interface(s) 103 can include one or more ports for connecting a number of devices to one another or to another server.
[019] The memory module(s) 101 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more modules (not shown) of the system 100 can be stored in the memory 101. The memory module(s) 101 is configured to store operational instructions which when executed cause one or more of the hardware processor(s) 102 to perform various actions associated with the training of the neural network and time series data classification using the neural network. The various steps involved in the process of training the neural network and the time

series data classification are explained with description of FIG.2. All the steps in FIG. 2 are explained with reference to the system of FIG. 1.
[020] FIG. 2 is a flow diagram depicting steps involved in the process of training a neural network for time series data classification, using the system of FIG. 1, according to some embodiments of the present disclosure. Steps 202 through 210 are different stages of data processing involved in the training of the neural network. The system 100 collects (202) a plurality of time series classification tasks (alternately referred to as ‘tasks’), as input. In an embodiment, the tasks may be of different types. For example, one task corresponds to classification of time series data related to weather, whereas another task corresponds to classification of time series data related to an industrial process monitoring.
[021] The system then samples (204) each of a specific number of tasks from the plurality of tasks, wherein the ‘specific number of tasks’ is pre-defined or is dynamically configured with the system 100. For example, based on the requirements, an authorized person may use appropriate interface provided by the communication interface(s) 103 to define value of the ‘specific number of tasks’. Further, any suitable sampling technique can be used to sample the specific number of tasks.
[022] By sampling the specific number of tasks, the system 100 consolidates (206) a set of updated parameters corresponding to each of the specific number of tasks. After consolidating the set of updated parameters, the system 100 obtains (208) a final set of updated parameters from the consolidated sets of updated parameters of all of the specific number of tasks, using equation (4).
[023] The system 100 uses the final set of updated parameters to train (210) a neural network. When training of the neural network is to being performed in a scenario in which sufficient training data is not available, the final set of updated parameters generated by the system 100 can be used to substitute missing data and in turn to train the neural network.

[024] After training the neural network, the system 100 performs time series classification of any task given using the neural network and uses a triplet loss based approach to determine accuracy of the classification that had been performed by learning rich time-series embedding for ensuring higher accuracy in given classification tasks. If an average triplet loss calculated/determined by the system 100 is found to be less than a threshold, then the system 100 may store the neural network in latest state as a ‘Final neural network’, which is further used by the system 100 to perform time series classification for input tasks. If the average triplet loss calculated/determined by the system 100 is found to be exceeding the threshold, then the system repeats 204 to 210 to update and fine-tune the neural network. In various embodiments, the steps in method 200 may be performed in the same order as depicted in FIG. 2 or in any alternate order which is technically feasible. In another embodiment, one or more of the steps in method 200 may be omitted as per requirements.
[025] The method 200 is further elaborated below:
[026] Problem in the time series classification domain that is being
addressed by the system 100 is training of neural network in the absence of
sufficient training data. This is also known as K-shot learning problem. Consider a
K-shot learning problem for a time series classification data sampled from a
distribution p ( T) that requires learning a multi-way classifier for a test task given
only labeled time series instances per class. The system 100 is used to obtain a
neural network with parameters ø that is trained to solve several K-shot tasks
sampled from p( T). The K-shot learning tasks are divided into three sets: a
training meta set Str , a validation meta-set Sva , and a testing meta-set Ste . The
training meta-set is used to obtain the parameters Ø , the validation meta-set is used for model selection, and the testing meta-set is used for evaluating results of the time series classification being performed by the system 100.
[027] Each task instance Tj ~ p ( T ) in Str and Sva consists of a labeled
training set of univariate time series Dtrj = {(xn,kj ,yn,kj )Vk=1....K;n=
1....Nj}, where K is number of univariate time series instances for each of the Nj

classes. Each univariate time series x = x1 ,x2 ,…xT with xt ∈R for t = 1, …, T,
where T is length of time series and y is class label. Tasks in Str and Sva only
contain a training set, whereas each task in Ste contains a testing set Dtej =
{( xn,kj , yn,kj )vk = 1....K';n = 1....Nj } apart from a training set Dtrj . The
classes in Dtrj and Dtej are same, whereas classes across tasks are usually
different. For any xn,kj from Dtej , goal is to estimate corresponding label yn,kj
Ø
obtained by using an updated set of parameters obtained by fine-tuning the
neural network using K *Nj labeled samples from Dtrj .
[028] The neural network being considered by the system 100 may be of any suitable type. For the purpose of explanation, a residual network (ResNet) consisting of multiple convolutional blocks with shortcut residual connections between them, followed a global average pooling layer such that the network does not have any feedforward layers at the end is considered. Each convolutional block consists of a convolutional layer followed by a batch normalization (BN) layer which acts as a regularizer. Each BN layer is in turn followed by a ReLU layer. This architecture is depicted in FIG. 3.
[029] In order to process a newly assigned time series classification task,
the neural network should be able to extract temporal features at multiple time
scales and further need to ensure that the neural network can generalize to time
series of varying lengths across tasks. In order to ensure this, filters of multiple
lengths are used in each convolution block to capture temporal features at various
scales. The residual network takes a univariate time series x of any length T as
input and converts it to a fixed dimensional feature vector z ∈ Rm , where m is
number of filters in a final convolution layer. All trainable parameters of the residual network consisting of filter weights and biases across convolution layers and BN layer by ø.
[030] Use of the triplet loss based approach as training objective to obtain a desirable output allows generalization to varying number of classes without introducing any additional task specific parameters. Triplet loss relies on pairwise distance between representations of time series samples from within and

across classes, irrespective of the number of classes. Using triplet loss at time of
fine-tuning for the test task, therefore, allows the neural network to adapt to a
given few-shot classification task without introducing any additional task-specific
parameters. Triplets consist of two matching time series and a non-matching time
series such that the loss aims to separate the positive pair from the negative by a
distance margin. Given a set Sj of all valid triplets of time series for a training task
Tj of the form (xal , xpl, xnl)∈ Sj consisting of an anchor time series xal, a positive
time series xpl , and a negative time series xnl ; where the positive time series is
another instance from same class as the anchor, while the negative is from a different class than the anchor. Representations are obtained such that the distance between the representations of an anchor and any positive time series is lower than the distance between the representations of the anchor and any negative time series. The system 100 can be configured to consider the triplet loss based on Euclidean norm, expressed as:

[031] Where > 0 is distance margin between positive and negative
pairs, and loss to be minimized is given by:

[032] Where [ Z ]+ = max (z, 0), such that only the triplets violating the constraint in equation (1) contribute to the loss. As triplet loss approach is used for training the neural network, number of instances per class K > 1.
[033] To elaborate the training of the neural network, consider that the system 100 uses a first order gradient descent based meta learning algorithm (FS-1), and a simpler variant of FS-1 (referred to as FS-2).
[034] FS-1 learns an initialization for the parameters ø of the ResNet such that these parameters can be quickly optimized using gradient-based learning at test time to solve a new task i.e., the model generalizes from a small number of examples from the test task. In order to learn the parameters ø, the system 100 trains the neural network on a variety of tasks with varying number of classes and time series lengths. The residual network yields a fixed-dimensional

representation for varying length time series, and nature of the loss function is such that it does not require any changes due to varying number of classes across tasks. For the aforementioned reasons, the same neural network parameters ø across the tasks.
[035] For the training of the neural network, the system 100 finds an initial set of parameters ø such that for a randomly sampled task with corresponding loss (as in equation 3) learner has low loss after k updates, such that:

Where UkTj is an operator that updates ø using k mini batches fromDtrj
[036] FS-1 sequentially samples few-shot tasks from the set of tasks Str. As depicted in FIG. 4, meta learning procedure adopted by the system 100 includes M meta iterations. Each meta-iteration involves B K-shot tasks, and each task is solved using k steps of gradient based optimization which involves randomly sampling mini batches from K * N instances in the task.
[037] Considering that each task has a varying number of instances
owing to varying N, number of iterations for each task is set to k = [ k*N ]* e,
b
where b is the mini-batch size and is the number of epochs. Instead of fixing the number of iterations k for each sampled task, the number of epochs e across datasets is fixed such that the network is trained to adapt quickly in a fixed number of epochs. Also, the number of triplets in each batch is significantly more than the number of unique time series in a mini-batch.
[038] The system 100 may use any suitable approach to initialize filter
weights of the residual network. For example, orthogonal initialization approach
may be used by the system 100. In this approach, in ith meta-iteration, the residual
network for each of the B-tasks is initialized with Ø i-1. Each task Tj with labeled
data Dtr j is solved by updating parameters Ø i-1 of the network k times to obtain
Ø ij = UKTj ( Ø i-1).

[039] In effect the system 100 uses a batch version of optimization problem in equation 3 and a meta-batch of B tasks to update ø as:
Øi = Øi-1+ ∈ 1/B ∑ Bj =1 (Øij - Øi-1)--- (4)
[040] Here, Øj with k > 1 implies that ø is updated using updated values Øj obtained after solving B tasks for k iterations each. Optimal parameters of the residual network after the meta-training are denoted as ø and are used as initialization parameters for initializing target task specific residual network. For each new task with labeled instances in and any test time series x taken from Dte, first ø is updated to Ø using Dtr . To obtain class corresponding estimate embeddings for all the N * K samples in Dtr is compared to the embedding for using an appropriate classifier.
[041] In FS-2 instead of updating the parameters ø by collectively using updated values from B tasks, ø is continuously updated at each mini-batch irrespective of the task. As a result, the network is trained for a few iterations on a task, and then the task is changed.
[042] The final neural network (NN) that is used for initialization of a
task is fine-tuned using a small labeled training set of new test time series
classification task and then the time series classification is performed on the test
set using a classifier.
Experimental setup: a. Sampling:
[043] The experiment was conducted by restricting distribution of tasks
to univariate time series classification (UTSC) with a constraint on maximum length of time series such that T ≤ 512. Tasks from publicly available archives of UTSC datasets were sampled, where each dataset corresponds to a N-way multi-class classification task with number of classes N and length of time series T varies across datasets, and all the time series in any data set are of the same length. Each time series is z-normalized using mean and standard deviation of all the points in the time series.
[044] 18 datasets were selected and used to sample tasks for training meta-set Str, and 6 datasets to sample tasks for the validation meta-set SVa. Any

task in Str or Sva has K randomly sampled time series for each of the N classes in the dataset. Remaining 41 datasets were used to create tasks for testing meta-set. Each dataset is an N way classification problem with an original train and test split. 100 K-shot tasks were sampled from each of the 41 datasets. Each of the 100 tasks sampled contains K samples from each of the N classes for Dtr and K’ samples from each of the N classes for Dte for each task are sampled from respective original train and test split of the dataset. The K (or K’) samples of each of the class in Dtr (or Dte ) are sampled uniformly from entire set of samples of the respective class. Dtr is used to fine-tune ø*, and Dte is used to evaluate the
Ø . updated task specific model
Hyper-parameters for FS-1 and FS-2:
[045] On the basis of initial experiments on a subset of the training meta-
set, the residual architecture was used with L = 4 layers and m = 165 convolution
filters per layer. An Adam optimizer with learning rate of 0.0001 was used for
updating Ø on each task while using ∈=1 in meta-update setup in equation (4).
FS-1 and FS-2 were trained for a total of M = 2000 meta-iterations with meta-batch size of B = 5, and mini batch size b = 10. FS-1 and FS-2 were trained using K = 5 and 10 for tasks in training meta-set while K = 5 is used for validation and test meta-sets. Across all experiments, K’ = 5 was maintained. The experiments proved that K = 10 for tasks in training meta-set gave better results in terms of average triplet loss on validation meta-set. Epochs e = 4 were used for solving
each task while training the FS-1 and FS-2 models. The number of epochs e' to be
used while fine-tuning for tasks in testing meta-set was chosen in the range of1-
100 based on average triplet loss on tasks in validation meta-set. Experiments
proved that e' = 16 and 8 were effective for FS-1 and FS-2 models respectively.
As a result, Ø is fine-tuned for e' epochs for each task in testing meta-set. For
triplet loss, α = 0.5 was chosen.
For comparison, the following baseline classifiers were considered:
(1) Euclidean Distance (ED): 1NN based ED, where time series of length T is represented by a fixed-dimensional vector of the same length.

(2) Dynamic Time Warping (DTW): 1NN based on DTW approach was considered. Leave-one-out cross-validation on Dtr of each task was performed to find best warping window in the range of ⍵= 0.02T, 0.04T,… T, where is window length and T is time-series length.
(3) Bag of SFA Symbols (BOSS): BOSS is a time series feature extraction technique that provides time series representations while being tolerant to noise. BOSS provides a symbolic representation based on Symbolic Fourier Approximation (SFA) on each fixed length sliding window extracted from a time series while providing low pass filtering and quantization for noise reduction. The resulting sequence of symbols (words) for each sliding window is converted to a histogram of words under the bag-of-words assumption which is considered to be the final representation of the time series. Hyper-parameters wordLength and normalization are chosen based on leave-one-out cross validation over the ranges {8, 10, 12, 14, 16} and { True, False } respectively, while default values of remaining hyper-parameters is used. 1NN is applied on the extracted features for final classification decision.
(4) Residual Network (ResNet): Instead of using obtained via FS-1 or FS-2 as a starting point for fine-tuning, a ResNet based baseline was considered where the model is trained from scratch for each task. The architecture is same as those used for FS-1 and FS-2 (also similar to state-of-the-art ResNet versions. Given that each task has a very small number of training samples and the parameters are to be trained from scratch, ResNet architectures are likely to be prone to overfitting despite batch normalization. To mitigate this issue, apart from the same network architecture as FS-1 and FS-2, smaller networks with smaller number of trainable parameters were considered. More specifically,
four combinations resulting from number of layers = {L/2, L} and number
of filters per layer = {[⎿m/2｣,m ]} were considered where L = 4 and m = 165. Further, a model with best overall results among the four

combinations was used as baseline viz. number of layers = 2 and
number of filters = 165. Each ResNet model was trained for 16 epochs
as for FS-1.
[046] Each task was evaluated using classification accuracy rate on the
test set i.e. inference is correct if estimated label is same as corresponding ground
truth label. Each task consists of K' * N test samples, wherein performance results
of each task equals the fraction of correctly classified test samples. While
comparing the data processing being done by the system 100 with various
baselines considered, for each dataset, classification error results were averaged
over 100 randomly sampled tasks. Table. 1 below depicts comparison of the
classification performed by system 100 with a few state of art techniques in terms
of ranks over classification accuracy rates on all 4100 tasks from 41 datasets with
varying K.

K ED DTW BOSS ResNet FS-2 FS-1
2 4.232 2.976 3.902 3.805 3.207 2.878
5 4.537 3.463 3.890 3.305 3.244 2.561
10 4.573 3.476 3.646 3.683 3.427 2.195
20 4.439 3.354 2.927 3.902 3.793 2.585
Table. 1 Table. 2 below shows comparison of ranks across datasets with varying number of classes N in a task. N is number of classes in 5-shot task and n is number of datasets.

N n ED DTW BOSS ResNet FS-2 FS-1
2-5 24 4.167 4.083 3.375 3.458 3.042 2.875
6-10 9 4.778 2.333 5.333 2.389 3.778 2.389
>10 8 5.375 2.875 3.812 3.902 3.875 1.812
Overall 41 4.537 3.463 3.890 3.305 3.244 2.561
Table. 2
Results:

[047] It was observed that FS-1 improves upon all the baselines considered for 5-shot tasks. The pairwise comparison of FS-1 with other baselines show significant gains in accuracies across many datasets. FS-1 has Win/Tie/Loss (W/T/L) counts of 26/2/13 when compared to the best non-few-shot-learning model, i.e. ResNet. On 27/41 datasets, FS-1 is amongst the top-2 models. FS-2 with a simpler update rule than FS-1 is the second best model but is very closely followed by the ResNet models trained from scratch.
[048] To study the effect of number of training samples per class
available in end task, we consider K = {2, 5, 10, 20} for Dtr and experiment was
conducted under same protocol of 4100 tasks. Results can be observed in Table.1 and indicate that:
• FS-1 is the best performing model, especially for 5 and 10-shot scenarios with large gaps in ranks.
• When considering very small number of training samples per class, i.e. for K= 2, it was observed that FS-1 is still the best model although it is very closely followed by DTW. This is expected as given just two samples per class, it is very difficult to effectively learn any data distribution patterns, especially when the domain of the task is unseen while training. The fact that FS-1 and FS-2 still perform significantly better than ResNet models trained from scratch show the generic nature of filters learned in φ*. As expected, data-intensive machine learning and deep learning models like BOSS and ResNet that are trained from scratch only on the target task data tend to overfit.
• For tasks with larger number of training samples per class, i.e K = 20, FS-1 gave best results. As expected, machine learning based state-of-the-art model BOSS performs better than other baselines when sufficient training samples are available and is closer to FS-1.

[049] To study the generalizability of FS-1 to varying as a result of leveraging triplet loss, the datasets were grouped based on . As shown in Table 2, it was observed that FS-1 is consistently amongst the top-2 models across values of . While FS-1 is significantly better than other algorithms for 2 ≤ ≤ 5 and > 10, it is as good as the best algorithm DTW for 6 ≤ ≤ 9.
[050] To study the importance of fine-tuning different convolutional layers of FS-1 using training data of the target few-shot task, four variants FS-1-l with l = 1, 2, 3, 4, were considered where parameters of lowermost l convolutional layers of the pre-trained model were frozen, while fine-tuning top L-llayers only. It was observed that FS-1-1, i.e. where the filter weights of only the first convolutional layer are frozen while those of all higher layers are fine-tuned, performs better than the default FS-1 model where all layers are fine-tuned. On the other hand, freezing higher layers as well (FS-1-2 and FS-1-3) or freezing all the layers (FS-1-4, i.e. no fine-tuning on target task) leads to significant drop in classification performance. These results indicate that the first layer has learned generic features while being trained on diverse set of -shot tasks while the higher layers of the FS-1 model are important to quickly adapt to the target -shot task.
[051] Apart from the above scenario where the UCR datasets used to sample tasks in training, validation and testing meta-sets are different, a scenario where there are a large number of classes within a TSC dataset was considered, and the goal was to quickly adapt to a new set of classes given a model that has been pre-trained on another disjoint set of classes from the same dataset.
[052] Three datasets with large number of classes from the UCR Archive, namely, 50Words, Adiac and ShapesAll, containing 50, 37, and 60 classes, respectively were considered. Half of the classes (randomly chosen) to form the training meta-set were considered, 1/4th of the classes for validation meta-set and remaining 1/4th of the classes for testing meta-set. The FS-1 and FS-2 models were trained on 5-shot 5-way TSC tasks from training meta-set for M = 50 and B = 5. A best meta-iteration was chosen based on average triplet loss on the validation meta-set (also containing 5-shot 5-way classification tasks). Note

that ED, DTW and BOSS are trained on the respective task from the testing meta-set only. Also, whenever number of samples for a class is less than 5, all samples for that class in all tasks were taken). It was observed that FS-1 outperforms all approaches on the three datasets, except DTW for 50Words dataset, and is able to quickly generalize to new classes.
[053] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[054] The embodiments of present disclosure herein addresses unresolved problem of training a neural network for time series classification. The embodiment, thus provides a mechanism for training the neural network in a K-shot training scenario.
[055] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may

also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[056] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[057] The illustrated steps are set out to explain the exemplary
embodiments shown, and it should be anticipated that ongoing technological
development will change the manner in which particular functions are performed.
These examples are presented herein for purposes of illustration, and not
limitation. Further, the boundaries of the functional building blocks have been
arbitrarily defined herein for the convenience of the description. Alternative
boundaries can be defined so long as the specified functions and relationships
thereof are appropriately performed. Alternatives (including equivalents,
extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[058] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A

computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[059] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor implemented method for time series data classification,
comprising:
collecting at least one time series classification task as input, via one or more hardware processors, wherein the time series classification task comprises a training set of time series data and wherein number of classes across the time series classification tasks varies;
training a neural network using at least one initial parameter to solve the at least one time series classification task, via the one or more hardware processors, wherein training the neural network comprises:
iteratively performing till an average triplet loss on a plurality of validation tasks by the neural network is less than a threshold:
sampling a pre-defined number of time series classification tasks;
consolidating a set of updated parameters from the sampled time series classification tasks;
obtaining a final set of updated parameters from the consolidated set of updated parameters; and
using the final set of updated parameters along with data pertaining to the at least one time series classification task to train the neural network; and performing the time series data classification using the neural network, via the one or more hardware processors.
2. The processor implemented method as claimed in claim 1, wherein the
final set of updated parameters comprises updated parameters
corresponding to each time series classification task from the pre-defined
number of time series classification tasks.

3. The processor implemented method as claimed in claim 1, wherein training the neural network by considering the average triplet loss allows the neural network to be used across various time series classification tasks without introducing any additional parameter.
4. A system for time series data classification, comprising:
one or more hardware processors (102);
one or more communication interfaces (103); and
one or more memory modules (101) storing a plurality of
instructions, wherein the plurality of instructions when executed
cause the one or more hardware processors (102) to:
collect at least one time series classification task as input, wherein the time series classification task comprises a training set of time series data and wherein number of classes across the time series classification tasks varies; train a neural network using at least one initial parameter to solve the at least one time series classification task, wherein training the neural network comprises:
iteratively perform till an average triplet loss on a plurality of validation tasks is less than a threshold:
sampling a pre-defined number of time series
classification tasks;
consolidating a set of updated parameters from the
sampled time series classification tasks;
obtaining a final set of updated parameters from the
consolidated set of updated parameters; and
using the final set of updated parameters along with
data pertaining to the at least one time series
classification task to train the neural network; and

perform the time series data classification using the neural network.
5. The system as claimed in claim 4, wherein the final set of updated
parameters comprises updated parameters corresponding to each time
series classification task from the pre-defined number of time series
classification tasks.
6. The system as claimed in claim 4, wherein by training the neural network
considering the average triplet loss allows the neural network to be used
across various time series classification tasks without introducing any
additional parameter.

Documents

Application Documents

#	Name	Date
1	201921034646-STATEMENT OF UNDERTAKING (FORM 3) [28-08-2019(online)].pdf	2019-08-28
2	201921034646-REQUEST FOR EXAMINATION (FORM-18) [28-08-2019(online)].pdf	2019-08-28
3	201921034646-FORM 18 [28-08-2019(online)].pdf	2019-08-28
4	201921034646-FORM 1 [28-08-2019(online)].pdf	2019-08-28
5	201921034646-FIGURE OF ABSTRACT [28-08-2019(online)].jpg	2019-08-28
6	201921034646-DRAWINGS [28-08-2019(online)].pdf	2019-08-28
7	201921034646-DECLARATION OF INVENTORSHIP (FORM 5) [28-08-2019(online)].pdf	2019-08-28
8	201921034646-COMPLETE SPECIFICATION [28-08-2019(online)].pdf	2019-08-28
9	Abstract1.jpg	2019-09-17
10	201921034646-Proof of Right (MANDATORY) [12-11-2019(online)].pdf	2019-11-12
11	201921034646-ORIGINAL UR 6(1A) FORM 1-141119.pdf	2019-11-16
12	201921034646-FORM-26 [19-03-2020(online)].pdf	2020-03-19
13	201921034646-Form 1 (Submitted on date of filing) [09-09-2020(online)].pdf	2020-09-09
14	201921034646-Covering Letter [09-09-2020(online)].pdf	2020-09-09
15	201921034646-FORM 3 [15-01-2021(online)].pdf	2021-01-15
16	201921034646-CORRESPONDENCE(IPO)-(CERTIFIED COPY OF WIPO DAS )-(21-9-2020).pdf	2021-10-19
17	201921034646-FER.pdf	2022-06-09
18	201921034646-OTHERS [01-08-2022(online)].pdf	2022-08-01
19	201921034646-FORM 3 [01-08-2022(online)].pdf	2022-08-01
20	201921034646-FER_SER_REPLY [01-08-2022(online)].pdf	2022-08-01
21	201921034646-CORRESPONDENCE [01-08-2022(online)].pdf	2022-08-01
22	201921034646-CLAIMS [01-08-2022(online)].pdf	2022-08-01
23	201921034646-PatentCertificate05-03-2024.pdf	2024-03-05
24	201921034646-IntimationOfGrant05-03-2024.pdf	2024-03-05

Search Strategy

1	search(14)E_09-06-2022.pdf