Sign In to Follow Application
View All Documents & Correspondence

Method And System For Characterization Of Time Series Data In Sarima Parameter Space

Abstract: ABSTRACT METHOD AND SYSTEM FOR CHARACTERIZATION OF TIME SERIES DATA IN SARIMA PARAMETER SPACE This disclosure relates to a processor implemented method of characterizing of the time series data in the SARIMA parameter space is provided. One or more initial parameters from a time series data are received as an input. One or more randomized search is performed in a neighborhood of the one or more initial parameters to obtain one or more optimal seasonal autoregressive integrated moving hyperparameters. The one or more optimal SARIMA hyperparameters is obtained by one or more Akaike information criterion (AIC) values are computed for one or more points in the neighborhood and one or more points are selected with minimum AIC value from the one or more AIC values as the one or more optimal SARIMA hyperparameters. The one or more optimal SARIMA hyperparameters is characterized for deriving a seasonal autoregressive integrated moving model to obtain one or more predicted data associated with the of the time-series data. [To be published with FIG. 2]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
25 April 2022
Publication Number
43/2023
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. LING, Yibei
Tata Consultancy Services Limited, 379 Thornall Street, Edison 08837, NJ, USA
2. KHATUA, Chitta
Tata Consultancy Services Limited, Kalinga Park, SEZ Cargo, Plot No 35, Chandaka Industrial Estate. Near Infocity, Patia, Chandrasekharpur, Bhubaneswar 751024, Odisha, India

Specification

Description: FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
METHOD AND SYSTEM FOR CHARACTERIZATION OF TIME SERIES DATA IN SARIMA PARAMETER SPACE

Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

Preamble to the description:
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The disclosure herein generally relates to a hyper-parameter optimization, and, more particularly, to a method and system for characterization of time series data in SARIMA parameter space.

BACKGROUND
Autoregressive Integrated Moving Average (ARIMA) and Seasonal Autoregressive Integrated Moving Average (SARIMA) models are the most widely used forecasting methods for univariate time series data. These models can handle both trend and seasonal components, which requires explicit specification of these components. The seasonal components consist of (P, D, Q, and S), which are required to be configured in advance, specification of these seasonal components in general does not work well as these components are hidden and vary with time.
Existing technologies need to configure a hyper-parameter before running the SARIMA model. While there are some heuristics for specifying the hyper-parameter, and these heuristics are too simple, hence often lead to suboptimal results. Typically, a seasonal period is fed into the SARIMA model as a constant based on a dataset. This approach may work well for seasonal datasets but does not work for stock related time series data in which seasonal period is hidden and varies over time. Another typical approach, in which a historical parameter search data is utilized to determine candidate parameters for classification algorithms. This approach has proved to be ineffective since the optimal hyper-parameters are very sensitive to the dataset, in practice a new dataset essentially renders this historical hyperparameters useless.
Similarly, configuring a fixed parameter space in advance, search an appropriate parameter point in the fixed parameter space. A large, and the fixed parameter space requires a large amount of time and computing resource, while a small, fixed parameter space may result in a poor accuracy. Using the fixed parameter space in practice does not work well. It is easy to see that search for best hyper-parameter via manual configuration does not work for a large search space.

SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one aspect, a processor implemented method of characterizing of a time series data in a SARIMA parameter space is provided. The processor implemented method includes at least one of: receiving, via one or more hardware processors, one or more initial parameters from a time series data as an input; performing, via the one or more hardware processors, one or more randomized search in a neighborhood of the one or more initial parameters to obtain one or more optimal seasonal autoregressive integrated moving (SARIMA) hyperparameters, the step of obtaining the one or more optimal SARIMA hyperparameters further comprising: (a) computing, via the one or more hardware processors, one or more Akaike information criterion (AIC) values for one or more points in the neighborhood, and (b) selecting, via the one or more hardware processors, one or more points with a minimum AIC value from the one or more AIC values as the one or more optimal SARIMA hyperparameters; and characterizing, via the one or more hardware processors, the one or more optimal SARIMA hyperparameters for deriving a seasonal autoregressive integrated moving (SARIMA) model to obtain one or more predicted data associated with the of the time-series data. The predicted data corresponds to one or more future data points of the time-series data.
The one or more randomized search corresponds to one or more of: (a) a global descent grid search, (b) a component-wise descent grid search, and (c) a localized grid descent search. The SARIMA model corresponds to a seven-dimensional hyperparameter space includes one or more trend elements (p,d,q) and one or more seasonal elements (P,D,Q,S) of the time-series data. The localized grid descent search considers one or more previous optimal points as an input to generate a grid surrounding the previous optimal point and performing an exhaustive search over the generated grid.
In another aspect, there is provided a system for characterization of a time series data in a SARIMA parameter space. The system includes a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: receive, one or more initial parameters from a time series data as an input; perform, one or more randomized search in a neighborhood of the one or more initial parameters to obtain one or more optimal seasonal autoregressive integrated moving (SARIMA) hyperparameters, and the one or more hardware processors are configured by the instructions to obtain the one or more optimal SARIMA hyperparameters further include: (a) compute, one or more Akaike information criterion (AIC) values for one or more points in the neighborhood, and (b) select, one or more points with a minimum AIC value from the one or more AIC values as the one or more optimal SARIMA hyperparameters; and characterize, the one or more optimal SARIMA hyperparameters for deriving a seasonal autoregressive integrated moving (SARIMA) model to obtain one or more predicted data associated with the of the time-series data. The predicted data corresponds to one or more future data points of the time-series data.
The one or more randomized search corresponds to one or more of: (a) a global descent grid search, (b) a component-wise descent grid search, and (c) a localized grid descent search. The SARIMA model corresponds to a seven-dimensional hyperparameter space includes one or more trend elements (p,d,q) and one or more seasonal elements (P,D,Q,S) of the time-series data. The localized grid descent search considers one or more previous optimal points as an input to generate a grid surrounding the previous optimal point and performing an exhaustive search over the generated grid.
In yet another aspect, there are provided one or more non-transitory machine readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors causes at least one of: receiving, one or more initial parameters from a time series data as an input; performing, one or more randomized search in a neighborhood of the one or more initial parameters to obtain one or more optimal seasonal autoregressive integrated moving (SARIMA) hyperparameters, and the step of obtaining the one or more optimal SARIMA hyperparameters further comprising: (a) computing, one or more Akaike information criterion (AIC) values for one or more points in the neighborhood, and (b) selecting, one or more points with a minimum AIC value from the one or more AIC values as the one or more optimal SARIMA hyperparameters; and characterizing, the one or more optimal SARIMA hyperparameters for deriving a seasonal autoregressive integrated moving (SARIMA) model to obtain one or more predicted data associated with the of the time-series data. The predicted data corresponds to one or more future data points of the time-series data.
The one or more randomized search corresponds to one or more of: (a) a global descent grid search, (b) a component-wise descent grid search, and (c) a localized grid descent search. The SARIMA model corresponds to a seven-dimensional hyperparameter space includes one or more trend elements (p,d,q) and one or more seasonal elements (P,D,Q,S) of the time-series data. The localized grid descent search considers one or more previous optimal points as an input to generate a grid surrounding the previous optimal point and performing an exhaustive search over the generated grid.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG. 1 illustrates a system for characterization of time series data in SARIMA parameter space, according to an embodiment of the present disclosure.
FIG. 2 illustrates a functional block diagram of the system of FIG.1, according to some embodiments of the present disclosure.
FIG. 3 illustrates an exploded view of a SARIMA hyperparameter optimizer, according to some embodiments of the present disclosure.
FIG. 4A - FIG. 4C are exemplary graphical representations illustrating a trajectory of a global descent grid search, a component-wise descent grid search, and a localized grid descent search respectively, according to some embodiments of the present disclosure.
FIG. 5 is an exemplary flow diagram illustrating method of characterizing of the time series data in the SARIMA parameter space, according to an embodiment of the present disclosure.
FIG. 6 is an exemplary graphical representation illustrating an evolution of the optimal SARIMA’s hyperparameter associated with a disease spread range in a plurality of countries, according to an embodiment of the present disclosure.
FIG. 7 is an exemplary graphical representation illustrating a complexity curves of the data associated with the disease spread range in the plurality of countries, according to an embodiment of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
There is a need to improve robustness and prediction accuracy of the time series data. Embodiments of the present disclosure provide a method and system for characterization of time series data in SARIMA parameter space. The embodiment of the present disclosure provides a grid search approach for hyper-parameter optimization of a seasonal autoregressive integrated moving model (SARIMA) for the time-series data analysis and to produce accurate and robust prediction. The embodiment of the present disclosure is capable of characterizing time-series curves in the SARIMA parameter space. An optimal SARIMA’s hyper-parameter of a seasonal autoregressive integrated moving average (SARIMA) model via one or more grid search from one or more initial parameters. The one or more grid search methods are (a) a global descent grid search, (b) a localized descent grid search, and (c) a component-wise descent grid search. The one or more grid search involves a step of computing one or more Akaike information criterion (AIC) values for one or more points in a neighborhood. The SARIMA model chooses one or more points with a minimum AIC value to obtain the optimal SARIMA’s hyper-parameters. Randomized search is built on top of the component-wise descent grid search and the global descent grid search to strike an optimal balance between efficiency and accuracy.
Referring now to the drawings, and more particularly to FIGS. 1 through 7, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 illustrates a block diagram of a system 100 for characterization of time series data in SARIMA parameter space, according to an embodiment of the present disclosure. In an embodiment, the system 100 includes one or more processor(s) 102, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 104 operatively coupled to the one or more processors 102. The memory 104 includes a database. The one or more processor(s) processor 102, the memory 104, and the I/O interface(s) 106 may be coupled by a system bus such as a system bus 108 or a similar mechanism. The one or more processor(s) 102 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more processor(s) 102 is configured to fetch and execute computer-readable instructions stored in the memory 104. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud, and the like.
The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface device(s) 106 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a camera device, and a printer. Further, the I/O interface device(s) 106 may enable the system 100 to communicate with other devices, such as web servers and external databases. The I/O interface device(s) 106 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. In an embodiment, the I/O interface device(s) 106 can include one or more ports for connecting number of devices to one another or to another server.
The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 104 includes a module (s) 110 and a repository 112 for storing data processed, received, and generated by the module (s) 110. The module (s) 110 may include routines, programs, objects, components, data structures, and so on, which perform particular tasks or implement particular abstract data types.
Further, the database stores information pertaining to inputs fed to the system 100 and/or outputs generated by the system (e.g., data/output generated at each stage of the data processing) 100, specific to the methodology described herein. More specifically, the database stores information being processed at each step of the proposed methodology.
Additionally, the module (s) 110 may include programs or coded instructions that supplement applications and functions of the system 100. The repository 112, amongst other things, includes a system database 114 and other data 116. The other data 116 may include data generated as a result of the execution of one or more modules in the module (s) 110. Further, the database stores information pertaining to inputs fed to the system 100 and/or outputs generated by the system (e.g., at each stage), specific to the methodology described herein. Herein, the memory for example the memory 104 and the computer program code configured to, with the hardware processor for example the processor 102, causes the system 100 to perform various functions described herein under.
FIG. 2 illustrates a functional block diagram of the system 100 of FIG.1, according to some embodiments of the present disclosure. The system 200 may be an example of the system 100 (FIG. 1). In an example embodiment, the system 200 may be embodied in, or is in direct communication with the system, for example the system 100 (FIG. 1). The system 200 includes a seasonal autoregressive integrated moving average (SARIMA) hyperparameter optimizer 202, a seasonal autoregressive integrated moving average (SARIMA) model configuration unit 204, and a prediction unit 206. The seasonal autoregressive integrated moving average (SARIMA) hyperparameter optimizer 202 is configured to search for one or more optimal SARIMA hyperparameters with one or more minimal Akaike information criterion (AIC) values over the SARIMA hyperparameter space, using a portion of a time series data. For one or more initial parameters e.g.,(p_0,d_0,q_0,P_0,D_0,Q_0,S_0 ), one or more search operation is performed for a neighborhood of the initial parameter i.e., a grid search involves a step of computing one or more Akaike information criterion (AIC) values for one or more points in the neighborhood. The seasonal autoregressive integrated moving average (SARIMA) hyperparameter optimizer 202 chooses one or more points with a minimum AIC value to obtain one or more optimal SARIMA’s hyper-parameters e.g., (p^*,d^*,q^*,P^*,D^*,Q^*,S^* ). In an embodiment, the search process overhead is proportional to a size of the one or more neighborhoods i.e., the larger the neighborhood, the more costly the search becomes.
The seasonal autoregressive integrated moving average (SARIMA) model configuration unit 204 configures a SARIMA model based on the optimal hyper-parameters p^*,d^*,q^*,P^*,D^*,Q^*,S^* obtained via the grid search. In an embodiment, the grid search is an automated search method for the optimal hyper-parameter in the SARIMA parameter space that yields the minimum AIC value (i.e., a smallest prediction error). The SARIMA model predicts and update center p_0,d_0,q_0,P_0,D_0,Q_0,S_0 of the neighborhood at next iteration with the optimal hyper-parameters p^*,d^*,q^*,P^*,D^*,Q^*,S^*.
The SARIMA model is configured to predict one or more future data points of the time-series data. The SARIMA model includes a seven dimensional hyperparameter (p,d,q,P,D,Q,S) i.e., three trend (i.e., (p,d,q) and four seasonal elements (i.e., (P,D,Q,S)) of time-series, where p,P are number of trend and a seasonal autoregressive order, respectively, d,D are the trend and seasonal difference order respectively, and q,Q are the trend and seasonal number of moving average terms (i.e., lags of forecast errors), S is number of time lags for a single seasonal period. The seven-dimensional hyperparameter are non-negative integers, referred as the hyperparameters of the SARIMA model. The SARIMA model requires specifying trend and seasonal hyperparameters for the time series analysis of data. In an embodiment, the SARIMA model is a seven-dimensional discrete space in which each point in the SARIMA hyper parameter space corresponds to a particular configuration of the SARIMA model and this space is alternatively referred as a SARIMA model space. The optimal hyper-parameter of SARIMA model is as follows:
(p^*,d^*,q^*,P^*,D^*,Q^*,S^* )=argmin_((p,d,q,P,D,Q,S)?O) AIC(SARIMA(p,d,q,P,D,Q,S)) (1)

where O is the SARIMA hyperparameter space. The above-mentioned equation can be interpreted as seeking the optimal hyper-parameter (p^*,d^*,q^*,P^*,D^*,Q^*,S^* ) of SARIMA model over the SARIMA hyperparameter space (O) that minimizes the Akaike information criterion (AIC) value of the SARIMA model. Let |p|,|d|,|q|,|P|,|D|,|Q|,|S| be cardinalities of the SARIMA model. An overhead over for searching the optimal hyper-parameter of SARIMA model, the point in the search space that include the minimum AIC value, is proportional to:
|O|=|p|×|d|×|q|×|P|×|D|×|Q|×|S| (2)
A volume of the SARIMA hyperparameter space ? grows exponentially with the sizes of the hyperparameters.
FIG. 3 illustrates an exploded view of the SARIMA hyperparameter optimizer 202, according to some embodiments of the present disclosure. The SARIMA hyperparameter optimizer 202 further includes a randomized search unit 202A which further include a random selector 202B. The randomized search unit 202A highlights three grid search methods: a global descent grid search, a localized descent grid search, and a component-wise descent grid search. The randomized search is used to randomly select the global descent grid search, the localized component-wise descent grid search to balance accuracy and efficiency. The global descent grid search can attain a global optimal hyper-parameter at an expense of search cost, the component-wise descent search is more efficient than the global descent grid search. The localized descent grid search is most efficient which produces a suboptimal result.
The global descent grid search which searches the optimal hyper-parameter of SARIMA model over the entire hyper-parameter space ? by minimizing AIC (Akaike information criterion), and the function argmin returns the optimal hyperparameter ?(p?^*,d^*,q^*,P^*,D^*,Q^*,S^*) that minimizes AIC value. The overhead of the grid search is proportional to a cardinality product of all the involved feature dimensions, i.e., |p|×|d|×|q|×|P|×|D|×|Q|×|S|. The global descent grid search ensures the global optimum hyper-parameter point. Consider, below mentioned is an exemplary pseudo code for the global descent grid search:
Require: Trend parameter lists: p_L,d_L,q_L
Require: Seasonal parameter lists: P_L,D_L,Q_L,S_L
Require: Hyperparameter space: O= p_L×d_L×q_L×P_L×D_L×Q_L×S_L
Require: time series data: D_t
?(p?^*,d^*,q^*,P^*,D^*,Q^*,S^*)=?argmin?_O AIC(SARIMA(D_t ?; p?_L,d_L,q_L,P_L,D_L,Q_L,S_L)
Return ?(p?^*,d^*,q^*,P^*,D^*,Q^*,S^*)
The component-wise descent grid search in which the hyper-parameter optimization is also referred as a coordinate-wise descent algorithm. The component-wise descent grid search provides component-wise search for the optimal hyperparameter of SARIMA model, which successively minimizes AIC along coordinate directions to find point with the minimum AIC value. At each iteration, the algorithm chooses a coordinate (i.e., a parameter) in a cyclic fashion, then search the optimal hyperparameter over the corresponding coordinate while fixing all other coordinates (i.e., parameters). The overhead of coordinate-wise hyper-parameter optimization is proportional to a cardinality sum of all each involved multiple coordinates. For example, |p|+|d|+|q|+|P|+|D|+|Q|+|S|, which is much cheaper than the overhead of the global descent grid search. The coordinate-wise search algorithm performs more efficiently than the global descent grid search but may yield suboptimal solutions. Consider, below mentioned is an exemplary pseudo code for the component-wise descent grid search:
Require: Trend parameter lists: p_L,d_L,q_L
Require: Seasonal parameter lists: P_L,D_L,Q_L,S_L
Require: Hyperparameter space: O= p_L×d_L×q_L×P_L×D_L×Q_L×S_L
Require: time series data: D_t
Require: Coordinate List: L=(p_L,d_L,q_L,P_L,D_L,Q_L,S_L)
Require: Coordinate point: P=(p,d,q,P,D,Q,S)
Initialize:p=p_L [0],d=d_L [0],q=q_L [0],P=P_L [0],D=D_L [0],Q=Q_L [0],s=S_L [0]
while i<|L|:
x=?argmin?_(P[i]?L[i]) AIC(SARIMA(D_(t; ) p,d,q,P,D,Q,S))
P[i]?x
p=p_L [i],d=d_L [i],q=q_L [i],P=P_L [i],D=D_L [i],Q=Q_L [i],s=S_L [i]
i?i+1
end while
Return P
The localized component-wise descent grid search assumes that two consecutive optimal hyperparameter points should very close to each other. Considers one or more previous optimal points (p_p,d_p,q_p,P_p,D_p,Q_p,S_p ) as one or more inputs, generates a small grid neighborhood surrounding the one or more previous optimal points and then performs exhaustive search over the generated small grid neighborhood. The overhead of the localized component-wise descent grid search is proportional to volume of the generated grid |p_n |×|d_n |×|q_n |×|P_n |×|D_n |×|Q_n |×|S_n |, which is much smaller than that of the global descent grid search. The localized component-wise descent grid search performs well for low noisy time series data but may generates larger prediction error when the underlying data contains large number of spiky noises. Consider, below mentioned is an exemplary pseudo code for the localized component-wise descent grid search:
Require: previous optimal trend points: p_p,d_p,q_p
Require: previous seasonal parameter points: P_p,D_p,Q_p,S_p
Require: time series data: D_t
?(p?_n,d_n,q_n,P_n,D_n,Q_n,S_(n))=neighborhood(p_p;d_p,q_p,P_p,D_p,Q_p,S_p)
?p=p?_p,d=d_p,q= q_p,P=P_p,D=D_p,Q=Q_p,S=S_p
O^n=p_n×d_n×q_n×P_n×D_n×Q_n×S_n
?(p?^*,d^*,q^*,P^*,D^*,Q^*,S^*)=?argmin?_((p,d,q,P,D,Q,S)?O^n ) AIC(SARIMA(D_(t; ) p,d,q,P,D,Q,S)
Return ?(p?^*,d^*,q^*,P^*,D^*,Q^*,S^*)
The randomized search unit 202A serves as the random selector 202B to dynamically select the global descent grid search, the component-wise descent grid search, and the localized descent grid search to obtain an optimal balance between robustness, efficiency, and prediction accuracy. The randomized search unit 202A combines the global descent grid search, the component-wise descent grid search, and the localized grid descent search into a randomized framework in which each constituent search method is executed probabilistically in a nondeterministic fashion. For example, considering COVID time series data analysis, the randomized search algorithm is configurated as follows: the localized descent grid search with 75% chance, the coordinate-wise descent grid search with 20% chance, and the global descent grid search with 5% chance to invoke, which strikes an optimal balance between optimality and efficiency. Consider, below mentioned is an exemplary pseudo code for the randomized search:
Require: time series dataset: D_t
Require: previous seasonal parameter points: ?_1=0.05,?_(2=) 0.25
Require: time series data: D_t
n= random generator (0,1)
if n< ?_(1 ) then
Global Search {Algorithm 1}
end if
If ?_1=n < ?_(2 ) then
Coordinate-wise search (Algorithm 2}
end if
if ?n=??_2then
Localized search (algorithm 3)
end if
The optimal hyperparameters p^*,d^*,q^*,P^*,D^*,Q^*,S^* in the SARIMA parameter space to characterize and compare complexity of the time series data. For example, information needed in combating a COVID disease such as whether the time series data contains a seasonal component and what is time interval between a first wave and a second wave of the COVID disease.
FIG. 4A is an exemplary graphical representation illustrating a trajectory of the global descent grid search, according to some embodiments of the present disclosure. The trajectory of the global grid descent search is a line from a starting point to the optimal hyper-parameter point. FIG. 4B is an exemplary graphical representation illustrating a trajectory of the component-wise descent grid search, according to some embodiments of the present disclosure. The trajectory of the component-wise descent grid search follows an alternating-direction. FIG. 4C is an exemplary graphical representation illustrating a trajectory of the localized grid descent search respectively, according to some embodiments of the present disclosure. The trajectory of the localized descent grid search is performed in a limited parameter space, which yields a local optimized result.
FIG. 5 is an exemplary flow diagram illustrating method 500 of characterizing of the time series data in the SARIMA parameter space, according to an embodiment of the present disclosure. In an embodiment, the system 100 comprises one or more data storage devices or the memory 104 operatively coupled to the one or more hardware processors 102 and is configured to store instructions for execution of steps of the method by the one or more processors 102. The flow diagram depicted is better understood by way of following explanation/description. The steps of the method of the present disclosure will now be explained with reference to the components of the system as depicted in FIGS. 1 and 2.
At step 502, one or more initial parameters from a time series data are received as an input. At step 504, one or more randomized search is performed in a neighborhood of the one or more initial parameters to obtain one or more optimal seasonal autoregressive integrated moving (SARIMA) hyperparameters. The one or more randomized search corresponds to one or more of: (a) a global descent grid search, (b) a component-wise descent grid search, and (c) a localized grid descent search. The step of obtaining the one or more optimal SARIMA hyperparameters further include: (a) at step 504A, one or more Akaike information criterion (AIC) values is computed for one or more points in the neighborhood, and (b) at step 504B, one or more point is selected with a minimum AIC value from the one or more AIC values as the one or more optimal SARIMA hyperparameters. At step 506, characterizing, via the one or more hardware processors, the one or more optimal SARIMA hyperparameters is characterized for deriving a seasonal autoregressive integrated moving (SARIMA) model to obtain one or more predicted data associated with the of the time-series data. The predicted data corresponds to one or more future data points of the time-series data. The SARIMA model corresponds to a seven-dimensional hyperparameter space which include one or more trend elements (p,d,q) and one or more seasonal elements (P,D,Q,S) of the time-series data. The localized grid descent search considers one or more previous optimal points as an input to generate a grid surrounding the previous optimal point and performing an exhaustive search over the generated grid.
Experimental results:
For example, a study is conducted to highlight the characterization of predicted data associated with the disease spread range in a plurality of countries i.e., a COUNTRY ‘A’, a COUNTRY ‘B’, and a COUNTRY ‘C’. FIG. 6 is an exemplary graphical representation illustrating the evolution of the optimal SARIMA’s hyperparameter associated with the disease (e.g., a COVID disease) spread range, according to an embodiment of the present disclosure. FIG. 7 is an exemplary graphical representation illustrating a complexity curve of the data associated with the disease spread range in the plurality of countries, according to an embodiment of the present disclosure. A complexity metric of the time series data is defined as a sum of all optimal trend parameters and optimal seasonal parameters as follows:
C_i=p_i^*+d_i^*+q_i^*+P_i^*+D_i^*+Q_i^*+S_i^*, 0=i=n
where i denotes the index of time.
Using the complexity metric, the average complexities of COUNTRY ‘A’, COUNTRY ‘B’, and COUNTRY ‘C’ with daily new cases are computed as 7.38, 7.05, and 9.06, respectively. The result is in line with a visual inspection that the COUNTRY ‘A’ and the COUNTRY ‘B’ daily new cases are more complicated than that of COUNTRY ‘C’ daily new cases. While indistinguishable in the visual inspection, subtle difference between the COUNTRY ‘A’ and the COUNTRY ‘B’ daily new case becomes apparent in the context of complexity analysis in the SARIMA hyperparameter space: the COUNTRY ‘A’ daily new case is more complicated than COUNTRY ‘B’ daily new case. Overall, the complexity ranking of the COUNTRY ‘A’, the Country ‘B’, and the Country ‘C’ is as follows: C_C=9.06>C_A=7.38>C_B=7.05 as ranking reflects to some degree that the COVID pandemic in the COUNTRY ‘C’ shows no signs of easing, the COUNTRY ‘B’ daily trend starts dropping after recent surges, the COUNTRY ‘A’, and the COUNTRY ‘B’ daily trend have shown declines in new cases.
The embodiment of present disclosure herein addresses unresolved problem of improving a robustness and a prediction accuracy of the time series data. The embodiment thus provides a system and method of dynamic configuration and optimization of the hyper-parameters which have a robustness and a prediction accuracy of time series data e.g., prediction in sale forecasting, and a stock price. The optimal hyper-parameter is sought for every step of the prediction and as a result, the optimal hyper-parameter is determined by an actual data, rather than predefined by a developer. The embodiment of present disclosure does not requires specifying a parameter space in advance and provides an effective means of striking an optimal balance between a computing resource required parameter search and the prediction accuracy. The main advantage of the proposed method is to automate the search for optimal hyper-parameter of the SARIMA model without requiring specification of the hyper-parameter and to strike an optimal balance between efficiency of the search and performance. The randomized search over SARIMA model to strike a good balance between robustness, efficiency, and accuracy. In addition, the hyper-parameters of the SARIMA are dynamically changed to minimize AIC metric, rather than using fixed hyper-parameters. The optimal hyper-parameter values (e.g., 7-dimensional parameters) to measure the complexity of time series data.
The embodiment of present disclosure provides an automated search method, which is fundamentally different from the manual one. The claimed method considers two factors: efficiency and the performance, rather than one single performance factor. The embodiment of present disclosure provides a search model that does not assume about a constant seasonal period and are capable to identify the optimal hyper-parameter of the SARIMA model (p*, d*, q*, P*, D*, Q*, S*) over (p, d, q, P, D, Q, S) space. The global descent grid search over the seven-dimensional hyper-parameter space (p, d, q, P, D, Q, S) where p is trend autoregression order, d trend difference order, q trend moving average order, P seasonal autoregression order, D seasonal difference order, Q seasonal moving average order, and S is number of time steps for a single seasonal period. The component-wise descent grid search is performed by alternating one-dimensional search over the seven-dimensional space. An optimal combination of the global and the component-wise descent grid search is to strike an optimal balance between the efficiency, performance, and the accuracy. The SARIMA model needs to be configured with parameters estimated from a historical data. The claimed method dynamically updates the hyperparameter of the SARIMA model at every step during a data processing, rather than using the fixed parameters for the entire dataset. The claimed method derives the optimal parameters based on the current dataset.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims. , Claims: We Claim:
1. A processor implemented method (500), comprising:
receiving, via one or more hardware processors, a plurality of initial parameters from a time series data as an input (502);
performing, via the one or more hardware processors, a plurality of randomized search in a neighborhood of the plurality of initial parameters to obtain a plurality of optimal seasonal autoregressive integrated moving (SARIMA) hyperparameters (504), wherein the step of obtaining the plurality of optimal SARIMA hyperparameters further comprising:
(a) computing, via the one or more hardware processors, a plurality of Akaike information criterion (AIC) values for one or more points in the neighborhood (504A), and
(b) selecting, via the one or more hardware processors, at least one point with a minimum AIC value from the plurality of AIC values as the plurality of optimal SARIMA hyperparameters (504B); and
characterizing, via the one or more hardware processors, the plurality of optimal SARIMA hyperparameters for deriving a seasonal autoregressive integrated moving (SARIMA) model to obtain at least one predicted data associated with the of the time-series data, wherein the predicted data corresponds to at least one future data point of the time-series data (506).

2. The processor implemented method (500) as claimed in claim 1, wherein the plurality of randomized search corresponds to one or more of: (a) a global descent grid search, (b) a component-wise descent grid search, and (c) a localized grid descent search.

3. The processor implemented method (500) as claimed in claim 1, wherein the SARIMA model corresponds to a seven-dimensional hyperparameter space comprises a plurality of trend elements (p,d,q) and a plurality of seasonal elements (P,D,Q,S) of the time-series data.

4. The processor implemented method (500) as claimed in claim 1, wherein the localized grid descent search considers a plurality of previous optimal points as an input to generate a grid surrounding the previous optimal point and performing an exhaustive search over the generated grid.

5. A system (100), comprising:
a memory (104) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (102) coupled to the memory (104) via the one or more communication interfaces (106), wherein the one or more hardware processors (102) are configured by the instructions to:
receive, a plurality of initial parameters from a time series data as an input;
perform, a plurality of randomized search in a neighborhood of the plurality of initial parameters to obtain a plurality of optimal seasonal autoregressive integrated moving (SARIMA) hyperparameters, wherein the one or more hardware processors (102) are further configured by the instructions to obtain the plurality of optimal SARIMA hyperparameters, comprises:
(a) compute, a plurality of Akaike information criterion (AIC) values for one or more points in the neighborhood, and
(b) select, at least one point with a minimum AIC value from the plurality of AIC values as the plurality of optimal SARIMA hyperparameters; and
characterize, the plurality of optimal SARIMA hyperparameters for deriving a seasonal autoregressive integrated moving (SARIMA) model to obtain at least one predicted data associated with the of the time-series data, wherein the predicted data corresponds to at least one future data point of the time-series data.

6. The system (100) as claimed in claim 5, wherein the plurality of randomized search corresponds to one or more of: (a) a global descent grid search, (b) a component-wise descent grid search, and (c) a localized grid descent search.

7. The system (100) as claimed in claim 5, wherein the SARIMA model corresponds to a seven-dimensional hyperparameter space comprises a plurality of trend elements (p,d,q) and a plurality of seasonal elements (P,D,Q,S) of the time-series data.

8. The system (100) as claimed in claim 5, wherein the localized grid descent search considers a plurality of previous optimal points as an input to generate a grid surrounding the previous optimal point and performing an exhaustive search over the generated grid.

Dated this 25th day of April 2022
Tata Consultancy Services Limited
By their Agent & Attorney

(Adheesh Nargolkar)
of Khaitan & Co
Reg No IN-PA-1086

Documents

Application Documents

# Name Date
1 202221024307-STATEMENT OF UNDERTAKING (FORM 3) [25-04-2022(online)].pdf 2022-04-25
2 202221024307-REQUEST FOR EXAMINATION (FORM-18) [25-04-2022(online)].pdf 2022-04-25
3 202221024307-FORM 18 [25-04-2022(online)].pdf 2022-04-25
4 202221024307-FORM 1 [25-04-2022(online)].pdf 2022-04-25
5 202221024307-FIGURE OF ABSTRACT [25-04-2022(online)].jpg 2022-04-25
6 202221024307-DRAWINGS [25-04-2022(online)].pdf 2022-04-25
7 202221024307-DECLARATION OF INVENTORSHIP (FORM 5) [25-04-2022(online)].pdf 2022-04-25
8 202221024307-COMPLETE SPECIFICATION [25-04-2022(online)].pdf 2022-04-25
9 202221024307-Proof of Right [31-05-2022(online)].pdf 2022-05-31
10 202221024307-FORM-26 [23-06-2022(online)].pdf 2022-06-23
11 Abstract1.jpg 2022-08-04
12 202221024307-FER.pdf 2025-04-02
13 202221024307-FER_SER_REPLY [11-09-2025(online)].pdf 2025-09-11
14 202221024307-CLAIMS [11-09-2025(online)].pdf 2025-09-11

Search Strategy

1 SearchHistoryE_26-02-2024.pdf