Sign In to Follow Application
View All Documents & Correspondence

Method And System For Optimized Model Fitment To Hierarchical Time Series Clusters By Categorizing Time Series

Abstract: Time efficient model fitting while achieving maximum forecasting accuracy with optimal fit model is a challenge. The embodiments herein provide a method and system for optimized model fitment to hierarchical time-series cluster by categorizing time-series in a plurality of segments prior to generating the hierarchical time-series clusters. Further, for each of the segment a set of most appropriate forecasting models are identified and each segment is then structured as hierarchical time-series clusters. Thus, during the process of model fitting, the method identifies optimal fit model for a time-series from only the appropriate forecasting model set identified for the segment to which the time-series is categorized. Further, the method disclosed also details on model fitting approach across time-series within a cluster, wherein the method refers to criteria for prioritizing classes of forecasting models within the set of forecasting models for faster and more accurate model fitting.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
28 August 2019
Publication Number
10/2021
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
ip@legasis.in
Parent Application

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th Floor, Nariman Point, Mumbai - 400021, Maharashtra, India

Inventors

1. SINGH, Brajesh
Tata Consultancy Services Limited, Plot No.54, ODC II, Santacruz Electronic Export Processing Zone, Andheri (East), Mumbai - 400096, Maharashtra, India
2. SHETTY, Sachin
Tata Consultancy Services Limited, 1st to 13th floors, Kensington 'B' Wing - SEZ, Hiranandani Business Park, Powai, Mumbai - 400076, Maharashtra, India

Specification

Claims:
1. A processor implemented method for optimized model fitment to hierarchical time-series cluster by categorizing time-series, the method comprising:
receiving a plurality of time-series associated with product demand forecasting (202), wherein the plurality of time-series have non-identical characteristics;
categorizing a time-series among the plurality of time-series into a segment among a plurality of segments based on one or more preset conditions satisfied by the time-series (204), wherein the plurality of segments comprises a non-forecastable series, a new series, a phasing out series, a sporadic series, a high value volatile series, a high value stable series, a low value volatile series and a low value stable series with each segment comprising a set of time series among the plurality of time series, wherein the one or more preset conditions are verified in a predefined sequence, and wherein the categorizing of the plurality of time-series comprises:
determining whether a Coefficient of Variation (CoV) of demand of the time-series is greater than a first preset value, wherein the time-series is categorized as the non-forecastable series if the CoV of demand is greater than the first preset value;
determining whether number of active periods in the time-series, having the CoV of demand less than or equal to the first preset value, is less than a second preset value corresponding to percentage of length of the time-series, wherein the time-series is categorized as the non-forecastable series if the number of active periods in the time-series is less than the second preset value;
determining whether a first demand of product derived from latest data values present at an end segment of the time-series, having the number of active periods greater than or equal to the second preset value, is equal to a total demand, of product, wherein the time-series is categorized as the new series if the first demand of the product is equal to the total demand of product;
determining whether a second demand of product derived from data values present across larger time segment of the time-series as compared to the end segment, when the time-series is not the new product time-series, is less than a third preset value corresponding to percentage of the total demand, wherein the time-series is categorized as the phasing out series if the second demand of product is less than the third preset value corresponding to percentage of the total demand;
determining whether the number of active periods in the time-series, having the second demand of product greater than or equal to the third preset value, is greater than a fourth preset value corresponding to percentage of length of the time-series, wherein the time-series is categorized as the sporadic series if the number of active periods in the time-series is less than or equal to the fourth preset value;
determining whether a value of demand of the time-series, having the number of active periods greater than the fourth preset value, is greater than a fifth preset value, wherein the time-series is categorized as a high value time-series if the value of demand is greater than the fifth preset value and is categorized a low value time-series if the value of demand is less than or equal to the fifth predefined value;
determining whether the CoV of demand of the high value time-series is greater than a sixth preset value, wherein the high value time-series is categorized as the high value volatile series when the CoV of demand is greater than the sixth predefined value, and wherein the high value time-series is categorized as the high value stable series when the CoV of demand is less than or equal to the sixth preset value; and
determining whether the CoV of demand of the low value time-series is greater than the sixth preset value, wherein the low value time-series is categorized as the low value volatile series when the CoV of demand is greater than the sixth preset value, and wherein the low value time-series is categorized as the low value stable series when the CoV of demand is less than or equal to the sixth preset value;
applying data treatment to each time-series among the set of time-series in each segment among the plurality of segments (206), wherein the data treatment is specific to the segment to which the time-series is categorized; and
repeating a plurality of steps for each segment, the plurality of steps comprising:
identifying a set of forecasting models applicable to the segment that are optimal fit for a plurality of characteristics of the set of time-series in the segment (208), wherein the set of forecasting models belong to a plurality of classes, with each class having a plurality of models;
clustering the set of time-series in the segment into hierarchical time-series clusters, arranged as a plurality of clusters into a plurality of branches, and based on Dynamic Time Warping (DTW) distance measure (210);
tagging a set of identifiers to each time-series among the set of time-series across the plurality of segments (212), wherein the set of identifiers comprises a segment identifier based on the segment the time-series is categorized into, a cluster identifier representing a cluster among the plurality of clusters the time-series belongs to within the hierarchical time-series clusters, the set of forecasting models identified for the corresponding segment and a Cluster level Error Tolerance (ET) for the cluster; and
applying model fitting to the hierarchical time-series clusters to determine an optimal fit model for each time-series in each segment based on an optimal fit model technique in accordance with the tagged set of identifiers (214).
2. The method as claimed in claim 1, wherein the data treatment specific to the segment comprises:
a) trimming leading zeroes or nulls, average filling of missing values and refrain from applying outlier treatment if the segment is the non- forecastable series or the phasing out series;
b) trimming leading zeroes or nulls, applying pattern based missing value fill and applying pattern based outlier treatment if the segment is the new series;
c) trimming leading zeroes or nulls, replacing the missing values with zeroes and refrain from applying outlier treatment if the segment is the sporadic series;
d) trimming leading zeroes or nulls, applying pattern based missing value fill and applying pattern based outlier treatment with a sigma value greater or equal to a preset sigma threshold if the segment is the high value volatile series and the low value volatile series; and
e) trimming leading zeroes or nulls, applying pattern based missing value fill and applying pattern based outlier treatment with a sigma value lower than a the preset sigma threshold if the segment is the high value stable series and the low value stable series.
3. The method as claimed in claim 2, further comprises preprocessing each time-series of the set of time series of each segment prior to clustering, wherein the preprocessing comprises:
applying data treatment to the data values of each time-series to reduce noise and enable identifying underlying pattern of each time-series while clustering time-series in each segment into hierarchical clusters;
normalizing the data values of each time-series to bring the data values to a common scale; and
identifying business requirements for forecasting frequency, wherein based on the business requirements time-series for Daily, Weekly, Monthly frequencies are clustered separately.
4. The method as claimed in claim 1, wherein the optimal fit model technique applied to determine the optimal fit model for each time-series in the set of time-series in the segment, arranged into the hierarchical time-series clusters comprises repeating a plurality of steps for every branch among the plurality of branches of the hierarchical time-series clusters of the segment, wherein the plurality of steps comprise:
identifying a first time-series from a cluster, among the plurality of clusters, placed at a cluster height equal to a lowest cluster height of the branch, wherein the first series is identified based on the segment identifier, the cluster identifier and the cluster level ET tagged to the first time-series (402);
determining a first optimal fit model for the first time-series from a first class identified from the plurality of classes in the set of forecasting models, wherein the determined first optimal fit model of the first class provides an error below the cluster level ET (404);
repeating process of determining optimal model fit for each time-series in the plurality of clusters in each branch among the plurality of branches until the optimal fit model for each time-series in a top cluster, among the plurality of clusters of the branch, placed at a topmost cluster height is completed (406), wherein the process of determining the optimal fit model comprises:
determining a second optimal fit model for a second time-series of the cluster, wherein the second optimal model satisfies a condition among a plurality of fitment conditions when the plurality of conditions are tested in a sequence (408), wherein the plurality of conditions and the sequence of testing comprises:
a) select the first optimal fit model determined for the first time-series as the second optimal fit model:
if an error, when the first optimal fit model is applied to the second time-series, is below the cluster level ET; or
if the error, when the first optimal fit model is applied to the second time-series, is above the cluster level ET and an error difference (ED) between the error for the first time-series and the error for the second time-series is below an Error Difference (ED) threshold;
b) select a next optimal fit model from the first class, which provides the error below the cluster level ET when the next optimal fit model is applied to the second time-series;
c) select an adjacent optimal fit model from a second class among the plurality of classes of the set of forecasting models identified for the segment, which provides the error below the cluster level ET when the optimal fit model from the second class is applied to the second time-series;
d) an least error optimal fit model from a class among the plurality of classes of the set of forecasting models which provides least value of the error for the cluster level ET; and
applying the second optimal fit model to a first series of a next cluster among the plurality of clusters (410), wherein the next cluster is placed at a next successive cluster height to the cluster and the second time-series is the first series of the next cluster.

5. A system (100) for optimized model fitment to hierarchical time-series cluster by categorizing time-series, the system (100) comprising:
a memory (102) storing instructions;
one or more Input/Output (I/O) interfaces (106); and
processor(s) (104) coupled to the memory (102) via the one or more I/O interfaces (106), wherein the processor(s) (104) is configured by the instructions to:
receive a plurality of time-series associated with product demand forecasting, wherein the plurality of time-series have non-identical characteristics;
categorize a time-series among the plurality of time-series into a segment among a plurality of segments based on one or more preset conditions satisfied by the time-series, wherein the plurality of segments comprise a non- forecastable series, a new series, a phasing out series, a sporadic series, a high value volatile series, a high value stable series, a low value volatile series and a low value stable series with each segment comprising a set of time series among the plurality of time series, wherein the one or more preset conditions are verified in a predefined sequence, and wherein the categorizing of the plurality of time-series comprises:
determining whether a Coefficient of Variation (CoV) of demand of the time-series is greater than a first preset value, wherein the time-series is categorized as the non-forecastable series if the CoV of demand is greater than the first preset value;
determining whether number of active periods in the time-series, having the CoV of demand less than or equal to the first preset value, is greater or less than a second preset value corresponding to percentage of length of the time-series, wherein the time-series is categorized as the non-forecastable series if the number of active periods in the time-series is less than the second preset value;
determining whether a first demand of product derived from latest data values present at an end segment of the time-series, having the number of active periods greater than or equal to the second preset value, is equal to a total demand, of product, wherein the time-series is categorized as the new series if the first demand of the product is equal to the total demand of product;
determining whether a second demand of product derived from data values present across larger time segment of the time-series as compared to the end segment, when the time-series is not the new product time-series, is less than a third preset value corresponding to percentage of the total demand, wherein the time-series is categorized as the phasing out series if the second demand of product is less than the third preset value corresponding to percentage of the total demand;
determining whether the number of active periods in the time-series, having the second demand of product greater than or equal to the third preset value, is greater than a fourth preset value corresponding to percentage of length of the time-series, wherein the time-series is categorized as the sporadic series if the number of active periods in the time-series is less or than equal to the fourth preset value;
determining whether a value of demand of the time-series, having the number of active periods greater than the fourth preset value, is greater than a fifth preset value, wherein the time-series is categorized as a high value time-series if the value of demand is greater than the fifth preset value and is categorized a low value time-series if the value of demand is less than or equal to the fifth predefined value;
determining whether the CoV of demand of the high value time-series is greater than a sixth preset value, wherein the high value time-series is categorized as the high value volatile series when the CoV of demand is greater than the sixth predefined value, and wherein the high value time-series is categorized as the high value stable series when the CoV of demand is less than or equal to the sixth preset value; and
determining whether the CoV of demand of the low value time-series is greater than the sixth preset value, wherein the low value time-series is categorized as the low value volatile series when the CoV of demand is greater than the sixth preset value, and wherein the low value time-series is categorized as the low value stable series when the CoV of demand is less than or equal to the sixth preset value;
apply data treatment to each time-series among the set of time-series in each segment among the plurality of segments, wherein the data treatment is specific to the segment to which the time-series is categorized; and
repeat a plurality of steps for each segment, the plurality of steps comprising:
identify a set of forecasting models applicable to the segment that are optimal fit for a plurality of characteristics of the set of time-series in the segment, wherein the set of forecasting models belong to a plurality of classes, with each class having a plurality of models;
cluster the set of time-series in the segment into hierarchical time-series clusters, arranged as a plurality of clusters into a plurality of branches, and based on Dynamic Time Warping (DTW) distance measure;
tag a set of identifiers to each time-series among the set of time-series across the plurality of segments, wherein the set of identifier comprises a segment identifier based on the segment the time-series is categorized into, a cluster identifier representing a cluster among the plurality of clusters the time-series belongs to within the hierarchical time-series clusters, the set of forecasting models identified for the corresponding segment and a Cluster level Error Tolerance (ET) for the cluster; and
apply model fitting to the hierarchical time-series clusters to determine an optimal fit model for each time-series in each segment based on an optimal fit model technique in accordance with the tagged set of identifiers.
6. The system as claimed in claim 5, wherein the data treatment specific to the segment comprises:
a) trimming leading zeroes or nulls, average filling of missing values and refrain from applying outlier treatment if the segment is the non- forecastable series or the phasing out series;
b) trimming leading zeroes or nulls, applying pattern based missing value fill and applying pattern based outlier treatment if the segment is the new series;
c) trimming leading zeroes or nulls, replacing the missing values with zeroes and refrain from applying outlier treatment if the segment is the sporadic series;
d) trimming leading zeroes or nulls, applying pattern based missing value fill and applying pattern based outlier treatment with a sigma value greater or equal to a preset sigma threshold if the segment is the high value volatile series and the low value volatile series; and
e) trimming leading zeroes or nulls, applying pattern based missing value fill and applying pattern based outlier treatment with a sigma value lower than a the preset sigma threshold if the segment is the high value stable series and the low value stable series.
7. The system as claimed in claim 6, wherein the processor(s) 104 is further configured to preprocess each time-series of the set of time-series of each segment prior to clustering by:
applying data treatment to the data values of each time-series to reduce noise and enable identifying underlying pattern of each time-series while clustering time-series in each segment into hierarchical clusters;
normalizing the data values of each time-series to bring the data values to a common scale; and
identifying business requirements for forecasting frequency, wherein based on the business requirements time-series for Daily, Weekly, Monthly frequencies are clustered separately.
8. The system as claimed in claim 5, wherein the optimal fit model technique applied by the processor (s) to determine the optimal fit model for each time-series in the set of time-series in the segment, arranged into the hierarchical time-series clusters comprises repeating a plurality of steps for every branch among the plurality of branches of the hierarchical time-series clusters of the segment, wherein the plurality of steps comprise:
identifying a first time-series from a cluster, among the plurality of clusters, placed at a cluster height equal to a lowest cluster height of the branch, wherein the first series is identified based on the segment identifier, the cluster identifier and the cluster level ET tagged to the first time-series;
determining a first optimal fit model for the first time-series from a first class identified from the plurality of classes in the set of forecasting models, wherein the determined first optimal fit model of the first class provides an error below the cluster level ET;
repeating process of determining optimal model fit for each time-series in the plurality of clusters in each branch among the plurality of branches until the optimal fit model for each time-series in a top cluster, among the plurality of clusters of the branch, placed at a topmost cluster height is completed, wherein the process of determining the optimal fit model comprises:
determining a second optimal fit model for a second time-series of the cluster, wherein the second optimal model satisfies a condition among a plurality of fitment conditions when the plurality of conditions are tested in a sequence, wherein the plurality of conditions and the sequence of testing comprises:
a) select the first optimal fit model determined for the first time-series as the second optimal fit model:
if an error, when the first optimal fit model is applied to the second time-series, is below the cluster level ET; or
if the error, when the first optimal fit model is applied to the second time-series, is above the cluster level ET and an error difference (ED) between the error for the first time-series and the error for the second time-series is below an Error Difference (ED) threshold;
b) select a next optimal fit model from the first class, which provides the error below the cluster level ET when the next optimal fit model is applied to the second time-series;
c) select an adjacent optimal fit model from a second class among the plurality of classes of the set of forecasting models identified for the segment, which provides the error below the cluster level ET when the optimal fit model from the second class is applied to the second time-series;
d) an least error optimal fit model from a class among the plurality of classes of the set of forecasting models which provides least value of the error for the cluster level ET; and
applying the second optimal fit model to a first series of a next cluster among the plurality of clusters, wherein the next cluster is placed at a next successive cluster height to the cluster and the second time-series is the first series of the next cluster.
, Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
METHOD AND SYSTEM FOR OPTIMIZED MODEL FITMENT TO HIERARCHICAL TIME-SERIES CLUSTERS BY CATEGORIZING TIME-SERIES

Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India

The following specification particularly describes the invention and the manner in which it is to be performed.
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY

The present application is a patent of addition of Indian Patent Application No. 201721028534, filed on 10 August 2017, the entire content of which is hereby incorporated herein by way of reference.

TECHNICAL FIELD
The embodiments herein generally relate to the field of time-series clusters and, more particularly, to optimized model fitment to hierarchical time-series clusters by categorizing time-series.

BACKGROUND
Time-series forecasting is crucial in deriving future insights and planning for many application areas such as energy, sales in retail and its supply chain and traffic management and the like. Huge volume of time-series data is gathered from multiple source points to derive future insights for specific application domain by generating forecasts. Selecting a best fit model that can help identify patterns in any given data and then predict data behavior, can be a daunting task. Specially, when there is high data volume and the data is not categorized or labeled as per inherent patterns. Time-series data gathered for product demand forecasting, wherein each time-series hardly has any identical characteristics with other is one typical example. Time-series data by its very nature is not a single data point but carries additional pattern information as part of the sequence of data points. Across different industries, varying patterns may be observed in time-series specific to the entity’s production, demand, marketing, competitor activity and such aspects. However, at a purely data level, commonly observed patterns in a time-series are Intermittency, Trend, Seasonality and Noise (Random). The interaction between these various components forms the basis of various forecasting models. Typically, in any given forecasting problem in a specific industry, multiple such combinations in the data are observed. As can be envisaged, a single model may not provide the best accuracy. However, selecting best-fit model or for each time-series separately can put a lot of pressure on available resources. For this reason, most of the time, sub-quality models like Weighted Moving Average, are deployed as they can be built quickly.
Some approaches have been worked upon to find most appropriate fitting model for a group of time-series, rather than identifying a best fit model for each time-series. Some existing methods classify the time-series data into classes or groups based on characteristics of each time-series and tag a set of appropriate forecasting models to each group. However, the groups so formed are limited. Further the approach discussed limits only to categorization or segmentation, and further mostly employs conventional techniques of model fitting to identify best fit model for each time-series in a group. Thus may not contribute in providing time efficient model fitting approach. A considerable research been carried out on classifier and neural network based time-series analysis for forecasting, but these approaches are computationally intensive and practically challenging with aspect of ease of implementation.
The Applicant has addressed concerns and limitations in the art, in applicant’s Indian patent application No. 201721028534, filed on 10 August 2017 by providing bottom to top model fitting approach to a time-series data arranged as hierarchical time-series clusters, wherein optimal models were identified for individual clusters in accordance to a bottom to top model fitting approach based on a preset criteria. However, further additions and refinements to the approach are not discussed that can improve the time efficiency of the bottom to top model fitting approach. Moreover, in certain applications of forecasting time efficient and accurate forecasting both need to be achieved at the same time.

SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for optimized model fitment to hierarchical time-series cluster by categorizing time-series is provided. The method comprises receiving a plurality of time-series associated with product demand forecasting, wherein the plurality of time-series have non-identical characteristics. Further comprises categorizing a time-series among the plurality of time-series into a segment among a plurality of segments based on one or more preset conditions satisfied by the time-series. The plurality of segments comprise a non-forecastable series, a new series, a phasing out series, a sporadic series, a high value volatile series, a high value stable series, a low value volatile series and a low value stable series with each segment comprising a set of time series among the plurality of time series, wherein the one or more preset conditions are verified in a predefined sequence. The categorizing of the plurality of time-series comprises: determining whether a Coefficient of Variation (CoV) of demand of the time-series is greater than- a first preset value, wherein the time-series is categorized as the non-forecastable series if the CoV of demand is greater than the first preset value; determining whether number of active periods in the time-series, having the CoV of demand less than or equal to the first preset value, is less than a second preset value corresponding to percentage of length of the time-series, wherein the time-series is categorized as the non-forecastable series if the number of active periods in the time-series is less than the second preset value; determining whether a first demand of product derived from latest data values present at an end segment of the time-series, having the number of active periods greater than or equal to the second preset value, is equal to a total demand, of product, wherein the time-series is categorized as the new series if the first demand of the product is equal to the total demand of product; determining whether a second demand of product derived from data values present across larger time segment of the time-series as compared to the end segment, when the time-series is not the new product time-series, is less than a third preset value corresponding to percentage of the total demand, wherein the time-series is categorized as the phasing out series if the second demand of product is less than the third preset value corresponding to percentage of the total demand; determining whether the number of active periods in the time-series, having the second demand of product greater than or equal to the third preset value, is greater than a fourth preset value corresponding to percentage of length of the time-series, wherein the time-series is categorized as the sporadic series if the number of active periods in the time-series is less than or equal to the fourth preset value; determining whether a value of demand of the time-series, having the number of active periods greater than the fourth preset value, is greater than a fifth preset value, wherein the time-series is categorized as a high value time-series if the value of demand is greater than the fifth preset value and is categorized a low value time-series if the value of demand is less than or equal to the fifth predefined value; determining whether the CoV of demand of the high value time-series is greater than a sixth preset value, wherein the high value time-series is categorized as the high value volatile series when the CoV of demand is greater than the sixth predefined value, and wherein the high value time-series is categorized as the high value stable series when the CoV of demand is less than or equal to the sixth preset value; and determining whether the CoV of demand of the low value time-series is greater than the sixth preset value, wherein the low value time-series is categorized as the low value volatile series when the CoV of demand is greater than the sixth preset value, and wherein the low value time-series is categorized as the low value stable series when the CoV of demand is less than or equal to the sixth preset value. Further, the method comprises applying data treatment to each time-series among a set of time-series in each segment among the plurality of segment. The data treatment is specific to the segment to which the time-series is categorized. Further, for each segment comprising the set time-series, the method comprises identifying a set of forecasting models applicable to the segment that are optimal fit for a plurality characteristics of the set of time-series in the segment. The set of forecasting models belong to a plurality of classes, with each class having a plurality of models. Furthermore, the method comprises clustering the set of time-series in the segment into hierarchical time-series clusters, arranged as a plurality of clusters into a plurality of branches based on Dynamic Time Warping (DTW) distance measure. Furthermore, the method comprises tagging a set of identifiers to each time-series among the set of time-series across the plurality of segments. The set of identifier comprises a segment identifier based on the segment the time-series is categorized into, a cluster identifier representing a cluster among the plurality of clusters the time-series belongs to within the hierarchical time-series clusters, the set of forecasting models identified for the corresponding segment and a Cluster level Error Tolerance (ET) for the cluster. Thereafter, the method comprises applying model fitting to the hierarchical time-series clusters to determine an optimal fit model for each time-series in each segment based on an optimal fit model technique in accordance with the tagged set of identifiers.
In another aspect, a system for optimized model fitment to hierarchical time-series cluster by categorizing time-series is provided. The system comprises a memory storing instructions; one or more Input/Output (I/O) interfaces; and processor(s) coupled to the memory via the one or more I/O interfaces, wherein the processor(s) is configured by the instructions to receive a plurality of time-series associated with product demand forecasting, wherein the plurality of time-series have non-identical characteristics. Further, the processor is configured to categorize a time-series among the plurality of time-series into a segment among a plurality of segments based on one or more preset conditions satisfied by the time-series. The plurality of segments comprise a non-forecastable series, a new series, a phasing out series, a sporadic series, a high value volatile series, a high value stable series, a low value volatile series and a low value stable series with each segment comprising a set of time series among the plurality of time series,, wherein the one or more preset conditions are verified in a predefined sequence. The categorizing of the plurality of time-series comprises: determining whether a Coefficient of Variation (CoV) of demand of the time-series is greater than a first preset value, wherein the time-series is categorized as the non-forecastable series if the CoV of demand is greater than the first preset value; determining whether number of active periods in the time-series, having the CoV of demand less than or equal to the first preset value, is less than a second preset value corresponding to percentage of length of the time-series, wherein the time-series is categorized as the non-forecastable series if the number of active periods in the time-series is less than the second preset value; determining whether a first demand of product derived from latest data values present at end segment of the time-series, having the number of active periods greater than or equal to the second preset value, is equal to a total demand, of product, wherein the time-series is categorized as the new series if the first demand of the product is equal to the total demand of product; determining whether a second demand of product derived from data values present across larger time segment of the time-series as compared to the end segment, when the time-series is not the new product time-series, is less than a third preset value corresponding to percentage of the total demand, wherein the time-series is categorized as the phasing out series if the second demand of product is less than the third preset value corresponding to percentage of the total demand; determining whether the number of active periods in the time-series, having the second demand of product greater than or equal to the third preset value, is greater than a fourth preset value corresponding to percentage of length of the time-series, wherein the time-series is categorized as the sporadic series if the number of active periods in the time-series is less than or equal to the fourth preset value; determining whether a value of demand of the time-series, having the number of active periods greater than the fourth preset value, is greater than a fifth preset value, wherein the time-series is categorized as a high value time-series if the value of demand is greater than the fifth preset value and is categorized a low value time-series if the value of demand is less than the fifth predefined value; determining whether the CoV of demand of the high value time-series is greater than a sixth preset value, wherein the high value time-series is categorized as the high value volatile series when the CoV of demand is greater than the sixth predefined value, and wherein the high value time-series is categorized as the high value stable series when the CoV of demand is less than or equal to the sixth preset value; and determining whether the CoV of demand of the low value time-series is greater than the sixth preset value, wherein the low value time-series is categorized as the low value volatile series when the CoV of demand is greater than the sixth preset value, and wherein the low value time-series is categorized as the low value stable series when the CoV of demand is less than or equal to the sixth preset value. Further, the processor is configured to apply data treatment to each time-series among a set of time-series in each segment among the plurality of segment. The data treatment is specific to the segment to which the time-series is categorized. Further, for each segment comprising the set time-series, the method comprises identifying a set of forecasting models applicable to the segment that are optimal fit for a plurality characteristics of the set of time-series in the segment. The set of forecasting models belong to a plurality of classes, with each class having a plurality of models. Furthermore, the processor is configured to cluster the set of time-series in the segment into hierarchical time-series clusters, arranged as a plurality of clusters into a plurality of branches based on Dynamic Time Warping (DTW) distance measure. Furthermore, the processor is configured to tag a set of identifiers to each time-series among the set of time-series across the plurality of segments. The set of identifier comprises a segment identifier based on the segment the time-series is categorized into, a cluster identifier representing a cluster among the plurality of clusters the time-series belongs to within the hierarchical time-series clusters, the set of forecasting models identified for the corresponding segment and a Cluster level Error Tolerance (ET) for the cluster. Thereafter, the method comprises applying model fitting to the hierarchical time-series clusters to determine an optimal fit model for each time-series in each segment based on an optimal fit model technique in accordance with the tagged set of identifiers.
In yet another aspect, there are provided one or more non-transitory machine readable information storage mediums comprising one or more instructions, which when executed by one or more hardware processors causes a method for optimized model fitment to hierarchical time-series cluster by categorizing time-series. The method comprises receiving a plurality of time-series associated with product demand forecasting, wherein the plurality of time-series have non-identical characteristics. Further comprises categorizing a time-series among the plurality of time-series into a segment among a plurality of segments based on one or more preset conditions satisfied by the time-series. The plurality of segments comprise a non -forecastable series, a new series, a phasing out series, a sporadic series, a high value volatile series, a high value stable series, a low value volatile series and a low value stable series with each segment comprising a set of time series among the plurality of time series, wherein the one or more preset conditions are verified in a predefined sequence. The categorizing of the plurality of time-series comprises: determining whether a Coefficient of Variation (CoV) of demand of the time-series is greater than- a first preset value, wherein the time-series is categorized as the non-forecastable series if the CoV of demand is greater than the first preset value; determining whether number of active periods in the time-series, having the CoV of demand less than or equal to the first preset value, is less than a second preset value corresponding to percentage of length of the time-series, wherein the time-series is categorized as the non-forecastable series if the number of active periods in the time-series is less than the second preset value; determining whether a first demand of product derived from latest data values present at end segment of the time-series, having the number of active periods greater than or equal to the second preset value, is equal to a total demand, of product, wherein the time-series is categorized as the new series if the first demand of the product is equal to the total demand of product; determining whether a second demand of product derived from data values present across larger time segment of the time-series as compared to the end segment, when the time-series is not the new product time-series, is less than a third preset value corresponding to percentage of the total demand, wherein the time-series is categorized as the phasing out series if the second demand of product is less than the third preset value corresponding to percentage of the total demand; determining whether the number of active periods in the time-series, having the second demand of product greater than or equal to the third preset value, is greater than a fourth preset value corresponding to percentage of length of the time-series, wherein the time-series is categorized as the sporadic series if the number of active periods in the time-series is less than or equal to the fourth preset value; determining whether a value of demand of the time-series, having the number of active periods greater than the fourth preset value, is greater than a fifth preset value, wherein the time-series is categorized as a high value time-series if the value of demand is greater than the fifth preset value and is categorized a low value time-series if the value of demand is less than or equal to the fifth predefined value; determining whether the CoV of demand of the high value time-series is greater than a sixth preset value, wherein the high value time-series is categorized as the high value volatile series when the CoV of demand is greater than the sixth predefined value, and wherein the high value time-series is categorized as the high value stable series when the CoV of demand is less than or equal to the sixth preset value; and determining whether the CoV of demand of the low value time-series is greater than the sixth preset value, wherein the low value time-series is categorized as the low value volatile series when the CoV of demand is greater than the sixth preset value, and wherein the low value time-series is categorized as the low value stable series when the CoV of demand is less than or equal to the sixth preset value. Further, the method comprises applying data treatment to each time-series among a set of time-series in each segment among the plurality of segment. The data treatment is specific to the segment to which the time-series is categorized. Further, for each segment comprising the set time-series, the method comprises identifying a set of forecasting models applicable to the segment that are optimal fit for a plurality characteristics of the set of time-series in the segment. The set of forecasting models belong to a plurality of classes, with each class having a plurality of models. Furthermore, the method comprises clustering the set of time-series in the segment into hierarchical time-series clusters, arranged as a plurality of clusters into a plurality of branches based on Dynamic Time Warping (DTW) distance measure. Furthermore, the method comprises tagging a set of identifiers to each time-series among the set of time-series across the plurality of segments. The set of identifier comprises a segment identifier based on the segment the time-series is categorized into, a cluster identifier representing a cluster among the plurality of clusters the time-series belongs to within the hierarchical time-series clusters, the set of forecasting models identified for the corresponding segment and a Cluster level Error Tolerance (ET) for the cluster. Thereafter, the method comprises applying model fitting to the hierarchical time-series clusters to determine an optimal fit model for each time-series in each segment based on an optimal fit model technique in accordance with the tagged set of identifiers.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG. 1 is a functional block diagram of a system for optimized model fitment to hierarchical time-series clusters by categorizing time-series prior to clustering, in accordance with some embodiments of the present disclosure.
FIG. 2A and FIG. 2B is a flow diagram illustrating a method for optimized model fitment to the hierarchical time-series clusters by categorizing time-series prior to the clustering, using the system of FIG. 1, in accordance with some embodiments of the present disclosure.
FIG. 3 illustrate an flow diagram for a process providing steps for the categorization of time-series into a plurality of segments prior to creating hierarchical time-series clusters, in accordance with some embodiments of the present disclosure.
FIG. 4 illustrates a flow diagram for a process providing steps for optimal model fitment to hierarchical time-series clusters of each segment, in accordance with some embodiments of the present disclosure.
FIG. 5 illustrates a pattern alignment characteristic of any two time-series used as basis for applying the optimized model fitment approach, in accordance with some embodiments of the present disclosure.
FIG. 6 illustrates, a cluster dendogram of a set of time-series of a segment arranged into hierarchical time-series clusters, in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following claims.
The Applicant has addressed concerns and limitations in the art in Indian patent application No. 201721028534, filed on 10 August 2017 by providing model fitting approach to a time-series data arranged as hierarchical time-series clusters, wherein optimal models were identified for individual clusters in accordance to a bottom to top model fitting approach based on a preset criteria. The Indian patent application No. 201721028534 focusses only on arranging a plurality of time-series into hierarchical time-series clusters and identify optimal models for each time-series. However, further additions and refinements to the approach are appreciated that can improve the time efficiency of the bottom to top model fitting approach and still provide an optimal fit model to time-series enabling maximum possible accurate model fitting for forecasting in minimum possible time. The Indian patent application No. 201721028534 directly begins with a step of clustering the time-series into hierarchical time-series clusters. With the time-series arranged as a plurality of clusters in a hierarchical structure, the Indian patent application No. 201721028534 describes step wherein all possible forecasting models available in the repository are tried to identify a best fit model for a first time-series in bottom most clusters lying at lowest cluster height. However, practical observations indicate that based on the characteristics of time-series, a certain set of models are most appropriate for a certain group or category of the time-series, while for some typical time-series none of the model available is best fit and some general models such as moving average can only be used. Thus, with the approach provided by the applicant’s Indian patent application No. 201721028534, considerable amount of time will be consumed or wasted over trying forecasting models that were never fit for the particular type of time-series.
Hence, to provide a time efficient approach to the already described model fitting approach for hierarchical time-series clusters, the embodiments herein provide a method and system for optimized model fitment to hierarchical time-series cluster by categorizing time-series in a plurality of segments prior to generating the hierarchical time-series clusters. Further, for each of the segment a set of most appropriate forecasting models are identified and each segment is then structured as hierarchical time-series clusters. Thus, during the process of model fitting, unlike the Indian patent application No. 201721028534, the method disclosed identifies optimal fit model for a time-series from only the appropriate forecasting model set identified for the segment, to which the time-series is categorized. Further, the method disclosed also details on model fitting approach across time-series within a cluster, wherein the method refers to criteria for prioritizing classes of forecasting models within the set of forecasting models for faster model fitting, still maintaining best possible model fitment accuracy.
Referring now to the drawings, and more particularly to FIGS. 1 through 6, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 is a functional block diagram of a system 100 for optimized model fitment to hierarchical time-series clusters by categorizing time-series prior to clustering, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 includes a processor(s) 104, communication interface device(s), alternatively referred as or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the processor(s) 104. The processors(s) 104, can be one or more hardware processors. In an embodiment, the one or more hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in a memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
The I/O interface(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server. The I/O interface 106 provides interface to receive a plurality of time-series data from one or more external sources. In an embodiment, the plurality of time-series, to be fit with optimal forecasting models, may be stored by the system into a database 108 of the memory 102.
The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 102, may further store all forecasting models, also interchangeably referred as models, are provided below:
1. MA – Moving Average class of models including Weighted Moving average models.
2. Holt Winters – Exponential smoothing model class
3. ETS - State Space Exponential Smoothing
4. STL – Seasonal Decomposition via LOESS (locally estimated scatterplot smoothing)
5. Croston
6. ARIMA - AutoRegressive Integrated Moving Average
7. UCM – Unified Component Modeling
8. SVR – Support Vector Regression
9. StructTS – Structural Time-series state space models w/ Kalman Filter
The memory 102 also stores a set of identifiers tagged to each time-series, wherein the identifiers are used while identifying optimal fit models for each time-series. Thus, the memory 102 may comprise information pertaining to input(s)/output(s) of each step performed by the processor(s) 104 of the system 100 and methods of the present disclosure.
FIG. 2A and FIG. 2B is a flow diagram illustrating a method 200 for optimized model fitment to the hierarchical time-series clusters by categorizing time-series prior to the clustering, using the system of FIG. 1, in accordance with some embodiments of the present disclosure.
In an embodiment, the system 100 comprises one or more data storage devices or the memory 102 operatively coupled to the processor(s) 104 and is configured to store instructions for execution of steps of the method 200 by the processor (s) 104. The steps of the method 200 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG. 1 and the steps of flow diagram as depicted in FIG. 2A through FIG. 6. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
Referring to the steps of the method 200, at step 202, the processor(s) 104 is configured to receiving a plurality of time-series associated with product demand forecasting. The plurality of time-series received have non-identical characteristics. As understood, due to such variations fitting a general common forecasting model for all time-series in not a good solution for accurate forecasting.
An example below explains on the variation seen in the characteristic of multiple time-series received in a product demand forecasting scenario and possible reasons for the same. Assume a scenario of a superstore, which sells various retail products for household usage ranging from condiments, packaged food, vegetables, dairy, baked goods, sanitation, small electronics, cold storage items, etc. All in all, the store stocks various quantities of over 10000 “products”. If the demand across each of these products needs to be tracked, it is observed that there exist different patterns in the time-series related to every product.
For example:
Some vegetables, fruits may indicate very seasonal demand patterns.
Some products like chips and dips may show high upticks during game season.
Some products may have introductory offers and may show high growth in demand.
Some others may be slowing down due to a replacement product, etc.
Sanitation products may only show intermittent demand behavior.
Some of these products may require a forecast to be provided daily, while others may need weekly or monthly forecasts.
Thus, for accurate forecasting, conventionally each time-series is fitted with a best-fit forecasting model, which requires multiple iterations through available models and is computationally intensive task. The challenge is explained below with an illustrative example and calculation. Assume there exists following models in a repository. For example:
Model classes capable of modeling seasonal data, like SARIMA, STL, Triple Exponential Smoothing (Holt Winters, ETS)
Model classes capable of modeling trend data, like ARIMA, Double Exponential Smoothing (Holt, ETS)
Model classes capable of modeling intermittent data like CROSTON or its variations
Other models like Support Vector Regression, Structural Time-series with Kalman Filter, Unified Component Model that can model multiple patterns
Each of these model classes could have multiple variations at the time of model fitment given their respective parameters, approximately in the range of up to 350 different models. In this particular example, it would mean ~4mn runs to determine the best fit model for each product. On a high computing system (16-core, 3.2GHz, 16GB RAM), this would still take over 48 hours to do the model fitment. However, for retail stores the forecasting requirements typically need the model building to finish within 5-6 hours and forecasts to be generated so that the right inventory planning can be done for the week. Even though the Indian patent application No. 201721028534 provides a solution to efficiently identify a best model fit by arranging the multiple time-series as hierarchical time-series clusters, the Indian patent application No. 201721028534 at initial model fitment step needs to run through all available models in repository and does not state any criteria on utilizing a selected few models for a given time-series.
To provide enhanced time efficient model fitment solution over the Indian patent application No. 201721028534, the method 200 disclosed utilizes categorization of time-series into multiple segments prior to clustering, as explained in conjunction with a process 300 of FIG. 3. Further, for each segment from the multiple segments, the method disclosed identifies appropriate forecasting models as explained in TABLE 1 below. Further, for more accurate results during model fitting, the method disclosed additionally applies data treatment to each time-series. Further, to this treated time series, the method disclosed then applies the bottom to top model-fitting approach as disclosed in the Indian patent application No. 201721028534. However, method enhances the model fitting approach of the Indian patent application No. 201721028534, by further detailing on how to perform model fitment for time-series within a cluster. This modified approach providing further improvement over the bottom to top model fitting approach of the Indian patent application No. 201721028534, hence forth, is referred as optimal model fitment approach, and is explained in conjunction with a process 400 of FIG. 4. Thus the method 200 reduces the number of model build runs required for cases such as the example superstore, which handles wide variety and varying price range products.
The method 200 disclosed here provides a plurality of segments into which a received time-series may be categorized based on the characteristics on these series. Referring back to the steps of the method 200, at step 204, the processor(s) 104 is configured to categorize the time-series among the plurality of time-series into a segment among the plurality of segments. The categorization is performed based on one or more preset conditions satisfied by the time-series. The logical analysis to decide on the possible segments to be created is provided below.
When handling forecasting needs for varied products, there may be time-series with characteristics, which that could determine how critical is it, to get high accuracy. These may be either statistical in nature or may come from the business that requires the forecast. Below are some sample characteristics for a time-series due to which time-series may not have identical characteristic:
Series have so little data that they can’t be forecasted or at best provide an average value as forecast.
It may be a new product and hence very little historical data to use in forecast.
It may be a product that is now waning in demand
Series values may be very sporadic or intermittent
The pattern may have either high volatility or may be very stable.
Additionally, there may be a business value attached to the series, indicating high-value or low-value products.
Thus, based on above analysis the method 200 creates the plurality of segments such as a non-forecastable series, a new series, a phasing out series, a sporadic series, a high value volatile series, a high value stable series, a low value volatile series and a low value stable series. The segments described herein are indicative and depending upon business scenarios further business segments may be drawn. Statistical segments may be limited to what are provided herein. The categorization of the time-series is explained in conjunction with the steps of process 300 of FIG. 3 used at the step 204.
Categorization into segments: At step 302, the processor(s) 104 is configured to determine whether a Coefficient of Variation (CoV) of demand of the time-series is greater than a first preset value. Further, the time-series is categorized as the non-forecastable series if the CoV of demand is greater than the first preset value. For example in FIG. 3, the first preset value for CoV of demand is 5. The demand herein refers to non-zero data values of the time-series. The condition at step 302, wherein time-series is tested for a very high value of CoV enables to check if the variation is too high due to which forecastability of the time-series is a challenge.
At step 304, the processor(s) 104 is configured to determine whether number of active periods in the time-series, having the CoV of demand less than or equal to the first preset value, is less than a second preset value corresponding to percentage of length of the time-series. The time-series is categorized as the non-forecastable series if the number of active periods in the time-series is less than the second preset value. For example in FIG. 3, the second preset value is 10% of the length of time-series. This condition at step 304 enables determining whether series is forecastable or not. Thus, if actives period is say less than 10% of the total length, then there are hardly any data points to create reliable forecast.
At step 306, the processor(s) 104 is configured to determine whether a first demand of product derived from latest data values present at end segment of the time-series, having the number of active periods greater than or equal to the second preset value, is equal to a total demand, of product. The time-series is categorized as the new series if the first demand of the product is equal to the total demand of product.
At step 308, the processor(s) 104 is configured to determine whether a second demand of product derived from data values present across larger time segment of the time-series, when the time-series is not the new product time-series, is less than a third preset value corresponding to percentage of the total demand. The time-series is categorized as the phasing out series if the second demand of product is less than the third preset value corresponding to percentage of the total demand. For example in FIG. 3, the third preset value is 3% of the total demand.
At step 310, the processor(s) 104 is configured to determine whether the number of active periods in the time-series, having the second demand of product greater than the third preset value, is greater than or equal to a fourth preset value corresponding to percentage of length of the time-series. The time-series is categorized as the sporadic series if the number of active periods in the time-series is less than or equal to the fourth preset value. For example in FIG. 3, the fourth preset value is 40% of the series length.
At step 312, the processor(s) 104 is configured to determine whether a value of demand of the time-series, having the number of active periods greater than the fourth preset value, is greater than a fifth preset value. The time-series is categorized as a high value time-series if the value of demand is greater than the fifth preset value and is categorized a low value time-series if the value of demand is less than or equal to the fifth preset value. For example in FIG. 3, the fifth preset value for value of demand is $4035.
In an embodiment, instead of a two class criteria, the time-series may be categorized into various gradients. Thus there may be any number of gradients going from High to Low. For example, gradients can be 1) high-Medium-Low 2) Very High-High-Medium High-Medium-Medium Low-Low or the like.
At step 314, the processor(s) 104 is configured to determine whether the CoV of demand of the high value time-series is greater than a sixth preset value. The high value time-series is categorized as the high value volatile series when the CoV of demand is greater than the sixth preset value. Further, the high value time-series is categorized as the high value stable series when the CoV of demand is less than or equal to the sixth preset value. For example in FIG. 3, the sixth preset value for CoV of demand is 1.5. At step 316, the processor(s) 104 is configured to determine whether the CoV of demand of the low value time-series is greater than the sixth preset value. Further, the low value time-series is categorized as the low value volatile series when the CoV of demand is greater than the sixth predefined value, and the low value time-series is categorized as the low value stable series when the CoV of demand is less than or equal to than the sixth predefined value. All the preset values can be defined by a Subject matter expert based on his knowledge and hands on experience in the specific domain of a use case where time series model fitting is to be applied.
Referring back to steps of method 200, at step 206, the processor(s) 104 is configured to apply data treatment to each time-series among a set of time-series in each segment that are categorized into each segment. The data treatment is specific to the segment to which the time-series is categorized. Further, for each segment comprising the set time-series, the method 200 repeats the steps from 208 through 214 to determine optimal fit model for each time-series in each segment.
Processing time series in each segment: At step 208, the processor (s) 104 is configured to identify a set of forecasting models applicable to the segment that are optimal fit for a plurality characteristics of the set of time-series in the segment. The set of forecasting models belong to a plurality of classes, with each class having a plurality of models. The table 1 below details the segment specific data treatment and segment specific set of forecasting models that are most appropriate fit for time-series in the corresponding segment for providing best possible forecasting accuracy.
TABLE 1:
Segment Data treatment Forecasting models selected based on characteristics

Non- forecastable series trimming leading zeroes or nulls, average filling of missing values and refrain from applying outlier treatment 1. Moving average models only
New series trimming leading zeroes or nulls, applying pattern based missing value fill and applying pattern based outlier treatment 1. When less than 5 data points use moving average models only
2. Else use Holt winters, STL, ETS. Further, within the above models it can be restricted to use non seasonal models
Phasing out series trimming leading zeroes or nulls, average filling of missing values and refrain from applying outlier treatment 1. Moving average models only
Sporadic series trimming leading zeroes or nulls, replacing the missing values with zeroes and refrain from applying outlier treatment 1. If spikes are seasonal or bunched use Holt winters, STL, ETS and if validation MAPE is high, try UCM

2. Otherwise use CROSTON and Variants
High value volatile series # trimming leading zeroes or nulls, applying pattern based missing value fill and applying pattern based outlier treatment.
# for outlier treatment data above a preset sigma threshold is to be chopped to bring it below the sigma threshold. In case of volatile data, this preset sigma threshold is greater 1. Use Holt winters, STL, ETS, ARIMA with validation period based model selection.
2. If validation period MAPE is high try SVR,UCM or Struc TS

High value stable series # trimming leading zeroes or nulls, applying pattern based missing value fill and applying pattern based outlier treatment
# for outlier treatment data above a predefined sigma threshold is to be chopped to bring it below the sigma threshold. In case of stable data, this preset threshold is lower compared to the volatile data 1. Use Holts winters, STL, ETS, ARIMA with validation period based model selection.
2. If validation period MAPE is high try SVR, UCM or Struc TS
Low value volatile series # trimming leading zeroes or nulls, applying pattern based missing value fill and applying pattern based outlier treatment.
# for outlier treatment, data above a predefined sigma threshold is to be chopped to bring it below the sigma threshold. In case of volatile data, this preset threshold is greater 1. Use Holt winters, STL, ETS with validation period based model selection.
( Prefer using simpler models like moving average, HW, STL)
Low value stable series # trimming leading zeroes or nulls, applying pattern based missing value fill and applying pattern based outlier treatment
# for outlier treatment data above a predefined sigma threshold is to be chopped to bring it below the sigma threshold. In case of stable data, this preset threshold is lower compared to the volatile data 1. Use Holt winters, STL, ETS, with validation period based model selection.
The above table 1 depicts that even though the method utilizes similar data treatment for stable and volatile data, the sigma threshold preset for outlier treatment in volatile series is set to a greater value as compared to preset value of sigma threshold for stable data:
Further, while selecting models between High Value and Low Value series, the method disclosed suggests not to utilize costly models like ARIMA, SVR, UCM, StructTS and the like for low value series, rather costly models should be preferred only for High Value series. The term ‘costly’ herein refers to high resource usage such as CPU and Memory or high elapsed time during model build.
Further, between Volatile and Stable data, the method disclosed suggests that volatile data needs to be handled using simpler models like moving average, HW, STL etc. while Stable can try complex models like ETS, ARIMA etc.
The segmentation process as explained in conjunction with FIG. 3 can be applied for various use cases with some configurable parameters like limits on Coefficient of Variation (CoV), Value of Demand the cutoff percentages and the like. The various characteristics of time series in corresponding segments have an impact on the kind of data treatment on the time-series and types of models to be included in selection. For example,
Non-Forecastable series could be completely excluded from model selection exercise, or at best, only be provided with a forecast that is just an average of available data points.
New series may not have enough historical data to reliably detect “Seasonal” behavior and hence models that determine seasonality could be excluded.
Phasing-Out series may not be of much importance to business and hence the error tolerance may be higher during model selection, thereby reducing the number of models to be tried to achieve the required tolerance level.
Sporadic / Intermittent series may require only CROSTON model.
For highly volatile series, the error tolerance may be higher, thereby reducing the number of models to be tried. However, if business value is high, it may still be decided/ defined to try every model in the repository.
Thus, based on the characteristics or segments evaluated for each series, the various data treatment for the time-series is decided. Further, from a repository of models available, only a narrow set of forecasting models is attached to each segment as per the characteristic. For series that are not part of any of the above special characteristics, a simple approach to use all available forecasting models is used. As mentioned earlier, the table 1 above show a sample of various rules that can be specified based on above characteristics.
Thus post categorization of series each time-series is tagged or annotated with following information:
Segment Identifier
Applicable data treatment rules based on segmentation
Applicable set of forecasting Models based on the Segment to which the time-series is categorized.
Cluster level Error Tolerance threshold as decided for the segment. (ET) or (ETc)

Once the plurality of time-series are categorized into corresponding segment, the method 200 proceeds with generating hierarchical time-series clusters for each segment. Each segment has a set of time-series. In an embodiment, prior to clustering step 214 of the method 200, the processor(s) 104 is configured to preprocess each time-series of each segment by applying data treatment to the data values of each time-series to reduce noise and enable identifying underlying pattern of each time-series for clustering time-series. The preprocessing or data treatment comprises normalizing data values of each time-series to bring the data values to a common scale. Furthermore, the preprocessing for data treatment comprises identifying business requirements for forecasting frequency, wherein based on the business requirements time-series for Daily, Weekly, Monthly frequencies are clustered separately.
Once the segments are identified and the data treatment is applied to each time-series in each segment, the clustering is performed. Referring back to step 210 of the method 200, the processor(s) 104 is configured to cluster the set of time-series in the segment into hierarchical time-series clusters. The hierarchical time-series clusters arrange the multiple time-series as a plurality of clusters distributed into a plurality of branches. The clustering is performed based on Dynamic Time Warping (DTW) distance measure as explained in the Indian patent application No. 201721028534 using know techniques. The clustering approach is not repeated for brevity. The reasoning for identifying hierarchical time-series clustering technique based on Distance time Warping (DTW) for clustering the time series is described below.
Since the model selection primarily depends on pattern detection, a model that is chosen as the best fit model for a specific pattern would also inherently be able to work for similar pattern of a different time-series, even if the scales differ. So, for the sequence X and Y indicated in FIG. 5, the model, also for X can be chosen by iterating over say 250 combinations of models identified for the segment. Due to the similarity of pattern between any two time-series as seen in FIG. 6, the model chosen for X then easily can be applied to Y as well, without having to run the 249 other combinations. The same model applied to both X and Y would have different fitment error values though, but if the error value is within a specified tolerance level provided by a user, a common models works well to expectations. Thus, to use the above pattern similarity logic for time efficient model fitting, main challenge is to identify similar patterns without any prior labeled information about the various time-series, also referred as unlabeled data set. In addition, as explained earlier, there can be multiple variations of the pattern itself. In such scenarios, it is common to use clustering approach as a mechanism to detect some structure in unlabeled dataset. When various types of clustering approaches were tired it was observed that across varying patterns of time-series, Hierarchical Agglomerative Clustering method with the distance measure as Dynamic Time Warping (DTW) and Average linkage provided the best clusters. Thus, choosing DTW was critical as it is capable of detecting “non-linear” similarity between time-series patterns. Hence for varying length of series or series that do not necessarily overlap in pattern, but the patterns are still similar in “speed” or “time” aspect, DTW is most suited. The arrows in FIG. 5 indicate how DTW aligns two time-series sequences X and Y. Even though the DTW is comparatively time consuming to other distances such as Euclidean distance, DTW when used along with the segmentation approach reduces the number of models to be tried, where each regression model in itself can have high run times, effectively providing time efficient model fitting.
In an embodiment, the number of cluster determination in case of Hierarchical Clustering is usually done on the basis of DUNN index or the like. So, essentially, a cluster is fit and attempt is made to create various clusters by cutting dendograms at various levels. Thus, for example, this may end in say a range of 3 to 10 clusters. For each of these sets for the various cuts, the DUNN index is checked. Further, whichever number gives highest DUNN is selected. Generally this is like a log curve, rises fast and then stabilizes. So point of stabilization is the number of clusters that are considered.
Hierarchical time-series clustering can be visualized as shown in cluster dendogram of FIG. 6 for a sample of 35 time-series (TS_1 through TS_35) with monthly frequency. The 35 time-series deliberately are chosen to reflect a mix of patterns here to show case how clustering helps improve selection of models. The following can be observed in this sample:
The grouping based on height determines the cluster similarity.
The cluster on the extreme right for TS_12 to TS_17 is a set of series which show intermittent pattern, the rest of the series have more continuous pattern.
TS_28 comes out as a single element bit more distant from other continuous series in the cluster as it is a phasing out series.
TS_26 and TS_32 exhibit a pattern where a structural break is observed, which means there was sudden shift in the scale of demands.
Once clusters are identified for each time series of each segment, then at step 212 of the method 200, the processor(s) 104 is configured to tag a set of identifiers to each time-series among the set of time-series across the plurality of segments. The set of identifiers comprise the segment identifier based on the segment the time-series is categorized into, a cluster identifier representing a cluster among the plurality of clusters the time-series belongs to within the hierarchical time-series clusters; the set of forecasting models identified for the corresponding segment and the cluster level Error Tolerance (ET) for the cluster.
Once each time-series in the segment has been tagged with set of identifiers, at step 214, the processor(s) is configured to apply model fitting to the hierarchical time-series clusters to determine the optimal fit model for each time-series in each segment. The method 200 utilizes the optimal fit model technique in accordance with the tagged set of identifiers.
Optimal fit model technique: The optimal fit model technique is described in conjunction with the process 400 of FIG. 4 used by the method at step 214. For every branch among the plurality of branches of the hierarchical time-series clusters of the segment, at step 402, the processor(s) 104 is configured to identify a first time-series from a cluster, among the plurality of clusters, placed at a cluster height equal to a lowest cluster height of the branch. The first series is identified based on the segment identifier, the cluster identifier and the cluster level ET tagged to the first time-series. At step 404, the processor(s) 104 is configured to determine a first optimal fit model for the first time-series from a first class identified from the plurality of classes in the set of forecasting models. The determined first optimal fit model of the first class is such that it provides an error below the cluster level ET. At step 402, the processor(s) 104 is configured to repeating process of determining optimal model fit for each time-series in the plurality of clusters in each branch among the plurality of branches until the optimal fit model for each time-series in a top cluster is completed. The topmost cluster refers to cluster placed at a topmost cluster height of the branch. The process of determining the optimal fit model is provided below with help of steps 408 and 410.
At step 402, the processor(s) 104 is configured to determine a second optimal fit model for a second time-series of the cluster, wherein the second optimal model satisfies a condition among a plurality of fitment conditions when the plurality of conditions are tested in a sequence. The plurality of conditions and the sequence of testing comprises:
Select the first optimal fit model determined for the first time-series as the second optimal fit model:
If an error, when the first optimal fit model is applied to the second time-series, is below the cluster level ET; or
If the error, when the first optimal fit model is applied to the second time-series, is above the cluster level ET and an error difference (ED) between the error for the first time-series and the error for the second time-series is below an Error Difference (ED) threshold.
Select a next optimal fit model from the first class, which provides the error below the cluster level ET when the next optimal fit model is applied to the second time-series.
Select an adjacent optimal fit model from a second class among the plurality of classes of the set of forecasting models identified for the segment, which provides the error below the cluster level ET when the optimal fit model from the second class is applied to the second time-series.
A least error optimal fit model from a class among the plurality of classes of the set of forecasting models, which provides least value of the error for the cluster level ET.
Further, moving up the cluster height, at step 402, the processor(s) 104 is configured to apply the second optimal fit model to a first series of a next cluster among the plurality of clusters. The next cluster is placed at a next successive cluster height to the cluster and the second time-series is the first series of the next cluster in the successive iteration.
The steps of the process are explained below with an example.
For any given series, when a forecasting model needs to be selected, various model classes like ARIMA, Holt Winters, ETS, UCM, STL, SVR available in the repository ( database 108) of the system 100 can be tried and tested in accordance with the steps of process 400. Since each of these models have their own additional parameters, for example, the ARIMA orders p, d, q or Holt Winters and ETS follow a,ß,?,e and so on. There is a need to iterate through each combination to get the optimal fit model. Optimal fit selection is based on the model’s performance w.r.t an error measure like MAPE and an error tolerance specified by the model builder.
An illustrative example below explain the steps of process 400 for model fitment, also referred alternatively as model building and selection, utilizing the segment and cluster information:
From the set of time-series of a segment, say time-series S1 to Sn, take first time time-series S1 placed at the lowest cluster height of the branch and based on corresponding applicable set of forecasting models, fit each model from the set and choose the best fit model (say M1) which had error of E1.
Now, take sibling of S1 (say second time-series S2) and fit M1 on it and check the error, E2. With observations studied, the time-series in a single cluster will have considerable similarity, thus would ideally be within the cluster tolerance level ET for its cluster.
If it is not and E2 – E1, is within ICED level (Error Difference (ED) threshold), the same model M1 is retailed for S2
If it is above ICED, then same model class as M1 is considered but freshly train the parameters to obtain a better fit model M2.
Once S1 and S2 have optimal fit models identified, the branch is traversed upward to the cluster at next successive cluster height, while sequentially trying the previously fit models to check errors are within tolerance levels.
At any given point, if cluster level ET and ICED even exceeds, after re-training model parameters, the next model class is selected from the applicable model list MLC. If the model list is exhausted, the model is picked which provides least error.
While moving up the hierarchy of clusters, the cluster information is checked to confirm each time-series is same as previous series. Further, once it is detected that there is a switch between clusters or segment, accordingly change the MLC and ETc is changed.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

Documents

Application Documents

# Name Date
1 201923034688-IntimationOfGrant14-02-2025.pdf 2025-02-14
1 201923034688-STATEMENT OF UNDERTAKING (FORM 3) [28-08-2019(online)].pdf 2019-08-28
1 201923034688-Written submissions and relevant documents [24-01-2024(online)].pdf 2024-01-24
2 201923034688-REQUEST FOR EXAMINATION (FORM-18) [28-08-2019(online)].pdf 2019-08-28
2 201923034688-PatentCertificate14-02-2025.pdf 2025-02-14
2 201923034688-Correspondence to notify the Controller [10-01-2024(online)].pdf 2024-01-10
3 201923034688-FORM 18 [28-08-2019(online)].pdf 2019-08-28
3 201923034688-FORM-26 [10-01-2024(online)]-1.pdf 2024-01-10
3 201923034688-Written submissions and relevant documents [24-01-2024(online)].pdf 2024-01-24
4 201923034688-Correspondence to notify the Controller [10-01-2024(online)].pdf 2024-01-10
4 201923034688-FORM 1 [28-08-2019(online)].pdf 2019-08-28
4 201923034688-FORM-26 [10-01-2024(online)].pdf 2024-01-10
5 201923034688-US(14)-HearingNotice-(HearingDate-11-01-2024).pdf 2023-12-15
5 201923034688-FORM-26 [10-01-2024(online)]-1.pdf 2024-01-10
5 201923034688-FIGURE OF ABSTRACT [28-08-2019(online)].jpg 2019-08-28
6 201923034688-FORM-26 [10-01-2024(online)].pdf 2024-01-10
6 201923034688-DRAWINGS [28-08-2019(online)].pdf 2019-08-28
6 201923034688-CLAIMS [20-06-2022(online)].pdf 2022-06-20
7 201923034688-US(14)-HearingNotice-(HearingDate-11-01-2024).pdf 2023-12-15
7 201923034688-COMPLETE SPECIFICATION [28-08-2019(online)].pdf 2019-08-28
7 201923034688-COMPLETE SPECIFICATION [20-06-2022(online)].pdf 2022-06-20
8 201923034688-CLAIMS [20-06-2022(online)].pdf 2022-06-20
8 201923034688-DRAWING [20-06-2022(online)].pdf 2022-06-20
8 201923034688-FORM-26 [11-10-2019(online)].pdf 2019-10-11
9 201923034688-COMPLETE SPECIFICATION [20-06-2022(online)].pdf 2022-06-20
9 201923034688-FER_SER_REPLY [20-06-2022(online)].pdf 2022-06-20
9 Abstract1.jpg 2019-10-31
10 201923034688-DRAWING [20-06-2022(online)].pdf 2022-06-20
10 201923034688-OTHERS [20-06-2022(online)].pdf 2022-06-20
10 201923034688-Proof of Right [19-02-2020(online)].pdf 2020-02-19
11 201923034688-FER.pdf 2022-03-04
11 201923034688-FER_SER_REPLY [20-06-2022(online)].pdf 2022-06-20
12 201923034688-OTHERS [20-06-2022(online)].pdf 2022-06-20
12 201923034688-Proof of Right [19-02-2020(online)].pdf 2020-02-19
13 201923034688-FER.pdf 2022-03-04
13 201923034688-FER_SER_REPLY [20-06-2022(online)].pdf 2022-06-20
13 Abstract1.jpg 2019-10-31
14 201923034688-Proof of Right [19-02-2020(online)].pdf 2020-02-19
14 201923034688-FORM-26 [11-10-2019(online)].pdf 2019-10-11
14 201923034688-DRAWING [20-06-2022(online)].pdf 2022-06-20
15 201923034688-COMPLETE SPECIFICATION [20-06-2022(online)].pdf 2022-06-20
15 201923034688-COMPLETE SPECIFICATION [28-08-2019(online)].pdf 2019-08-28
15 Abstract1.jpg 2019-10-31
16 201923034688-CLAIMS [20-06-2022(online)].pdf 2022-06-20
16 201923034688-DRAWINGS [28-08-2019(online)].pdf 2019-08-28
16 201923034688-FORM-26 [11-10-2019(online)].pdf 2019-10-11
17 201923034688-COMPLETE SPECIFICATION [28-08-2019(online)].pdf 2019-08-28
17 201923034688-FIGURE OF ABSTRACT [28-08-2019(online)].jpg 2019-08-28
17 201923034688-US(14)-HearingNotice-(HearingDate-11-01-2024).pdf 2023-12-15
18 201923034688-DRAWINGS [28-08-2019(online)].pdf 2019-08-28
18 201923034688-FORM 1 [28-08-2019(online)].pdf 2019-08-28
18 201923034688-FORM-26 [10-01-2024(online)].pdf 2024-01-10
19 201923034688-FORM-26 [10-01-2024(online)]-1.pdf 2024-01-10
19 201923034688-FORM 18 [28-08-2019(online)].pdf 2019-08-28
19 201923034688-FIGURE OF ABSTRACT [28-08-2019(online)].jpg 2019-08-28
20 201923034688-REQUEST FOR EXAMINATION (FORM-18) [28-08-2019(online)].pdf 2019-08-28
20 201923034688-FORM 1 [28-08-2019(online)].pdf 2019-08-28
20 201923034688-Correspondence to notify the Controller [10-01-2024(online)].pdf 2024-01-10
21 201923034688-Written submissions and relevant documents [24-01-2024(online)].pdf 2024-01-24
21 201923034688-STATEMENT OF UNDERTAKING (FORM 3) [28-08-2019(online)].pdf 2019-08-28
21 201923034688-FORM 18 [28-08-2019(online)].pdf 2019-08-28
22 201923034688-PatentCertificate14-02-2025.pdf 2025-02-14
22 201923034688-REQUEST FOR EXAMINATION (FORM-18) [28-08-2019(online)].pdf 2019-08-28
23 201923034688-IntimationOfGrant14-02-2025.pdf 2025-02-14
23 201923034688-STATEMENT OF UNDERTAKING (FORM 3) [28-08-2019(online)].pdf 2019-08-28

Search Strategy

1 SearchHistoryE_18-02-2022.pdf