Sign In to Follow Application
View All Documents & Correspondence

Real Time Categorization Of Log Events

Abstract: Embodiments for categorizing a real-time log event are described. In one example, a Term Frequency-Inverse Document Frequency (TF-IDF) vector for the log event is computed based on pre-calculated TF-IDF matrix of log corpus and number of new words in log event, where log corpus comprises one or more pre-existing log events, and where the log event is indicative of error message. Further, distance between TF-IDF vector and cluster centroid of each cluster in the log corpus is calculated. Thereafter, cluster having closest cluster centroid is identified from amongst the clusters based on distance between TF-IDF vector and cluster centroid of each of the clusters, where closest cluster centroid is cluster centroid closest to TF-IDF vector. Subsequently, log event is categorized into one or more log categories based on comparison of distance between TF-IDF vector and closest cluster centroid pre-determined silhouette threshold corresponding to cluster with closest cluster centroid.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
02 January 2015
Publication Number
28/2016
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
iprdel@lakshmisri.com
Parent Application
Patent Number
Legal Status
Grant Date
2021-09-02
Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED
Nirmal Building, 9th Floor, Nariman Point, Mumbai, Maharashtra 400021, India

Inventors

1. JACOB, Jayadeep
SP XII/609(7),Hannah, Ambalappattu Lane, Powdikonam Po, Trivandrum- 695588, Kerala, India

Specification

CLIAMS:1. A method for categorizing a real-time log event, the method comprising:
computing a Term Frequency-Inverse Document Frequency (TF-IDF) vector for the real-time log event based on a pre-calculated TF-IDF matrix of a log corpus and a number of new words in the real-time log event, wherein the log corpus comprises one or more pre-existing log events, and wherein the real-time log event is indicative of an error message;
calculating a distance between the TF-IDF vector and a cluster centroid of each cluster in the log corpus;
identifying, from amongst the clusters, a cluster having a closest cluster centroid based on the distance between the TF-IDF vector and the cluster centroid of each of the clusters, wherein the closest cluster centroid is a cluster centroid closest to the TF-IDF vector; and
categorizing the real-time log event into one or more log categories based on a comparison of the distance between the TF-IDF vector and the closest cluster centroid with a pre-determined silhouette threshold corresponding to the cluster with the closest cluster centroid.

2. The method as claimed in claim 1 further comprising:
receiving the real-time log event from one or more log sources; and
processing the real-time log event to remove insignificant data from the real-time log event, wherein the insignificant data includes timestamps, digits, and special characters.

3. The method as claimed in claim 1 further comprising determining a centroid matrix for the real-time log event by adapting a pre-determined centroid matrix of the log corpus based on the number of new words in the real-time log event, wherein the pre-determined centroid matrix is determined based on a number of clusters in the log corpus.

4. The method as claimed in claim 1, wherein the one or more log categories include a pre-existing log category corresponding to the cluster and a new log category.

5. The method as claimed in claim 4, wherein, when the distance between the TF-IDF vector and the closest cluster centroid of the cluster is greater than the pre-determined silhouette threshold corresponding to the cluster, the real-time log event is categorized as the new log category.

6. The method as claimed in claim 4, wherein, when the distance between the TF-IDF vector and the closest cluster centroid of the cluster is less than the pre-determined silhouette threshold corresponding to the cluster, the real-time log event is categorized into the pre-existing log category.

7. The method as claimed in claim 1, wherein the method further comprising:
receiving the log corpus from one or more log sources, wherein the log corpus comprises one or more pre-existing log events;
processing the log corpus to remove insignificant data from each of the one or more pre-existing log events, wherein the insignificant data includes timestamps, digits, and special characters;
computing the TF-IDF matrix of the log corpus based on a number of pre-existing log events in the log corpus and a number of words in the log corpus;
generating a cluster model based on the TF-IDF matrix, wherein the cluster model is indicative of the number of clusters corresponding to the log corpus, and wherein a cluster is indicative of a log category;
determining the centroid matrix of the log corpus based on the number of clusters in the cluster model and the number of words in the log corpus;
calculating a cluster radius and a silhouette width of each cluster, wherein a cluster radius of a cluster is calculated based on a distance between a cluster centroid of the cluster and a farthest point in the cluster; and wherein a silhouette width of the cluster is indicative of compactness of the cluster; and
determining a silhouette threshold for each cluster based on the corresponding cluster radius and the corresponding silhouette width.

8. The method as claimed in claim 7, wherein the cluster model is generated based on a clustering algorithm, wherein the clustering algorithm is a spherical k-means clustering algorithm.

9. A log categorization system (102) for categorizing a real-time log event, the log categorization system (102) comprising:
a processor (104);
a clustering module (116) coupled to the processor (104) to,
compute a Term Frequency-Inverse Document Frequency (TF-IDF) vector for the real-time log event based on a pre-calculated TF-IDF matrix of a log corpus and a number of new words in the real-time log event, wherein the log corpus comprises one or more pre-existing log events, and wherein the real-time log event is indicative of an error message;
a log categorization module (120) coupled to the processor (104) to,
calculate a distance between the TF-IDF vector and a cluster centroid of each cluster in the log corpus;
identify, from amongst the clusters, a cluster having a closest cluster centroid based on the distance between the TF-IDF vector and the cluster centroid of each of the clusters, wherein the closest cluster centroid is a cluster centroid closest to the TF-IDF vector; and
categorize the real-time log event into a log category based on a comparison of the distance between the TF-IDF vector and the closest cluster centroid with a pre-determined silhouette threshold corresponding to the cluster with the closest cluster centroid.

10. The log categorization system (102) as claimed in claim 9, wherein the log category is one of a pre-existing log category and a new log category.

11. The log categorization system (102) as claimed in claim 9, wherein the log categorization system (102) further includes a log processing module (114) coupled to the processor (104) to:
receive the real-time log event from a log source; and
process the real-time log event to remove insignificant data from the real-time log event, wherein the insignificant data includes timestamps, digits, and special characters.

12. The log categorization system (102) as claimed in claim 10, wherein the log categorization module (120) categorizes the real-time log event into the pre-existing log category when the distance between the TF-IDF vector and the closest cluster centroid is less than the pre-determined silhouette threshold corresponding to the cluster with the closest cluster centroid.

13. The log categorization system (102) as claimed in claim 10, wherein the log categorization module (120) categorizes the real-time log event as the new log category when the distance between the TF-IDF vector and the closest cluster centroid is greater than the pre-determined silhouette threshold corresponding to the cluster with the closest cluster centroid.

14. The log categorization system (102) as claimed in claim 9, wherein the clustering module (116) determines a centroid matrix for the real-time log event by adapting a pre-determined centroid matrix of the log corpus based on the number of new words in the real-time log event, wherein the pre-determined centroid matrix is determined based on a number of clusters in the log corpus.

15. The log categorization system (102) as claimed in claim 9, wherein the log processing module (114) further:
receives the log corpus from one or more log sources, wherein the log corpus comprises one or more pre-existing log events; and
processes the log corpus to remove insignificant data from each of the one or more pre-existing log events, wherein the insignificant data includes timestamps, digits, and special characters.

16. The log categorization system (102) as claimed in claim 9, wherein the clustering module (116) further:
computes the TF-IDF matrix of the log corpus based on a number of pre-existing log events in the log corpus and a number of words in the log corpus;
generates a cluster model based on the TF-IDF matrix, wherein the cluster model is indicative of the number of clusters corresponding to the log corpus, and wherein a cluster is indicative of a log category; and
determines the centroid matrix of the log corpus based on the number of clusters in the cluster model and the number of words in the log corpus.

17. The log categorization system (102) as claimed in claim 9, wherein the log categorization system (102) further includes a threshold determination module (118) to:
calculate a cluster radius and a silhouette width of each cluster, wherein a cluster radius of a cluster is calculated based on a distance between a cluster centroid of the cluster and a farthest point in the cluster; and wherein a silhouette width of the cluster is indicative of compactness of the cluster; and
determine a silhouette threshold for each cluster based on the corresponding cluster radius and the corresponding silhouette width.
18. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
computing a Term Frequency-Inverse Document Frequency (TF-IDF) vector for a log event based on a pre-calculated TF-IDF matrix of a log corpus and a number of new words in the log event, wherein the log corpus comprises one or more pre-existing log events, and wherein the log event is indicative of an error message;
calculating a distance between the TF-IDF vector and a cluster centroid of each cluster in the log corpus;
identifying, from amongst the clusters, a cluster having a closest cluster centroid based on the distance between the TF-IDF vector and the cluster centroid of each of the clusters, wherein the closest cluster centroid is a cluster centroid closest to the TF-IDF vector; and
categorizing the log event into one or more log categories based on a comparison of the distance between the TF-IDF vector and the closest cluster centroid with a pre-determined silhouette threshold corresponding to the cluster with the closest cluster centroid.
,TagSPECI:As Attached

Documents

Application Documents

# Name Date
1 12-MUM-2015-Request For Certified Copy-Online(18-03-2015).pdf 2015-03-18
2 SPEC FOR FILING.pdf 2018-08-11
3 PD014618IN-SC_request for priority document.pdf 2018-08-11
4 FORM 5.pdf 2018-08-11
5 FORM 3.pdf 2018-08-11
6 FIGURES FOR FILING.pdf 2018-08-11
7 12-MUM-2015-Power of Attorney-130215.pdf 2018-08-11
8 12-MUM-2015-Form 1-150115.pdf 2018-08-11
9 12-MUM-2015-Correspondence-150115.pdf 2018-08-11
10 12-MUM-2015-Correspondence-130215.pdf 2018-08-11
11 12-MUM-2015-FER.pdf 2019-11-06
12 12-MUM-2015-FORM-26 [17-04-2020(online)].pdf 2020-04-17
13 12-MUM-2015-FORM 3 [17-04-2020(online)].pdf 2020-04-17
14 12-MUM-2015-OTHERS [05-05-2020(online)].pdf 2020-05-05
15 12-MUM-2015-FER_SER_REPLY [05-05-2020(online)].pdf 2020-05-05
16 12-MUM-2015-CLAIMS [05-05-2020(online)].pdf 2020-05-05
17 12-MUM-2015-Correspondence to notify the Controller [06-04-2021(online)].pdf 2021-04-06
18 12-MUM-2015-Information under section 8(2) [23-04-2021(online)].pdf 2021-04-23
19 12-MUM-2015-Written submissions and relevant documents [28-04-2021(online)].pdf 2021-04-28
20 12-MUM-2015-PatentCertificate02-09-2021.pdf 2021-09-02
21 12-MUM-2015-IntimationOfGrant02-09-2021.pdf 2021-09-02
22 12-MUM-2015-US(14)-HearingNotice-(HearingDate-15-04-2021).pdf 2021-10-03
23 12-MUM-2015-RELEVANT DOCUMENTS [26-09-2023(online)].pdf 2023-09-26

Search Strategy

1 SearchStrategy_A12MUM2015AE_05-02-2021.pdf
2 SearchStrategyMatrix_12MUM2015_05-11-2019.pdf

ERegister / Renewals

3rd: 06 Sep 2021

From 02/01/2017 - To 02/01/2018

4th: 06 Sep 2021

From 02/01/2018 - To 02/01/2019

5th: 06 Sep 2021

From 02/01/2019 - To 02/01/2020

6th: 06 Sep 2021

From 02/01/2020 - To 02/01/2021

7th: 06 Sep 2021

From 02/01/2021 - To 02/01/2022

8th: 06 Sep 2021

From 02/01/2022 - To 02/01/2023

9th: 30 Dec 2022

From 02/01/2023 - To 02/01/2024

10th: 28 Dec 2023

From 02/01/2024 - To 02/01/2025

11th: 30 Dec 2024

From 02/01/2025 - To 02/01/2026