Sign In to Follow Application
View All Documents & Correspondence

A System And Method For Data Classification

Abstract: A system and method for data classification are disclosed. The method includes receiving by a data classifier, a data corpus comprising one or more words. The method further includes comparing the data corpus with at least one pre-classified category of words to determine an overlap ratio between the data corpus and each of the at least one pre-classified category of words. The method further includes computing a confidence score of the data corpus for each of the at least one pre-classified category of words based on the overlap ratio and a predefined confidence score associated with the data corpus for each of the at least one pre-classified category of words. Finally, the method includes classifying the data corpus based on the confidence score into the at least one pre-classified category.  Figure 2

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
29 November 2016
Publication Number
22/2018
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
ipo@knspartners.com
Parent Application

Applicants

WIPRO LIMITED
Doddakannelli, Sarjapur Road, Bangalore 560035, Karnataka, India.

Inventors

1. MOHIT SHARMA
006, Brundavan Presidency, 27th Cross, 19th Main, Sector 2, HSR Layout, Bangalore -560102, Karnataka, India.
2. SRINIVAS ADYAPAK
261, 36 B Cross, 7 Block Jayanagar, Bangalore 560070, Karnataka, India.

Specification

Claims:WE CLAIM:
1. A method of data classification:
receiving, by a data classifier, a data corpus comprising one or more words;
comparing, by the data classifier, the data corpus with at least one pre-classified category of words to determine an overlap ratio between the data corpus and each of the at least one pre-classified category of words;
computing, by the data classifier, a confidence score of the data corpus for each of the at least one pre-classified category of words based on the overlap ratio and a predefined confidence score associated with the data corpus for each of the at least one pre-classified category of words; and
classifying, by the data classifier, the data corpus based on the confidence score into the at least one pre-classified category.

2. The method of claim 1, wherein the overlap ratio is based on one or more words common between the data corpus and the at least one pre-classified category of words.

3. The method of claim 1, wherein the confidence score is the probability of the data corpus belonging to a category of the at least one pre-classified category of words.

4. The method of claim 1, further comprising determining a boost value for the confidence score of the data corpus for each of the at least one pre-classified category of words based on a change in the confidence score for each of the at least one pre-classified category of words from the predefined confidence score associated with the data corpus for each of the at least one pre-classified category of words.

5. A system for data classification, comprising:
a hardware processor; and
a memory storing instructions executable by the hardware processor for:
receiving a data corpus comprising one or more words;
comparing the data corpus with at least one pre-classified category of words to determine an overlap ratio between the data corpus and each of the at least one pre-classified category of words;
computing a confidence score of the data corpus for each of the at least one pre-classified category of words based on the overlap ratio and a predefined confidence score associated with the data corpus for each of the at least one pre-classified category of words; and
classifying the data corpus based on the confidence score into the at least one pre-classified category.

6. The system of claim 5, wherein the overlap ratio is based on one or more words common between the data corpus and the at least one pre-classified category of words.

7. The system of claim 5, wherein the confidence score is the probability of the data corpus belonging to a category of the at least one pre-classified category of words.

8. The system of claim 5, further comprising determining a boost value for the confidence score of the data corpus for each of the at least one pre-classified category of words based on a change in the confidence score for each of the at least one pre-classified category of words from the predefined confidence score associated with the data corpus for each of the at least one pre-classified category of words.

Dated this 29th day of November, 2016

Swetha SN
Of K&S Partners
Agent for the Applicant
, Description:TECHNICAL FIELD
This disclosure relates to natural language processing, and more particularly to a system and method for data classification.

Documents