Abstract: The present disclosure discloses method and data management system for managing data of an entity. The data management system receives data associated with an entity from data source. The data comprises a current data and a reference data. A category of current data is predicted to be one of duplicate data and non-duplicate data, with respect to reference data, using a plurality of Supervised Machine Learning (SML) classifiers, where each of the plurality of SML classifiers predicts category of the data individually. The data management system generates a confidence factor of the duplicate data category and the non-duplicate data category based on the prediction of each of the plurality of SML classifiers and thereafter determines current data to be one of, duplicate data and non-duplicate data based on confidence factor to manage data of entity. FIG.1
Claims:WE CLAIM:
1. A method of managing data of an entity, the method comprising:
receiving, by a data management system (101), data associated with an entity from a data source, wherein the data comprises a current data and a reference data;
predicting, by the data management system (101), a category of the current data to be one of, duplicate data and non-duplicate data, with respect to the reference data, using a plurality of Supervised Machine Learning (SML) classifiers, wherein each of the plurality of SML classifiers predicts the category of the data individually;
generating, by the data management system (101), a confidence factor of the duplicate data category and the non-duplicate data category based on the prediction of each of the plurality of SML classifiers; and
determining, by the data management system (101), the current data to be one of, the duplicate data and the non-duplicate data based on the confidence factor to manage the data of the entity.
2. The method as claimed in claim 1 further comprising converting format of the data to a predefined format of the plurality of SML classifiers.
3. The method as claimed in claim 1, wherein the plurality of SML classifiers are trained based on a plurality of master datasets associated to the entity analysed by one or more data experts as duplicate and non-duplicate.
4. The method as claimed in claim 3 further comprising evaluating the plurality of trained SML classifiers based on one or more metrics and data exploration technique.
5. The method as claimed in claim 4, wherein the one or more metrics comprises accuracy metrics, precision metrics, recall metrics and F1-score metric which is a combination of precision and recall metrics.
6. The method as claimed in claim 1, wherein the current data is determined to be duplicate data when the confidence factor of the duplicate data category is greater than the confidence factor of the non-duplicate data category.
7. The method as claimed in claim 1, wherein the current data is determined to be non-duplicate data when the confidence factor of the non-duplicate data category is greater than the confidence factor of the duplicate data category.
8. The method as claimed in claim 1 further comprising facilitating learning for one or more SML classifiers of the plurality of SML classifiers, associated with the category of the data having a minimum confidence factor.
9. The method as claimed in claim 1 further comprising providing instructions to a system based on the determination of the current data to be one of the duplicate data and the non-duplicate data to manage redundant data.
10. A data management system (101) for managing data of an entity, comprising:
a processor (113); and
a memory (111) communicatively coupled to the processor (113), wherein the memory (111) stores processor instructions, which, on execution, causes the processor (113) to:
receive data associated with an entity from a data source, wherein the data comprises a current data and a reference data;
predict a category of the current data to be one of, duplicate data and non-duplicate data, with respect to the reference data, using a plurality of SML classifiers, wherein each of the plurality of SML classifiers predicts the category of the data individually;
generate a confidence factor of the duplicate data category and the non-duplicate data category based on the prediction of each of the plurality of SML classifiers; and
determine the current data to be one of the duplicate data and the non-duplicate data based on the confidence factor to manage the data of the entity.
11. The data management system (101) as claimed in claim 10, wherein the processor (113) converts format of the data to a predefined format of the plurality of SML classifiers.
12. The data management system (101) as claimed in claim 10, wherein the processor (113) trains the plurality of SML classifiers based on a plurality of master datasets associated to the entity, analysed by one or more data experts as duplicate and non-duplicate.
13. The data management system (101) as claimed in claim 12, wherein the processor (113) evaluates the plurality of trained SML classifiers based on at least one of one or more metrics and data exploration technique.
14. The data management system (101) as claimed in claim 13, wherein the one or more metrics comprises accuracy metrics, precision metrics, recall metrics and F1-score metric which is a combination of precision and recall metrics.
15. The data management system (101) as claimed in claim 10, wherein the processor (113) determines the current data to be duplicate data, when the confidence factor of the duplicate data category is greater than the confidence factor of the non-duplicate data category.
16. The data management system (101) as claimed in claim 10, wherein the processor (113) determines the current data to be non-duplicate data when the confidence factor of the non-duplicate data category is greater than the confidence factor of the duplicate data category.
17. The data management system (101) as claimed in claim 10, wherein the processor (113) facilitates learning for one or more SML classifiers of the plurality of SML classifiers, associated with the category of the data having a minimum confidence factor.
18. The data management system (101) as claimed in claim 10, wherein the processor (113) provides instructions to a system based on the determination of the current data to be one of the duplicate data and the non-duplicate data to manage redundant data.
Dated this June 22, 2018
R Ramya Rao
Of K&S Partners
Agent for the Applicant
IN/PA-1607
, Description:TECHNICAL FIELD
The present subject matter is related in general to data management, more particularly, but not exclusively to method and system for managing data of an entity.
| Section | Controller | Decision Date |
|---|---|---|
| 15 and 43 | Rohit Mishra | 2023-07-17 |
| 15 and 43 | Rohit Mishra | 2023-07-21 |
| # | Name | Date |
|---|---|---|
| 1 | 201844023426-STATEMENT OF UNDERTAKING (FORM 3) [22-06-2018(online)].pdf | 2018-06-22 |
| 2 | 201844023426-REQUEST FOR EXAMINATION (FORM-18) [22-06-2018(online)].pdf | 2018-06-22 |
| 3 | 201844023426-POWER OF AUTHORITY [22-06-2018(online)].pdf | 2018-06-22 |
| 4 | 201844023426-FORM 18 [22-06-2018(online)].pdf | 2018-06-22 |
| 5 | 201844023426-FORM 1 [22-06-2018(online)].pdf | 2018-06-22 |
| 6 | 201844023426-DRAWINGS [22-06-2018(online)].pdf | 2018-06-22 |
| 7 | 201844023426-DECLARATION OF INVENTORSHIP (FORM 5) [22-06-2018(online)].pdf | 2018-06-22 |
| 8 | 201844023426-COMPLETE SPECIFICATION [22-06-2018(online)].pdf | 2018-06-22 |
| 9 | 201844023426-Certified Copy of Priority Document (MANDATORY) [11-07-2018(online)].pdf | 2018-07-11 |
| 10 | Correspondence by Agent_Form 1,Form 30_16-07-2018.pdf | 2018-07-16 |
| 11 | abstract 201844023426.jpg | 2018-07-17 |
| 12 | 201844023426-Proof of Right (MANDATORY) [17-08-2018(online)].pdf | 2018-08-17 |
| 13 | Correspondence by Agent_Form1_23-08-2018.pdf | 2018-08-23 |
| 14 | 201844023426-RELEVANT DOCUMENTS [05-04-2021(online)].pdf | 2021-04-05 |
| 15 | 201844023426-PETITION UNDER RULE 137 [05-04-2021(online)].pdf | 2021-04-05 |
| 16 | 201844023426-OTHERS [05-04-2021(online)].pdf | 2021-04-05 |
| 17 | 201844023426-Information under section 8(2) [05-04-2021(online)].pdf | 2021-04-05 |
| 18 | 201844023426-FORM 3 [05-04-2021(online)].pdf | 2021-04-05 |
| 19 | 201844023426-FER_SER_REPLY [05-04-2021(online)].pdf | 2021-04-05 |
| 20 | 201844023426-DRAWING [05-04-2021(online)].pdf | 2021-04-05 |
| 21 | 201844023426-CORRESPONDENCE [05-04-2021(online)].pdf | 2021-04-05 |
| 22 | 201844023426-COMPLETE SPECIFICATION [05-04-2021(online)].pdf | 2021-04-05 |
| 23 | 201844023426-CLAIMS [05-04-2021(online)].pdf | 2021-04-05 |
| 24 | 201844023426-ABSTRACT [05-04-2021(online)].pdf | 2021-04-05 |
| 25 | 201844023426-FER.pdf | 2021-10-17 |
| 26 | 201844023426-US(14)-HearingNotice-(HearingDate-06-06-2023).pdf | 2023-05-09 |
| 27 | 201844023426-POA [17-05-2023(online)].pdf | 2023-05-17 |
| 28 | 201844023426-FORM 13 [17-05-2023(online)].pdf | 2023-05-17 |
| 29 | 201844023426-Correspondence to notify the Controller [17-05-2023(online)].pdf | 2023-05-17 |
| 30 | 201844023426-AMENDED DOCUMENTS [17-05-2023(online)].pdf | 2023-05-17 |
| 31 | 201844023426-Written submissions and relevant documents [23-06-2023(online)].pdf | 2023-06-23 |
| 32 | 201844023426-FORM-26 [23-06-2023(online)].pdf | 2023-06-23 |
| 33 | 201844023426-FORM 3 [23-06-2023(online)].pdf | 2023-06-23 |
| 34 | 201844023426-FORM 13 [17-07-2023(online)].pdf | 2023-07-17 |
| 35 | 201844023426-AMMENDED DOCUMENTS [17-07-2023(online)].pdf | 2023-07-17 |
| 36 | 201844023426-PatentCertificate21-07-2023.pdf | 2023-07-21 |
| 37 | 201844023426-IntimationOfGrant21-07-2023.pdf | 2023-07-21 |
| 38 | 201844023426-FORM 4 [04-07-2024(online)].pdf | 2024-07-04 |
| 1 | 2020-11-0411-44-10E_04-11-2020.pdf |