Abstract: A method and device for automatic data correction using context and semantic aware learning techniques is disclosed. The method includes extracting data within a document as machine readable text in a predefined format. The method further includes encoding each word of each line in the machine readable text to a multi-dimension word vector. The method includes generating a context word vector for each word in each line based on multi-dimension vectors associated with words succeeding and preceding the word in a line comprising the word. The method further includes decoding the context word vector associated with each word in each line to generate a corrected context word vector for each word. The method includes validating the corrected context word vector associated with each word in each line. Fig. 3
Claims:WE CLAIM
1. A method for automatic data correction, the method comprising:
extracting, by a data extraction device, data within a document as machine readable text in a predefined format;
encoding, by an encoder in the data extraction device, each word of each line in the machine readable text to a multi-dimension word vector;
generating, by the data extraction device, a context word vector for each word in each line based on multi-dimension vectors associated with words succeeding and preceding the word in a line comprising the word;
decoding, by a decoder in the data extraction device, the context word vector associated with each word in each line to generate a corrected context word vector for each word; and
validating, by the data extraction device, the corrected context word vector associated with each word in each line.
2. The method of claim 1 further comprising receiving a document, by the data extraction device, wherein the document comprises at least one content, and wherein the at least content comprises at least one error.
3. The method of claim 1, wherein the encoding comprises:
receiving the machine readable text as a plurality of lines;
splitting each of the plurality of lines into a plurality of words; and
converting each of the plurality of words in each of the plurality of lines to a multi-dimension word vector.
4. The method of claim 1, wherein each unrecognized word in the machine readable text is assigned a zero-word vector.
5. The method of claim 1, wherein the validating comprises enhancing features of a context word vector to a larger dimension.
6. The method of claim 1, wherein the validating comprises computing mean value of context word vectors associated with words of a line within the machine readable text.
7. The method of claim 6 further comprising comparing the mean value of the context word vector associated with words of the line with a label or ground truth to determine whether the line is correct or incorrect.
8. The method of claim 7 further comprising providing an output of the decoder to the encoder, when the line is determined to be incorrect, wherein the output of the decoder comprises corrected context word vectors associated with words in the machine readable text.
9. The method of claim 8 further comprising iteratively providing an output of the decoder to the encoder, till the line is determined to be correct.
10. The method of claim 1, wherein decoding comprises comparing a context word vector associated with a word in the machine readable text with a training data set to generate a corrected context word vector for the word.
11. A data extraction device for automatic data correction, the data extraction device comprises:
a processor; and
a memory communicatively coupled to the processor, wherein the memory stores processor instructions, which, on execution, causes the processor to:
extract data within a document as machine readable text in a predefined format;
encode each word of each line in the machine readable text to a multi-dimension word vector;
generate a context word vector for each word in each line based on multi-dimension vectors associated with words succeeding and preceding the word in a line comprising the word;
decode the context word vector associated with each word in each line to generate a corrected context word vector for each word; and
validate the corrected context word vector associated with each word in each line.
12. The data extraction device of claim 11, wherein the processor instructions further cause the processor to receive a document, by the data extraction device, wherein the document comprises at least one content, and wherein the at least content comprises at least one error.
13. The data extraction device of claim 11, wherein to encode, the processor instructions further cause the processor to:
receive the machine readable text as a plurality of lines;
split each of the plurality of lines into a plurality of words; and
convert each of the plurality of words in each of the plurality of lines to a multi-dimension word vector.
14. The data extraction device of claim 11, wherein to validate, the processor instructions further cause the processor to enhance features of a context word vector to a larger dimension.
15. The data extraction device of claim 11, wherein to validate, the processor instructions further cause the processor to compute mean value of context word vectors associated with words of a line within the machine readable text.
16. The data extraction device of claim 15, wherein the processor instructions further cause the processor to compare the mean value of the context word vector associated with words of the line with a label or ground truth to determine whether the line is correct or incorrect.
17. The data extraction device of claim 16, wherein the processor instructions further cause the processor to provide an output of a decoder to an encoder, when the line is determined to be incorrect, wherein the output of the decoder comprises corrected context word vectors associated with words in the machine readable text.
18. The data extraction device of claim 17, wherein the processor instructions further cause the processor to iteratively provide an output of the decoder to the encoder, till the line is determined to be correct.
19. The data extraction device of claim 11, wherein to decode, the processor instructions further cause the processor to compare a context word vector associated with a word in the machine readable text with a training data set to generate a corrected context word vector for the word.
Dated this 10th day of January 2018
SWETHA S. N
OF K&S PARTNERS
ATTORNEY FOR THE APPLICANT
, Description:TECHNICAL FIELD
This disclosure relates generally to data extraction and correction and more particularly to method and device for automatic data correction using context and semantic aware learning techniques.
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 201841005108-IntimationOfGrant27-06-2024.pdf | 2024-06-27 |
| 1 | 201841005108-STATEMENT OF UNDERTAKING (FORM 3) [10-02-2018(online)].pdf | 2018-02-10 |
| 2 | 201841005108-PatentCertificate27-06-2024.pdf | 2024-06-27 |
| 2 | 201841005108-REQUEST FOR EXAMINATION (FORM-18) [10-02-2018(online)].pdf | 2018-02-10 |
| 3 | 201841005108-POWER OF AUTHORITY [10-02-2018(online)].pdf | 2018-02-10 |
| 3 | 201841005108-FORM-26 [24-04-2024(online)].pdf | 2024-04-24 |
| 4 | 201841005108-Written submissions and relevant documents [24-04-2024(online)].pdf | 2024-04-24 |
| 4 | 201841005108-FORM 18 [10-02-2018(online)].pdf | 2018-02-10 |
| 5 | 201841005108-FORM 1 [10-02-2018(online)].pdf | 2018-02-10 |
| 5 | 201841005108-AMENDED DOCUMENTS [22-03-2024(online)].pdf | 2024-03-22 |
| 6 | 201841005108-DRAWINGS [10-02-2018(online)].pdf | 2018-02-10 |
| 6 | 201841005108-Correspondence to notify the Controller [22-03-2024(online)].pdf | 2024-03-22 |
| 7 | 201841005108-FORM 13 [22-03-2024(online)].pdf | 2024-03-22 |
| 7 | 201841005108-DECLARATION OF INVENTORSHIP (FORM 5) [10-02-2018(online)].pdf | 2018-02-10 |
| 8 | 201841005108-POA [22-03-2024(online)].pdf | 2024-03-22 |
| 8 | 201841005108-COMPLETE SPECIFICATION [10-02-2018(online)].pdf | 2018-02-10 |
| 9 | 201841005108-REQUEST FOR CERTIFIED COPY [01-03-2018(online)].pdf | 2018-03-01 |
| 9 | 201841005108-US(14)-HearingNotice-(HearingDate-09-04-2024).pdf | 2024-03-19 |
| 10 | 201841005108-FER.pdf | 2021-10-17 |
| 10 | 201841005108-Proof of Right (MANDATORY) [25-04-2018(online)].pdf | 2018-04-25 |
| 11 | 201841005108-CLAIMS [24-09-2021(online)].pdf | 2021-09-24 |
| 11 | Correspondence by Agent_Form1_01-05-2018.pdf | 2018-05-01 |
| 12 | 201841005108-DRAWING [24-09-2021(online)].pdf | 2021-09-24 |
| 12 | 201841005108-PETITION UNDER RULE 137 [24-09-2021(online)].pdf | 2021-09-24 |
| 13 | 201841005108-FER_SER_REPLY [24-09-2021(online)].pdf | 2021-09-24 |
| 13 | 201841005108-OTHERS [24-09-2021(online)].pdf | 2021-09-24 |
| 14 | 201841005108-FORM 3 [24-09-2021(online)].pdf | 2021-09-24 |
| 14 | 201841005108-Information under section 8(2) [24-09-2021(online)].pdf | 2021-09-24 |
| 15 | 201841005108-FORM 3 [24-09-2021(online)].pdf | 2021-09-24 |
| 15 | 201841005108-Information under section 8(2) [24-09-2021(online)].pdf | 2021-09-24 |
| 16 | 201841005108-FER_SER_REPLY [24-09-2021(online)].pdf | 2021-09-24 |
| 16 | 201841005108-OTHERS [24-09-2021(online)].pdf | 2021-09-24 |
| 17 | 201841005108-PETITION UNDER RULE 137 [24-09-2021(online)].pdf | 2021-09-24 |
| 17 | 201841005108-DRAWING [24-09-2021(online)].pdf | 2021-09-24 |
| 18 | 201841005108-CLAIMS [24-09-2021(online)].pdf | 2021-09-24 |
| 18 | Correspondence by Agent_Form1_01-05-2018.pdf | 2018-05-01 |
| 19 | 201841005108-FER.pdf | 2021-10-17 |
| 19 | 201841005108-Proof of Right (MANDATORY) [25-04-2018(online)].pdf | 2018-04-25 |
| 20 | 201841005108-REQUEST FOR CERTIFIED COPY [01-03-2018(online)].pdf | 2018-03-01 |
| 20 | 201841005108-US(14)-HearingNotice-(HearingDate-09-04-2024).pdf | 2024-03-19 |
| 21 | 201841005108-COMPLETE SPECIFICATION [10-02-2018(online)].pdf | 2018-02-10 |
| 21 | 201841005108-POA [22-03-2024(online)].pdf | 2024-03-22 |
| 22 | 201841005108-DECLARATION OF INVENTORSHIP (FORM 5) [10-02-2018(online)].pdf | 2018-02-10 |
| 22 | 201841005108-FORM 13 [22-03-2024(online)].pdf | 2024-03-22 |
| 23 | 201841005108-Correspondence to notify the Controller [22-03-2024(online)].pdf | 2024-03-22 |
| 23 | 201841005108-DRAWINGS [10-02-2018(online)].pdf | 2018-02-10 |
| 24 | 201841005108-AMENDED DOCUMENTS [22-03-2024(online)].pdf | 2024-03-22 |
| 24 | 201841005108-FORM 1 [10-02-2018(online)].pdf | 2018-02-10 |
| 25 | 201841005108-Written submissions and relevant documents [24-04-2024(online)].pdf | 2024-04-24 |
| 25 | 201841005108-FORM 18 [10-02-2018(online)].pdf | 2018-02-10 |
| 26 | 201841005108-POWER OF AUTHORITY [10-02-2018(online)].pdf | 2018-02-10 |
| 26 | 201841005108-FORM-26 [24-04-2024(online)].pdf | 2024-04-24 |
| 27 | 201841005108-REQUEST FOR EXAMINATION (FORM-18) [10-02-2018(online)].pdf | 2018-02-10 |
| 27 | 201841005108-PatentCertificate27-06-2024.pdf | 2024-06-27 |
| 28 | 201841005108-STATEMENT OF UNDERTAKING (FORM 3) [10-02-2018(online)].pdf | 2018-02-10 |
| 28 | 201841005108-IntimationOfGrant27-06-2024.pdf | 2024-06-27 |
| 1 | 2021-03-2314-53-22E_24-03-2021.pdf |