Abstract: This disclosure relates to method and system for annotating tokens for natural language processing (NLP). In one embodiment, the method may include segmenting a plurality of corpus based on each of a plurality of instances, deriving a plurality of entities for each of the plurality of instances based on at least one of a machine learning technique and a deep learning technique, determining a word vector for each of the plurality of entities associated with each of the plurality of instances, and labelling a plurality of tokens for each of the plurality of instances. It should be noted that the plurality of tokens associated with the plurality of entities may be identified based on a frequency of each of the plurality of entities. FIG. 1
Claims:WE CLAIM:
1. A method of annotating tokens for natural language processing, the method comprising:
segmenting, by an automatic annotation device, a plurality of corpus based on each of a plurality of instances;
deriving, by the automatic annotation device, a plurality of entities for each of the plurality of instances based on at least one of a machine learning technique and a deep learning technique;
determining, by the automatic annotation device, a word vector for each of the plurality of entities associated with each of the plurality of instances; and
labelling, by the automatic annotation device, a plurality of tokens for each of the plurality of instances, wherein the plurality of tokens associated with the plurality of entities are identified based on a frequency of each of the plurality of entities.
2. The method of claim 1, further comprising:
receiving a plurality of dataset for the plurality of instances from at least one of a website, a portable document format (PDF), a research paper, a journal, or an article; and
determining the plurality of corpus by clubbing each of the plurality of dataset.
3. The method of claim 1, further comprising:
determining a pattern for each of the plurality of tokens based on an alphabetic embedding; and
identifying a matching tokens of the plurality of tokens based on the pattern, wherein the matching tokens is used to identify at least one of a rhythm, a poetry, or a prose.
4. The method of claim 1, wherein labelling the plurality of tokens further comprises:
measuring an accuracy of each of the plurality of tokens for each of the plurality of instances using a confusion matrix; and
re-labelling the plurality of tokens for each of the plurality of instances when the accuracy is below an accuracy threshold, wherein the accuracy threshold can be scaled up or scaled down based on a mistake performed by a neural network.
5. The method of claim 4, further comprising training the neural network based on the plurality of tokens when the accuracy of each of the plurality of token is above the accuracy threshold.
6. The method of claim 5, further comprising:
determining an accuracy value of each of the plurality of entities based on n-fold cross validation technique; and
re-labelling each of the plurality of tokens when the accuracy value is not in a predefined range.
7. A system for annotating tokens for natural language processing, the system comprising:
an automatic annotation device comprising at least one processor and a computer-readable medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
segmenting a plurality of corpus based on each of a plurality of instances;
deriving a plurality of entities for each of the plurality of instances based on at least one of a machine learning technique and a deep learning technique;
determining a word vector for each of the plurality of entities associated with each of the plurality of instances; and
labelling a plurality of tokens for each of the plurality of instances, wherein the plurality of tokens associated with the plurality of entities are identified based on a frequency of each of the plurality of entities.
8. The system of claim 7, wherein the operations further comprise:
receiving a plurality of dataset for the plurality of instances from at least one of a website, a portable document format (PDF), a research paper, a journal, or an article; and
determining the plurality of corpus by clubbing each of the plurality of dataset.
9. The system of claim 7, wherein the operations further comprise:
determining a pattern for each of the plurality of tokens based on an alphabetic embedding; and
identifying a matching tokens of the plurality of tokens based on the pattern, wherein the matching tokens is used to identify at least one of a rhythm, a poetry, or a prose.
10. The system of claim 7, wherein labelling the plurality of tokens further comprises:
measuring an accuracy of each of the plurality of tokens for each of the plurality of instances using a confusion matrix; and
re-labelling the plurality of tokens for each of the plurality of instances when the accuracy is below an accuracy threshold, wherein the accuracy threshold can be scaled up or scaled down based on a mistake performed by a neural network.
11. The system of claim 10, wherein the operations further comprise training the neural network based on the plurality of tokens when the accuracy of each of the plurality of token is above the accuracy threshold.
12. The system of claim 11, wherein the operations further comprise:
determining an accuracy value of each of the plurality of entities based on n-fold cross validation technique; and
re-labelling each of the plurality of tokens when the accuracy value is not in a predefined range.
Dated 12th day of March, 2019
R Ramya Rao
Of K&S Partners
Agent for the Applicant
IN/PA-1607
, Description:TECHNICAL FIELD
[001] This disclosure relates generally to natural language processing (NLP), and more particularly to method and system for annotating tokens for NLP applications.
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 201941009531-IntimationOfGrant27-06-2024.pdf | 2024-06-27 |
| 1 | 201941009531-STATEMENT OF UNDERTAKING (FORM 3) [12-03-2019(online)].pdf | 2019-03-12 |
| 2 | 201941009531-PatentCertificate27-06-2024.pdf | 2024-06-27 |
| 2 | 201941009531-REQUEST FOR EXAMINATION (FORM-18) [12-03-2019(online)].pdf | 2019-03-12 |
| 3 | 201941009531-POWER OF AUTHORITY [12-03-2019(online)].pdf | 2019-03-12 |
| 3 | 201941009531-FORM 3 [17-11-2023(online)].pdf | 2023-11-17 |
| 4 | 201941009531-Written submissions and relevant documents [17-11-2023(online)].pdf | 2023-11-17 |
| 4 | 201941009531-FORM 18 [12-03-2019(online)].pdf | 2019-03-12 |
| 5 | 201941009531-FORM-26 [02-11-2023(online)].pdf | 2023-11-02 |
| 5 | 201941009531-FORM 1 [12-03-2019(online)].pdf | 2019-03-12 |
| 6 | 201941009531-DRAWINGS [12-03-2019(online)].pdf | 2019-03-12 |
| 6 | 201941009531-AMENDED DOCUMENTS [10-10-2023(online)].pdf | 2023-10-10 |
| 7 | 201941009531-DECLARATION OF INVENTORSHIP (FORM 5) [12-03-2019(online)].pdf | 2019-03-12 |
| 7 | 201941009531-Correspondence to notify the Controller [10-10-2023(online)].pdf | 2023-10-10 |
| 8 | 201941009531-FORM 13 [10-10-2023(online)].pdf | 2023-10-10 |
| 8 | 201941009531-COMPLETE SPECIFICATION [12-03-2019(online)].pdf | 2019-03-12 |
| 9 | 201941009531-POA [10-10-2023(online)].pdf | 2023-10-10 |
| 9 | 201941009531-Request Letter-Correspondence [13-03-2019(online)].pdf | 2019-03-13 |
| 10 | 201941009531-Power of Attorney [13-03-2019(online)].pdf | 2019-03-13 |
| 10 | 201941009531-US(14)-HearingNotice-(HearingDate-02-11-2023).pdf | 2023-10-05 |
| 11 | 201941009531-FER.pdf | 2021-10-17 |
| 11 | 201941009531-Form 1 (Submitted on date of filing) [13-03-2019(online)].pdf | 2019-03-13 |
| 12 | 201941009531-CLAIMS [30-09-2021(online)]-1.pdf | 2021-09-30 |
| 12 | abstract 201941009531.jpg | 2019-03-14 |
| 13 | 201941009531-CLAIMS [30-09-2021(online)].pdf | 2021-09-30 |
| 13 | 201941009531-Proof of Right (MANDATORY) [10-09-2019(online)].pdf | 2019-09-10 |
| 14 | 201941009531-COMPLETE SPECIFICATION [30-09-2021(online)]-1.pdf | 2021-09-30 |
| 14 | Correspondence by Agent_Form 1_16-09-2019.pdf | 2019-09-16 |
| 15 | 201941009531-COMPLETE SPECIFICATION [30-09-2021(online)].pdf | 2021-09-30 |
| 15 | 201941009531-PETITION UNDER RULE 137 [30-09-2021(online)].pdf | 2021-09-30 |
| 16 | 201941009531-DRAWING [30-09-2021(online)]-1.pdf | 2021-09-30 |
| 16 | 201941009531-OTHERS [30-09-2021(online)].pdf | 2021-09-30 |
| 17 | 201941009531-OTHERS [30-09-2021(online)]-1.pdf | 2021-09-30 |
| 17 | 201941009531-DRAWING [30-09-2021(online)].pdf | 2021-09-30 |
| 18 | 201941009531-FER_SER_REPLY [30-09-2021(online)]-1.pdf | 2021-09-30 |
| 18 | 201941009531-FORM 3 [30-09-2021(online)].pdf | 2021-09-30 |
| 19 | 201941009531-FER_SER_REPLY [30-09-2021(online)].pdf | 2021-09-30 |
| 20 | 201941009531-FER_SER_REPLY [30-09-2021(online)]-1.pdf | 2021-09-30 |
| 20 | 201941009531-FORM 3 [30-09-2021(online)].pdf | 2021-09-30 |
| 21 | 201941009531-DRAWING [30-09-2021(online)].pdf | 2021-09-30 |
| 21 | 201941009531-OTHERS [30-09-2021(online)]-1.pdf | 2021-09-30 |
| 22 | 201941009531-DRAWING [30-09-2021(online)]-1.pdf | 2021-09-30 |
| 22 | 201941009531-OTHERS [30-09-2021(online)].pdf | 2021-09-30 |
| 23 | 201941009531-COMPLETE SPECIFICATION [30-09-2021(online)].pdf | 2021-09-30 |
| 23 | 201941009531-PETITION UNDER RULE 137 [30-09-2021(online)].pdf | 2021-09-30 |
| 24 | Correspondence by Agent_Form 1_16-09-2019.pdf | 2019-09-16 |
| 24 | 201941009531-COMPLETE SPECIFICATION [30-09-2021(online)]-1.pdf | 2021-09-30 |
| 25 | 201941009531-Proof of Right (MANDATORY) [10-09-2019(online)].pdf | 2019-09-10 |
| 25 | 201941009531-CLAIMS [30-09-2021(online)].pdf | 2021-09-30 |
| 26 | 201941009531-CLAIMS [30-09-2021(online)]-1.pdf | 2021-09-30 |
| 26 | abstract 201941009531.jpg | 2019-03-14 |
| 27 | 201941009531-FER.pdf | 2021-10-17 |
| 27 | 201941009531-Form 1 (Submitted on date of filing) [13-03-2019(online)].pdf | 2019-03-13 |
| 28 | 201941009531-Power of Attorney [13-03-2019(online)].pdf | 2019-03-13 |
| 28 | 201941009531-US(14)-HearingNotice-(HearingDate-02-11-2023).pdf | 2023-10-05 |
| 29 | 201941009531-POA [10-10-2023(online)].pdf | 2023-10-10 |
| 29 | 201941009531-Request Letter-Correspondence [13-03-2019(online)].pdf | 2019-03-13 |
| 30 | 201941009531-COMPLETE SPECIFICATION [12-03-2019(online)].pdf | 2019-03-12 |
| 30 | 201941009531-FORM 13 [10-10-2023(online)].pdf | 2023-10-10 |
| 31 | 201941009531-DECLARATION OF INVENTORSHIP (FORM 5) [12-03-2019(online)].pdf | 2019-03-12 |
| 31 | 201941009531-Correspondence to notify the Controller [10-10-2023(online)].pdf | 2023-10-10 |
| 32 | 201941009531-DRAWINGS [12-03-2019(online)].pdf | 2019-03-12 |
| 32 | 201941009531-AMENDED DOCUMENTS [10-10-2023(online)].pdf | 2023-10-10 |
| 33 | 201941009531-FORM-26 [02-11-2023(online)].pdf | 2023-11-02 |
| 33 | 201941009531-FORM 1 [12-03-2019(online)].pdf | 2019-03-12 |
| 34 | 201941009531-Written submissions and relevant documents [17-11-2023(online)].pdf | 2023-11-17 |
| 34 | 201941009531-FORM 18 [12-03-2019(online)].pdf | 2019-03-12 |
| 35 | 201941009531-POWER OF AUTHORITY [12-03-2019(online)].pdf | 2019-03-12 |
| 35 | 201941009531-FORM 3 [17-11-2023(online)].pdf | 2023-11-17 |
| 36 | 201941009531-REQUEST FOR EXAMINATION (FORM-18) [12-03-2019(online)].pdf | 2019-03-12 |
| 36 | 201941009531-PatentCertificate27-06-2024.pdf | 2024-06-27 |
| 37 | 201941009531-IntimationOfGrant27-06-2024.pdf | 2024-06-27 |
| 37 | 201941009531-STATEMENT OF UNDERTAKING (FORM 3) [12-03-2019(online)].pdf | 2019-03-12 |
| 1 | 2021-04-0511-42-57E_05-04-2021.pdf |