Sign In to Follow Application
View All Documents & Correspondence

System And Method For Annotation Of Tokens For Natural Language Processing

Abstract: This disclosure relates to method and system for annotating tokens for natural language processing (NLP). In one embodiment, the method may include segmenting a plurality of corpus based on each of a plurality of instances, deriving a plurality of entities for each of the plurality of instances based on at least one of a machine learning technique and a deep learning technique, determining a word vector for each of the plurality of entities associated with each of the plurality of instances, and labelling a plurality of tokens for each of the plurality of instances. It should be noted that the plurality of tokens associated with the plurality of entities may be identified based on a frequency of each of the plurality of entities. FIG. 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
12 March 2019
Publication Number
38/2020
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
bangalore@knspartners.com
Parent Application
Patent Number
Legal Status
Grant Date
2024-06-27
Renewal Date

Applicants

WIPRO LIMITED
Doddakannelli, Sarjapur Road, Bangalore 560035, Karnataka, India.

Inventors

1. RISHAV DAS
33/1, Nandi Bagan Bye Lane, P.O. Salkia, P.S. Golabari, Howrah 711106, West Bengal, India.
2. SOURAV MUDI
Pahalanpur, Burdwan, Madhabdihi 713427, West Bengal, India.

Specification

Claims:WE CLAIM:
1. A method of annotating tokens for natural language processing, the method comprising:
segmenting, by an automatic annotation device, a plurality of corpus based on each of a plurality of instances;
deriving, by the automatic annotation device, a plurality of entities for each of the plurality of instances based on at least one of a machine learning technique and a deep learning technique;
determining, by the automatic annotation device, a word vector for each of the plurality of entities associated with each of the plurality of instances; and
labelling, by the automatic annotation device, a plurality of tokens for each of the plurality of instances, wherein the plurality of tokens associated with the plurality of entities are identified based on a frequency of each of the plurality of entities.

2. The method of claim 1, further comprising:
receiving a plurality of dataset for the plurality of instances from at least one of a website, a portable document format (PDF), a research paper, a journal, or an article; and
determining the plurality of corpus by clubbing each of the plurality of dataset.

3. The method of claim 1, further comprising:
determining a pattern for each of the plurality of tokens based on an alphabetic embedding; and
identifying a matching tokens of the plurality of tokens based on the pattern, wherein the matching tokens is used to identify at least one of a rhythm, a poetry, or a prose.

4. The method of claim 1, wherein labelling the plurality of tokens further comprises:
measuring an accuracy of each of the plurality of tokens for each of the plurality of instances using a confusion matrix; and
re-labelling the plurality of tokens for each of the plurality of instances when the accuracy is below an accuracy threshold, wherein the accuracy threshold can be scaled up or scaled down based on a mistake performed by a neural network.

5. The method of claim 4, further comprising training the neural network based on the plurality of tokens when the accuracy of each of the plurality of token is above the accuracy threshold.

6. The method of claim 5, further comprising:
determining an accuracy value of each of the plurality of entities based on n-fold cross validation technique; and
re-labelling each of the plurality of tokens when the accuracy value is not in a predefined range.

7. A system for annotating tokens for natural language processing, the system comprising:
an automatic annotation device comprising at least one processor and a computer-readable medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
segmenting a plurality of corpus based on each of a plurality of instances;
deriving a plurality of entities for each of the plurality of instances based on at least one of a machine learning technique and a deep learning technique;
determining a word vector for each of the plurality of entities associated with each of the plurality of instances; and
labelling a plurality of tokens for each of the plurality of instances, wherein the plurality of tokens associated with the plurality of entities are identified based on a frequency of each of the plurality of entities.

8. The system of claim 7, wherein the operations further comprise:
receiving a plurality of dataset for the plurality of instances from at least one of a website, a portable document format (PDF), a research paper, a journal, or an article; and
determining the plurality of corpus by clubbing each of the plurality of dataset.

9. The system of claim 7, wherein the operations further comprise:
determining a pattern for each of the plurality of tokens based on an alphabetic embedding; and
identifying a matching tokens of the plurality of tokens based on the pattern, wherein the matching tokens is used to identify at least one of a rhythm, a poetry, or a prose.

10. The system of claim 7, wherein labelling the plurality of tokens further comprises:
measuring an accuracy of each of the plurality of tokens for each of the plurality of instances using a confusion matrix; and
re-labelling the plurality of tokens for each of the plurality of instances when the accuracy is below an accuracy threshold, wherein the accuracy threshold can be scaled up or scaled down based on a mistake performed by a neural network.

11. The system of claim 10, wherein the operations further comprise training the neural network based on the plurality of tokens when the accuracy of each of the plurality of token is above the accuracy threshold.

12. The system of claim 11, wherein the operations further comprise:
determining an accuracy value of each of the plurality of entities based on n-fold cross validation technique; and
re-labelling each of the plurality of tokens when the accuracy value is not in a predefined range.

Dated 12th day of March, 2019

R Ramya Rao
Of K&S Partners
Agent for the Applicant
IN/PA-1607
, Description:TECHNICAL FIELD
[001] This disclosure relates generally to natural language processing (NLP), and more particularly to method and system for annotating tokens for NLP applications.

Documents

Orders

Section Controller Decision Date

Application Documents

# Name Date
1 201941009531-IntimationOfGrant27-06-2024.pdf 2024-06-27
1 201941009531-STATEMENT OF UNDERTAKING (FORM 3) [12-03-2019(online)].pdf 2019-03-12
2 201941009531-PatentCertificate27-06-2024.pdf 2024-06-27
2 201941009531-REQUEST FOR EXAMINATION (FORM-18) [12-03-2019(online)].pdf 2019-03-12
3 201941009531-POWER OF AUTHORITY [12-03-2019(online)].pdf 2019-03-12
3 201941009531-FORM 3 [17-11-2023(online)].pdf 2023-11-17
4 201941009531-Written submissions and relevant documents [17-11-2023(online)].pdf 2023-11-17
4 201941009531-FORM 18 [12-03-2019(online)].pdf 2019-03-12
5 201941009531-FORM-26 [02-11-2023(online)].pdf 2023-11-02
5 201941009531-FORM 1 [12-03-2019(online)].pdf 2019-03-12
6 201941009531-DRAWINGS [12-03-2019(online)].pdf 2019-03-12
6 201941009531-AMENDED DOCUMENTS [10-10-2023(online)].pdf 2023-10-10
7 201941009531-DECLARATION OF INVENTORSHIP (FORM 5) [12-03-2019(online)].pdf 2019-03-12
7 201941009531-Correspondence to notify the Controller [10-10-2023(online)].pdf 2023-10-10
8 201941009531-FORM 13 [10-10-2023(online)].pdf 2023-10-10
8 201941009531-COMPLETE SPECIFICATION [12-03-2019(online)].pdf 2019-03-12
9 201941009531-POA [10-10-2023(online)].pdf 2023-10-10
9 201941009531-Request Letter-Correspondence [13-03-2019(online)].pdf 2019-03-13
10 201941009531-Power of Attorney [13-03-2019(online)].pdf 2019-03-13
10 201941009531-US(14)-HearingNotice-(HearingDate-02-11-2023).pdf 2023-10-05
11 201941009531-FER.pdf 2021-10-17
11 201941009531-Form 1 (Submitted on date of filing) [13-03-2019(online)].pdf 2019-03-13
12 201941009531-CLAIMS [30-09-2021(online)]-1.pdf 2021-09-30
12 abstract 201941009531.jpg 2019-03-14
13 201941009531-CLAIMS [30-09-2021(online)].pdf 2021-09-30
13 201941009531-Proof of Right (MANDATORY) [10-09-2019(online)].pdf 2019-09-10
14 201941009531-COMPLETE SPECIFICATION [30-09-2021(online)]-1.pdf 2021-09-30
14 Correspondence by Agent_Form 1_16-09-2019.pdf 2019-09-16
15 201941009531-COMPLETE SPECIFICATION [30-09-2021(online)].pdf 2021-09-30
15 201941009531-PETITION UNDER RULE 137 [30-09-2021(online)].pdf 2021-09-30
16 201941009531-DRAWING [30-09-2021(online)]-1.pdf 2021-09-30
16 201941009531-OTHERS [30-09-2021(online)].pdf 2021-09-30
17 201941009531-OTHERS [30-09-2021(online)]-1.pdf 2021-09-30
17 201941009531-DRAWING [30-09-2021(online)].pdf 2021-09-30
18 201941009531-FER_SER_REPLY [30-09-2021(online)]-1.pdf 2021-09-30
18 201941009531-FORM 3 [30-09-2021(online)].pdf 2021-09-30
19 201941009531-FER_SER_REPLY [30-09-2021(online)].pdf 2021-09-30
20 201941009531-FER_SER_REPLY [30-09-2021(online)]-1.pdf 2021-09-30
20 201941009531-FORM 3 [30-09-2021(online)].pdf 2021-09-30
21 201941009531-DRAWING [30-09-2021(online)].pdf 2021-09-30
21 201941009531-OTHERS [30-09-2021(online)]-1.pdf 2021-09-30
22 201941009531-DRAWING [30-09-2021(online)]-1.pdf 2021-09-30
22 201941009531-OTHERS [30-09-2021(online)].pdf 2021-09-30
23 201941009531-COMPLETE SPECIFICATION [30-09-2021(online)].pdf 2021-09-30
23 201941009531-PETITION UNDER RULE 137 [30-09-2021(online)].pdf 2021-09-30
24 Correspondence by Agent_Form 1_16-09-2019.pdf 2019-09-16
24 201941009531-COMPLETE SPECIFICATION [30-09-2021(online)]-1.pdf 2021-09-30
25 201941009531-Proof of Right (MANDATORY) [10-09-2019(online)].pdf 2019-09-10
25 201941009531-CLAIMS [30-09-2021(online)].pdf 2021-09-30
26 201941009531-CLAIMS [30-09-2021(online)]-1.pdf 2021-09-30
26 abstract 201941009531.jpg 2019-03-14
27 201941009531-FER.pdf 2021-10-17
27 201941009531-Form 1 (Submitted on date of filing) [13-03-2019(online)].pdf 2019-03-13
28 201941009531-Power of Attorney [13-03-2019(online)].pdf 2019-03-13
28 201941009531-US(14)-HearingNotice-(HearingDate-02-11-2023).pdf 2023-10-05
29 201941009531-POA [10-10-2023(online)].pdf 2023-10-10
29 201941009531-Request Letter-Correspondence [13-03-2019(online)].pdf 2019-03-13
30 201941009531-COMPLETE SPECIFICATION [12-03-2019(online)].pdf 2019-03-12
30 201941009531-FORM 13 [10-10-2023(online)].pdf 2023-10-10
31 201941009531-DECLARATION OF INVENTORSHIP (FORM 5) [12-03-2019(online)].pdf 2019-03-12
31 201941009531-Correspondence to notify the Controller [10-10-2023(online)].pdf 2023-10-10
32 201941009531-DRAWINGS [12-03-2019(online)].pdf 2019-03-12
32 201941009531-AMENDED DOCUMENTS [10-10-2023(online)].pdf 2023-10-10
33 201941009531-FORM-26 [02-11-2023(online)].pdf 2023-11-02
33 201941009531-FORM 1 [12-03-2019(online)].pdf 2019-03-12
34 201941009531-Written submissions and relevant documents [17-11-2023(online)].pdf 2023-11-17
34 201941009531-FORM 18 [12-03-2019(online)].pdf 2019-03-12
35 201941009531-POWER OF AUTHORITY [12-03-2019(online)].pdf 2019-03-12
35 201941009531-FORM 3 [17-11-2023(online)].pdf 2023-11-17
36 201941009531-REQUEST FOR EXAMINATION (FORM-18) [12-03-2019(online)].pdf 2019-03-12
36 201941009531-PatentCertificate27-06-2024.pdf 2024-06-27
37 201941009531-IntimationOfGrant27-06-2024.pdf 2024-06-27
37 201941009531-STATEMENT OF UNDERTAKING (FORM 3) [12-03-2019(online)].pdf 2019-03-12

Search Strategy

1 2021-04-0511-42-57E_05-04-2021.pdf

ERegister / Renewals

3rd: 26 Sep 2024

From 12/03/2021 - To 12/03/2022

4th: 26 Sep 2024

From 12/03/2022 - To 12/03/2023

5th: 26 Sep 2024

From 12/03/2023 - To 12/03/2024

6th: 26 Sep 2024

From 12/03/2024 - To 12/03/2025

7th: 07 Mar 2025

From 12/03/2025 - To 12/03/2026