Sign In to Follow Application
View All Documents & Correspondence

System And Method For Extracting Information From Unstructured Text

Abstract: This disclosure relates generally to natural language processing, and more particularly to a system and method for extracting subject-verb-object (SVO) chunked text from an unstructured text. In one embodiment, a method is provided for extracting SVO chunked text from an unstructured text. The method comprises identifying a plurality of part of speech (PoS) tokens in the unstructured text, and determining a plurality of SVO chunked text directly from the plurality of PoS tokens using a machine learning chunker model. The machine learning chunker model is trained on a subject-verb-object (SVO) annotated training data. Figure 3

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
15 February 2017
Publication Number
33/2018
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
ipo@knspartners.com
Parent Application
Patent Number
Legal Status
Grant Date
2023-12-08
Renewal Date

Applicants

WIPRO LIMITED
Doddakannelli, Sarjapur Road, Bangalore 560035, Karnataka, India.

Inventors

1. SHAUN CYPRIAN D'SOUZA
13 Udayan CHS, Sector 9A, Plot 35 Vashi, Navi Mumbai 400703, Maharashtra, India.

Specification

Claims:WE CLAIM
1. A method for extracting subject-verb-object (SVO) chunked text from an unstructured text, the method comprising:
identifying, by a SVO chunked text extraction engine, a plurality of part of speech (PoS) tokens in the unstructured text; and
determining, by the SVO chunked text extraction engine, a plurality of SVO chunked text directly from the plurality of PoS tokens using a machine learning chunker model, wherein the machine learning chunker model is trained on a subject-verb-object (SVO) annotated training data.

2. The method of claim 1, wherein identifying the plurality of PoS tokens comprises:
extracting a plurality of tokens from the input text, wherein each of the plurality of tokens comprises a word or a phrase; and
determining a PoS tag for each of the plurality of tokens.

3. The method of claim 1, wherein each of the plurality of SVO chunked text is a set of semantically related PoS tokens.

4. The method of claim 1, wherein each of the plurality of SVO chunked text comprises a verb phrase and at least two of a subject phrase, an object phrase, and an objectsubject phrase, and wherein the objectsubject phrase corresponds to an overlapping contiguous chunks that is an object phrase in an initial part of a sentence and a subject phrase in the subsequent part of the sentence.

5. The method of claim 1, wherein the SVO annotated training data comprises a plurality of tokens, a plurality of corresponding PoS tags, and a plurality of corresponding SVO tags, and wherein the plurality of corresponding SVO tags comprises a subject tag, a verb tag, an object tag, and an objectsubject tag.

6. The method of claim 5, wherein the plurality of corresponding SVO tags is in beginning-inside-other (BIO) format.

7. The method of claim 5, wherein the SVO annotated training data is generated based on a plurality of corresponding span information for the plurality of tokens by:
for each of a plurality of PoS tokens in each of a plurality of sets of syntactically related PoS tokens in a sentence,
detecting a span information for a PoS token; and
tagging the PoS token as a subject, a verb, an object, or an objectsubject based on the span information and a pervious tagging of the PoS token.

8. The method of claim 1, wherein the machine learning chunker model is trained on a non-overlapping SVO annotated training data comprising one set of subject, verb, and object in each of the sentences.

9. The method of claim 1, wherein the machine learning chunker model is trained on an overlapping SVO annotated training data comprising one or more sets of subject, verb, object, and objectsubject in each of the sentences.

10. The method of claim 1, wherein the machine learning chunker model determine the plurality of SVO chunked text directly from the plurality of PoS tokens without a set of heuristics or without a set of rules.

11. A system for extracting subject-verb-object (SVO) chunked text from an unstructured text, the system comprising:
at least one processor; and
a computer-readable medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
identifying a plurality of part of speech (PoS) tokens in the unstructured text; and
determining a plurality of SVO chunked text directly from the plurality of PoS tokens using a machine learning chunker model, wherein the machine learning chunker model is trained on a subject-verb-object (SVO) annotated training data.

12. The system of claim 11, wherein each of the plurality of SVO chunked text is a set of semantically related PoS tokens, wherein each of the plurality of SVO chunked text comprises a verb phrase and at least two of a subject phrase, an object phrase, and an objectsubject phrase, and wherein the objectsubject phrase corresponds to an overlapping contiguous chunks that is an object phrase in an initial part of a sentence and a subject phrase in the subsequent part of the sentence.

13. The system of claim 11, wherein the SVO annotated training data comprises a plurality of tokens, a plurality of corresponding PoS tags, and a plurality of corresponding SVO tags, and wherein the plurality of corresponding SVO tags comprises a subject tag, a verb tag, an object tag, and an objectsubject tag.

14. The system of claim 13, wherein the SVO annotated training data is generated based on a plurality of corresponding span information for the plurality of tokens by:
for each of a plurality of PoS tokens in each of a plurality of sets of syntactically related PoS tokens in a sentence,
detecting a span information for a PoS token; and
tagging the PoS token as a subject, a verb, an object, or an objectsubject based on the span information and a pervious tagging of the PoS token.

15. The system of claim 10, wherein the machine learning chunker model is trained on at least one of:
a non-overlapping SVO annotated training data comprising one set of subject, verb, and object in each of the sentences, and
an overlapping SVO annotated training data comprising one or more sets of subject, verb, object, and objectsubject in each of the sentences.

Dated this 15th day of February 2017

R Ramya Rao
Of K&S Partners
Agent for the Applicant
, Description:TECHNICAL FIELD
This disclosure relates generally to natural language processing, and more particularly to system and method for extracting subject-verb-object (SVO) chunked text from an unstructured text.

Documents

Orders

Section Controller Decision Date

Application Documents

# Name Date
1 201741005343-IntimationOfGrant08-12-2023.pdf 2023-12-08
1 Power of Attorney [15-02-2017(online)].pdf 2017-02-15
2 201741005343-PatentCertificate08-12-2023.pdf 2023-12-08
2 Form 5 [15-02-2017(online)].pdf 2017-02-15
3 Form 3 [15-02-2017(online)].pdf 2017-02-15
3 201741005343-FORM 3 [24-11-2023(online)].pdf 2023-11-24
4 Form 18 [15-02-2017(online)].pdf_119.pdf 2017-02-15
4 201741005343-Written submissions and relevant documents [24-11-2023(online)].pdf 2023-11-24
5 Form 18 [15-02-2017(online)].pdf 2017-02-15
5 201741005343-AMENDED DOCUMENTS [30-10-2023(online)].pdf 2023-10-30
6 Drawing [15-02-2017(online)].pdf 2017-02-15
6 201741005343-Correspondence to notify the Controller [30-10-2023(online)].pdf 2023-10-30
7 Description(Complete) [15-02-2017(online)].pdf_118.pdf 2017-02-15
7 201741005343-FORM 13 [30-10-2023(online)].pdf 2023-10-30
8 Description(Complete) [15-02-2017(online)].pdf 2017-02-15
8 201741005343-POA [30-10-2023(online)].pdf 2023-10-30
9 201741005343-US(14)-HearingNotice-(HearingDate-09-11-2023).pdf 2023-10-23
9 REQUEST FOR CERTIFIED COPY [22-02-2017(online)].pdf 2017-02-22
10 201741005343-FER.pdf 2021-10-17
10 PROOF OF RIGHT [29-05-2017(online)].pdf 2017-05-29
11 201741005343-FER_SER_REPLY [11-02-2021(online)].pdf 2021-02-11
11 Correspondence by Agent_Form 1_31-05-2017.pdf 2017-05-31
12 201741005343-FORM 3 [11-02-2021(online)].pdf 2021-02-11
12 Abstract_201741005343.jpg 2017-06-02
13 201741005343-PETITION UNDER RULE 137 [11-02-2021(online)].pdf 2021-02-11
14 201741005343-FORM 3 [11-02-2021(online)].pdf 2021-02-11
14 Abstract_201741005343.jpg 2017-06-02
15 201741005343-FER_SER_REPLY [11-02-2021(online)].pdf 2021-02-11
15 Correspondence by Agent_Form 1_31-05-2017.pdf 2017-05-31
16 201741005343-FER.pdf 2021-10-17
16 PROOF OF RIGHT [29-05-2017(online)].pdf 2017-05-29
17 REQUEST FOR CERTIFIED COPY [22-02-2017(online)].pdf 2017-02-22
17 201741005343-US(14)-HearingNotice-(HearingDate-09-11-2023).pdf 2023-10-23
18 201741005343-POA [30-10-2023(online)].pdf 2023-10-30
18 Description(Complete) [15-02-2017(online)].pdf 2017-02-15
19 Description(Complete) [15-02-2017(online)].pdf_118.pdf 2017-02-15
19 201741005343-FORM 13 [30-10-2023(online)].pdf 2023-10-30
20 Drawing [15-02-2017(online)].pdf 2017-02-15
20 201741005343-Correspondence to notify the Controller [30-10-2023(online)].pdf 2023-10-30
21 Form 18 [15-02-2017(online)].pdf 2017-02-15
21 201741005343-AMENDED DOCUMENTS [30-10-2023(online)].pdf 2023-10-30
22 Form 18 [15-02-2017(online)].pdf_119.pdf 2017-02-15
22 201741005343-Written submissions and relevant documents [24-11-2023(online)].pdf 2023-11-24
23 Form 3 [15-02-2017(online)].pdf 2017-02-15
23 201741005343-FORM 3 [24-11-2023(online)].pdf 2023-11-24
24 Form 5 [15-02-2017(online)].pdf 2017-02-15
24 201741005343-PatentCertificate08-12-2023.pdf 2023-12-08
25 201741005343-IntimationOfGrant08-12-2023.pdf 2023-12-08
25 Power of Attorney [15-02-2017(online)].pdf 2017-02-15

Search Strategy

1 201741005343_searchE_24-08-2020.pdf

ERegister / Renewals

3rd: 07 Mar 2024

From 15/02/2019 - To 15/02/2020

4th: 07 Mar 2024

From 15/02/2020 - To 15/02/2021

5th: 07 Mar 2024

From 15/02/2021 - To 15/02/2022

6th: 07 Mar 2024

From 15/02/2022 - To 15/02/2023

7th: 07 Mar 2024

From 15/02/2023 - To 15/02/2024

8th: 07 Mar 2024

From 15/02/2024 - To 15/02/2025

9th: 15 Feb 2025

From 15/02/2025 - To 15/02/2026