Abstract: This disclosure relates generally to natural language processing, and more particularly to a system and method for extracting subject-verb-object (SVO) chunked text from an unstructured text. In one embodiment, a method is provided for extracting SVO chunked text from an unstructured text. The method comprises identifying a plurality of part of speech (PoS) tokens in the unstructured text, and determining a plurality of SVO chunked text directly from the plurality of PoS tokens using a machine learning chunker model. The machine learning chunker model is trained on a subject-verb-object (SVO) annotated training data. Figure 3
Claims:WE CLAIM
1. A method for extracting subject-verb-object (SVO) chunked text from an unstructured text, the method comprising:
identifying, by a SVO chunked text extraction engine, a plurality of part of speech (PoS) tokens in the unstructured text; and
determining, by the SVO chunked text extraction engine, a plurality of SVO chunked text directly from the plurality of PoS tokens using a machine learning chunker model, wherein the machine learning chunker model is trained on a subject-verb-object (SVO) annotated training data.
2. The method of claim 1, wherein identifying the plurality of PoS tokens comprises:
extracting a plurality of tokens from the input text, wherein each of the plurality of tokens comprises a word or a phrase; and
determining a PoS tag for each of the plurality of tokens.
3. The method of claim 1, wherein each of the plurality of SVO chunked text is a set of semantically related PoS tokens.
4. The method of claim 1, wherein each of the plurality of SVO chunked text comprises a verb phrase and at least two of a subject phrase, an object phrase, and an objectsubject phrase, and wherein the objectsubject phrase corresponds to an overlapping contiguous chunks that is an object phrase in an initial part of a sentence and a subject phrase in the subsequent part of the sentence.
5. The method of claim 1, wherein the SVO annotated training data comprises a plurality of tokens, a plurality of corresponding PoS tags, and a plurality of corresponding SVO tags, and wherein the plurality of corresponding SVO tags comprises a subject tag, a verb tag, an object tag, and an objectsubject tag.
6. The method of claim 5, wherein the plurality of corresponding SVO tags is in beginning-inside-other (BIO) format.
7. The method of claim 5, wherein the SVO annotated training data is generated based on a plurality of corresponding span information for the plurality of tokens by:
for each of a plurality of PoS tokens in each of a plurality of sets of syntactically related PoS tokens in a sentence,
detecting a span information for a PoS token; and
tagging the PoS token as a subject, a verb, an object, or an objectsubject based on the span information and a pervious tagging of the PoS token.
8. The method of claim 1, wherein the machine learning chunker model is trained on a non-overlapping SVO annotated training data comprising one set of subject, verb, and object in each of the sentences.
9. The method of claim 1, wherein the machine learning chunker model is trained on an overlapping SVO annotated training data comprising one or more sets of subject, verb, object, and objectsubject in each of the sentences.
10. The method of claim 1, wherein the machine learning chunker model determine the plurality of SVO chunked text directly from the plurality of PoS tokens without a set of heuristics or without a set of rules.
11. A system for extracting subject-verb-object (SVO) chunked text from an unstructured text, the system comprising:
at least one processor; and
a computer-readable medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
identifying a plurality of part of speech (PoS) tokens in the unstructured text; and
determining a plurality of SVO chunked text directly from the plurality of PoS tokens using a machine learning chunker model, wherein the machine learning chunker model is trained on a subject-verb-object (SVO) annotated training data.
12. The system of claim 11, wherein each of the plurality of SVO chunked text is a set of semantically related PoS tokens, wherein each of the plurality of SVO chunked text comprises a verb phrase and at least two of a subject phrase, an object phrase, and an objectsubject phrase, and wherein the objectsubject phrase corresponds to an overlapping contiguous chunks that is an object phrase in an initial part of a sentence and a subject phrase in the subsequent part of the sentence.
13. The system of claim 11, wherein the SVO annotated training data comprises a plurality of tokens, a plurality of corresponding PoS tags, and a plurality of corresponding SVO tags, and wherein the plurality of corresponding SVO tags comprises a subject tag, a verb tag, an object tag, and an objectsubject tag.
14. The system of claim 13, wherein the SVO annotated training data is generated based on a plurality of corresponding span information for the plurality of tokens by:
for each of a plurality of PoS tokens in each of a plurality of sets of syntactically related PoS tokens in a sentence,
detecting a span information for a PoS token; and
tagging the PoS token as a subject, a verb, an object, or an objectsubject based on the span information and a pervious tagging of the PoS token.
15. The system of claim 10, wherein the machine learning chunker model is trained on at least one of:
a non-overlapping SVO annotated training data comprising one set of subject, verb, and object in each of the sentences, and
an overlapping SVO annotated training data comprising one or more sets of subject, verb, object, and objectsubject in each of the sentences.
Dated this 15th day of February 2017
R Ramya Rao
Of K&S Partners
Agent for the Applicant
, Description:TECHNICAL FIELD
This disclosure relates generally to natural language processing, and more particularly to system and method for extracting subject-verb-object (SVO) chunked text from an unstructured text.
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 201741005343-IntimationOfGrant08-12-2023.pdf | 2023-12-08 |
| 1 | Power of Attorney [15-02-2017(online)].pdf | 2017-02-15 |
| 2 | 201741005343-PatentCertificate08-12-2023.pdf | 2023-12-08 |
| 2 | Form 5 [15-02-2017(online)].pdf | 2017-02-15 |
| 3 | Form 3 [15-02-2017(online)].pdf | 2017-02-15 |
| 3 | 201741005343-FORM 3 [24-11-2023(online)].pdf | 2023-11-24 |
| 4 | Form 18 [15-02-2017(online)].pdf_119.pdf | 2017-02-15 |
| 4 | 201741005343-Written submissions and relevant documents [24-11-2023(online)].pdf | 2023-11-24 |
| 5 | Form 18 [15-02-2017(online)].pdf | 2017-02-15 |
| 5 | 201741005343-AMENDED DOCUMENTS [30-10-2023(online)].pdf | 2023-10-30 |
| 6 | Drawing [15-02-2017(online)].pdf | 2017-02-15 |
| 6 | 201741005343-Correspondence to notify the Controller [30-10-2023(online)].pdf | 2023-10-30 |
| 7 | Description(Complete) [15-02-2017(online)].pdf_118.pdf | 2017-02-15 |
| 7 | 201741005343-FORM 13 [30-10-2023(online)].pdf | 2023-10-30 |
| 8 | Description(Complete) [15-02-2017(online)].pdf | 2017-02-15 |
| 8 | 201741005343-POA [30-10-2023(online)].pdf | 2023-10-30 |
| 9 | 201741005343-US(14)-HearingNotice-(HearingDate-09-11-2023).pdf | 2023-10-23 |
| 9 | REQUEST FOR CERTIFIED COPY [22-02-2017(online)].pdf | 2017-02-22 |
| 10 | 201741005343-FER.pdf | 2021-10-17 |
| 10 | PROOF OF RIGHT [29-05-2017(online)].pdf | 2017-05-29 |
| 11 | 201741005343-FER_SER_REPLY [11-02-2021(online)].pdf | 2021-02-11 |
| 11 | Correspondence by Agent_Form 1_31-05-2017.pdf | 2017-05-31 |
| 12 | 201741005343-FORM 3 [11-02-2021(online)].pdf | 2021-02-11 |
| 12 | Abstract_201741005343.jpg | 2017-06-02 |
| 13 | 201741005343-PETITION UNDER RULE 137 [11-02-2021(online)].pdf | 2021-02-11 |
| 14 | 201741005343-FORM 3 [11-02-2021(online)].pdf | 2021-02-11 |
| 14 | Abstract_201741005343.jpg | 2017-06-02 |
| 15 | 201741005343-FER_SER_REPLY [11-02-2021(online)].pdf | 2021-02-11 |
| 15 | Correspondence by Agent_Form 1_31-05-2017.pdf | 2017-05-31 |
| 16 | 201741005343-FER.pdf | 2021-10-17 |
| 16 | PROOF OF RIGHT [29-05-2017(online)].pdf | 2017-05-29 |
| 17 | REQUEST FOR CERTIFIED COPY [22-02-2017(online)].pdf | 2017-02-22 |
| 17 | 201741005343-US(14)-HearingNotice-(HearingDate-09-11-2023).pdf | 2023-10-23 |
| 18 | 201741005343-POA [30-10-2023(online)].pdf | 2023-10-30 |
| 18 | Description(Complete) [15-02-2017(online)].pdf | 2017-02-15 |
| 19 | Description(Complete) [15-02-2017(online)].pdf_118.pdf | 2017-02-15 |
| 19 | 201741005343-FORM 13 [30-10-2023(online)].pdf | 2023-10-30 |
| 20 | Drawing [15-02-2017(online)].pdf | 2017-02-15 |
| 20 | 201741005343-Correspondence to notify the Controller [30-10-2023(online)].pdf | 2023-10-30 |
| 21 | Form 18 [15-02-2017(online)].pdf | 2017-02-15 |
| 21 | 201741005343-AMENDED DOCUMENTS [30-10-2023(online)].pdf | 2023-10-30 |
| 22 | Form 18 [15-02-2017(online)].pdf_119.pdf | 2017-02-15 |
| 22 | 201741005343-Written submissions and relevant documents [24-11-2023(online)].pdf | 2023-11-24 |
| 23 | Form 3 [15-02-2017(online)].pdf | 2017-02-15 |
| 23 | 201741005343-FORM 3 [24-11-2023(online)].pdf | 2023-11-24 |
| 24 | Form 5 [15-02-2017(online)].pdf | 2017-02-15 |
| 24 | 201741005343-PatentCertificate08-12-2023.pdf | 2023-12-08 |
| 25 | 201741005343-IntimationOfGrant08-12-2023.pdf | 2023-12-08 |
| 25 | Power of Attorney [15-02-2017(online)].pdf | 2017-02-15 |
| 1 | 201741005343_searchE_24-08-2020.pdf |