Abstract: This disclosure relates generally to natural language processing, and more particularly to system and method for generating named entities. In one embodiment, a method is provided for generating named entities. The method includes extracting a plurality of named entities in a primary language from a plurality of digital content in the primary language, transliterating each of the plurality of named entities in the primary language to a set of possible named entities in a secondary language, determining a correct named entity in the secondary language from among the set of possible named entities in the secondary language, and generating a named entity in a subsequent secondary language corresponding to the correct named entity in the secondary language. It should be noted that the plurality of named entities in the primary language are named entities in the subsequent secondary language, and the subsequent secondary language is related to the secondary language. Figure 3
Claims:WE CLAIM
1. A method of generating named entities, the method comprising:
extracting, by a named entity generation engine, a plurality of named entities in a primary language from a plurality of digital content in the primary language;
transliterating, by the named entity generation engine, each of the plurality of named entities in the primary language to a set of possible named entities in a secondary language;
determining, by the named entity generation engine, a correct named entity in the secondary language from among the set of possible named entities in the secondary language; and
generating, by the named entity generation engine, a named entity in a subsequent secondary language corresponding to the correct named entity in the secondary language, wherein the plurality of named entities in the primary language are named entities in the subsequent secondary language, and wherein the subsequent secondary language is related to the secondary language.
2. The method of claim 1, wherein the plurality of named entities in the primary language is extracted using a named entity extraction model.
3. The method of claim 2, wherein the named entity extraction model is trained by manually tagging an initial plurality of named entities in the primary language in an initial plurality of digital content in the primary language.
4. The method of claim 3, wherein the initial plurality of digital content in the primary language are curated from across a plurality of genres, and are relevant to a population versant with the subsequent secondary language.
5. The method of claim 1, wherein each of the plurality of named entities comprise at least one of a person, a place, and an organization.
6. The method of claim 1, wherein each of the plurality of named entities in the primary language is transliterated to the set of possible named entities in the secondary language using a plurality of predefined transliteration frameworks, and wherein each of the plurality of predefined transliteration frameworks comprise a plurality of pre-defined mapping tables in Unicode values.
7. The method of claim 6, wherein each of the plurality of predefined transliteration frameworks comprises at least one of a Harvard-Kyoto transliteration framework, an ISO 15919 transliteration framework, and a simplified customized standard transliteration framework, and wherein each of the plurality of pre-defined mapping tables comprises at least one of a vowels mapping table, a constants mapping table, and a matras mapping table.
8. The method of claim 1, wherein transliteration further comprises:
retrieving a sequence of symbols for the primary language for each of the plurality of named entities in the primary language;
retrieving a sequence of symbols for the secondary language corresponding to the sequence of symbols for the primary language; and
combining the sequence of symbols in the secondary language to generate the set of possible named entities in the secondary language.
9. The method of claim 1, wherein the correct named entity in the secondary language is determined from among the set of possible named entities in the secondary language using a long short term memory (LSTM) model based on a confidence score.
10. The method of claim 9, wherein the LSTM model is trained with a large dataset comprising of a plurality of named entities of the secondary language and a plurality of corresponding transliterated named entities in the primary language.
11. The method of claim 1, wherein the named entity in the subsequent secondary language corresponding to the correct named entity in the secondary language is generated using a multi-lingual character level tree model.
12. The method of claim 1, wherein the primary language is English, wherein the secondary language is Hindi, and the subsequent secondary language is one of a plurality of Indian languages.
13. A system for generating named entities, the system comprising:
at least one processor; and
a computer-readable medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
extracting a plurality of named entities in a primary language from a plurality of digital content in the primary language;
transliterating each of the plurality of named entities in the primary language to a set of possible named entities in a secondary language;
determining a correct named entity in the secondary language from among the set of possible named entities in the secondary language; and
generating a named entity in a subsequent secondary language corresponding to the correct named entity in the secondary language, wherein the plurality of named entities in the primary language are named entities in the subsequent secondary language, and wherein the subsequent secondary language is related to the secondary language.
14. The system of claim 13, wherein the plurality of named entities in the primary language is extracted using a named entity extraction model.
15. The system of claim 14, wherein the named entity extraction model is trained by manually tagging an initial plurality of named entities in the primary language in an initial plurality of digital content in the primary language, and wherein the initial plurality of digital content in the primary language are curated from across a plurality of genres and are relevant to a population versant with the subsequent secondary language.
16. The system of claim 13, wherein each of the plurality of named entities in the primary language is transliterated to the set of possible named entities in the secondary language using a plurality of predefined transliteration frameworks, and wherein each of the plurality of predefined transliteration frameworks comprise a plurality of pre-defined mapping tables in Unicode values.
17. The system of claim 16, wherein each of the plurality of predefined transliteration frameworks comprises at least one of a Harvard-Kyoto transliteration framework, an ISO 15919 transliteration framework, and a simplified customized standard transliteration framework, and wherein each of the plurality of pre-defined mapping tables comprises at least one of a vowels mapping table, a constants mapping table, and a matras mapping table.
18. The system of claim 13, wherein transliteration further comprises:
retrieving a sequence of symbols for the primary language for each of the plurality of named entities in the primary language;
retrieving a sequence of symbols for the secondary language corresponding to the sequence of symbols for the primary language; and
combining the sequence of symbols in the secondary language to generate the set of possible named entities in the secondary language.
19. The system of claim 13, wherein the correct named entity in the secondary language is determined from among the set of possible named entities in the secondary language using a long short term memory (LSTM) model based on a confidence score, and wherein the LSTM model is trained with a large dataset comprising of a plurality of named entities of the secondary language and a plurality of corresponding transliterated named entities in the primary language.
20. The system of claim 13, wherein the named entity in the subsequent secondary language corresponding to the correct named entity in the secondary language is generated using a multi-lingual character level tree model.
Dated this 18th day of March 2017
SWETHA SN
OF K&S PARTNERS
AGENT FOR THE APPLICANT
IN/PA-2123
, Description:TECHNICAL FIELD
This disclosure relates generally to natural language processing, and more particularly to method and system for generating named entities.
| # | Name | Date |
|---|---|---|
| 1 | Power of Attorney [18-05-2017(online)].pdf | 2017-05-18 |
| 2 | Form 5 [18-05-2017(online)].pdf | 2017-05-18 |
| 3 | Form 3 [18-05-2017(online)].pdf | 2017-05-18 |
| 4 | Form 18 [18-05-2017(online)].pdf_168.pdf | 2017-05-18 |
| 5 | Form 18 [18-05-2017(online)].pdf | 2017-05-18 |
| 6 | Form 1 [18-05-2017(online)].pdf | 2017-05-18 |
| 7 | Drawing [18-05-2017(online)].pdf | 2017-05-18 |
| 8 | Description(Complete) [18-05-2017(online)].pdf_169.pdf | 2017-05-18 |
| 9 | Description(Complete) [18-05-2017(online)].pdf | 2017-05-18 |
| 10 | REQUEST FOR CERTIFIED COPY [19-05-2017(online)].pdf | 2017-05-19 |
| 11 | PROOF OF RIGHT [13-07-2017(online)].pdf | 2017-07-13 |
| 12 | Correspondence by Agent_Form 1_17-07-2017.pdf | 2017-07-17 |
| 13 | abstract 201741017539.jpg | 2017-07-20 |
| 14 | 201741017539-Proof of Right (MANDATORY) [15-09-2017(online)].pdf | 2017-09-15 |
| 15 | Correspondence by Agent_Form30,Form1_19-09-2017.pdf | 2017-09-19 |
| 16 | 201741017539-PETITION UNDER RULE 137 [03-05-2021(online)].pdf | 2021-05-03 |
| 17 | 201741017539-OTHERS [03-05-2021(online)].pdf | 2021-05-03 |
| 18 | 201741017539-FORM 3 [03-05-2021(online)].pdf | 2021-05-03 |
| 19 | 201741017539-FER_SER_REPLY [03-05-2021(online)].pdf | 2021-05-03 |
| 20 | 201741017539-DRAWING [03-05-2021(online)].pdf | 2021-05-03 |
| 21 | 201741017539-COMPLETE SPECIFICATION [03-05-2021(online)].pdf | 2021-05-03 |
| 22 | 201741017539-CLAIMS [03-05-2021(online)].pdf | 2021-05-03 |
| 23 | 201741017539-FER.pdf | 2021-10-17 |
| 24 | 201741017539-PatentCertificate22-12-2023.pdf | 2023-12-22 |
| 25 | 201741017539-IntimationOfGrant22-12-2023.pdf | 2023-12-22 |
| 26 | 201741017539-PROOF OF ALTERATION [18-03-2024(online)].pdf | 2024-03-18 |
| 1 | search_strategyE_25-11-2020.pdf |