Method And System For Generating Named Entities

Abstract: This disclosure relates generally to natural language processing, and more particularly to system and method for generating named entities. In one embodiment, a method is provided for generating named entities. The method includes extracting a plurality of named entities in a primary language from a plurality of digital content in the primary language, transliterating each of the plurality of named entities in the primary language to a set of possible named entities in a secondary language, determining a correct named entity in the secondary language from among the set of possible named entities in the secondary language, and generating a named entity in a subsequent secondary language corresponding to the correct named entity in the secondary language. It should be noted that the plurality of named entities in the primary language are named entities in the subsequent secondary language, and the subsequent secondary language is related to the secondary language. Figure 3

Patent Information

Application #

Filing Date

18 May 2017

Publication Number

47/2018

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

bangalore@knspartners.com

Parent Application

Patent Number

Legal Status

Grant Date

2023-12-22

Renewal Date

Applicants

WIPRO LIMITED

Doddakannelli, Sarjapur Road, Bangalore 560035, Karnataka, India.

Inventors

1. BALAJI JAGAN

32 Muthamil Nagar, Chinnalapatti, Dindigul District, Tamil Nadu – 624301, India

2. NAVEEN KUMAR NANJAPPA

59, Lakshmi Nilaya, Sonnappa Layout, Virupakshapura, Kodigehalli, Bengaluru-560097, Karnataka, India.

Specification

Claims:WE CLAIM
1. A method of generating named entities, the method comprising:
extracting, by a named entity generation engine, a plurality of named entities in a primary language from a plurality of digital content in the primary language;
transliterating, by the named entity generation engine, each of the plurality of named entities in the primary language to a set of possible named entities in a secondary language;
determining, by the named entity generation engine, a correct named entity in the secondary language from among the set of possible named entities in the secondary language; and
generating, by the named entity generation engine, a named entity in a subsequent secondary language corresponding to the correct named entity in the secondary language, wherein the plurality of named entities in the primary language are named entities in the subsequent secondary language, and wherein the subsequent secondary language is related to the secondary language.

2. The method of claim 1, wherein the plurality of named entities in the primary language is extracted using a named entity extraction model.

3. The method of claim 2, wherein the named entity extraction model is trained by manually tagging an initial plurality of named entities in the primary language in an initial plurality of digital content in the primary language.

4. The method of claim 3, wherein the initial plurality of digital content in the primary language are curated from across a plurality of genres, and are relevant to a population versant with the subsequent secondary language.

5. The method of claim 1, wherein each of the plurality of named entities comprise at least one of a person, a place, and an organization.

6. The method of claim 1, wherein each of the plurality of named entities in the primary language is transliterated to the set of possible named entities in the secondary language using a plurality of predefined transliteration frameworks, and wherein each of the plurality of predefined transliteration frameworks comprise a plurality of pre-defined mapping tables in Unicode values.

7. The method of claim 6, wherein each of the plurality of predefined transliteration frameworks comprises at least one of a Harvard-Kyoto transliteration framework, an ISO 15919 transliteration framework, and a simplified customized standard transliteration framework, and wherein each of the plurality of pre-defined mapping tables comprises at least one of a vowels mapping table, a constants mapping table, and a matras mapping table.

8. The method of claim 1, wherein transliteration further comprises:
retrieving a sequence of symbols for the primary language for each of the plurality of named entities in the primary language;
retrieving a sequence of symbols for the secondary language corresponding to the sequence of symbols for the primary language; and
combining the sequence of symbols in the secondary language to generate the set of possible named entities in the secondary language.

9. The method of claim 1, wherein the correct named entity in the secondary language is determined from among the set of possible named entities in the secondary language using a long short term memory (LSTM) model based on a confidence score.

10. The method of claim 9, wherein the LSTM model is trained with a large dataset comprising of a plurality of named entities of the secondary language and a plurality of corresponding transliterated named entities in the primary language.

11. The method of claim 1, wherein the named entity in the subsequent secondary language corresponding to the correct named entity in the secondary language is generated using a multi-lingual character level tree model.

12. The method of claim 1, wherein the primary language is English, wherein the secondary language is Hindi, and the subsequent secondary language is one of a plurality of Indian languages.

13. A system for generating named entities, the system comprising:
at least one processor; and
a computer-readable medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
extracting a plurality of named entities in a primary language from a plurality of digital content in the primary language;
transliterating each of the plurality of named entities in the primary language to a set of possible named entities in a secondary language;
determining a correct named entity in the secondary language from among the set of possible named entities in the secondary language; and
generating a named entity in a subsequent secondary language corresponding to the correct named entity in the secondary language, wherein the plurality of named entities in the primary language are named entities in the subsequent secondary language, and wherein the subsequent secondary language is related to the secondary language.

14. The system of claim 13, wherein the plurality of named entities in the primary language is extracted using a named entity extraction model.

15. The system of claim 14, wherein the named entity extraction model is trained by manually tagging an initial plurality of named entities in the primary language in an initial plurality of digital content in the primary language, and wherein the initial plurality of digital content in the primary language are curated from across a plurality of genres and are relevant to a population versant with the subsequent secondary language.

16. The system of claim 13, wherein each of the plurality of named entities in the primary language is transliterated to the set of possible named entities in the secondary language using a plurality of predefined transliteration frameworks, and wherein each of the plurality of predefined transliteration frameworks comprise a plurality of pre-defined mapping tables in Unicode values.

17. The system of claim 16, wherein each of the plurality of predefined transliteration frameworks comprises at least one of a Harvard-Kyoto transliteration framework, an ISO 15919 transliteration framework, and a simplified customized standard transliteration framework, and wherein each of the plurality of pre-defined mapping tables comprises at least one of a vowels mapping table, a constants mapping table, and a matras mapping table.

18. The system of claim 13, wherein transliteration further comprises:
retrieving a sequence of symbols for the primary language for each of the plurality of named entities in the primary language;
retrieving a sequence of symbols for the secondary language corresponding to the sequence of symbols for the primary language; and
combining the sequence of symbols in the secondary language to generate the set of possible named entities in the secondary language.

19. The system of claim 13, wherein the correct named entity in the secondary language is determined from among the set of possible named entities in the secondary language using a long short term memory (LSTM) model based on a confidence score, and wherein the LSTM model is trained with a large dataset comprising of a plurality of named entities of the secondary language and a plurality of corresponding transliterated named entities in the primary language.

20. The system of claim 13, wherein the named entity in the subsequent secondary language corresponding to the correct named entity in the secondary language is generated using a multi-lingual character level tree model.

Dated this 18th day of March 2017

SWETHA SN
OF K&S PARTNERS
AGENT FOR THE APPLICANT
IN/PA-2123
, Description:TECHNICAL FIELD
This disclosure relates generally to natural language processing, and more particularly to method and system for generating named entities.

Documents

Application Documents

#	Name	Date
1	Power of Attorney [18-05-2017(online)].pdf	2017-05-18
2	Form 5 [18-05-2017(online)].pdf	2017-05-18
3	Form 3 [18-05-2017(online)].pdf	2017-05-18
4	Form 18 [18-05-2017(online)].pdf_168.pdf	2017-05-18
5	Form 18 [18-05-2017(online)].pdf	2017-05-18
6	Form 1 [18-05-2017(online)].pdf	2017-05-18
7	Drawing [18-05-2017(online)].pdf	2017-05-18
8	Description(Complete) [18-05-2017(online)].pdf_169.pdf	2017-05-18
9	Description(Complete) [18-05-2017(online)].pdf	2017-05-18
10	REQUEST FOR CERTIFIED COPY [19-05-2017(online)].pdf	2017-05-19
11	PROOF OF RIGHT [13-07-2017(online)].pdf	2017-07-13
12	Correspondence by Agent_Form 1_17-07-2017.pdf	2017-07-17
13	abstract 201741017539.jpg	2017-07-20
14	201741017539-Proof of Right (MANDATORY) [15-09-2017(online)].pdf	2017-09-15
15	Correspondence by Agent_Form30,Form1_19-09-2017.pdf	2017-09-19
16	201741017539-PETITION UNDER RULE 137 [03-05-2021(online)].pdf	2021-05-03
17	201741017539-OTHERS [03-05-2021(online)].pdf	2021-05-03
18	201741017539-FORM 3 [03-05-2021(online)].pdf	2021-05-03
19	201741017539-FER_SER_REPLY [03-05-2021(online)].pdf	2021-05-03
20	201741017539-DRAWING [03-05-2021(online)].pdf	2021-05-03
21	201741017539-COMPLETE SPECIFICATION [03-05-2021(online)].pdf	2021-05-03
22	201741017539-CLAIMS [03-05-2021(online)].pdf	2021-05-03
23	201741017539-FER.pdf	2021-10-17
24	201741017539-PatentCertificate22-12-2023.pdf	2023-12-22
25	201741017539-IntimationOfGrant22-12-2023.pdf	2023-12-22
26	201741017539-PROOF OF ALTERATION [18-03-2024(online)].pdf	2024-03-18

Search Strategy

1	search_strategyE_25-11-2020.pdf