System And Method For Providing Transliteration By Leveraging Arpabet

System And Method For Providing Transliteration By Leveraging Arpabet Phonemes

Abstract: The present subject matter discloses a system and a method for providing transliteration between plurality of languages by leveraging Arpabet phonemes. For transliterating, highly phonetic scripts like Devanagiri and other Indic language scripts may be used as an intermediate representation for the languages. Further, a map may be generated between the intermediate representation and plurality of languages to be transliterated. With the maps generated, an arpabet characters may be replaced with a mapped grapheme. Based on the map, transliterating scheme is created for each of the plurality of languages. After creation of the transliterating schemes, merging criteria may be created which are further useful for creating meaningful grapheme phonemes. Thus, after creating the map and the merging criteria, the system may transliterate characters between several languages pair of the number of languages using the transliteration schemes associated with the languages.

Patent Information

Application #

Filing Date

06 January 2014

Publication Number

34/2015

Publication Type

INA

Invention Field

COMMUNICATION

Status

Email

ip@legasis.in

Parent Application

Patent Number

Legal Status

Grant Date

2021-08-23

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point, Mumbai 400021, Maharashtra, India

Inventors

1. KARANDE, Shirish Subhash

Tata Consultancy Services Limited, TRDDC, 54/B Hadapsar Industrial Estate, Hadapsar, Pune 411013, Maharashtra, India

2. SRINIVASAN, Iyengar Venkatachary

Tata Consultancy Services Limited, TRDDC, 54/B Hadapsar Industrial Estate, Hadapsar, Pune 411013, Maharashtra, India

3. LODHA, Sachin P.

Tata Consultancy Services Limited, TRDDC, 54/B Hadapsar Industrial Estate, Hadapsar, Pune 411013, Maharashtra, India

Claims

1. A method for facilitating transliteration between a plurality of languages, the method comprising: receiving, by a processor, content in a plurality of languages; dividing, by the processor, the content into one or more tokens based upon predefined parameters, wherein the one or more tokens belongs to each of the plurality of languages, and wherein each token indicates a word in the content; determining, by the processor, an intermediate representation, for the one or more tokens, corresponding to each of the plurality of languages, wherein the intermediate representation is in an intermediate language having a phonetic characteristic; generating, by the processor, a map between the one or more tokens of the content in each of the plurality of languages and the intermediate representation, wherein the map indicates a transliteration scheme corresponding to each language; receiving, by the processor, a request for transliteration of the content in a source language into a target language, wherein the source language and the target language belongs to the plurality of languages; and transliterating, by the processor, the content of the source language into the target language using the transliteration schemes associated with the source language and the target language.

2. The method of claim 1, wherein the predefined parameter comprises spaces and punctuation marks.

3. The method of claim 1 further comprising determining, by using text segmentation heuristic, whether the content received is as sentence or not, wherein if the content determined to be a sentence then translating the content into a target language using a crowd sourcing technique.

4. The method of claim 1, wherein the intermediate representation is determined based on at least one of a phonetic transcription code, wherein the phonetic transcription code comprises Arpabet, WorldBet, International Phonetic Alphabet (API), SAMPA, and X-SAMPA.

5. The method of claim 1 further comprising creating rules necessary for creating meaningful grapheme phonemes for each of the plurality of languages.

6. A system 102 for facilitating transliteration between a plurality of languages, wherein the system comprises: a processor 202; a memory 206 coupled with the processor 202, wherein the processor 202 executes a plurality of modules 208 stored in the memory 206, and wherein the plurality of modules comprises: receiving module 210 to receive content in a plurality of languages; tokenizing module 212 to divide the content into one or more tokens based upon predefined parameters, wherein the one or more tokens belongs to each of the plurality of languages, and wherein each token indicates a word in the content; determining module 214 to determine an intermediate representation, for the one or more tokens, corresponding to each of the plurality of languages, wherein the intermediate representation is in an intermediate language having a phonetic characteristic; generating module 216 to generate a map between the one or more tokens of the content in each of the plurality of languages and the intermediate representation, wherein the map indicates a transliteration scheme corresponding to each language; the receiving module 210 to receive a request for transliteration of the content in a source language into a target language, wherein the source language and the target language belongs to the plurality of languages; and transliterating module 218 to transliterate the content of the source language into the target language using the transliteration schemes associated with the source language and the target language.

7. The system 102 of claim 6, wherein the determining module 214 further determines, by using text segmentation heuristic, whether the content received is as sentence or not, wherein if the content determined to be a sentence then translating the content into a target language using a crowd sourcing technique.

8. The system 102 of claim 6, wherein the intermediate representation is determined based on at least one of a phonetic transcription code, wherein the phonetic transcription code comprises Arpabet, WorldBet, International Phonetic Alphabet (API), SAMPA, and X-SAMPA.

9. The system 102 of claim 6 further comprises a merging module 220 for creating rules required for creating meaningful grapheme phonemes for each of the plurality of languages.

10. A non-transitory computer readable medium embodying a program executable in a computing device for facilitating transliteration between a plurality of languages, the program comprising: a program code for receiving content in a plurality of languages; a program code for dividing the content into one or more tokens based upon predefined parameters, wherein the one or more tokens belongs to each of the plurality of languages, and wherein each token indicates a word in the content; a program code for determining an intermediate representation, for the one or more tokens, corresponding to each of the plurality of languages, wherein the intermediate representation is in an intermediate language having a phonetic characteristic; a program code for generating a map between the one or more tokens of the content in each of the plurality of languages and the intermediate representation, wherein the map indicates a transliteration scheme corresponding to each language; a program code for receiving a request for transliteration of the content in a source language into a target language, wherein the source language and the target language belongs to the plurality of languages; and a program code for transliterating the content of the source language into the target language using the transliteration schemes associated with the source language and the target language.

Specification

DESC:
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
SYSTEM AND METHOD FOR PROVIDING TRANSLITERATION BY LEVERAGING ARPABET PHONEMES

APPLICANT:
Tata Consultancy Services Limited
A company Incorporated in India under The Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India

The following specification particularly describes the invention and the manner in which it is to be performed.
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
[001] The present application claims priority to Indian Provisional Patent Application No. 45/MUM/2014, filed on January 06th, 2014, the entirety of which is hereby incorporated by reference.
TECHNICAL FIELD
[002] The present subject matter described herein, in general, relates to a method and a system for providing transliteration by leveraging arpabet phonemes.
BACKGROUND
[003] For facilitating transliteration, different transliterating tools are available which allows to phonetically type a particular language or other text in a specified script. These tools have an in-built artificial intelligence with a dictionary based phonetic transliteration approach. Further, the tools generally use machine learning techniques to learn different patterns of a language. However, for transliterating n number of languages, the tools have to prepare nC2 combinations of transliterating schemes.
[004] Thus, for every language n such schemes have to be created by the tools. This approach becomes an overhead when a newer language has to be included in the list of n number of languages. The newer language added may not be as popular and thus the investment demanded could not be worthwhile. Also, the unnecessarily creating or designing of several combinations of transliterating schemes leads to an increase in internal computing/processing time for these tools. Along with the computation, a space crunch situation can be created in memory of these tools due to the increase in the number of transliterating schemes which are created for each language of the n number of languages. All these factors leads to wastage in the memory utilization of the tools and further slowing down the transliteration speed due to the increase in the number of the transliteration schemes.
SUMMARY
[005] This summary is provided to introduce aspects related to systems and methods for facilitating transliteration between a plurality of languages and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of subject matter nor is it intended for use in determining or limiting the scope of the subject matter.
[006] In one implementation, a system for facilitating transliteration between a plurality of languages is disclosed. The system comprises a processor and a memory coupled to the processor. The processor executes a plurality of modules stored in the memory. The plurality of modules comprises a receiving module, a tokenizing module, a determining module, a generating module, and transliterating module. The receiving module may receive content in a plurality of languages for transliteration. Further, the tokenizing module may divide the content into one or more tokens based upon predefined parameters. The one or more tokens may belong to each of the plurality of languages and each token may indicate a word in the content. Further, the determining module may determine an intermediate representation, for the one or more tokens, corresponding to each of the plurality of languages. The intermediate representation determined may be in an intermediate language having a phonetic characteristic. Further, the generating module may generate a map between the one or more tokens of the content in each of the plurality of languages and the intermediate representation. Further, the map may indicate a transliteration scheme corresponding to each language. The receiving module may further receive a request for transliterating the content in a source language into a target language, wherein the source language and the target language belong to the plurality of languages. Further, the transliterating module may transliterate the content of the source language into the target language using the transliteration schemes associated with the source language and the target language.
[007] In another implementation, a method for facilitating transliteration between a plurality of languages is disclosed. The method may comprise receiving, by a processor, content in a plurality of languages. The method may further comprise a step of dividing, by the processor, the content into one or more tokens based upon predefined parameters. Further, the one or more tokens belong to each of the plurality of languages and each token indicates a word in the content. The method may further comprise a step of determining, by the processor, an intermediate representation, for the one or more tokens, corresponding to each of the plurality of languages. Further, the intermediate representation may be in an intermediate language having a phonetic characteristic. The method may further comprise a step of generating, by the processor, a map between the one or more tokens of the content in each of the plurality of languages and the intermediate representation. The map may indicate a transliteration scheme corresponding to each language. The method may further comprise a step of receiving, by the processor, a request for transliteration of the content in a source language into a target language, wherein the source language and the target language belong to the plurality of languages. Further, the method may comprise a step of transliterating, by the processor, the content of the source language into the target language using the transliteration schemes associated with the source language and the target language.
[008] Yet in another implementation a non-transitory computer readable medium embodying a program executable in a computing device for facilitating transliteration between a plurality of languages is disclosed. The program may comprise a program code for receiving content in a plurality of languages. Further, the program may comprise a program code for dividing the content into one or more tokens based upon predefined parameters. Further, the one or more tokens may belong to each of the plurality of languages and each token indicates a word in the content. Further, the program may comprise a program code for determining an intermediate representation, for the one or more tokens, corresponding to each of the plurality of languages. The intermediate representation may be in an intermediate language having a phonetic characteristic. The program may further comprise a program code for generating a map between the one or more tokens of the content in each of the plurality of languages and the intermediate representation. The map may indicate a transliteration scheme corresponding to each language. Further, the program may comprise a program code for receiving a request for transliterating the content in a source language into a target language, wherein the source language and the target language belong to the plurality of languages. Further, the program may comprise a program code for transliterating the content of the source language into the target language using the transliteration schemes associated with the source language and the target language.
BRIEF DESCRIPTION OF THE DRAWINGS
[009] The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
[0010] Figure 1 illustrates a network implementation of a system for facilitating transliteration between plurality of languages, in accordance with an embodiment of the present subject matter.
[0011] Figure 2 illustrates the system, in accordance with an embodiment of the present subject matter.
[0012] Figure 3 illustrates a method for facilitating transliteration between plurality of languages, in accordance with an embodiment of the present subject matter.
DETAILED DESCRIPTION
[0013] Systems and methods for facilitating transliteration between plurality of languages by leveraging Arpabet phonemes are described. Transliteration is mainly required where a same content, for example content of a web page, has to be displayed in different countries in their native languages in order to make the content more readable for readers. By leveraging the fact that the Devanagiri and other Indic language scripts are highly phonetic, the present disclosure facilitates the transliteration of the content between plurality of languages. According to embodiments of present disclosure, the content to be transliterated may be received in the plurality of languages to the system. Further, the Devanagiri and other the Indic language scripts being as highly phonetic may be used for creating intermediate representation for each of the plurality of languages.
[0014] In first step, the system may identify the text fields from the content received and determine whether the text fields identified are sentence using text segmentation heuristic technique. Further, the text fields which are identified as sentence may be translated using a crowd sourcing technique. In the next step, the text fields of the content identified as non-sentence may be divided into plurality of tokens based on predefined parameter like spaces, punctuation marks and the like. According to embodiments of present disclosure, the token may indicate a word in the content. The system may further process the content to extract root form of each of the plurality of tokens using standard stemmer API and NLP libraries. Further, the named identities i.e., people, organizations or locations may also be identified from the tokens.
[0015] Further, an intermediate representation may be determined for each of the tokens based on phonetic transcription codes like Arpabet, WorldBet, International Phonetic Alphabet (API), SAMPA, and X-SAMPA. The intermediate representation is determined such that it provides a way to convert words of one language into another language considering that phonetic pronunciation of the words do not change with different scripts. This way, only one intermediate representation needs to be created for each token corresponding to each language.
[0016] The system may further create a map between the tokens of each of language and their intermediate representation. The map created indicates a transliteration scheme corresponding to each language. At the time of transliteration, the system may refer the map for a language pair (source and destination language) for transliterating the content from a source language to a target language.
[0017] While aspects of described system and method for facilitating transliteration between plurality of languages may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.
[0018] Referring to Figure 1, a network implementation 100 of a system 102 for facilitating transliteration between plurality of languages is illustrated, in accordance with an embodiment of the present subject matter. Although the present subject matter is explained considering that the system 102 is implemented as a computing system, it may be understood that the system 102 may also be implemented as a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, a tablet, a mobile phone, and the like. In one implementation, the system 102 may be implemented in a cloud-based environment. It will be understood that the system 102 may be accessed by multiple users through one or more user devices 104-1, 104-2…104-N, collectively referred to as user 104 hereinafter, or applications residing on the user devices 104. Examples of the user devices 104 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 104 are communicatively coupled to the system 102 through a network 106.
[0019] In one implementation, the network 106 may be a wireless network, a wired network or a combination thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[0020] Referring now to Figure 2, the system 102 is illustrated in accordance with an embodiment of the present subject matter. In one embodiment, the system 102 may include at least one processor 202, an input/output (I/O) interface 204, and a memory 206. The at least one processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 202 is configured to fetch and execute computer-readable instructions or modules stored in the memory 206.
[0021] The I/O interface 204 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 204 may allow the system 102 to interact with a user directly or through the client devices 104. Further, the I/O interface 204 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 204 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 204 may include one or more ports for connecting a number of devices to one another or to another server.
[0022] The memory 206 may include any computer-readable medium or computer program product known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, a compact disks (CDs), digital versatile disc or digital video disc (DVDs) and magnetic tapes. The memory 206 may include modules 208 and data 224.
[0023] The modules 208 include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the modules 208 may include a receiving module 210, tokenizing module 212, determining module 214, generating module 216, transliterating module 218, merging module 220, and other modules 222. The other modules 222 may include programs or coded instructions that supplement applications and functions of the system 102.
[0024] The data 224, amongst other things, serves as a repository for storing data processed, received, and generated by one or more of the modules 208. The data 224 may also include Arpabet phonemes database 226, transliteration schemes database 228, and other data 230.
[0025] According to embodiments of present disclosure, an example is illustrated for transliterating the content from one language to another language. The present disclosure leverages Arpabet phonemes for facilitating the transliteration. The Arpabet phonemes are phonetic transcription codes representing phoneme of General American English with a distinct sequence of ASCII (American Standard Code for Information Interchange) characters. Further, every phoneme may be represented by one or two capital letters. The Arpabet phonemes may be stored in an Arpabet phonemes database 226 of the system 102. In a known transliteration methodologies, for transliterating n number of languages nC2 combination of transliterating schemes are created. Thus, for every language nC2 transliterating schemes have to be designed which is not a worthwhile. This may lead to unnecessary space utilization in the memory 206 of the system 102. It may also lead in increase in number of computations performed for creating the nC2 combination of the transliterating schemes.
[0026] To overcome such situations, the present disclosure discloses a methodology for facilitating transliteration between plurality of languages by leveraging the Arpabet phonemes. In one embodiment, one of a well known script “Devanagiri” and other Indic language scripts (considered to be highly phonetic) may be used as an intermediate representation for the transliteration between the plurality of languages. The determining of the intermediate representation using the Arpabet phonemes has been explained in detail in subsequent paragraphs of the specification.
[0027] At first, the receiving module 210 of the system 102 may receive content in plurality of languages. For example, the content may belong to a web page or may belong to any other source. In one example, the content may be received as “I visited Karakoram” and “Karakoram heights” which is in English language which is to be transliterated into Hindi language, wherein the English and Hindi (in this example) may belong to the plurality of languages. After receiving the content, the determining module 214 of the system 102 may determine whether the content received is a sentence. If the content is determined to be a sentence, then the system 102 may facilitate translation of the sentence into a target language using crowd sourcing technique. In present example, the first part of the content received i.e., “I visited Karakoram” is a sentence, and hence it gets translated into Hindi using the crowd sourcing technique. The translation can be seen in table below.

Content Received Is a sentence Translation
I visited Karakoram Yes ??? ???????? ?? ???? ????
[0028] Further, if the content is determined to be a non-sentence, the tokenizing module 212 of the system 102 may divide the content into plurality of tokens indicating words in the content. In this example, the second part of the content i.e., “Karakoram heights” is determined to be the non-sentence. Hence, the system 102 considers the second part of the content i.e., “Karakoram heights” for the transliteration. The tokenizing module 212 of the system 102 may divides the content into one or more tokens. In this example, the one or more tokens are shown in below table.
Content Received Token 1 Token 2
Karakoram heights Karakoram heights
[0029] Thus, the two tokens “Karakoram” and “heights” can be observed from above table. Further, the determining module 214 of the system 102 may determine the intermediate representation for both of the above tokens “Karakoram” and “heights”. The intermediate representation may be determined for these tokens in each of the plurality of languages in which the content may be received. Since, in this example the content received is in the English language which is to be transliterated into the Hindi language, the determining module 214 determines the intermediate representation for the tokens (“Karakoram” and “heights”) corresponding to the English and the Hindi language.
[0030] According to embodiments of present disclosure, the intermediate representation may be determined based on at least one of a phonetic transcription code comprising Arpabet, WorldBet, International Phonetic Alphabet (API), SAMPA, and X-SAMPA. In the present example, the Arpabet phonemes may be considered as the phonetic transcription code for determining the intermediate representation for each token in each of the plurality of languages. Few examples of the Arpabet phonemes mapped with different set of words are shown in below table.

Arpabet Phoneme Word Intermediate representation
AA
odd AA D
AE
at AE T
AH hut HH AH T
AO
ought AO T
AW cow K AW
AY
hide HH AY D
B be B IY
CH cheese CH IY Z
D dee D IY
DH
thee DH IY
EH Ed EH D
ER hurt HH ER T
EY
ate EY T
F
fee F IY
G
green G R IY N
HH he HH IY
IH it IH T
IY eat IY T
JH
gee JH IY
K
key K IY
L lee L IY
M me M IY
N knee N IY
NG ping P IH NG
OW
oat OW T
OY toy T OY
P pee P IY
R
read R IY D
S sea S IY
SH
she SH IY
T
tea T IY
TH theta TH EY T AH
UH hood HH UH D
UW
two T UW
Y
yield Y IY L D
ZH
seizure S IY ZH ER
[0031] By referring the above table, the determining module 214 of the system 102 may determine the intermediate representation for each of the tokens using the Arpabet phonemes. The intermediate representation determined for both the tokens may be seen in below table.
Token (Word) Arpabet Phoneme Intermediate Representation (English) Intermediate Representation (Hindi)
Karakoram AA OW AH1 KAA RAA KOW RAH1M
heights AY HH AY TS
???????? AA OW AH1 KAA RAA KOW RAH1M
?????? AY HH AY TS
[0032] Thus, it may be observed from the above table that the intermediate representation for both the tokens in both the languages remain same. Further, in the next step, the generating module 216 of the system 102 may generate a map between the tokens of the content in each of the plurality of languages and the intermediate representation. The map indicates a transliteration scheme corresponding to each language of the plurality of languages. This way, only one transliteration scheme need to be created for each language. Further, the transliteration scheme created may be stored in transliteration schemes database 228 of the system 102.
[0033] According to embodiments of present disclosure, the generating module 216 may further replace arpabet characters with a mapped grapheme. According to an embodiment, a deterministic map created for a language “Hindi” is shown in a block given below:
'AA': 'u0906', 'AE': 'u0913','AH' : 'u0905', 'AO' : 'u0911', 'AW' : 'u0914', 'AY' : 'u0908', 'B' : 'u092c', 'CH' : 'u091a', 'D' :'u0921', 'DH' : 'u0927', 'EH' : 'u090f', 'ER' : 'u0930', 'EY' : 'u0910', 'F' : 'u092b', 'G' : 'u0917', 'HH' : 'u0939', 'IH' : 'u0907','IY' : 'u0907', 'JH' : 'u091c', 'K' : 'u0915', 'L' : 'u0932', 'M' : 'u092e','N' : 'u0928', 'NG' : 'u0917', 'OW' : 'u0913', 'OY' :'u092f', 'P' : 'u092a', 'R' : 'u0930', 'S' : 'u0938', 'SH' : 'u0936', 'T' : 'u0924', 'TH' : 'u0925', 'UH' : '', 'UW' : 'u090a', 'V' :'u0935', 'W' : 'u0935', 'Y' : 'u092f', 'Z' : 'u091d', 'ZH' : 'u091e'
[0034] After creating the transliteration schemes, the merging module 220 of the system may create merging criteria i.e., rules necessary for creating meaningful grapheme phonemes for each of the plurality of languages. In one embodiment, the merging criteria may be created by taking into consideration that more than three consonants back to back in Hindi language is rare. Thus, a vowel phoneme may be added in between consonant phonemes. Therefore, rather than having a fixed rule like no more than three consonants back to back, more probabilistic rules may be considered by the merging module 220.
[0035] Further, the receiving module 210 of the system 102 may further receive a request for transliterating the content from a source language into a target language. In the present example, the source language is English and the target language is Hindi. Based on the transliteration schemes created, the transliterating module 218 of the system 102 may transliterate the content of the source language into the target language using the transliteration schemes associated with the source language and the target language. In this present example, the transliterating module 218 may match the intermediate representations of each of the tokens (“Karakoram” and “heights”) corresponding to its source and target language for the transliteration. For example, the transliterating module 218 may match the intermediate representations determined for the token “Karakoram” in English and in Hindi language. It may be observed from the above table that the intermediate representation for the token “Karakoram” in both the languages is “KAA RAA KOW RAH1M”. Thus, based on the intermediate representation, the transliterating module 218 transliterates the token “Karakoram” from English to Hindi.
[0036] Similarly, the transliterating module 218 may further match the intermediate representation determined for the token “heights” in English and in Hindi language. It may be observed from the above table that the intermediate representation for the token “heights” in both the languages is “HH AY TS”. Thus, based on the intermediate representation, the transliterating module 218 transliterates the token “heights” from English to Hindi. The transliteration done for both the tokens is shown in below table.
Token (Words) Before Transliteration After Transliteration
Karakoram Karakoram ????????
heights heights ??????
[0037] Thus, from the above table, it may be observed that the token “Karakoram” which was in English is now transliterated into Hindi i.e., “????????”. Similarly, the token “heights” which was in English is now transliterated into Hindi i.e., “??????”.
[0038] Referring now to Figure 3, method for facilitating transliteration between plurality of languages is shown, in accordance with an embodiment of the present subject matter. The method 300 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 300 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
[0039] The order in which the method 300 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 300 or alternate methods. Additionally, individual blocks may be deleted from the method 300 without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method 300 may be considered to be implemented in the above described system 102.
[0040] At block 302, content may be received in a plurality of languages. Further, the content received may belongs to web page to be displayed in different regions/countries in their native languages.
[0041] At block 304, the content may be divided into one or more tokens based upon predefined parameters like spaces and punctuation marks. Further, the one or more tokens belong to each of the plurality of languages and each token indicates a word in the content.
[0042] At block 306, an intermediate representation may be determined for the one or more tokens corresponding to each of the plurality of languages. Further, the intermediate representation may be in an intermediate language having a phonetic characteristic.
[0043] At block 308, a map may be generated between the one or more tokens of the content in each of the plurality of languages and the intermediate representation. Further, the map may indicate a transliteration scheme corresponding to each of the plurality of languages.
[0044] At block 310, a request may be received for transliterating the content in a source language into a target language. Further, the source language and the target language belong to the plurality of languages.
[0045] At block 312, the content may be transliterated from the source language into the target language using the transliteration schemes associated with the source language and the target language.
[0046] Although implementations for methods and systems for facilitating transliterating between the plurality of languages have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for providing transliteration between the plurality of languages by leveraging the Arpabet phonemes.
,CLAIMS:WE CLAIM:

1. A method for facilitating transliteration between a plurality of languages, the method comprising:
receiving, by a processor, content in a plurality of languages;
dividing, by the processor, the content into one or more tokens based upon predefined parameters, wherein the one or more tokens belongs to each of the plurality of languages, and wherein each token indicates a word in the content;
determining, by the processor, an intermediate representation, for the one or more tokens, corresponding to each of the plurality of languages, wherein the intermediate representation is in an intermediate language having a phonetic characteristic;
generating, by the processor, a map between the one or more tokens of the content in each of the plurality of languages and the intermediate representation, wherein the map indicates a transliteration scheme corresponding to each language;
receiving, by the processor, a request for transliteration of the content in a source language into a target language, wherein the source language and the target language belongs to the plurality of languages; and
transliterating, by the processor, the content of the source language into the target language using the transliteration schemes associated with the source language and the target language.

2. The method of claim 1, wherein the predefined parameter comprises spaces and punctuation marks.

3. The method of claim 1 further comprising determining, by using text segmentation heuristic, whether the content received is as sentence or not, wherein if the content determined to be a sentence then translating the content into a target language using a crowd sourcing technique.

4. The method of claim 1, wherein the intermediate representation is determined based on at least one of a phonetic transcription code, wherein the phonetic transcription code comprises Arpabet, WorldBet, International Phonetic Alphabet (API), SAMPA, and X-SAMPA.

5. The method of claim 1 further comprising creating rules necessary for creating meaningful grapheme phonemes for each of the plurality of languages.

6. A system 102 for facilitating transliteration between a plurality of languages, wherein the system comprises:
a processor 202;
a memory 206 coupled with the processor 202, wherein the processor 202 executes a plurality of modules 208 stored in the memory 206, and wherein the plurality of modules comprises:
receiving module 210 to receive content in a plurality of languages;
tokenizing module 212 to divide the content into one or more tokens based upon predefined parameters, wherein the one or more tokens belongs to each of the plurality of languages, and wherein each token indicates a word in the content;
determining module 214 to determine an intermediate representation, for the one or more tokens, corresponding to each of the plurality of languages, wherein the intermediate representation is in an intermediate language having a phonetic characteristic;
generating module 216 to generate a map between the one or more tokens of the content in each of the plurality of languages and the intermediate representation, wherein the map indicates a transliteration scheme corresponding to each language;
the receiving module 210 to receive a request for transliteration of the content in a source language into a target language, wherein the source language and the target language belongs to the plurality of languages; and
transliterating module 218 to transliterate the content of the source language into the target language using the transliteration schemes associated with the source language and the target language.

7. The system 102 of claim 6, wherein the determining module 214 further determines, by using text segmentation heuristic, whether the content received is as sentence or not, wherein if the content determined to be a sentence then translating the content into a target language using a crowd sourcing technique.

8. The system 102 of claim 6, wherein the intermediate representation is determined based on at least one of a phonetic transcription code, wherein the phonetic transcription code comprises Arpabet, WorldBet, International Phonetic Alphabet (API), SAMPA, and X-SAMPA.

9. The system 102 of claim 6 further comprises a merging module 220 for creating rules required for creating meaningful grapheme phonemes for each of the plurality of languages.

10. A non-transitory computer readable medium embodying a program executable in a computing device for facilitating transliteration between a plurality of languages, the program comprising:
a program code for receiving content in a plurality of languages;
a program code for dividing the content into one or more tokens based upon predefined parameters, wherein the one or more tokens belongs to each of the plurality of languages, and wherein each token indicates a word in the content;
a program code for determining an intermediate representation, for the one or more tokens, corresponding to each of the plurality of languages, wherein the intermediate representation is in an intermediate language having a phonetic characteristic;
a program code for generating a map between the one or more tokens of the content in each of the plurality of languages and the intermediate representation, wherein the map indicates a transliteration scheme corresponding to each language;
a program code for receiving a request for transliteration of the content in a source language into a target language, wherein the source language and the target language belongs to the plurality of languages; and
a program code for transliterating the content of the source language into the target language using the transliteration schemes associated with the source language and the target language.

Documents

Application Documents

#	Name	Date
1	45-MUM-2014-FORM-26 [08-06-2018(online)].pdf	2018-06-08
2	Form-2(Online).pdf	2018-08-11
3	Form 2.pdf	2018-08-11
4	Figure of abstract.jpg	2018-08-11
5	ABSTRACT1.jpg	2018-08-11
6	45-MUM-2014-FORM 26(19-3-2014).pdf	2018-08-11
7	45-MUM-2014-FORM 1(27-2-2014).pdf	2018-08-11
8	45-MUM-2014-CORRESPONDENCE(27-2-2014).pdf	2018-08-11
9	45-MUM-2014-CORRESPONDENCE(19-3-2014).pdf	2018-08-11
10	45-MUM-2014-OTHERS(ORIGINAL UR 6( 1A) FORM 26)-130618.pdf	2018-09-12
11	45-MUM-2014-FORM-26 [05-07-2019(online)].pdf	2019-07-05
12	45-MUM-2014-FER.pdf	2019-07-12
13	45-MUM-2014-ORIGINAL UR 6(1A) FORM 26-120719.pdf	2019-11-07
14	45-MUM-2014-OTHERS [12-01-2020(online)].pdf	2020-01-12
15	45-MUM-2014-FER_SER_REPLY [12-01-2020(online)].pdf	2020-01-12
16	45-MUM-2014-COMPLETE SPECIFICATION [12-01-2020(online)].pdf	2020-01-12
17	45-MUM-2014-CLAIMS [12-01-2020(online)].pdf	2020-01-12
18	45-MUM-2014-PatentCertificate23-08-2021.pdf	2021-08-23
19	45-MUM-2014-IntimationOfGrant23-08-2021.pdf	2021-08-23
20	45-MUM-2014-RELEVANT DOCUMENTS [30-09-2023(online)].pdf	2023-09-30

Search Strategy

1	2019-07-1014-49-52_10-07-2019.pdf