Abstract: An NLP-based system (101) and method (300) implemented thereby for generating multilingual language is disclosed. Initially, the system (101) receives a user preference of a first language and a second language. The system (101) generates a monolingual text in the first language and calculates a first tf-idf score of each word in the text for ranking various words. Further, the system (101) performs linear transformation of low ranked words to form second language words and align the vectors of the first language to vectors of the second language. Further, the system (101) identifies semantically similar words surrounding the linearly transformed second language words. Furthermore, the system (101) calculates a second tf-idf score for second language words and ranks them. The system (101) lastly generates an appropriate multilingual language response for the user by substituting the monolingual text in the first language with the top ranked words of the second language. [To be published with figure 1]
Claims:We Claim:
1. A system (101) for generating a multilingual language text, the system (101) comprising:
a processor (201); and
a memory (205) coupled to the processor (201), wherein the processor (201) is configured to execute programmed instructions (207) stored in the memory (205) for:
receiving a user preference for a multilingual language, wherein the user preference corresponds to a first language and a second language for multilingual language generation;
generating a monolingual text corresponding to the first language;
calculating a first score for each word in the generated text and ranking each word based on the first score;
performing linear transformation of at least one low ranked word to form second language words, wherein the linear transformation corresponds to aligning monolingual vectors of the first language to monolingual vectors of the second language;
identifying one or more words surrounding the linearly transformed vectors of the second language, wherein the one or more words identified are semantically similar to the corresponding words of the first language;
calculating a second score for the one or more words of the second language and ranking the one or more words based on the second score; and
generating an appropriate multilingual language response for the user by replacing the monolingual text in the first language with the top ranked word of the second language.
2. The system (101) as claimed in claim 1, wherein the monolingual text corresponding to the first language is generated by using a Long Short Term Memory (LSTM) model.
3. The system (101) as claimed in claim 1, wherein the first score and the second score correspond to the term frequency–inverse document frequency (tf-idf) weight.
4. A method (300) for generating a multilingual language text, the method comprising:
receiving, by a processor, a user preference for a multilingual language, wherein the user preference corresponds to a first language and a second language for multilingual language generation;
generating, by the processor, a monolingual text corresponding to the first language;
calculating, by the processor, a first score for each word in the generated text and ranking each word based on the first score;
performing, by the processor, linear transformation of at least one low ranked word to form second language words, wherein the linear transformation corresponds to aligning monolingual vectors of the first language to monolingual vectors of the second language;
identifying, by the processor, one or more words surrounding the linearly transformed vectors of the second language, wherein the one or more words identified are semantically similar to the corresponding words of the first language;
calculating, by the processor, a second score for the one or more words of the second language and ranking the one or more words based on the second score; and
generating, by the processor, an appropriate multilingual language response for the user by replacing the monolingual text in the first language with the top ranked word of the second language
5. The method (300) as claimed in claim 4, wherein the monolingual text corresponding to the first language is generated by using a Long Short Term Memory (LSTM) model.
6. The method (300) as claimed in claim 4, wherein the first score and the second score correspond to the term frequency–inverse document frequency (tf-idf) weight.
7. A virtual assistance device for generating a multilingual language text, the virtual assistance device comprising:
a processor (201); and
a memory (205) coupled to the processor (201), wherein the processor (201) is configured to execute programmed instructions (207) stored in the memory (205) for:
receiving a user preference for a multilingual language, wherein the user preference corresponds to a first language and a second language for multilingual language generation;
generating a monolingual text corresponding to the first language;
calculating a first score for each word in the generated text and ranking each word based on the first score;
performing linear transformation of at least one low ranked word to form second language words, wherein the linear transformation corresponds to aligning monolingual vectors of the first language to monolingual vectors of the second language;
identifying one or more words surrounding the linearly transformed vectors of the second language, wherein the one or more words identified are semantically similar to the corresponding words of the first language;
calculating a second score for the one or more words of the second language and ranking the one or more words based on the second score; and generating an appropriate multilingual language response for the user by replacing the monolingual text in the first language with the top ranked word of the second language.
8. The virtual assistance device as claimed in claim 8, wherein the virtual assistance device is a chat application or a chatbot.
9. The virtual assistance device as claimed in claim 8, wherein the monolingual text corresponding to the first language is generated by using a Long Short Term Memory (LSTM) model.
10. The virtual assistance device as claimed in claim 8, wherein the first score and the second score correspond to the term frequency–inverse document frequency (tf-idf) weight.
Dated this 19th Day of March 2020
Priyank Gupta
Agent for the Applicant
IN/PA- 1454 , Description:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
TITLE OF INVENTION:
A NATURAL LANGUAGE PROCESSING (NLP) BASED SYSTEM AND METHOD FOR MIXED LANGUAGE GENERATION
APPLICANT:
Zensar Technologies Limited, an Indian Entity,
having address as:
ZENSAR KNOWLEDGE PARK,
PLOT # 4, MIDC, KHARADI, OFF
NAGAR ROAD, PUNE-411014,
MAHARASHTRA, INDIA
The following specification describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The present subject matter described herein, in general relates to Natural Language Generation (NLG) that focuses on reducing communication gaps between machines and humans. In particular, the present subject matter relates to NLP-based system and method for generation of multilingual language text, comprising two or more distinct languages.
BACKGROUND
Communicating using multiple, complex and nuanced languages is one of the special traits possessed by humans. This unique ability differentiates humans from the rest of the species and has led to the development of various civilizations. Different sects of the society are weaved together due to this humanly quality that gives us an identity. However, in multilingual parts of the world, communities are held together by more than just one language.
In India, speakers rarely communicate in a single language. Hindi speakers borrow English words while English speakers use some other regional language phrases to fill the gaps and convey their message more clearly. This form of multilingual communication is common and effortless for people who live in multilingual communities or societies. In fact, the tendency to converse in mixed languages spans over digital media or social media platforms.
Multilingual communication may seem natural for people living in multilingual communities, but it presents a challenge for current artificial intelligence technologies. Artificial Intelligence (AI)-enabled virtual assistants or chatbots are programmed to detect, interpret, and generate responses in natural language that is essentially a single language. The inability to work with several languages simultaneously makes working with these AI-enabled tools less convenient for multilingual users. To make machines sound more human, this ability to interpret, output or converse in multilingual languages is essential. Hence, developing multilingual system that can communicate or converse in multilingual languages is clearly an imperative for us.
Thus, there exists a long-felt need for method and system/apparatus that can overcome the above disclosed problems of the prior art by facilitating generation of multilingual language text comprising two or more distinct languages.
SUMMARY
This summary is provided to introduce concepts related to NLP-based system and method for generation of multilingual language text. This summary is not intended to identify essential features of the claimed subject matter not is not it intended for use on determining or limiting the scope of the claimed subject matter.
In one embodiment, a system for generating a multilingual language text is disclosed. The system may comprise a processor and a memory coupled to the processor. The processor may be configured to execute a plurality of programmed instructions stored in the memory for performing the task of generating multilingual language text. The processor may be configured to execute programmed instructions to receive a user preference for a multilingual language. The user preference may include a first language and a second language for multilingual language generation. The processor may further be configured to execute programmed instructions to generate a monolingual text corresponding to the first language. The processor may further be configured to execute programmed instructions to calculate a first score for each word in the generated text and rank each word based on the first score. Further, the processor may be configured to execute programmed instructions to perform linear transformation of at least one low ranked word to form second language words. The linear transformation may essentially imply aligning the monolingual vectors of the first language to monolingual vectors of the second language. After linearly transforming the monolingual vectors of the first language to monolingual vectors of the second language, the processor may be configured to execute programmed instructions to identify one or more words the linearly transformed vectors of the second language. The one or more words identified are semantically similar to the corresponding words of the first language. Further, the processor may be configured to execute programmed instructions to calculate a second score for the one or more words of the second language and rank the words based on the second score. Furthermore, the processor may be configured to execute programmed instructions to generate an appropriate multilingual language response for the user by replacing the monolingual text in the first language with the top ranked word of the second language.
In another embodiment, a method for generating a multilingual language text is disclosed. The method may include receiving, by a processor, a user preference for a multilingual language. The user preference may include a first language and a second language for multilingual language generation. The method may further include generating, by the processor, a monolingual text corresponding to the first language. The method may further include calculating, by the processor, a first score for each word in the generated text and rank each word based on the first score. Further, the method may include performing, by the processor, linear transformation of at least one low ranked word to form second language words. The linear transformation may essentially imply aligning the monolingual vectors of the first language to monolingual vectors of the second language. Further, the method may include identifying, by the processor, one or more words surrounding the linearly transformed vectors of the second language. The identified words are semantically similar to the corresponding words of the first language. Further, the method may include calculating, by the processor, a second score for the one or more words of the second language and rank the words based on the second score. Furthermore, the method may include generating, by the processor, an appropriate multilingual language response for the user by replacing the monolingual text in the first language with the top ranked word of the second language.
In yet another embodiment, a virtual assistant device for generating a multilingual language text is disclosed. The virtual assistant device may comprise a processor and a memory coupled to the processor. The processor may be configured to execute a plurality of programmed instructions stored in the memory for performing the task of generating multilingual language text. The processor may be configured to execute programmed instructions to receive a user preference for a multilingual language. The user preference may include a first language and a second language for multilingual language generation. The processor may further be configured to execute programmed instructions to generate a monolingual text corresponding to the first language. The processor may further be configured to execute programmed instructions to calculate a first score for each word in the generated text and rank each word based on the first score. Further, the processor may be configured to execute programmed instructions to perform linear transformation of at least one low ranked word to form second language words. The linear transformation may essentially imply aligning the monolingual vectors of the first language to monolingual vectors of the second language. After linearly transforming the monolingual vectors of the first language to monolingual vectors of the second language, the processor may be configured to execute programmed instructions to identify one or more words the linearly transformed vectors of the second language. The one or more words identified are semantically similar to the corresponding words of the first language. Further, the processor may be configured to execute programmed instructions to calculate a second score for the one or more words of the second language and rank the words based on the second score. Furthermore, the processor may be configured to execute programmed instructions to generate an appropriate multilingual language response for the user by replacing the monolingual text in the first language with the top ranked word of the second language.
BRIEF DESCRIPTION OF DRAWINGS
The detailed description is described with reference to the accompanying Figures. In the Figures, the left-most digit(s) of a reference number identifies the Figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
Figure 1 illustrates an implementation 100 of a system 101 for generating a multilingual language text by aligning the monolingual vectors of the first language to monolingual vectors of the second language based on linear transformation technique, in accordance with an embodiment of the present subject matter.
Figure 2 illustrates a pipeline of the programmed instructions executed by the system (101) for enabling multilingual language generation, in accordance with an embodiment of the present disclosure.
Figure 3 illustrates a method (300) for generating a multilingual language text, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
Reference throughout the specification to “various embodiments,” “some embodiments,” “one embodiment,” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, appearances of the phrases “in various embodiments,” “in some embodiments,” “in one embodiment,” or “in an embodiment” in places throughout the specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.
Referring to Figure 1, a network implementation (100) of a Natural Language Processing (NLP) based multilingual language generation system (101) is illustrated, in accordance with an embodiment of the present subject matter.
In an embodiment, the multilingual language generation system (101) (hereinafter referred as system (101) interchangeably) may be connected to a user device (103) over a network (102). It may be understood that the multilingual language generation system (101) may be accessed by multiple users through one or more user devices, collectively referred to as a user device (103). The user device (103) may be any electronic device, communication device, image capturing device, machine, software, automated computer program, a robot or a combination thereof.
In an embodiment, though the present subject matter is explained considering that the system (101) is implemented (as an NLP-based multilingual language generation system or a virtual assistant device such as “Chatbot Application” and the like) on a server, it may be understood that the system (101) may also be implemented in a variety of user devices, such as, but not limited to, a portable computer, a personal digital assistance, a handheld device, a mobile, a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a mobile device, and the like. In one embodiment, system (101) may be implemented in a cloud-computing environment. In an embodiment, the network (102) may be a wireless network such as Bluetooth, Wi-Fi, 3G, 4G/LTE and alike, a wired network or a combination thereof. The network (102) can be accessed by the user device (103) using wired or wireless network connectivity means including updated communications technology.
In one embodiment, the network (102) can be implemented as one of the different types of networks, cellular communication network, local area network (LAN), wide area network (WAN), the internet, and the like. The network (102) may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further, the network (102) may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
Further, referring to Figure 1, various components of the NLP-based multilingual language generation system (101) are illustrated, in accordance with an embodiment of the present subject matter. As shown, the system (101) may include at least one processor (201), an input/output interface (203), a memory (205), programmed instructions (207) and data (209). In one embodiment, the at least one processor (201) is configured to fetch and execute computer-readable instructions stored in the memory (205).
In one embodiment, the I/O interface (203) may be implemented as a mobile application or a web-based application and may further include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface (203) may allow the system (101) to interact with the user devices (103). Further, the I/O interface (203) may enable the user device (103) to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface (203) can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface (203) may include one or more ports for connecting to another server. In an exemplary embodiment, the I/O interface (203) is an interaction platform which may provide a connection between users and system (101).
In an implementation, the memory (205) may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and memory cards. The memory (205) may include programmed instructions (207) and data (209).
In one embodiment, the programmed instructions (207) may include, routines, programmes, objects, components, data structures, etc. which perform particular tasks, functions, or implement particular abstract data types. The data (209) may comprise a data repository (211), and other data (213). In one embodiment, data repository (211) may pre-store the vectors of the plurality of words in different multiple languages, wherein such vectors are generated based upon corpus word data available in different resources corresponding to each language. The other data (213) amongst other things, serves as a repository for storing data processed, received, and generated by one or more components and programmed instructions.
The aforementioned computing devices may support communication over one or more types of networks in accordance with the described embodiments. For example, some computing devices and networks may support communications over a Wide Area Network (WAN), the Internet, a telephone network (e.g., analog, digital, POTS, PSTN, ISDN, xDSL), a mobile telephone network (e.g., CDMA, GSM, NDAC, TDMA, E-TDMA, NAMPS, WCDMA, CDMA-2000, UMTS, 3G, 4G), a radio network, a television network, a cable network, an optical network (e.g., PON), a satellite network (e.g., VSAT), a packet-switched network, a circuit-switched network, a public network, a private network, and/or other wired or wireless communications network configured to carry data. Computing devices and networks also may support wireless wide area network (WWAN) communications services including Internet access such as EV-DO, EV-DV, CDMA/1×RTT, GSM/GPRS, EDGE, HSDPA, HSUPA, and others.
The aforementioned computing devices and networks may support wireless local area network (WLAN) and/or wireless metropolitan area network (WMAN) data communications functionality in accordance with Institute of Electrical and Electronics Engineers (IEEE) standards, protocols, and variants such as IEEE 802.11 (“WiFi”), IEEE 802.16 (“WiMAX”), IEEE 802.20x (“Mobile-Fi”), and others. Computing devices and networks also may support short range communication such as a wireless personal area network (WPAN) communication, Bluetooth® data communication, infrared (IR) communication, near-field communication, electromagnetic induction (EMI) communication, passive or active RFID communication, micro-impulse radar (MIR), ultra-wide band (UWB) communication, automatic identification and data capture (AIDC) communication, and others.
The working of the system (101) in facilitating multilingual language generation will now be described in detail referring to Figures 1, 2 and 3 as below:
In one embodiment, a user may select a user preference for generating multilingual language. The user may input a preferred first language and a preferred second language as two distinct languages for generating responses in multilingual languages.
In one embodiment, the processor (201) may be configured for receiving an input, in the form of a user selection for the preferred languages, from the user device (103). The processor (201) may then be configured to generate a monolingual text using a Long Short Term Memory (LSTM) model. It is to be noted by a person skilled in the art that since LSTM model is well known in the art, the details of the LSTM model are not described herein for the sake of brevity.
The monolingual text implies text generated using only one language, for example, Hindi or English. The processor (201) may be further configured to calculate a score for each word of the words in the generated text. In one embodiment, a term frequency / total number of terms in document (tf-idf) score may be calculated for each of the words in the generated text. The tf-idf score for each word may be calculated as below:
1. tf= term frequency / total number of terms in document, and
2. idf=1 + log (Total Number of Documents / Number of Documents with term in it).
The tf-idf score is thereby calculated by taking the product of: (tf * idf).
Further, the words are ranked based on the tf-idf score. The processor (201) may be configured to identify the low and high ranked words based on the tf-idf score. The top ranked words are identified as words of high importance by the processor (201) and the low ranked words are identified as words of low importance. Furthermore, the low ranked words may be considered for further processing based on tf-idf values. The low ranked words may be selected for performing linear transformation.
In accordance with an embodiment, the vectors for the low ranked words of the first language may be transformed to vectors of the second language based on tf-idf score. In one embodiment, the vectors for the low ranked words of the first language may be transformed to vectors of the second language using linear transformation method. It must be noted herein that the linear transformation method is and is represented as T(v)=Av, wherein ‘A’ indicates nXm matrix, and wherein T(v) indicates the transformation matrix corresponding to matrix A where v is written as a column vector (with m coordinates). It must be noted by a person skilled in the art such linear transformation method is widely known in the art and therefore the details of such linear transformation method are not described herein for the sake of brevity. Furthermore, the vectors of the first language are aligned with the vectors of the second language.
The processor (201) may be further configured to identify one or more words surrounding the linearly transformed vectors of the second language. It must be noted herein that, the one or more words identified are semantically similar corresponding to the first language. The processor (201) may then calculate a second score for the surrounding one or more words of the second language by using similar tf-idf technique as explained above. The one or more surrounding words may then be ranked based on the second score. The processor (201) may then identify the top ranked words for multilingual language generation.
The processor (201) may be configured to generate an appropriate multilingual language response for the user by substituting the words in the monolingual text of the first language with the top ranked words of the second language.
Furthermore, in accordance with an embodiment invention, the multilingual language generation system may also take user feedback into consideration for mapping any inaccurate results. Mapping of these results may be used for re-training the multilingual language generation system, to output more accurate and appropriate user responses.
Now, referring to Figure 2, a pipeline of the programmed instructions executed by the system (101) for generating multilingual language is illustrated, in accordance with an embodiment of the present disclosure. In an embodiment, the processor (201) may enable the data repository (211) to store the vectors of the plurality of words in different multiple languages, wherein such vectors are generated based upon corpus word data available in different resources corresponding to each language. The processor (201) may be configured for receiving the user preference through an interface (203). In one embodiment, a user may select the user preference using the interface (203) for generating multilingual language. The user may input the preferred first language and the preferred second language as two distinct languages for generating responses in multilingual languages.
In one embodiment, the processor (201) may be configured for receiving an input, in the form of a user selection for the preferred languages, from the user device (103). The processor (201) may then be configured to generate a monolingual text using a Long Short Term Memory (LSTM) model. The monolingual text implies text generated using only one language, for example, Hindi or English. The processor (201) may be further configured to calculate term frequency / total number of terms in document (tf-idf) score of each of the words in the generated text. The tf-idf score for each word is calculated by the using the two distinct entities of:
1. tf= term frequency / total number of terms in document, and
2. idf=1 + log (Total Number of Documents / Number of Documents with term in it).
The tf-idf score is thereby calculated by taking the product of: (tf * idf).
Further, the words are ranked based on the tf-idf score. The processor (201) may be configured to identify the low and high ranked words based on the tf-idf score. The top ranked words are identified as words of high importance by the processor (201) and the low ranked words are identified as words of low importance. Furthermore, the low ranked words are considered for further processing based on tf-idf values. The low ranked words are selected for performing linear transformation.
In accordance with an embodiment, linear transformation is a process, wherein vectors for the first language are generated. Further, the vectors for the low ranked words of the first language are transformed to vectors of the second language based on tf-idf score. Furthermore, the vectors of the first language are aligned with the vectors of the second language.
The processor (201) may be further configured to identify one or more words surrounding the linearly transformed vectors of the second language. The one or more words identified are semantically similar corresponding to the first language. The processor (201) may then calculate a second score for the surrounding one or more words of the second language by using tf-idf techniques. The one or more surrounding words may then be ranked based on the second score. The processor (201) may then identify the top ranked words for multilingual language generation.
The processor (201) may be configured to generate an appropriate multilingual language response for the user by substituting the words in the monolingual text of the first language with the top ranked words of the second language.
Furthermore, in the present invention, the multilingual language generation system may also take user feedback into consideration for mapping any inaccurate results. Mapping of these results may be used for re-training the multilingual language generation system, to output more accurate and appropriate user responses.
Now, referring to figure 3, a method (300) depicting an NLP-based system (101) for generating multilingual language is illustrated in accordance with the embodiments of the present disclosure. The order in which method (300) is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method (300) or alternate methods. Furthermore, the method (300) can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for the ease of explanation, in the embodiments described below, the method (300) may be implemented in the above described system (101).
At step (301), the processor (201) may be configured for receiving an input in form of a user preference from the user device (103). In one embodiment, the input user preference may be in the form of selection of the first language and the second language for multilingual language generation.
At step (303), the processor (201) may be configured for generating a monolingual text corresponding to the first language selected by the user.
At step (305), the processor (201) may be configured for calculating a first score for each word in the generated text. The words in the generated text are ranked based on the first score. In one embodiment, the first score is calculated by using the tf-idf technique.
At step (307), the processor (201) may be configured for performing linear transformation of low ranked words to form the second language words. In one embodiment, the processor (201) may be configured for identifying the low ranked and top ranked words.
At step (309), the processor (201) may be configured for identifying one or more words surrounding the linearly transformed vectors of the second language.
At step (311), the processor (201) may be configured for calculating a second score for the one or more words of the second language and ranking the one or more words based on the second score.
At step (313), the processor (201) may be configured for generating an appropriate multilingual language response for the user by replacing the monolingual text in the first language with the top ranked word of the second language.
The embodiments of the present disclosure are further elaborated in form of a working example. In case of an online NLP-based multilingual language generation system, following steps are performed to obtain appropriate response to the user input preference for multilingual languages.
In one exemplary embodiment, consider, the user provides a user preference of the desired first and second language for multilingual language generation as Hindi and English respectively. The NLP-based system then generates the monolingual text in Hindi language. Consider, the generated monolingual text reads as ‘Mei aapko kaise maddad kar sakti hu’. The system further calculates the tf-idf score for each word in the above generated text. In the above example, the tf-idf score is calculated for the following words: Maddad, Kar, Hu, Kaise, Aapko, mei, sakti. The disclosed words are then ranked based on the tf-idf score values.
Further, the low ranked words are identified by using the tf-idf values. The low ranked words then undergo linear transformation to form English language words. Linear transformation essentially aligns the monolingual vectors of Hindi language to monolingual vectors of the English language. In the disclosed example, ‘Maddad’ is a low ranked word and therefore undergoes linear transformation to form English language word.
Further, the semantically similar words surrounding the linearly transformed English language word are identified as words of high or low words. In the above example, word ‘Maddad’ after linear transformation generates an English word which has following semantically similar words in its surrounding: ‘Help, Aid, Assist’. The system further calculates the second tf-idf score for each semantically similar word. In the above example, the tf-idf score is calculated for the following words: ‘Help, Aid, Assist’. The disclosed words are then ranked based on the tf-idf score values.
Lastly, an appropriate multilingual language response is generated by replacing the monolingual text ‘Mei aapko kaise maddad kar sakti hu’ with the top ranked word of the English language, here the top ranked word based on tf-idf score turns out to be ‘help’. Hence, the multilingual language response generated by the NLP-based system in this example would be, ‘Mei aapko kaise help kar sakti hu’, wherein word ‘Maddad’ of Hindi language is replaced with the word ‘Help’ of English language.
In another exemplary embodiment, consider the user provides a user preference of the desired first and second language for multilingual language generation as Spanish and English respectively. The NLP-based system then generates the monolingual text in Spanish language. Consider, the generated monolingual text reads as ‘Hola, como puedo ayudarte’ in this exemplary embodiment. The system further calculates the tf-idf score for each word in the above generated text. In the above example, the tf-idf score is calculated for the following words: Hola, como, puedo, ayudarte. The disclosed words are then ranked based on the tf-idf score values.
Further, the low ranked words are identified by using the tf-idf values. The low ranked words then undergo linear transformation to form English language words. Linear transformation essentially aligns the monolingual vectors of Spanish language to monolingual vectors of the English language. In the disclosed example, ‘Hola’ is a low ranked word and therefore undergoes linear transformation to form English language word.
Further, the semantically similar words surrounding the linearly transformed English language word are identified as words of high or low words. In the above example, word ‘Hola’ after linear transformation generates an English word which has following semantically similar words in its surrounding: ‘Hello, Greetings, Welcome’. The system further calculates the second tf-idf score for each semantically similar word. In the above example, the tf-idf score is calculated for the following words: ‘Hello, Greetings, Welcome’. The disclosed words are then ranked based on the tf-idf score values.
Lastly, an appropriate multilingual language response is generated by replacing the monolingual text ‘Hola, como puedo ayudarte’ with the top ranked word of the English language, here the top ranked word based on tf-idf score turns out to be ‘Hello’. Hence, the multilingual language response generated by the NLP-based system in this example would be, ‘Hello, como puedo ayudarte’, wherein word ‘Hola’ of Spanish language is replaced with the word ‘Hello’ of English language.
It must be noted herein that although above exemplary embodiments have been described considering the languages of Hindi, English and Spanish as preferred language, however, the present disclosure is not limited to generating multilingual language of these languages alone, but may be extended to any other languages universally known. Further, although the aforementioned description has been described considered bilingual (e.g. Hindi-English, Spanish-English) text generation, however, the present disclosure can be extended to enable the generation of a multilingual text containing more than two languages. A person skilled in the art can easily generate a multilingual text containing two or more languages based on the methodology implemented by the present system (101) as described above.
The NLP-based multilingual language generation system (101) as described in present disclosure may provide multiple advantages involving but not limited to:
• The system (101) discloses an approach for generating more than one language per sentence giving the virtual assistants or chatbots human like conversation ability.
• The disclosed system (101) proposes a way to achieve the multilingual language conversation ability in an optimized way, without the requirement of any additional datasets (i.e. does not require mixed language dataset).
• The disclosed system (101) provides an ability to identify and understand a multilingual representation in a single text.
• The disclosed system (101) further provides an ability to create multiple language translation/transcription of the same text.
The embodiments, examples and alternatives of the preceding paragraphs or the description and drawings, including any of their various aspects or respective individual features, may be taken independently or in any combination. Features described in connection with one embodiment are applicable to all embodiments, unless such features are incompatible.
Although implementations for the NLP-based multilingual language generation system and the method thereof have been described in language specific to structural features and/or methods, it is to be understood that the approached claims are not necessarily limited to the specific features or methods described. Rather, the specific features and method are disclosed as examples of implementations for the NLP-based multilingual language generation system and the method thereof.
The foregoing description shall be interpreted as illustrative and not in any limiting sense. A person of ordinary skill in the art would understand that certain modifications could come within the scope of this disclosure. For limiting the scope of the invention, a subsequent Complete Specification be filed to determine the true scope and content of this disclosure.
| # | Name | Date |
|---|---|---|
| 1 | 202021011848-IntimationOfGrant20-03-2024.pdf | 2024-03-20 |
| 1 | 202021011848-STATEMENT OF UNDERTAKING (FORM 3) [19-03-2020(online)].pdf | 2020-03-19 |
| 2 | 202021011848-PatentCertificate20-03-2024.pdf | 2024-03-20 |
| 2 | 202021011848-REQUEST FOR EXAMINATION (FORM-18) [19-03-2020(online)].pdf | 2020-03-19 |
| 3 | 202021011848-POWER OF AUTHORITY [19-03-2020(online)].pdf | 2020-03-19 |
| 3 | 202021011848-CLAIMS [18-04-2022(online)].pdf | 2022-04-18 |
| 4 | 202021011848-FORM 18 [19-03-2020(online)].pdf | 2020-03-19 |
| 4 | 202021011848-FER_SER_REPLY [18-04-2022(online)].pdf | 2022-04-18 |
| 5 | 202021011848-OTHERS [18-04-2022(online)].pdf | 2022-04-18 |
| 5 | 202021011848-FORM 1 [19-03-2020(online)].pdf | 2020-03-19 |
| 6 | 202021011848-FIGURE OF ABSTRACT [19-03-2020(online)].pdf | 2020-03-19 |
| 6 | 202021011848-FER.pdf | 2021-11-01 |
| 7 | 202021011848-FORM 3 [21-09-2020(online)].pdf | 2020-09-21 |
| 7 | 202021011848-DRAWINGS [19-03-2020(online)].pdf | 2020-03-19 |
| 8 | 202021011848-COMPLETE SPECIFICATION [19-03-2020(online)].pdf | 2020-03-19 |
| 8 | 202021011848-Proof of Right [15-07-2020(online)].pdf | 2020-07-15 |
| 9 | Abstract1.jpg | 2020-04-20 |
| 10 | 202021011848-Proof of Right [15-07-2020(online)].pdf | 2020-07-15 |
| 10 | 202021011848-COMPLETE SPECIFICATION [19-03-2020(online)].pdf | 2020-03-19 |
| 11 | 202021011848-FORM 3 [21-09-2020(online)].pdf | 2020-09-21 |
| 11 | 202021011848-DRAWINGS [19-03-2020(online)].pdf | 2020-03-19 |
| 12 | 202021011848-FIGURE OF ABSTRACT [19-03-2020(online)].pdf | 2020-03-19 |
| 12 | 202021011848-FER.pdf | 2021-11-01 |
| 13 | 202021011848-OTHERS [18-04-2022(online)].pdf | 2022-04-18 |
| 13 | 202021011848-FORM 1 [19-03-2020(online)].pdf | 2020-03-19 |
| 14 | 202021011848-FORM 18 [19-03-2020(online)].pdf | 2020-03-19 |
| 14 | 202021011848-FER_SER_REPLY [18-04-2022(online)].pdf | 2022-04-18 |
| 15 | 202021011848-POWER OF AUTHORITY [19-03-2020(online)].pdf | 2020-03-19 |
| 15 | 202021011848-CLAIMS [18-04-2022(online)].pdf | 2022-04-18 |
| 16 | 202021011848-REQUEST FOR EXAMINATION (FORM-18) [19-03-2020(online)].pdf | 2020-03-19 |
| 16 | 202021011848-PatentCertificate20-03-2024.pdf | 2024-03-20 |
| 17 | 202021011848-STATEMENT OF UNDERTAKING (FORM 3) [19-03-2020(online)].pdf | 2020-03-19 |
| 17 | 202021011848-IntimationOfGrant20-03-2024.pdf | 2024-03-20 |
| 1 | NPL1E_28-10-2021.pdf |
| 1 | SearchHistory(53)E_28-10-2021.pdf |
| 2 | NPL1E_28-10-2021.pdf |
| 2 | SearchHistory(53)E_28-10-2021.pdf |