System And Method For Training Translation Models

< Back

System And Method For Training Translation Models

Abstract: The present disclosure relates to a method and a system for training language translation models. The method includes creating a training dataset with a set of source language sequences and a corresponding set of target language sequences, assigning one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in the training dataset, creating a corpus of unique tokens associated with the one or more stream tags, and training a machine learning model to translate input sequences into output sequences, wherein the model assigns a higher prediction score to tokens in the corpus that are associated with the stream tags in the input sequence compared to tokens not associated with the stream tags during generation of the output sequence.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

11 April 2023

Publication Number

42/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Flipkart Internet Private Limited

Building Alyssa Begonia & Clover, Embassy Tech Village, Outer Ring Road, Devarabeesanahalli Village, Bengaluru - 560103, Karnataka, India.

Inventors

1. PATIL, Amey

Flipkart Internet Private Limited, Building Alyssa Begonia & Clover, Embassy Tech Village, Outer Ring Road, Devarabeesanahalli Village, Bengaluru - 560103, Karnataka, India.

2. GARERA, Nikesh

Flipkart Internet Private Limited, Building Alyssa Begonia & Clover, Embassy Tech Village, Outer Ring Road, Devarabeesanahalli Village, Bengaluru - 560103, Karnataka, India.

Specification

Description:TECHNICAL FIELD
[0001] The present disclosure relates generally to translation models. In particular, the present disclosure relates to a method and a system for training translation models.

BACKGROUND
[0002] Background description includes information that may be useful in understanding the present disclosure. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed disclosure, or that any publication specifically or implicitly referenced is prior art.
[0003] The process of translating text from one language to another is a one-to-many task involving a set of complex tasks, such as understanding a source language and accurately conveying the meaning of the source text in a target language. The task becomes even more complex when translating texts with different styles, such as texts in different genres, different contexts, or different levels of formality. Currently, there are two main approaches to translating texts into different styles. The first approach is to fine-tune models to be specific to one particular domain, such as a genre or a style. For example, if a company wants to create a translation model for legal documents, they would train a model on a corpus of legal documents to create a model that is specific to that domain. The limitation of this approach is that managing multiple individual models for each language or style may be time-consuming and costly, and the final model may be slightly inferior to a model that has been trained on a larger, multi-domain dataset.
[0004] The second approach is to control translation style through domain-specific language model rescoring along with transformer decoder beam search. This approach does not require training a separate model for each language or style. However, as this approach is independent of the translation model, said translation model may not have any information on the desired translation choices, i.e. the user may not be able to choose a desired style of the translated language.
[0005] Further, existing translation models do not perform well with Indian languages, and do not reliably translate input texts into a linguistic style as specified by a user. Specifically, existing models do not perform well when the user makes choices as to whether the translation must be made using colloquial or non-colloquial languages.
[0006] There is, therefore, a need for a method and a system to address the above-mentioned problems.

OBJECTS OF THE INVENTION
[0007] An object of the present disclosure is to provide a method and a system for training translation models.
[0008] Another object of the present disclosure is to train a single model for translating input into multiple languages.
[0009] Another object of the present disclosure is to train a single model for translating inputs into multiple linguistic styles.
[0010] Yet another object of the present disclosure is to train a translation model using stream tags.
[0011] The other objects and advantages of the present disclosure will be apparent from the following description when read in conjunction with the accompanying drawings, which are incorporated for illustration of the preferred embodiments of the present disclosure and are not intended to limit the scope thereof.

SUMMARY
[0012] Aspects of the present disclosure relate generally to translation models. In particular, the present disclosure relates to a method and a system for training translation models.
[0013] In an aspect, a method for training language translation models may include creating, by a processor, a training dataset with a plurality of language sequence pairs, each language sequence pair having a set of source language sequences and a corresponding set of target language sequences. In an embodiment, the method may include assigning, by the processor, one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in the training dataset. Further, the method may include creating, by the processor, a corpus of tokens that may include a repository of unique tokens in the training dataset, each token in the corpus being associated with the one or more stream tags in the source language sequence and the target language sequence. In an embodiment, the method may include training, by the processor, a machine learning model with the training dataset to translate an input sequence received from a user into an output sequence, where the translation of the input sequence into the output sequence may include assigning a higher prediction score to tokens in the corpus associated with the one or more stream tags in the input sequence when compared to tokens in the corpus not associated with said one or more stream tags during generation of the output sequence.
[0014] In an embodiment, the set of source language sequences may be associated with a first natural language and the set of target language sequences may be associated with a second natural language.
[0015] In an embodiment, the one or more stream tags may be assigned to the training dataset by appending or prepending said one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in each of the plurality of language sequence pairs, where the one or more stream tags may be assigned based on a source of each of the plurality of language sequence pairs.
[0016] In an embodiment, each stream tag may be indicative of one or more linguistic attributes associated with the set of source language sequences and the set of target language sequences, said one or more linguistic attributes being any one or more of context, domain, style, and category of the respective sequence.
[0017] In an embodiment, the method may include generating, by the processor, the output sequence by sequentially selecting the tokens from the corpus based on a prediction score assigned to each token after each selection.
[0018] In an aspect, a system for training language translation models may include a processor, and a memory coupled to the processor, where the memory may include processor-executable instructions, which on execution, cause the processor to create a training dataset with a plurality of language sequence pairs, each language sequence pair having a set of source language sequences and a corresponding set of target language sequences. In an embodiment, the system may assign one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in the training dataset. In an embodiment, the system may create a corpus of tokens that may include a repository of unique tokens in the training dataset, each token in the corpus being associated with the one or more stream tags in the source language sequence and the target language sequence. In an embodiment, the system may train a machine learning model with the training dataset to translate an input sequence received from a user into an output sequence, where the translation of the input sequence into the output sequence may include assigning a higher prediction score to tokens in the corpus associated with the one or more stream tags in the input sequence when compared to tokens in the corpus not associated with said one or more stream tags during generation of the output sequence.
[0019] In an embodiment, the set of source language sequences may be associated with a first natural language and the set of target language sequences may be associated with a second natural language.
[0020] In an embodiment, the one or more stream tags may be assigned to the training dataset by appending or prepending said one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in each of the plurality of language sequence pairs, where the one or more stream tags may be assigned based on a source of each of the plurality of language sequence pairs.
[0021] In an embodiment, each stream tag may be indicative of one or more linguistic attributes associated with the set of source language sequences and the set of target language sequences, said one or more linguistic attributes being any one or more of context, domain, style, and category of the respective sequence.
[0022] In an embodiment, the system may generate the output sequence by sequentially selecting the tokens from the corpus based on a prediction score assigned to each token after each selection.
[0023] A non-transitory computer-readable medium for training language translation models may include processor-executable instructions that may cause a processor to create a training dataset with a plurality of language sequence pairs, each language sequence pair having a set of source language sequences and a corresponding set of target language sequences, assign one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in the training dataset, create a corpus of tokens that may include a repository of unique tokens in the training dataset, each token in the corpus being associated with the one or more stream tags in the source language sequence and the target language sequence, and train a machine learning model with the training dataset to translate an input sequence received from a user into an output sequence, where the translation of the input sequence into the output sequence may include assigning a higher prediction score to tokens in the corpus associated with the one or more stream tags in the input sequence when compared to tokens in the corpus not associated with said one or more stream tags during generation of the output sequence.
[0024] Various objects, features, aspects, and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.

BRIEF DESCRIPTION OF THE DRAWINGS
[0025] The accompanying drawings are included to provide a further understanding of the present disclosure and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present disclosure and, together with the description, serve to explain the principles of the present disclosure.
[0026] FIG. 1 illustrates an exemplary network architecture for implementing a proposed system for training translation models, according to embodiments of the present disclosure.
[0027] FIG. 2 illustrates an exemplary block diagram representation of the proposed system, according to embodiments of the present disclosure.
[0028] FIGs. 3A-C illustrate exemplary representations of prediction scores assigned to tokens by the proposed system, according to embodiments of the present disclosure.
[0029] FIG. 4 illustrates a flow chart depicting a method for training translation models, according to embodiments of the present disclosure.
[0030] FIG. 5 illustrates a hardware platform for the implementation of the proposed system, according to embodiments of the present disclosure.
[0031] The foregoing shall be more apparent from the following more detailed description of the invention.

DETAILED DESCRIPTION
[0032] In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.
[0033] The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that, various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the invention as set forth.
[0034] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.
[0035] Also, it is noted that individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
[0036] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive—in a manner similar to the term “comprising” as an open transition word—without precluding any additional or other elements.
[0037] As used herein, “connect,” “configure,” “couple,” and its cognate terms, such as “connects,” “connected,” “configured,” and “coupled” may include a physical connection (such as a wired/wireless connection), a logical connection (such as through logical gates of semiconducting device), other suitable connections, or a combination of such connections, as may be obvious to a skilled person.
[0038] As used herein, “send,” “transfer,” “transmit,” and their cognate terms like “sending,” “sent,” “transferring,” “transmitting,” “transferred,” “transmitted,” etc. include sending or transporting data or information from one unit or component to another unit or component, wherein the content may or may not be modified before or after sending, transferring, transmitting.
[0039] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[0040] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed products.
[0041] Various embodiments of the present disclosure provide a system and a method for translation models. In particular, the present disclosure relates to a method and system for training translation models.
[0042] In an aspect, a system for training language translation models may include a processor, and a memory coupled to the processor, where the memory may include processor-executable instructions, which on execution, cause the processor to create a training dataset with a plurality of language sequence pairs, each language sequence pair having a set of source language sequences and a corresponding set of target language sequences. In an embodiment, the system may assign one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in the training dataset. In an embodiment, the system may create a corpus of tokens that may include a repository of unique tokens in the training dataset, each token in the corpus being associated with the one or more stream tags in the source language sequence and the target language sequence. In an embodiment, the system may train a machine learning model with the training dataset to translate an input sequence received from a user into an output sequence, where the translation of the input sequence into the output sequence may include assigning a higher prediction score to tokens in the corpus associated with the one or more stream tags in the input sequence when compared to tokens in the corpus not associated with said one or more stream tags during generation of the output sequence.
[0043] FIG. 1 illustrates an exemplary network architecture 100 for implementing a proposed system 110 for translation models, according to embodiments of the present disclosure. The network architecture 100 may include an electronic device 108, the system 110, and a centralized server 118. The system 110 may be connected to the centralized server 118 via a communication network 106.
[0044] In an embodiment, the centralized server 118 may include, but is not limited to, a stand-alone server, a remote server, a cloud computing server, a dedicated server, a rack server, a server blade, a server rack, a bank of servers, a server farm, hardware supporting a part of a cloud service or system, a home server, hardware running a virtualized server, one or more processors executing code to function as a server, one or more machines performing server-side functionality as described herein, at least a portion of any of the above, some combination thereof, and the like.
[0045] In an embodiment, the communication network 106 may be a wired communication network or a wireless communication network. The wireless communication network may be any wireless communication network capable of transferring data between entities of that network such as, but are not limited to, a Bluetooth, a Zigbee, a Near Field Communication (NFC), a Wireless-Fidelity (Wi-Fi) network, a Light Fidelity (Li-FI) network, a carrier network including a circuit-switched network, a packet switched network, a Public Switched Telephone Network (PSTN), a Content Delivery Network (CDN) network, an Internet, intranets, Local Area Networks (LANs), Wide Area Networks (WANs), mobile communication networks including a Second Generation (2G), a Third Generation (3G), a Fourth Generation (4G), a Fifth Generation (5G), a Sixth Generation (6G), a Long-Term Evolution (LTE) network, a New Radio (NR), a Narrow-Band (NB), an Internet of Things (IoT) network, a Global System for Mobile Communications (GSM) network and a Universal Mobile Telecommunications System (UMTS) network, combinations thereof, and the like.
[0046] In an embodiment, the system 110 may be implemented by way of a single device or a combination of multiple devices that may be operatively connected or networked together. For example, the system 110 may be implemented by way of a standalone device such as the centralized server 118, and the like, and may be communicatively coupled to the electronic device 108. In another example, the system 110 may be implemented in/associated with the electronic device 108. In yet another example, the system 110 may be implemented in/associated with respective one or more computing devices (104-1, 104-2…104-N) (individually referred to as the computing device 104 and collectively referred to as the computing devices 104), associated with one or more users (102-1, 102-2…102-N) (individually referred to as the user 102 and collectively referred to as the users 102). In such a scenario, the system 110 may be replicated in each of the computing devices 104. In an embodiment, the user 102 may be a user of, but not limited to, an electronic commerce (e-commerce) platform, a marketplace, a merchant platform, a hyperlocal platform, a super-mart platform, a media platform, a service providing platform, a social networking platform, a travel/services booking platform, a messaging platform, a bot processing platform, a virtual assistance platform, an Artificial Intelligence (AI) based platform, a blockchain platform, a blockchain marketplace, and the like. In some instances, the user 102 may correspond to an entity/administrator of platforms/services.
[0047] In an embodiment, the electronic device 108 and/or the computing devices 104 may be at least one of an electrical, an electronic, an electromechanical, and a computing device. The electronic device 108 and/or the computing devices 104 may include, but is not limited to, a mobile device, a smart-phone, a Personal Digital Assistant (PDA), a tablet computer, a phablet computer, a wearable computing device, a Virtual Reality/Augmented Reality (VR/AR) device, a laptop, a desktop, a server, and the like.
[0048] In an embodiment, the system 110 may be implemented in hardware or a suitable combination of hardware and software. The system 110 or the centralized server 118 may be associated with entities (not shown).
[0049] Further, the system 110 may include a processor 112, an Input/Output (I/O) interface 114, and a memory 116. The I/O interface 114 of the system 110 may be used to receive user inputs from the computing devices 104 associated with the users 102. Further, the system 110 may also include other units such as a display unit, an input unit, an output unit, and the like, however the same are not shown in FIG. 1, for the purpose of clarity. Also, in FIG. 1, only a few units are shown, however, the system 110 or the network architecture 100 may include multiple such units or the system 110/network architecture 100 may include any such numbers of the units, obvious to a person skilled in the art or as required to implement the features of the present disclosure.
[0050] In an embodiment, the system 110 may be a hardware device including the processor 112 executing machine-readable program instructions to translation models in a computing environment. Execution of the machine-readable program instructions by the processor 112 may enable the proposed system 110 to train translation models. The “hardware” may comprise a combination of discrete components, an integrated circuit, an application-specific integrated circuit, a field programmable gate array, a digital signal processor, or other suitable hardware. The “software” may comprise one or more objects, agents, threads, lines of code, subroutines, separate software applications, two or more lines of code, or other suitable software structures operating in one or more software applications or on one or more processors. The processor 112 may include, for example, but is not limited to, microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuits, and any devices that manipulate data or signals based on operational instructions, and the like. Among other capabilities, the processor 112 may fetch and execute computer-readable instructions in the memory 116 operationally coupled with the system 110 for performing tasks such as data processing, input/output processing, feature extraction, and/or any other functions. Any reference to a task in the present disclosure may refer to an operation being or that may be performed on data.
[0051] In the example that follows, assume that the user 102 of the system 110 desires to use the system 110 for training translation models. In this instance, the user 102 may include any entity including, but not limited to, software developers, programmers, end-users, software testers, enterprise customers, end users, content providers, web developers, system integrators, database administrators, and the like, who may operate the system 110 for training translation models. The system 110, when associated with the electronic device 108 or the centralized server 118, may include, but is not limited to, a touch panel, a soft keypad, a hard keypad (including buttons), and the like, through which the user 102 operates said system 110.
[0052] In an aspect, the system 110 for training language translation models may, through the processor 112, create a training dataset with a plurality of language sequence pairs, each language sequence pair having a set of source language sequences and a corresponding set of target language sequences. In an embodiment, the set of source language sequences may be associated with a first natural language and the set of target language sequences may be associated with a second natural language.
[0053] In an embodiment, the system 110 may assign one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in the training dataset. In an embodiment, the one or more stream tags may be assigned to the training dataset by appending or prepending said one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in each of the plurality of language sequence pairs, where the one or more stream tags may be assigned based on a source of each of the plurality of language sequence pairs. In an embodiment, each stream tag may be indicative of one or more linguistic attributes associated with the set of source language sequences and the set of target language sequences, said one or more linguistic attributes being, but not limited to, context, domain, style, and category of the respective sequence.
[0054] In an embodiment, the system 110 may create a corpus of tokens that may include a repository of unique tokens in the training dataset, each token in the corpus being associated with the one or more stream tags in the source language sequence and the target language sequence. In an embodiment, the system 110 may train a machine learning model with the training dataset to translate an input sequence received from a user into an output sequence, where the translation of the input sequence into the output sequence may include assigning a higher prediction score to tokens in the corpus associated with the one or more stream tags in the input sequence when compared to tokens in the corpus not associated with said one or more stream tags during generation of the output sequence.
[0055] In an embodiment, the system 110 and/or the processor 112 may be configured to generate the output sequence by sequentially selecting the tokens from the corpus based on a prediction score assigned to each token after each selection.
[0056] In an embodiment, a non-transitory computer-readable medium for training language translation models may include processor-executable instructions that cause a processor to perform the steps of the method disclosed herein.
[0057] FIG. 2 illustrates an exemplary block diagram representation of the proposed system 110, according to embodiments of the present disclosure.
[0058] In an embodiment, data 202 may include source language data 206, target language data 208, stream tag data 210, token corpus data 212, machine learning (ML) model data 214, and other data 216. In an embodiment, the data 202 may be stored in the memory 116 in the form of various data structures. Additionally, the data 202 may be organized using data models, such as relational or hierarchical data models. The other data 216 may store data, including temporary data and temporary files, generated by modules 204 for performing the various functions of the system 110.
[0059] In an embodiment, the modules 204 may include a dataset creating module 218, an assigning module 220, a corpus creating module 222, a training module 224 and other modules 226.
[0060] In an embodiment, the data 202 stored in the memory 116 may be processed by the modules 204 of the system 110. The modules 204 may be stored within the memory 116. In an example, the modules 204, communicatively coupled to the processor 112 configured in the system 110, may also be present outside the memory 116, as shown in FIG. 2, and implemented as hardware. As used herein, the term modules may refer to an Application-Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality.
[0061] In an embodiment, the dataset creating module 218 may create a training dataset with a plurality of language sequence pairs, each language sequence pair having a set of source language sequences and a corresponding set of target language sequences. In an embodiment, the source language sequence and the target language sequence may be implemented as the source language data 206 and the corresponding target language data 208, respectively. In an embodiment, the set of source language sequences may be associated with a first natural language and the set of target language sequences may be associated with a second natural language. In an example, the source language data 206 may be indicative of a sentence in English and the corresponding target language data 208 may be indicative of a sentence in Hindi conveying the same meaning or connotation of the source language data 206. In an embodiment, the training dataset may be stored in a database coupled to the system 110. In an embodiment, the training dataset may be developed from a wide range of sources, and may be an imbalanced dataset with respect to the number of language sequence pairs from each domain.
[0062] In an embodiment, the assigning module 220 may assign one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in the training dataset. In an embodiment, the one or more stream tags may be implemented as stream tag data 210. In an embodiment, the one or more stream tags may be assigned to the training dataset by appending or prepending said one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in each of the plurality of language sequence pairs, wherein the one or more stream tags may be assigned based on a source of each of the plurality of language sequence pairs. In an embodiment, each stream tag may be indicative of one or more linguistic attributes associated with the set of source language sequences and the set of target language sequences, said one or more linguistic attributes being, but not limited to, context, domain, style, and category of the respective sequence.
[0063] In an embodiment, the corpus creating module 222 may create a corpus of tokens that may include a repository of unique tokens in the training dataset, each token in the corpus being associated with the one or more stream tags in the source language sequence and the target language sequence. In an embodiment, the corpus of tokens may be implemented as the token corpus data 212.
[0064] In an embodiment, the training module 224 may train an ML model with the training dataset to translate an input sequence received from a user into an output sequence, where the translation of the input sequence into the output sequence may include assigning a higher prediction score to tokens in the corpus associated with the one or more stream tags in the input sequence when compared to tokens in the corpus not associated with said one or more stream tags during generation of the output sequence. In an embodiment, the input sequence may include one or more stream tags indicative of the style and or language to which said input sequence is to be translated to. In an embodiment, the ML model may be implemented as the ML model data 214. In an embodiment, the ML model data 214 may be indicative of any or combination of Rule-based Machine Translation, Statistical Machine Translation, deep neural networks, convolutional neural networks, recurrent neural networks, encoder-decoder models, and transformer models. In an embodiment, the processor 112 may be configured to generate the output sequence by sequentially selecting the tokens from the corpus based on a prediction score assigned to each token after each selection.
Exemplary scenario
[0065] Consider a scenario where a user 102 may require translation of a piece of text on a website in a particular style and language. In such a scenario, the piece of text to be translated may refer to name, description, and/or reviews associated with a product on an e-commerce website. In an embodiment, the website containing the piece of text may be communicatively coupled to the system 110 such that said piece of text may be translated when the user 102 accesses the website. The system 110 may then translate said piece of text into the required language, style, and/or dialect using the one or more stream tags associated with the piece of text. In the example that follows, the piece of text may be the input sequence indicative of an English sentence, i.e. “This product is amazing” that is to be translated into the output sequence indicative of a Hindi sentence. Further, in the example, the sentence may be translated into a language or a linguistic style based on the one or more stream tags provided along with the English sentence.
[0066] FIG. 3A depicts an exemplary representation of the prediction score assigned to each token in the corpus by the system 110 when the one or more stream tags are not assigned to the input sequence. As shown, the system 110 may assign a prediction score of 0.34 to the word ‘उत्पाद’ and a prediction score of 0.56 to the word ‘प्रोडक्ट’, when no stream tag is provided with the input sequence. In such examples, the translation model may not show a significant preference to colloquial or non-colloquial words. FIG. 3B depicts an exemplary representation of the prediction score assigned to the tokens in the corpus by the system 110 based on the one or more stream tags associated with the input sequence. As shown, the system 110 may assign a higher prediction score to the word ‘उत्पाद’ of 0.84 compared to the word ‘प्रोडक्ट’ of 0.13, when the stream tag associated with the input sequence is ‘’. In such examples, higher prediction score may be assigned to word ‘उत्पाद’ as said word may be associated with non-colloquial and formal language used in the context of providing a general description for a product. Here, the stream tag indicates to the ML model that the translated output sequence may relate to a description of the product. Further, FIG. 3C depicts an exemplary representation of the prediction score assigned to the tokens in the corpus by the system 110 based on the one or more stream tags associated with the input sequence. As shown, the system 110 may assign a higher prediction score to the word ‘प्रोडक्ट’ of 0.91 as compared to the word ‘उत्पाद’ of 0.80, when the stream tag associated with the input sequence is ‘’. In such examples, higher prediction score may be assigned to the word ‘प्रोडक्ट’ as said word may be associated with colloquial and informal language generally used in the context of user provided reviews for the product. Here, the stream tag indicates to the ML model that the translated output sequence may relate to user provided reviews. In FIGs. 3B and 3C, the system 110 may use the one or more stream tags as conditional variables for assigning prediction scores to each token in the corpus. By using one or more stream tags in the input sequence, the system 110 may be able to provide more appropriate output sequences.
[0067] FIG. 4 illustrates a flow chart depicting a method 400 for training translation models, according to embodiments of the present disclosure.
[0068] At step 402, the method 400 may include creating, by a processor such as the processor 112 of FIGs. 1 and 2, associated with a system such as 110, a training dataset with a plurality of language sequence pairs, each language sequence pair having a set of source language sequences and a corresponding set of target language sequences;
[0069] At step 404, the method 400 may include assigning, by the processor, one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in the training dataset;
[0070] At step 406, the method 400 may include creating, by the processor, a corpus of tokens including a repository of unique tokens in the training dataset, each token in the corpus being associated with the one or more stream tags in the source language sequence and the target language sequence; and
[0071] At step 408, the method 400 may include training, by the processor, an ML model with the training dataset to translate an input sequence received from a user into an output sequence, where the translation of the input sequence into the output sequence may include assigning a higher prediction score to tokens in the corpus associated with the one or more stream tags in the input sequence when compared to tokens in the corpus not associated with said one or more stream tags during generation of the output sequence.
[0072] The order in which the method 400 is described is not intended to be construed as a limitation, and any number of the described method blocks may be combined or otherwise performed in any order to implement the method 400 or an alternate method. Additionally, individual blocks may be deleted from the method 400 without departing from the spirit and scope of the present disclosure described herein. Furthermore, the method 400 may be implemented in any suitable hardware, software, firmware, or a combination thereof that exists in the related art or that is later developed. The method 400 describes, without limitation, the implementation of the system 110. A person of skill in the art will understand that method 400 may be modified appropriately for implementation in various manners without departing from the scope and spirit of the disclosure.
[0073] FIG. 5 illustrates a hardware platform 500 for implementation of the disclosed system 110, according to an example embodiment of the present disclosure. For the sake of brevity, the construction, and operational features of the system 110 which are explained in detail above are not explained in detail herein. Particularly, computing machines such as but not limited to internal/external server clusters, quantum computers, desktops, laptops, smartphones, tablets, and wearables which may be used to execute the system 110 or may include the structure of the hardware platform 500. As illustrated, the hardware platform 500 may include additional components not shown, and that some of the components described may be removed and/or modified. For example, a computer system with multiple Graphics Processing Units (GPUs) may be located on external-cloud platforms or internal corporate cloud computing clusters, or organizational computing resources, and the like.
[0074] The hardware platform 500 may be a computer system such as the system 110 that may be used with the embodiments described herein. The computer system may represent a computational platform that includes components that may be in a server or another computer system. The computer system may execute, by the processor 505 (e.g., a single or multiple processors) or other hardware processing circuit, the methods, functions, and other processes described herein. These methods, functions, and other processes may be embodied as machine-readable instructions stored on a computer-readable medium, which may be non-transitory, such as hardware storage devices (e.g., random access memory (RAM), read-only memory (ROM), erasable programmable ROM (EPROM), electrically erasable, programmable ROM (EEPROM), hard drives, and flash memory). The computer system may include the processor 505 that executes software instructions or code stored on a non-transitory computer-readable storage medium 510 to perform methods of the present disclosure. The software code includes, for example, instructions to gather data and documents and analyze documents. In an example, the modules 204, may be software codes or components performing these steps. For example, the modules may include a dataset creating module 218, an assigning module 220, a corpus creating module 222, a training module 224 and other modules 226.
[0075] The instructions on the computer-readable storage medium 510 are read and stored the instructions in storage 515 or in random access memory (RAM). The storage 515 may provide a space for keeping static data where at least some instructions could be stored for later execution. The stored instructions may be further compiled to generate other representations of the instructions and dynamically stored in the RAM such as RAM 520. The processor 505 may read instructions from the RAM 520 and perform actions as instructed.
[0076] The computer system may further include the output device 525 to provide at least some of the results of the execution as output including, but not limited to, visual information to users, such as external agents. The output device 525 may include a display on computing devices and virtual reality glasses. For example, the display may be a mobile phone screen or a laptop screen, where a Graphical User Interface (GUI) and/or text may be presented as an output on the display screen. The computer system may further include an input device 530 to provide a user or another device with mechanisms for entering data and/or otherwise interacting with the computer system. The input device 530 may include, for example, a keyboard, a keypad, a mouse, or a touchscreen. Each of these output devices 525 and input device 530 may be joined by one or more additional peripherals. For example, the output device 525 may be used to display the results.
[0077] A network communicator 535 may be provided to connect the computer system to a network and in turn to other devices connected to the network including other clients, servers, data stores, and interfaces, for instance. A network communicator 535 may include, for example, a network adapter such as a Local Access Network (LAN) adapter or a wireless adapter. The computer system may include a data sources interface 540 to access the data source 545. The data source 545 may be an information resource. As an example, a database of exceptions and rules may be provided as the data source 545. Moreover, knowledge repositories and curated data may be other examples of the data source 545.
[0078] The present disclosure, therefore, solves the need for a method and a system for training translation models.
[0079] While the foregoing describes various embodiments of the invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. The scope of the invention is determined by the claims that follow. The invention is not limited to the described embodiments, versions, or examples, which are included to enable a person having ordinary skill in the art to make and use the invention when combined with information and knowledge available to the person having ordinary skill in the art.

ADVANTAGES OF THE INVENTION
[0080] The present disclosure provides a method and a system for training translation models.
[0081] The present disclosure provides a method and a system for training a single model for translating input into multiple languages.
[0082] The present disclosure provides a method and a system for training a single model for translating inputs into multiple linguistic styles.
[0083] The present disclosure provides a method and a system for training a translation model using stream tags.
, Claims:1. A method for training language translation models, the method comprising:
creating, by a processor, a training dataset with a plurality of language sequence pairs, each language sequence pair having a set of source language sequences and a corresponding set of target language sequences;
assigning, by the processor, one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in the training dataset;
creating, by the processor, a corpus of tokens comprising a repository of unique tokens in the training dataset, each token in the corpus being associated with the one or more stream tags in the source language sequence and the target language sequence; and
training, by the processor, a machine learning model with the training dataset to translate an input sequence received from a user into an output sequence, wherein the translation of the input sequence into the output sequence comprises assigning a higher prediction score to tokens in the corpus associated with the one or more stream tags in the input sequence when compared to tokens in the corpus not associated with said one or more stream tags during generation of the output sequence.

2. The method as claimed in claim 1, wherein the set of source language sequences is associated with a first natural language and the set of target language sequences is associated with a second natural language.

3. The method as claimed in claim 1, wherein the one or more stream tags are assigned to the training dataset by appending or prepending said one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in each of the plurality of language sequence pairs, and wherein the one or more stream tags are assigned based on a source of each of the plurality of language sequence pairs.
4. The method as claimed in claim 1, wherein each stream tag is indicative of one or more linguistic attributes associated with the set of source language sequences and the set of target language sequences, said one or more linguistic attributes being any one or more of: context, domain, style, and category of the respective sequence.

5. The method as claimed in claim 1, comprising generating, by the processor, the output sequence by sequentially selecting the tokens from the corpus based on a prediction score assigned to each token after each selection.

6. A system for training language translation models, the system comprising:
a processor; and
a memory coupled to the processor, wherein the memory comprises processor-executable instructions, which on execution, cause the processor to:
create a training dataset with a plurality of language sequence pairs, each language sequence pair having a set of source language sequences and a corresponding set of target language sequences;
assign one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in the training dataset;
create a corpus of tokens comprising a repository of unique tokens in the training dataset, each token in the corpus being associated with the one or more stream tags in the source language sequence and the target language sequence;
train a machine learning model with the training dataset to translate an input sequence received from a user into an output sequence, wherein the translation of the input sequence into the output sequence comprises assigning a higher prediction score to tokens in the corpus associated with the one or more stream tags in the input sequence when compared to tokens in the corpus not associated with said one or more stream tags during generation of the output sequence.

7. The system as claimed in claim 6, wherein the set of source language sequences is associated with a first natural language and the set of target language sequences is associated with a second natural language.

8. The system as claimed in claim 6, wherein the one or more stream tags are assigned to the training dataset by appending or prepending said one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in each of the plurality of language sequence pairs, and wherein the one or more stream tags are assigned based on a source of each of the plurality of language sequence pairs.

9. The system as claimed in claim 6, wherein each stream tag is indicative of one or more linguistic attributes associated with the set of source language sequences and the set of target language sequences, said one or more linguistic attributes being any one or more of: context, domain, style, and category of the respective sequence.

10. The system as claimed in claim 6, wherein the processor is configured to generate the output sequence by sequentially selecting the tokens from the corpus based on a prediction score assigned to each token after each selection.

11. A non-transitory computer-readable medium for training language translation models comprising processor-executable instructions that cause a processor to:
create a training dataset with a plurality of language sequence pairs, each language sequence pair having a set of source language sequences and a corresponding set of target language sequences;
assign one or more stream tags to the set of source language sequences and the corresponding set of target language sequences in the training dataset;
create a corpus of tokens comprising a repository of unique tokens in the training dataset, each token in the corpus being associated with the one or more stream tags in the source language sequence and the target language sequence;
train a machine learning model with the training dataset to translate an input sequence received from a user into an output sequence, wherein the translation of the input sequence into the output sequence comprises assigning a higher prediction score to tokens in the corpus associated with the one or more stream tags in the input sequence when compared to tokens in the corpus not associated with said one or more stream tags during generation of the output sequence.

Documents

Application Documents

#	Name	Date
1	202341026751-STATEMENT OF UNDERTAKING (FORM 3) [11-04-2023(online)].pdf	2023-04-11
2	202341026751-POWER OF AUTHORITY [11-04-2023(online)].pdf	2023-04-11
3	202341026751-FORM 1 [11-04-2023(online)].pdf	2023-04-11
4	202341026751-DRAWINGS [11-04-2023(online)].pdf	2023-04-11
5	202341026751-DECLARATION OF INVENTORSHIP (FORM 5) [11-04-2023(online)].pdf	2023-04-11
6	202341026751-COMPLETE SPECIFICATION [11-04-2023(online)].pdf	2023-04-11
7	202341026751-ENDORSEMENT BY INVENTORS [22-04-2023(online)].pdf	2023-04-22
8	202341026751-FORM 18 [09-12-2024(online)].pdf	2024-12-09