Sign In to Follow Application
View All Documents & Correspondence

System And Method For Generation Of Synthetic Words For Flagging Out Of Vocabulary Words

Abstract: The present disclosure provides a system and a method for generation of synthetic words for flagging out of vocabulary (OOV) words. The system generates a pipeline for detection of OOV words with synthetic word handling where words are used in conjunction with context specificity and domain awareness. The system enables detection of synthetic words along with a phonetic awareness to generate synthetic words while retaining the context and enhancing the vocabulary.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
29 April 2022
Publication Number
44/2023
Publication Type
INA
Invention Field
COMMUNICATION
Status
Email
Parent Application

Applicants

JIO PLATFORMS LIMITED
Office-101, Saffron, Nr. Centre Point, Panchwati 5 Rasta, Ambawadi, Ahmedabad - 380006, Gujarat, India.

Inventors

1. JOSHI, Prasad Pradip
Bungalow #34 & 35, ‘Pratisaad’, Meadow gate CHS, Lodha Heaven, Palava, Dombivli - 421204, Maharashtra, India.
2. CHEMUDUPATI, Rajiv
189 Skylite Vesta, NH207, Sarjapur, Anekal Taluk, Bangalore - 562125, Karnataka, India.
3. BANERJEE, Soham
22254, Prestige Lakeside Habitat, Gunjur, Bengaluru, Karnataka - 560087, Karnataka, India.

Specification

DESC:RESERVATION OF RIGHTS
[0001] A portion of the disclosure of this patent document contains material, which is subject to intellectual property rights such as but are not limited to, copyright, design, trademark, integrated circuit (IC) layout design, and/or trade dress protection, belonging to Jio Platforms Limited (JPL) or its affiliates (hereinafter referred as owner). The owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights whatsoever. All rights to such intellectual property are fully reserved by the owner.

FIELD OF INVENTION
[0002] The embodiments of the present disclosure generally relate to systems and methods for natural language processing (NLP), natural language understanding, and text mining. More particularly, the present disclosure relates to a system and a method for generation of synthetic words for flagging out of vocabulary (OOV) words.

BACKGROUND OF INVENTION
[0003] The following description of the related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section is used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of the prior art.
[0004] Vocabulary consists of words and sub word character n-grams that create semantic expressions in any language. Most freely available dictionaries do not provide access to words from an Indian context. Also, with the ever-expanding list of shorthand languages on social media, out of vocabulary (OOV) words are not included in various dictionary representations. Solutions provided may be feasible for an individual domain but not applicable to a broad spectrum. Current systems lack relevance of search as relationships between words and word disambiguation are not captured. Further, current systems include unnecessary word corrections and new words are never learnt or referred for future needs.
[0005] There is, therefore, a need in the art to provide a system and a method that can mitigate the problems associated with the prior arts.

OBJECTS OF THE INVENTION
[0006] Some of the objects of the present disclosure, which at least one embodiment herein satisfies are listed herein below.
[0007] It is an object of the present disclosure to provide a system and a method that can identify a word which is not a part of vocabulary and helps in spelling correction and information retrieval.
[0008] It is an object of the present disclosure to provide a system and a method that improves relevance of search by capturing relationships between words and word disambiguation.
[0009] It is an object of the present disclosure to provide a system and a method that avoids unnecessary word correction with the awareness of Indic context, shortened forms, and Urban dictionaries.
[0010] It is an object of the present disclosure to provide a system and a method that facilitates learning of new words by assimilating data from multiple sources which can be referred to for future needs.
[0011] It is an object of the present disclosure to provide a system and a method that enables synthetic word handling of Indic and English words (verb forms) used in conjunction with context specificity and domain awareness.
[0012] It is an object of the present disclosure to provide a system and a method that involves detection of words along with their phonetic awareness to capture the different variants.
[0013] It is an object of the present disclosure to provide a system and a method that provides Indic awareness, out of vocabulary (OOV) correction, and detection, thereby enhancing search relevance and potential revenue.

SUMMARY
[0014] This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
[0015] In an aspect, the present disclosure relates to a system for synthetic word detection. The system may include a processor, and a memory operatively coupled to the processor, where the memory stores instructions to be executed by the processor. The processor may receive one or more data parameters from one or more computing devices. One or more users may operate the one or more computing devices. The processor may extract the received one or more data parameters to generate pre-processed data. The received one or more data parameters may be based on a plurality of word vectors. The processor may enable a phonetic mapping of the pre-processed data to generate a plurality of word combinations in two or more languages based on the extracted one or more data parameters. The processor may encode the plurality of word combinations to generate one or more synthetic words and enable the synthetic word detection based on the phonetic mapping of the pre-processed data.
[0016] In an embodiment, the received one or more data parameters may include any or a combination of an abbreviated from, a short text, a synthetic word, and a disambiguated word.
[0017] In an embodiment, the processor may be configured to enable the phonetic mapping of the pre-processed data via a plurality of canonical and expanded forms.
[0018] In an embodiment, the processor may be configured to encode the plurality of word combinations by utilizing one or more combinations of Indic words and English verb forms.
[0019] In an embodiment, the processor may be configured to generate one or more phonetic codes to determine a plurality of synthetic word spell variations using one or more techniques.
[0020] In an embodiment, the one or more techniques may include any or a combination of a Soundex technique and a Metaphone technique.
[0021] In an embodiment, the processor may be configured to generate a custom database from a plurality of data sources for storing the generated one or more synthetic words.
[0022] In an aspect, the present disclosure relates to a method for synthetic word detection. The method may include receiving, by a processor, one or more data parameters from one or more computing devices. One or more users may operate the one or more computing devices. The method may include extracting, by the processor, the received one or more data parameters to generate pre-processed data. The received one or more data parameters may be based on a plurality of word vectors. The method may include enabling, by the processor, a phonetic mapping of the one or more pre-processed data to generate a plurality of word combinations in two or more languages based on the extracted data parameters. The method may include encoding, by the processor, the plurality of word combinations to generate one or more synthetic words and enable the synthetic word detection based on the phonetic mapping of the pre-processed data.
[0023] In an embodiment, the method may include enabling, by the processor, the phonetic mapping of the pre-processed data via a plurality of canonical and expanded forms.
[0024] In an embodiment, the method may include encoding, by the processor, the plurality of word combinations by utilizing one or more combinations of Indic words and English verb forms.
[0025] In an embodiment, the method may include generating, by the processor, one or more phonetic codes to determine a plurality of synthetic word spell variations using one or more techniques.
[0026] In an embodiment, the method may include generating, by the processor, a custom database from a plurality of data sources for storing the generated one or more synthetic words.
[0027] In an aspect, a user equipment (UE) for synthetic word detection may include one or more processors communicatively coupled to a processor in a system. The one or more processors may be coupled with a memory and said memory may store instructions to be executed by the one or more processors. The one or more processors may transmit one or more data parameters to the processor via a network. The processor may be configured to receive the one or more data parameters from the UE. The processor may extract the received one or more data parameters to generate pre-processed data. The received one or more data parameters may be based on a plurality of word vectors. The processor may enable a phonetic mapping of the pre-processed data to generate a plurality of word combinations in two or more languages based on the extracted one or more data parameters. The processor may encode the plurality of word combinations to generate one or more synthetic words and enable the synthetic word detection based on the phonetic mapping of the pre-processed data.

BRIEF DESCRIPTION OF DRAWINGS
[0028] The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes the disclosure of electrical components, electronic components, or circuitry commonly used to implement such components.
[0029] FIG. 1 illustrates an exemplary network architecture (100) of a proposed system (110), in accordance with an embodiment of the present disclosure.
[0030] FIG. 2 illustrates an exemplary block diagram (200) of a proposed system (110), in accordance with an embodiment of the present disclosure.
[0031] FIG. 3 illustrates an exemplary flow diagram (300) for creating look up dictionaries, in accordance with an embodiment of the present disclosure.
[0032] FIG. 4 illustrates an exemplary architecture (400) of a data assimilation pipeline, in accordance with an embodiment of the present disclosure.
[0033] FIG. 5 illustrates an exemplary synthetic word generation module (500), in accordance with an embodiment of the present disclosure.
[0034] FIG. 6 illustrates an exemplary computer system (600) in which or with which embodiments of the present disclosure may be implemented.
[0035] The foregoing shall be more apparent from the following more detailed description of the disclosure.

BRIEF DESCRIPTION OF THE INVENTION
[0036] In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.
[0037] The ensuing description provides exemplary embodiments only and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.
[0038] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail to avoid obscuring the embodiments.
[0039] Also, it is noted that individual embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
[0040] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as an open transition word without precluding any additional or other elements.
[0041] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[0042] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
[0043] The various embodiments throughout the disclosure will be explained in more detail with reference to FIGs. 1-6.
[0044] FIG. 1 illustrates an exemplary network architecture (100) of a proposed system (110), in accordance with an embodiment of the present disclosure.
[0045] As illustrated in FIG. 1, the network architecture (100) may include a system (110). The system (110) may be connected to one or more computing devices (104-1, 104-2…104-N) via a network (106). The one or more computing devices (104-1, 104-2…104-N) may be interchangeably specified as a user equipment (UE) (104) and be operated by one or more users (102-1, 102-2....102-N). Further, the one or more users (102-1, 102-2…102-N) may be interchangeably referred as a user (102) or users (102). The system (110) may generate one or more synthetic words based on one or more data parameters provided by the users (102).
[0046] In an embodiment, the computing devices (104) may include, but not be limited to, a mobile, a laptop, etc. Further, the computing devices (104) may include a smartphone, virtual reality (VR) devices, augmented reality (AR) devices, a general-purpose computer, desktop, personal digital assistant, tablet computer, and a mainframe computer. Additionally, input devices for receiving input from the user (102) such as a touch pad, touch-enabled screen, electronic pen, and the like may be used. A person of ordinary skill in the art will appreciate that the computing devices (104) may not be restricted to the mentioned devices and various other devices may be used.
[0047] In an embodiment, the network (106) may include, by way of example but not limitation, at least a portion of one or more networks having one or more nodes that transmit, receive, forward, generate, buffer, store, route, switch, process, or a combination thereof, etc. one or more messages, packets, signals, waves, voltage or current levels, some combination thereof, or so forth. The network (106) may also include, by way of example but not limitation, one or more of a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof.
[0048] In an embodiment, the system (110) may receive one or more data parameters from the one or more computing devices (104). The received one or more data parameters may include, but not be limited to, an abbreviated from, a short text, a synthetic word, and a disambiguated word. For example, abbreviated forms (IND ? India), short text (gr8 ? great), synthetic words (puching ? asking), disambiguation of words with missing vowels and additional vowels (appl ? apple, greaaaat ? great) may be included as the received one or more data parameters. In an embodiment, the received one or more data parameters may further include out of vocabulary (OOV) words such as synthetic words ((khareeding implying buying, boling implying saying). Further, the received one or more data parameters may include short text (twitter/urban language) expansion and vowel elongations (cooool ?cool etc., gr8 ? great etc.) Aliases (MSD ? MS Dhoni, BigB ? Amitabh bachan etc.) may also be included as the received one or more data parameters. Further, the system (110) may be configured with Indic awareness (Indian language context, correct Indic words typed in English, should not be mistaken for typographical errors).
[0049] In an embodiment, the system (110) may extract the received one or more data parameters to generate one or more pre-processed data. The received one or more data parameters may be based on a plurality of word vectors. In an embodiment, the system (110) may enable a phonetic mapping of the pre-processed data to generate a plurality of word combinations in two or more languages based on the extracted one or more data parameters. The system (110) may generate the plurality of word combinations utilizing one or more combinations of Indic words and English verb forms and may map the pre-processed data via a plurality of canonical and expanded forms.
[0050] The system (110) may encode the plurality of word combinations to generate one or more synthetic words and enable the synthetic word detection based on the phonetic mapping of the pre-processed data. Further, the system (110) may generate one or more phonetic codes to determine a plurality of synthetic word spell variations using one or more techniques. The one or more techniques may include, but not be limited to, a Soundex technique and a Metaphone technique.
[0051] In an embodiment, the system (110) may include a text data scraping pipeline from publicly available resources and storage for the scraped data. Further, the system (110) may include a disambiguation layer where the cleaned and pre-processed text data may be mapped to the canonical and expanded forms using the phonetic mappings to account for vowel drop and elongations.
[0052] In an embodiment, the system (110) may include a synthetic word generation layer where the synthetic words may be generated as a combination of Indic words and English verb forms as shown below:
i. Indic words (root) – soch (thought)
ii. English verb forms – “ing” / “ed” / “ify” / “fied”
[0053] Once the Indic and English verb forms are combined to create the synthetic words, the phonetic codes may be generated by the system (110) to capture the synthetic word spell variations using Soundex and Metaphone techniques. A person with ordinary skill in the art may understand that Soundex and Metaphone may refer to phonetic techniques for indexing names by sound as pronounced in English. Further, Metaphone may include a wider set of English pronunciation rules and allow varying lengths of keys compared to Soundex that may allow a fixed-length key.
[0054] Although FIG. 1 shows exemplary components of the network architecture (100), in other embodiments, the network architecture (100) may include fewer components, different components, differently arranged components, or additional functional components than depicted in FIG. 1. Additionally, or alternatively, one or more components of the network architecture (100) may perform functions described as being performed by one or more other components of the network architecture (100).
[0055] FIG. 2 illustrates an exemplary block diagram (200) of a proposed system (110), in accordance with an embodiment of the present disclosure.
[0056] Referring to FIG. 2, the system (110) may comprise one or more processor(s) (202) that may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that process data based on operational instructions. Among other capabilities, the one or more processor(s) (202) may be configured to fetch and execute computer-readable instructions stored in a memory (204) of the system (110). The memory (204) may be configured to store one or more computer-readable instructions or routines in a non-transitory computer readable storage medium, which may be fetched and executed to create or share data packets over a network service. The memory (204) may comprise any non-transitory storage device including, for example, volatile memory such as random-access memory (RAM), or non-volatile memory such as erasable programmable read only memory (EPROM), flash memory, and the like.
[0057] In an embodiment, the system (110) may include an interface(s) (206). The interface(s) (206) may comprise a variety of interfaces, for example, interfaces for data input and output (I/O) devices, storage devices, and the like. The interface(s) (206) may also provide a communication pathway for one or more components of the system (110). Examples of such components include, but are not limited to, processing engine(s) (208) and a database (210), where the processing engine(s) (208) may include, but not be limited to, a data ingestion engine (212), a disambiguation engine (214), a mapping engine (216), and a word generation engine (218).
[0058] The processing engine(s) (208) may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s) (208). In examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processing engine(s) (208) may be processor-executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processing engine(s) (208) may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the machine-readable storage medium may store instructions that, when executed by the processing resource, implement the processing engine(s) (208). In such examples, the system (110) may comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separate but accessible to the system (110) and the processing resource. In other examples, the processing engine(s) (208) may be implemented by electronic circuitry.
[0059] In an embodiment, the processor (220) may receive one or more data parameters from one or more computing devices (104) via the data ingestion engine (212) and store the one or more data parameters in the database (210). In an embodiment, the received one or more data parameters may include, but not be limited to, an abbreviated from, a short text, a synthetic word, and a disambiguated word.
[0060] In an embodiment, the processor (202) may extract the received one or more data parameters to generate pre-processed data via the disambiguation engine (214). The received one or more data parameters may be based on a plurality of word vectors. The processor (202) may enable a phonetic mapping of the pre-processed data to generate a plurality of word combinations in two or more languages based on the extracted one or more data parameters. The processor (202) may be configured to enable the phonetic mapping of the pre-processed data via a plurality of canonical and expanded forms via the mapping engine (216).
[0061] In an embodiment, the processor (202) may encode the plurality of word combinations to generate one or more synthetic words via the word generation engine (218). The processor (202) may enable the synthetic word detection based on the phonetic mapping of the pre-processed data. The processor (202) may enable the encoding of the plurality of word combinations via the word generation engine (218) utilizing one or more combinations of Indic words and English verb forms. The processor (202) may be configured to generate one or more phonetic codes to determine a plurality of synthetic word spell variations using one or more techniques. The one or more techniques may include, but not be limited to, a Soundex technique and a Metaphone technique.
[0062] In an embodiment, the processor (202) may be configured to generate a custom database from a plurality of data sources for storing the generated one or more synthetic words.
[0063] FIG. 3 illustrates an exemplary flow diagram (300) for creating look up dictionaries, in accordance with an embodiment of the present disclosure.
[0064] A scraping layer (302) may scrap data from publicly available resources and store the scraped data. A disambiguation layer (304) may sanitize and pre-process the scraped data, where the scraped data may be mapped to canonical and expanded forms using the phonetic mappings. The output from the disambiguation layer (304) may be stored in a mapped vocabulary module (306). A synthetic word generation layer (308) may generate synthetic words as a combination of Indic words and English verb forms and store them in a look up dictionary (310). Further, information from the mapped vocabulary module (306) may be appended to the look up dictionary (310). The data collected may then be parsed and assimilated to create custom dictionaries which are domain specific. These dictionaries may be further usable for various downstream tasks including, but not limited to, spell-correction, machine translation, embedding generation, query understanding, named entity recognition, and part-of-speech tagging. Addition of these custom dictionaries may provide access to domain specific knowledge inclusive of Indic awareness.
[0065] FIG. 4 illustrates an exemplary architecture (400) of a data assimilation pipeline, in accordance with an embodiment of the present disclosure.
[0066] As illustrated, data may be collected from a plurality of sources such as knowledge bases, knowledge specific web crawls, general web crawls (e-commerce websites, and blogs, etc.) The proposed system (110) may receive word/short text inputs (402) from the plurality of sources. Freely available datasets on websites and the like may be parsed and assimilated to create custom dictionaries (406) that are domain specific. The system (110) may include processing (412) of textual data collected from, knowledge bases, web crawls, and free datasets. The output from processing (412) may be provided to the dictionaries (406). Further, the system (110) may create a vocabulary lookup (404) to enable the phonetic mapping of the data and may generate the plurality of word combinations based on the data. The system (110) may generate a binary output (408) based on the processing of the data from the plurality of sources.
[0067] FIG. 5 illustrates an exemplary synthetic word generation module (500), in accordance with an embodiment of the present disclosure.
[0068] As illustrated, the synthetic word generation module (500) may combine words of two or more languages. For example, the synthetic word generation module (500) may combine English verb forms (502) and Indic words (504) to generate synthetic words and store the synthetic words in a synthetic word corpus (506). Further, a phonetic code generation module (508) may generate a trained model (510) based on the verb forms and the Indic words to generate synthetic words and the synthetic word detection.
[0069] FIG. 6 illustrates an exemplary computer system (600) in which or with which the proposed system may be implemented, in accordance with an embodiment of the present disclosure.
[0070] As shown in FIG. 6, the computer system (600) may include an external storage device (610), a bus (620), a main memory (630), a read-only memory (640), a mass storage device (650), a communication port(s) (660), and a processor (670). A person skilled in the art will appreciate that the computer system (600) may include more than one processor and communication ports. The processor (670) may include various modules associated with embodiments of the present disclosure. The communication port(s) (660) may be any of an RS-232 port for use with a modem-based dialup connection, a 10/100 Ethernet port, a Gigabit or 10 Gigabit port using copper or fiber, a serial port, a parallel port, or other existing or future ports. The communication ports(s) (660) may be chosen depending on a network, such as a Local Area Network (LAN), Wide Area Network (WAN), or any network to which the computer system (600) connects.
[0071] In an embodiment, the main memory (630) may be Random Access Memory (RAM), or any other dynamic storage device commonly known in the art. The read-only memory (640) may be any static storage device(s) e.g., but not limited to, a Programmable Read Only Memory (PROM) chip for storing static information e.g., start-up or basic input/output system (BIOS) instructions for the processor (670). The mass storage device (650) may be any current or future mass storage solution, which can be used to store information and/or instructions. Exemplary mass storage solutions include, but are not limited to, Parallel Advanced Technology Attachment (PATA) or Serial Advanced Technology Attachment (SATA) hard disk drives or solid-state drives (internal or external, e.g., having Universal Serial Bus (USB) and/or Firewire interfaces).
[0072] In an embodiment, the bus (620) may communicatively couple the processor(s) (670) with the other memory, storage, and communication blocks. The bus (620) may be, e.g. a Peripheral Component Interconnect PCI) / PCI Extended (PCI-X) bus, Small Computer System Interface (SCSI), USB, or the like, for connecting expansion cards, drives, and other subsystems as well as other buses, such a front side bus (FSB), which connects the processor (670) to the computer system (600).
[0073] In another embodiment, operator and administrative interfaces, e.g., a display, keyboard, and cursor control device may also be coupled to the bus (620) to support direct operator interaction with the computer system (600). Other operator and administrative interfaces can be provided through network connections connected through the communication port(s) (660). Components described above are meant only to exemplify various possibilities. In no way should the aforementioned exemplary computer system (600) limit the scope of the present disclosure.
[0074] While considerable emphasis has been placed herein on the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the disclosure. These and other changes in the preferred embodiments of the disclosure will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be implemented merely as illustrative of the disclosure and not as a limitation.

ADVANTAGES OF THE INVENTION
[0075] The present disclosure provides a system and a method that improves relevance of search by capturing relationships between words and word disambiguation.
[0076] The present disclosure provides a system and a method that facilitates learning of new words by assimilating data from multiple sources which can be referred to for future needs.
[0077] The present disclosure provides a system and a method that enables synthetic word handling of Indic and English words (verb forms) used in conjunction with context specificity and domain awareness.
[0078] The present disclosure provides a system and a method that involves detection of words along with their phonetic awareness to capture the different variants.
[0079] The present disclosure provides a system and a method that provides Indic awareness, out of vocabulary (OOV) correction and detection, thereby enhancing search relevance and potential revenue.
[0080] The present disclosure provides a system and a method that avoids unnecessary word correction with the awareness of Indic context, shortened forms, and urban dictionaries.
[0081] The present disclosure provides a system and a method that assimilates data from multiple sources where new words are learnt and referred to for future needs.

,CLAIMS:1. A system (110) for synthetic word detection, the system (110) comprising:
a processor (202); and
a memory (204) operatively coupled with the processor (202), wherein said memory (204) stores instructions which when executed by the processor (202) causes the processor (202) to:
receive one or more data parameters from one or more computing devices (104), wherein one or more users (102) operate the one or more computing devices (104);
extract the received one or more data parameters to generate pre-processed data, wherein the received one or more data parameters are based on a plurality of word vectors;
enable a phonetic mapping of the pre-processed data to generate a plurality of word combinations in two or more languages based on the extracted one or more data parameters; and
encode the plurality of word combinations to generate one or more synthetic words and enable the synthetic word detection based on the phonetic mapping of the pre-processed data.

2. The system (110) as claimed in claim 1, wherein the received one or more data parameters comprise any or a combination of: an abbreviated from, a short text, a synthetic word, and a disambiguated word.

3. The system (110) as claimed in claim 1, wherein the processor (202) is configured to enable the phonetic mapping of the pre-processed data via a plurality of canonical and expanded forms.

4. The system (110) as claimed in claim 1, wherein the processor (202) is configured to encode the plurality of word combinations by utilizing one or more combinations of Indic words and English verb forms.
5. The system (110) as claimed in claim 1, wherein the processor (202) is configured to generate one or more phonetic codes to determine a plurality of synthetic word spell variations using one or more techniques.

6. The system (110) as claimed in claim 5, wherein the one or more techniques comprise any or a combination of: a Soundex technique and a Metaphone technique.

7. The system (110) as claimed in claim 1, wherein the processor (202) is configured to generate a custom database from a plurality of data sources for storing the generated one or more synthetic words.

8. A method for synthetic word detection, the method comprising:
receiving, by a processor (202), one or more data parameters from one or more computing devices (104), wherein one or more users (102) operate the one or more computing devices (104);
extracting, by the processor (202), the received one or more data parameters to generate pre-processed data, wherein the received one or more data parameters are based on a plurality of word vectors;
enabling, by the processor (202), a phonetic mapping of the pre-processed data to generate a plurality of word combinations in two or more languages based on the extracted data parameters; and
encoding, by the processor (202), the plurality of word combinations to generate one or more synthetic words and enable the synthetic word detection based on the phonetic mapping of the pre-processed data.

9. The method as claimed in claim 8, comprising enabling, by the processor (202), the phonetic mapping of the pre-processed data via a plurality of canonical and expanded forms.

10. The method as claimed in claim 8, comprising encoding, by the processor (202), the plurality of word combinations by utilizing one or more combinations of Indic words and English verb forms.

11. The method as claimed in claim 8, comprising generating, by the processor (202), one or more phonetic codes to determine a plurality of synthetic word spell variations using one or more techniques.

12. The method as claimed in claim 11, wherein the one or more techniques comprise any or a combination of: a Soundex technique and a Metaphone technique.

13. The method as claimed in claim 8, comprising generating, by the processor (202), a custom database from a plurality of data sources for storing the generated one or more synthetic words.

14. A user equipment (UE) (104) for synthetic word detection, the UE (104) comprising:
one or more processors communicatively coupled to a processor (202) in a system (110), wherein the one or more processors are coupled with a memory, and wherein said memory stores instructions, which when executed by the one or more processors, cause the one or more processors to:
transmit one or more data parameters to the processor (202) via a network (106),
wherein the processor (202) is configured to:
receive the one or more data parameters from the UE (104);
extract the received one or more data parameters to generate pre-processed data, wherein the received one or more data parameters are based on a plurality of word vectors;
enable a phonetic mapping of the pre-processed data to generate a plurality of word combinations in two or more languages based on the extracted one or more data parameters; and
encode the plurality of word combinations to generate one or more synthetic words and enable the synthetic word detection based on the phonetic mapping of the pre-processed data.

Documents

Application Documents

# Name Date
1 202221025307-STATEMENT OF UNDERTAKING (FORM 3) [29-04-2022(online)].pdf 2022-04-29
2 202221025307-PROVISIONAL SPECIFICATION [29-04-2022(online)].pdf 2022-04-29
3 202221025307-POWER OF AUTHORITY [29-04-2022(online)].pdf 2022-04-29
4 202221025307-FORM 1 [29-04-2022(online)].pdf 2022-04-29
5 202221025307-DRAWINGS [29-04-2022(online)].pdf 2022-04-29
6 202221025307-DECLARATION OF INVENTORSHIP (FORM 5) [29-04-2022(online)].pdf 2022-04-29
7 202221025307-ENDORSEMENT BY INVENTORS [28-04-2023(online)].pdf 2023-04-28
8 202221025307-DRAWING [28-04-2023(online)].pdf 2023-04-28
9 202221025307-CORRESPONDENCE-OTHERS [28-04-2023(online)].pdf 2023-04-28
10 202221025307-COMPLETE SPECIFICATION [28-04-2023(online)].pdf 2023-04-28
11 202221025307-FORM-26 [01-05-2023(online)].pdf 2023-05-01
12 202221025307-Covering Letter [01-05-2023(online)].pdf 2023-05-01
13 202221025307-FORM-8 [02-05-2023(online)].pdf 2023-05-02
14 202221025307-FORM 18 [02-05-2023(online)].pdf 2023-05-02
15 202221025307-CORRESPONDENCE (IPO)(WIPO DAS)-12-05-2023.pdf 2023-05-12
16 Abstract1.jpg 2023-06-20
17 202221025307-FER.pdf 2025-04-02
18 202221025307-FORM 3 [02-07-2025(online)].pdf 2025-07-02
19 202221025307-FORM-5 [01-10-2025(online)].pdf 2025-10-01
20 202221025307-FORM-26 [01-10-2025(online)].pdf 2025-10-01
21 202221025307-FER_SER_REPLY [01-10-2025(online)].pdf 2025-10-01
22 202221025307-CORRESPONDENCE [01-10-2025(online)].pdf 2025-10-01
23 202221025307-COMPLETE SPECIFICATION [01-10-2025(online)].pdf 2025-10-01
24 202221025307-CLAIMS [01-10-2025(online)].pdf 2025-10-01

Search Strategy

1 202221025307E_09-10-2024.pdf