Abstract: Disclosed is a system for translating sign language into spoken language, comprising: a computing device equipped with a camera for capturing images of hand gestures, wherein the computing device captures an image of a hand gesture using the camera; a server configured to acquire the captured image from the computing device; detect a sign language gesture from the acquired image using an OpenCV framework, wherein the detection involves analyzing features of the hand gesture to recognize sign language symbols; generate a textual output corresponding to the recognized sign language gesture, wherein the textual output represents the meaning of the hand gesture in a first language; translate the generated textual output into a second language text using a machine learning protocol, wherein the translation involves processing the textual output through a trained model to output the second language text; generate an audio file based on the translated second language text using a Google Text-to-Speech (GTTS) library, wherein the audio file audibly represents the translated second language text; and transmit the audio file to the computing device. Fig. 1 Drawings / FIG. 1 / FIG. 2 / / FIG. 4 / FIG. 5
Description:Field of the Invention
Generally, the present disclosure relates to communication systems. Particularly, the present disclosure relates to systems for translating sign language into spoken language.
Background
The background description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
In the current era, the facilitation of effective communication stands as a cornerstone for inclusive societal participation. The emphasis on removing communication barriers for individuals who rely on sign language for interaction underscores the evolution of assistive technologies. Despite significant progress in the field of communication aids, the challenge of providing seamless and immediate translation between sign language and spoken languages remains largely unaddressed. Traditional approaches to sign language interpretation predominantly involve human intermediaries. Such reliance on manual translation processes not only introduces delays but also imposes limitations on accessibility and immediacy of communication for the deaf and hard of hearing community.
Further exploration into the domain reveals that the conventional methods of sign language interpretation suffer from a plethora of limitations. One of the most notable shortcomings is the dependency on the availability and proficiency of human interpreters. This dependency often results in significant communication delays and restricts the spontaneity of interaction, thereby impacting the quality and efficacy of communication. Moreover, the existing systems and techniques for sign language interpretation frequently lack the capacity to provide real-time translation across multiple languages. This limitation severely hampers the potential for individuals with hearing impairments to engage in cross-linguistic communications, thereby isolating them from engaging in global discourse.
Furthermore, the accuracy and reliability of translations produced by current technologies are frequently questioned. The complexity of sign language, encompassing not just hand gestures but also facial expressions and body language, presents significant challenges for automated systems. Most existing solutions fail to adequately address these nuances, leading to mistranslations and misunderstandings. Additionally, the integration of advanced technologies such as machine learning, computer vision, and natural language processing in the context of sign language translation remains inadequately explored. Such technologies hold the promise of overcoming the limitations of traditional interpretation methods by enabling more precise and nuanced understanding of sign language gestures.
Collating the issues associated with traditional sign language interpretation methods, it becomes evident that besides the challenges of delay, accuracy, and accessibility, there exists a broader issue related to the empowerment of the deaf and hard of hearing community. The current landscape of communication aids does not fully support the aspirations of individuals with hearing impairments towards achieving complete societal integration and participation. The lack of effective and inclusive communication tools not only limits personal and professional opportunities for the deaf community but also contributes to their social isolation.
In light of the above discussion, there exists an urgent need for solutions that overcome the problems associated with conventional systems and techniques for facilitating more inclusive communication and empowering individuals with hearing impairments to participate fully in various aspects of daily life.
Summary
The following presents a simplified summary of various aspects of this disclosure in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements nor delineate the scope of such aspects. Its purpose is to present some concepts of this disclosure in a simplified form as a prelude to the more detailed description that is presented later.
The following paragraphs provide additional support for the claims of the subject application.
The present disclosure relates to a system for translating sign language into spoken language. A computing device equipped with a camera captures images of hand gestures, which are then processed to detect and translate sign language into spoken words. The system comprises a server that receives these images from the computing device and utilizes an OpenCV framework to detect sign language gestures within the images. Features of the hand gestures are analyzed to recognize sign language symbols, which are then translated into textual output representing the meaning of the hand gestures in a first language. This textual output is further translated into a second language text using a machine learning protocol, and an audio file is generated from this second language text using a Google Text-to-Speech (GTTS) library. The generated audio file, which audibly represents the translated second language text, is transmitted back to the computing device for output to the user.
In an embodiment, the computing device includes a display screen that provides visual feedback to the user regarding the capture and processing of the hand gestures. This feedback includes indications of successful capture and ongoing processing of the hand gesture, enhancing the user experience by keeping them informed of the system's status.
In an embodiment, the server is equipped with a preprocessing module that enhances the quality of the captured image before the sign language gesture is detected. This preprocessing module is capable of performing operations such as noise reduction, contrast adjustment, and edge enhancement, thereby facilitating improved recognition accuracy of the sign language gestures.
In an embodiment, the OpenCV framework employed by the server for detecting sign language gestures is configured to apply machine learning algorithms. These algorithms are specifically trained to recognize a wide range of sign language gestures, encompassing both static poses and dynamic movements, thus enabling the system to accurately interpret a broad spectrum of sign language symbols.
In an embodiment, the machine learning protocol utilized for translating the generated textual output into a second language text includes a language detection component. This component is capable of automatically identifying the first language of the textual output before translation, allowing for dynamic handling of multiple source languages and enhancing the versatility of the translation process.
In an embodiment, the server can customize the voice parameters of the Google Text-to-Speech (GTTS) library used to generate the audio file. These parameters, including gender, pitch, and speed of the spoken output, can be adjusted based on user preferences or the context of the communication, providing a more personalized and accessible experience.
In an embodiment, the computing device is configured to offer user interaction options such as replaying the received audio file, adjusting volume, or requesting a retranslation of the original hand gesture image into a different language. These options provide enhanced accessibility and flexibility in usage, catering to diverse user needs and preferences.
In an embodiment, the server includes a feedback mechanism that allows users to rate the accuracy and comprehensibility of the translated spoken language output. Such feedback is utilized to continuously improve the translation and detection algorithms through machine learning techniques, thereby enhancing the overall quality and reliability of the system.
In an embodiment, the computing device and server utilize secure communication protocols for the transmission of images and audio files. This ensures privacy and data security, protecting sensitive user information and content during the translation process.
The method for translating sign language into spoken language using the system involves capturing an image of a hand gesture with the computing device's camera, acquiring this image on the server, and detecting a sign language gesture from the acquired image using an OpenCV framework. A textual output representing the meaning of the hand gesture in a first language is generated on the server, which is then translated into a second language text using a machine learning protocol. An audio file based on the translated second language text is generated using a Google Text-to-Speech (GTTS) library, and this audio file is transmitted to the computing device and outputted to the user. Through such a system and method, sign language is effectively translated into spoken language, bridging communication gaps and enhancing accessibility for individuals who rely on sign language for communication.
Brief Description of the Drawings
The features and advantages of the present disclosure would be more clearly understood from the following description taken in conjunction with the accompanying drawings in which:
FIG. 1 illustrates a system for translating sign language into spoken language, in accordance with the embodiments of the present disclosure.
FIG. 2 illustrates a method (200) for translating sign language into spoken language using the system (100), in accordance with the embodiments of the present disclosure.
FIG. 3 (FIG. 3A to FIG. 3C) illustrates a user interface for signed language recognition, in accordance with the embodiments of the present disclosure.
FIG. 4 illustrates a flow chart of model training, in accordance with the embodiments of the present disclosure.
FIG. 5 illustrates a workflow diagram of system for translating sign language into spoken language, in accordance with the embodiments of the present disclosure.
Detailed Description
In the following detailed description of the invention, reference is made to the accompanying drawings that form a part hereof, and in which is shown, by way of illustration, specific embodiments in which the invention may be practiced. In the drawings, like numerals describe substantially similar components throughout the several views. These embodiments are described in sufficient detail to claim those skilled in the art to practice the invention. Other embodiments may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims and equivalents thereof.
The use of the terms “a” and “an” and “the” and “at least one” and similar referents in the context of describing the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The use of the term “at least one” followed by a list of one or more items (for example, “at least one of A and B”) is to be construed to mean one item selected from the listed items (A or B) or any combination of two or more of the listed items (A and B), unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.
Pursuant to the "Detailed Description" section herein, whenever an element is explicitly associated with a specific numeral for the first time, such association shall be deemed consistent and applicable throughout the entirety of the "Detailed Description" section, unless otherwise expressly stated or contradicted by the context.
FIG. 1 illustrates a system (100) for translating sign language into spoken language, in accordance with the embodiments of the present disclosure. Said system (100) comprises a computing device (102) and a server (104) which together facilitate the conversion of sign language gestures into audible speech in a second language, through a series of computational and translation processes.
A computing device (102), equipped with a camera, forms an integral part of the system. The primary function of the camera attached to the computing device (102) involves capturing images of hand gestures made by a user. Once a hand gesture is performed in front of the camera, an image of this gesture is captured by said computing device. This process is crucial for the initial acquisition of data, which is necessary for the subsequent translation of the sign language gesture into spoken language.
Upon capturing an image of a hand gesture, the system involves a server (104) configured to acquire the captured image from the computing device (102). This acquisition step is fundamental as it facilitates the transfer of the image data from the computing device to the server, where further processing and analysis occur. The seamless transfer of the captured image from the computing device to the server is essential for the timely and accurate translation of sign language gestures.
The server (104) is further configured to detect a sign language gesture from the acquired image using an OpenCV framework. The detection process involves analyzing features of the hand gesture to recognize sign language symbols. This analytical process is performed by examining various characteristics of the captured image, such as the shape, orientation, and position of the hand, to accurately identify the sign language gesture being made. This step is critical for understanding the intended communication conveyed by the hand gesture.
Following the detection of the sign language gesture, the server (104) generates a textual output corresponding to the recognized sign language gesture. This textual output represents the meaning of the hand gesture in a first language. The generation of textual output from the recognized gesture is a pivotal step in translating the non-verbal sign language into a form that can be further processed for translation into spoken language.
The server (104) is additionally configured to translate the generated textual output into a second language text using a machine learning protocol. This translation process involves processing the textual output through a trained model to output the second language text. The use of a machine learning protocol for translation ensures that the nuances and context of the original sign language gesture are accurately captured and translated into the second language.
Furthermore, the server (104) generates an audio file based on the translated second language text using a Google Text-to-Speech (GTTS) library. This audio file audibly represents the translated second language text, providing a means for individuals who do not understand sign language to comprehend the message conveyed by the sign language gesture. The conversion of the second language text into an audible format is crucial for making the information accessible to a broader audience.
Lastly, the server (104) is responsible for transmitting the audio file to the computing device (102). This transmission enables the audio representation of the sign language gesture, now translated into a second language, to be played back on the computing device. This final step completes the process of translating sign language into spoken language, allowing for effective communication between individuals who use sign language and those who do not.
In an embodiment, the computing device (102) incorporated within the system (100) for translating sign language into spoken language further comprises a display screen. Said display screen is designed for presenting visual feedback to a user regarding the capture of the hand gesture. Such visual feedback includes indications that the hand gesture has been successfully captured and is being processed. The provision of visual feedback is crucial as it reassures the user that their hand gesture has been recognized by the system and is currently under analysis. This feature enhances user experience by providing immediate and understandable cues that facilitate better interaction with the system. The effectiveness of the translation process is significantly dependent on the user's ability to accurately perform hand gestures. Therefore, the incorporation of a display screen for visual feedback supports the accurate capture of hand gestures, thereby improving the overall efficiency of the translation system.
In another embodiment, the server (104) associated with the system (100) is equipped with a preprocessing module. This preprocessing module is tasked with enhancing the quality of the acquired image before the detection of the sign language gesture is initiated. Capable of performing operations such as noise reduction, contrast adjustment, and edge enhancement, the preprocessing module plays a vital role in facilitating improved recognition accuracy. By enhancing the quality of the acquired image, the preprocessing module ensures that the features of the hand gesture are more pronounced and distinguishable. This, in turn, aids the subsequent detection and recognition processes by reducing the likelihood of errors in identifying sign language gestures. The preprocessing of images is an essential step towards achieving high accuracy in gesture recognition, thereby contributing to the effectiveness of the sign language to spoken language translation system.
In a further embodiment, the OpenCV framework utilized by the server (104) for detecting the sign language gesture is enhanced to apply machine learning algorithms. These algorithms are specifically trained to recognize a wide range of sign language gestures, encompassing both static poses and dynamic movements. The integration of machine learning algorithms within the OpenCV framework enables the system to accurately identify a broader spectrum of sign language gestures. This adaptability is crucial for a system designed to cater to the diverse sign language gestures used by individuals. By leveraging machine learning algorithms trained on extensive datasets of sign language gestures, the system achieves high levels of accuracy in gesture recognition. This ensures that the translation from sign language to spoken language is precise and reflective of the intended communication.
In yet another embodiment, the machine learning protocol for translating the generated textual output into a second language text on the server (104) incorporates a language detection component. This component is capable of automatically identifying the first language of the textual output before the translation process begins. The inclusion of a language detection component enables dynamic handling of multiple source languages, enhancing the versatility of the translation system. By automatically detecting the language of the generated textual output, the system can select the appropriate language model for translation. This feature is instrumental in ensuring that the translation process is both efficient and accurate, accommodating users who communicate in sign language across different languages.
In an additional embodiment, the server (104) is further configured to customize the voice parameters of the Google Text-to-Speech (GTTS) library used to generate the audio file. The customization includes adjustments to the gender, pitch, and speed of the spoken output, based on user preferences or the context of the communication. This capability to customize voice parameters allows for a more personalized and accessible experience for users. By enabling adjustments to the voice output, the system can cater to individual user preferences, thereby enhancing the comprehensibility and naturalness of the spoken language output. This customization feature plays a significant role in ensuring that the translated spoken language is both pleasant to listen to and effective in conveying the intended message.
In a subsequent embodiment, the computing device (102) is configured to offer user interaction options for replaying the received audio file, adjusting the volume, or requesting a retranslation of the original hand gesture image into a different language. These user interaction options provide enhanced accessibility and flexibility in the usage of the system. By allowing users to replay the audio, adjust volume levels, and request retranslations, the system accommodates a wide range of user needs and preferences. This level of interaction and control ensures that users can effectively communicate and understand the translated spoken language, thereby improving the overall usability and accessibility of the system.
In another embodiment, the server (104) includes a feedback mechanism allowing users to rate the accuracy and comprehensibility of the translated spoken language output. The feedback collected through this mechanism is used to continuously improve the translation and detection algorithms through machine learning techniques. By incorporating user feedback into the system's learning process, the accuracy and reliability of the translation are enhanced over time. This approach ensures that the system remains adaptive and responsive to user needs, leading to continual improvements in performance. The use of a feedback mechanism for iterative learning and enhancement is a critical component in maintaining the effectiveness and user satisfaction of the translation system.
In a final embodiment, the computing device (102) and the server (104) utilize secure communication protocols to ensure privacy and data security during the transmission of images and audio files. Protecting sensitive user information and content is paramount in a system that handles personal communication data. The use of secure communication protocols safeguards the data against unauthorized access and breaches, thereby maintaining the confidentiality and integrity of the information being transmitted. This security measure is essential for building user trust and ensuring that the system can be safely used in various environments without compromising privacy or data security.
FIG. 2 illustrates a method (200) for translating sign language into spoken language using the system (100), in accordance with the embodiments of the present disclosure. At step (202) a method (200) for translating sign language into spoken language using the system (100) involves capturing an image of a hand gesture. This is achieved through the camera of the computing device (102), ensuring the gesture is accurately recorded for further processing. At step (204) upon capturing the image, the next step involves the server (104) acquiring the captured image. This transfer is essential for enabling the subsequent analysis and translation processes to take place on the server. At step (206) once acquired, the server (104) uses an OpenCV framework to detect a sign language gesture from the image. This step involves analyzing the image to identify and interpret the specific sign language gesture made. At step (206) after detection, the server (104) generates a textual output. This output represents the meaning of the hand gesture in a first language, converting the visual gesture into text form for translation. The step (208) involves translating the generated textual output into a second language text. This process is facilitated by a machine learning protocol on the server (104), ensuring accurate translation. At step (210) the server (104) uses a Google Text-to-Speech (GTTS) library to generate an audio file. This file audibly represents the translated second language text, making it accessible audibly. At step (212) the method (200) then includes transmitting the audio file from the server (104) to the computing device (102). This step is crucial for making the translated content available to the user. At step (214) the computing device (102) outputs the received audio file to the user. This allows the user to hear the translated spoken language, completing the translation process from sign language to spoken language.
FIG. 3 (FIG. 3A to FIG. 3C) illustrates a user interface for signed language recognition, in accordance with the embodiments of the present disclosure. FIG. 3A illustrates a welcoming and illustrative depiction of sign language interaction between two individuals. FIG. 3B presents a real-time detection interface where a user’s hand gesture is being recognized by the system (100); the hand is framed by a green bounding box with an “L” label, indicating the system (100)’s recognition of the specific letter or sign. Accompanying this are controls such as 'Add', 'Speech', 'Clear', and 'Space', which likely allow the user to manage the input and output of the sign recognition process. FIG. 3C demonstrates the translation interface where the recognized sign is translated into text, here shown as "HELLO" in English, which can then be vocalized via a speech synthesis feature, translated into other languages, or potentially edited by the user. The interface provides a seamless and accessible means for users to convert sign language into written and spoken language, enhancing communication for the deaf and hard of hearing community.
FIG. 4 illustrates a flow chart of model training, in accordance with the embodiments of the present disclosure. The process starts with data collection, the essential step of gathering datasets for the model's learning. Creating threshold value follows, indicating a benchmark setup to categorize or refine the data. The next phase is data augmentation, which enhances the dataset's size and variability to prevent overfitting and improve the model's robustness. This is succeeded by label-wise classification, where data is organized according to its categorical labels, a key step in supervised learning. The process continues with normalized data, a crucial preprocessing measure that scales the data to a common range, ensuring uniformity for effective processing by the model. This normalized data is then inputted into Resnet-50, an established convolutional neural network model known for its powerful image classification capabilities. model tuning is the subsequent fine-tuning of hyperparameters to optimize performance, followed by testing accuracy to evaluate the model on unseen data. The concluding step involves saving as TensorFlow-lite model, indicating the conversion of the model to a lightweight format, ready for deployment in environments with limited computational resources.
FIG. 5 illustrates a workflow diagram of system for translating sign language into spoken language, in accordance with the embodiments of the present disclosure. The workflow begins with input using camera, where a camera captures the hand gestures of a sign language user. The system then moves to detecting hand sign languages, employing Open-CV, a computer vision library, to interpret the gestures. The subsequent step is giving text output, where the detected gestures are converted into textual form. This text is then processed through text to text translation, utilizing an ML-Toolkit, which suggests that a machine learning model is employed to translate the text from one language to another, catering to different linguistic needs. The final output is the audio output of the translated language, created using the GTTS (Google Text-to-Speech) Library, enabling the system to vocalize the translated text. The workflow represents a sophisticated sequence of technological interactions, converging computer vision, machine learning, and speech synthesis to create a seamless communication aid for the hearing-impaired, ultimately fostering better inclusivity and accessibility in everyday interactions.
Example embodiments herein have been described above with reference to block diagrams and flowchart illustrations of methods and apparatuses. It will be understood that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, respectively, can be implemented by various means including hardware, software, firmware, and a combination thereof. For example, in one embodiment, each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations can be implemented by computer program instructions. These computer program instructions may be loaded onto a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create means for implementing the functions specified in the flowchart block or blocks.
Throughout the present disclosure, the term ‘processing means’ or ‘microprocessor’ or ‘processor’ or ‘processors’ includes, but is not limited to, a general purpose processor (such as, for example, a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a microprocessor implementing other types of instruction sets, or a microprocessor implementing a combination of types of instruction sets) or a specialized processor (such as, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), or a network processor).
The term “non-transitory storage device” or “storage” or “memory,” as used herein relates to a random access memory, read only memory and variants thereof, in which a computer can store data or software for any duration.
Operations in accordance with a variety of aspects of the disclosure is described above would not have to be performed in the precise order described. Rather, various steps can be handled in reverse order or simultaneously or not at all.
While several implementations have been described and illustrated herein, a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein may be utilized, and each of such variations and/or modifications is deemed to be within the scope of the implementations described herein. More generally, all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific implementations described herein. It is, therefore, to be understood that the foregoing implementations are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, implementations may be practiced otherwise than as specifically described and claimed. Implementations of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the scope of the present disclosure.
Claims
I/We claims:
A system (100) for translating sign language into spoken language, comprising:
a computing device (102) equipped with a camera for capturing images of hand gestures, wherein said computing device captures an image of a hand gesture using said camera;
a said server (104) configured to:
acquire said captured image from said computing device (102);
detect a sign language gesture from said acquired image using an OpenCV framework, wherein said detection involves analyzing features of said hand gesture to recognize sign language symbols;
generate a textual output corresponding to said recognized sign language gesture, wherein said textual output represents the meaning of said hand gesture in a first language;
translate said generated textual output into a second language text using a machine learning protocol, wherein said translation involves processing said textual output through a trained model to output said second language text;
generate an audio file based on said translated second language text using a Google Text-to-Speech (GTTS) library, wherein said audio file audibly represents said translated second language text; and
transmit said audio file to the computing device (102).
The system (100) of claim 1, wherein said computing device (102) further comprises a display screen for presenting visual feedback to a user regarding the capture of the hand gesture, and wherein said visual feedback includes indications that the hand gesture has been successfully captured and is being processed.
The system (100) of claim 1, wherein said server (104) further comprises a preprocessing module for enhancing the quality of said acquired image before detecting the sign language gesture, said preprocessing module capable of performing operations including noise reduction, contrast adjustment, and edge enhancement to facilitate improved recognition accuracy.
The system (100) of claim 1, wherein said OpenCV framework utilized by said server (104) for detecting the sign language gesture is further configured to apply machine learning algorithms specifically trained to recognize a wide range of sign language gestures, including both static poses and dynamic movements.
The system (100) of claim 1, wherein said machine learning protocol for translating said generated textual output into a second language text on said server (104) includes a language detection component capable of automatically identifying the first language of said textual output before translation, thereby enabling dynamic handling of multiple source languages.
The system (100) of claim 1, wherein said server (104) is further configured to customize the voice parameters of the Google Text-to-Speech (GTTS) library used to generate said audio file, including gender, pitch, and speed of the spoken output, based on user preferences or context of the communication.
The system (100) of claim 1, wherein said computing device (102) is configured to offer user interaction options for replaying the received audio file, adjusting volume, or requesting a retranslation of the original hand gesture image into a different language, providing enhanced accessibility and flexibility in usage.
The system (100) of claim 1, wherein said server (104) includes a feedback mechanism allowing users to rate the accuracy and comprehensibility of the translated spoken language output, such feedback being used to continuously improve the translation and detection algorithms through machine learning techniques.
The system (100) of claim 1, wherein said computing device (102) and said server (104) utilize secure communication protocols to ensure privacy and data security during the transmission of images and audio files, protecting sensitive user information and content.
A method (200) for translating sign language into spoken language using the system (100), comprising the steps of:
capturing an image of a hand gesture using the camera of the computing device (102);
acquiring the captured image on the server (104);
detecting a sign language gesture from the acquired image using an OpenCV framework on the server (104);
generating a textual output representing the meaning of the hand gesture in a first language on the server (104);
translating the generated textual output into a second language text using a machine learning protocol on the server (104);
generating an audio file based on the translated second language text using a Google Text-to-Speech (GTTS) library on the server (104);
transmitting the audio file to the computing device (102); and
outputting the received audio file to a user through the computing device (102).
SYSTEM AND METHOD FOR TRANSLATING SIGN LANGUAGE INTO SPOKEN LANGUAGE
Disclosed is a system for translating sign language into spoken language, comprising: a computing device equipped with a camera for capturing images of hand gestures, wherein the computing device captures an image of a hand gesture using the camera; a server configured to acquire the captured image from the computing device; detect a sign language gesture from the acquired image using an OpenCV framework, wherein the detection involves analyzing features of the hand gesture to recognize sign language symbols; generate a textual output corresponding to the recognized sign language gesture, wherein the textual output represents the meaning of the hand gesture in a first language; translate the generated textual output into a second language text using a machine learning protocol, wherein the translation involves processing the textual output through a trained model to output the second language text; generate an audio file based on the translated second language text using a Google Text-to-Speech (GTTS) library, wherein the audio file audibly represents the translated second language text; and transmit the audio file to the computing device.
Fig. 1
Drawings
/
FIG. 1
/
FIG. 2
/
/
FIG. 4
/
FIG. 5
, Claims:I/We claims:
A system (100) for translating sign language into spoken language, comprising:
a computing device (102) equipped with a camera for capturing images of hand gestures, wherein said computing device captures an image of a hand gesture using said camera;
a said server (104) configured to:
acquire said captured image from said computing device (102);
detect a sign language gesture from said acquired image using an OpenCV framework, wherein said detection involves analyzing features of said hand gesture to recognize sign language symbols;
generate a textual output corresponding to said recognized sign language gesture, wherein said textual output represents the meaning of said hand gesture in a first language;
translate said generated textual output into a second language text using a machine learning protocol, wherein said translation involves processing said textual output through a trained model to output said second language text;
generate an audio file based on said translated second language text using a Google Text-to-Speech (GTTS) library, wherein said audio file audibly represents said translated second language text; and
transmit said audio file to the computing device (102).
The system (100) of claim 1, wherein said computing device (102) further comprises a display screen for presenting visual feedback to a user regarding the capture of the hand gesture, and wherein said visual feedback includes indications that the hand gesture has been successfully captured and is being processed.
The system (100) of claim 1, wherein said server (104) further comprises a preprocessing module for enhancing the quality of said acquired image before detecting the sign language gesture, said preprocessing module capable of performing operations including noise reduction, contrast adjustment, and edge enhancement to facilitate improved recognition accuracy.
The system (100) of claim 1, wherein said OpenCV framework utilized by said server (104) for detecting the sign language gesture is further configured to apply machine learning algorithms specifically trained to recognize a wide range of sign language gestures, including both static poses and dynamic movements.
The system (100) of claim 1, wherein said machine learning protocol for translating said generated textual output into a second language text on said server (104) includes a language detection component capable of automatically identifying the first language of said textual output before translation, thereby enabling dynamic handling of multiple source languages.
The system (100) of claim 1, wherein said server (104) is further configured to customize the voice parameters of the Google Text-to-Speech (GTTS) library used to generate said audio file, including gender, pitch, and speed of the spoken output, based on user preferences or context of the communication.
The system (100) of claim 1, wherein said computing device (102) is configured to offer user interaction options for replaying the received audio file, adjusting volume, or requesting a retranslation of the original hand gesture image into a different language, providing enhanced accessibility and flexibility in usage.
The system (100) of claim 1, wherein said server (104) includes a feedback mechanism allowing users to rate the accuracy and comprehensibility of the translated spoken language output, such feedback being used to continuously improve the translation and detection algorithms through machine learning techniques.
The system (100) of claim 1, wherein said computing device (102) and said server (104) utilize secure communication protocols to ensure privacy and data security during the transmission of images and audio files, protecting sensitive user information and content.
A method (200) for translating sign language into spoken language using the system (100), comprising the steps of:
capturing an image of a hand gesture using the camera of the computing device (102);
acquiring the captured image on the server (104);
detecting a sign language gesture from the acquired image using an OpenCV framework on the server (104);
generating a textual output representing the meaning of the hand gesture in a first language on the server (104);
translating the generated textual output into a second language text using a machine learning protocol on the server (104);
generating an audio file based on the translated second language text using a Google Text-to-Speech (GTTS) library on the server (104);
transmitting the audio file to the computing device (102); and
outputting the received audio file to a user through the computing device (102).
SYSTEM AND METHOD FOR TRANSLATING SIGN LANGUAGE INTO SPOKEN LANGUAGE
| # | Name | Date |
|---|---|---|
| 1 | 202421033112-OTHERS [26-04-2024(online)].pdf | 2024-04-26 |
| 2 | 202421033112-FORM FOR SMALL ENTITY(FORM-28) [26-04-2024(online)].pdf | 2024-04-26 |
| 3 | 202421033112-FORM 1 [26-04-2024(online)].pdf | 2024-04-26 |
| 4 | 202421033112-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [26-04-2024(online)].pdf | 2024-04-26 |
| 5 | 202421033112-EDUCATIONAL INSTITUTION(S) [26-04-2024(online)].pdf | 2024-04-26 |
| 6 | 202421033112-DRAWINGS [26-04-2024(online)].pdf | 2024-04-26 |
| 7 | 202421033112-DECLARATION OF INVENTORSHIP (FORM 5) [26-04-2024(online)].pdf | 2024-04-26 |
| 8 | 202421033112-COMPLETE SPECIFICATION [26-04-2024(online)].pdf | 2024-04-26 |
| 9 | 202421033112-FORM-9 [07-05-2024(online)].pdf | 2024-05-07 |
| 10 | 202421033112-FORM 18 [08-05-2024(online)].pdf | 2024-05-08 |
| 11 | 202421033112-FORM-26 [13-05-2024(online)].pdf | 2024-05-13 |
| 12 | 202421033112-FORM 3 [13-06-2024(online)].pdf | 2024-06-13 |
| 13 | 202421033112-RELEVANT DOCUMENTS [17-04-2025(online)].pdf | 2025-04-17 |
| 14 | 202421033112-POA [17-04-2025(online)].pdf | 2025-04-17 |
| 15 | 202421033112-FORM 13 [17-04-2025(online)].pdf | 2025-04-17 |
| 16 | 202421033112-FER.pdf | 2025-08-06 |
| 17 | 202421033112-FORM-8 [24-10-2025(online)].pdf | 2025-10-24 |
| 18 | 202421033112-FORM-26 [24-10-2025(online)].pdf | 2025-10-24 |
| 19 | 202421033112-FER_SER_REPLY [24-10-2025(online)].pdf | 2025-10-24 |
| 20 | 202421033112-DRAWING [24-10-2025(online)].pdf | 2025-10-24 |
| 21 | 202421033112-CORRESPONDENCE [24-10-2025(online)].pdf | 2025-10-24 |
| 22 | 202421033112-COMPLETE SPECIFICATION [24-10-2025(online)].pdf | 2025-10-24 |
| 23 | 202421033112-ABSTRACT [24-10-2025(online)].pdf | 2025-10-24 |
| 1 | 202421033112E_30-08-2024.pdf |