Abstract: The most crucial method for interacting with others is communication. Due to genetic anomalies, mishaps, and infectious infections, the number of deaf and dumb persons has increased recently. People who are deaf or dumb must use specialised communication because they cannot speak to regular people. Whether these messages are conveyed through lip reading, lip sync, or sign language, people may misinterpret them. American Sign Language (ASL) is a challenging language. It depends on the mark's particular gesture stander. The hands communicate these markings, with help from body position and facial expression. ASL is the primary language used by the deaf and hard of hearing in North America and other world regions. The purpose of our project is to develop a computer programme and train a model that, when shown a real-time video of hand motions in American Sign Language, shows the sign's outcome in textual form or as audio. We developed a real-time neural network method for American sign language that uses handwriting because the majority of individuals lack sign language proficiency and there are few interpreters available. 5 Claims & 1 Figure
Description:Field of Invention
The current invention relates to artificial intelligence, namely a system and method for translating voice and hand gesture for the deaf and dumb peoples.
The Objectives of this Invention
The main objective is to develop a computer programme and train a model that, when shown a real-time video of gestures made with the hands in American Sign Language, shows the sign's outcome in written or auditory form on the display. We developed a real-time neural network method for American sign language that uses fingerspelling because most individuals lack sign language proficiency, and there are few interpreters available.
Background of the Invention
According to (CN2011/103136986B), The subsequent steps make up the type of recognition of sign language method that the present concept refers to: assemble the image containing the marked area; the attitude in the area of identifying and marking; create a directing order matching to the described attitude; Transform the described steering command into information in normal language. A specific form of sign language interpretation is also offered. The technique and system for the recognition of sign language discussed above can increase the precision of recognition. Another type of application invented in (EP2011/2574220B1), The main obstacle to communication for someone with physical restrictions, such as sensory, verbal, or visual impairments, such as a deaf, dumb, or blind person, is other people without disabilities. People who are dumb or deaf cannot particularly understand the speech spoken to them. Sign language, facial expression, lip reading, or written communication tools, typically intended for intimate interactions with relative proximity across the participants, are the typical solutions that address this it causes. Additionally, only trained personnel who can understand the information presented abstractly are permitted to use these communication methods. However, the aforementioned communication issue is more noticeable when a disabled person and a typical person speak. Another method was invented in (CN2013/103646587B), The innovation describes a set of smart glasses and a way to operate them. The intelligent eyewear set comprises spectacle lenses, a frame, and legs. For dual-face displaying, the eyeglass lenses are transparent displays. At the front of the frame are cameras and a pick-up device that are utilized to capture a voice signal and a gesture command, respectively. In the legs of the glasses are brain wave authentication equipment and processors. The brain wave recognition devices are used to gather brainwave warnings, and the processing units are used to receive gesture commands, voice commands, and brainwave signals and process the signals. The intelligent spectacles' back-side lenses display graphic and textual data after receiving and processing a gesture instruction and a voice signal. This allows for the transformation of outside information into visual and written content whose contents can be seen by the person carrying the spectacles.
According to Guptha et al.'s research (2022 2nd International Conference on Innovative Sustainable Computational Technologies (CISCT), Dehradun, India, 2022, pp. 1-5), Dore Idioma's primary role is translating hand gestures used in sign language. Therefore, communication with Dore Idioma will be simple for everyone as communication between hearing and deaf-mute people occurs. It is developing an intuitive application that enables the user to approach the translator swiftly. Using image labeling, we may get input from the user, compare it to a recognition model that has already been trained, cross-reference the recognized sign with the database, and provide the final output. Other models and software have been developed. However, they needed more effectiveness, usability, and implementation. Dore Idioma may aid in improving the efficiency and utility of communication between common and deaf-mute individuals. Dore Idioma was developed using MobileNet version 2 (Convolutional Neural Network), a pre-trained model. Our present model targets 99% accuracy and a total loss of 0.38 percent. According to Herbaz et al. ((2022), Journal of ICT Standardization, 10(03), 411–426.), This paper describes a Convolutional Neural Network (CNN)-based system for recognizing Moroccan sign language. This system contains a significant data set with over 20 files. Each file comprises 1,000 static photos of each signal taken by the camera from various angles. The suggested CNN approach was contrasted with various models of Sign Language. The proposed method has a success rate of 99.33% and the best efficiency with a 98.7% accuracy rate.
Summary of the Invention
The critical and precise relevant features are Deaf and dumb people can communicate with others much more quickly because of this recognition of hand movements technology. An average person finds it challenging to understand sign language, so our model's primary function is to aid their comprehension and improve communication.
Detailed Description of the Invention
This project will transform the sense of the observed motion into text, show the user the translated text, and then turn the text into speech. The system will recognize the proper hand motion and search its database for previously created gestures corresponding to the activity. When it matches, the user is presented with the class label. The vision serves as the conceptual foundation for the system. All signals are made with the hands; no artificial devices are used to communicate. We could not locate any raw photo datasets that matched our specifications in our search for pre-made datasets for the project. We only came across RGB-format datasets during our search. As a result, we created our own data set. Following are the steps we took to construct our own dataset. We used the OpenCV programme to produce the dataset. ASL instruction started with 400 or so images of each symbol. The machine's camera initially records each frame that is displayed on the screen. Each frame defines a Region of Interest (ROI), denoted below by a blue-bordered square. An image is represented as a 3D matrix with height, width, and depth proportional to the size of the image (1 for Grayscale and 3 for RGB), respectively. CNN also extracts valuable properties using these pixel data. CNNs use visual recognition and classification to identify faces, recognize objects. They are composed of neurons with a teachable weight matrix. Multiple layers serve as filters and separate the patterns from the input photos. CNNs are frequently used to classify photos, group them based on similarities, and subsequently identify objects in the images. Many algorithms can differentiate between people, signs, wild animals, and objects.
After feature extraction, add a gaussian blur filter and a threshold to the frame captured by OpenCV to obtain the processed image. A letter is printed and used in word formation if recognized for more than 15 frames after the processed image is given to the CNN model for prediction. The blank sign stands in for the space between the words. We discover numerous collections of symbols that, when recognized, lead to the same results. We used classifiers designed specifically for different groups to distinguish between them.
We substitute (incorrect) input with suggested replacements using the Hunspell Python module, then show the user some similar words, which they can subsequently add to the sentence. This makes spelling more accurate and enables complicated word prediction. The User will begin training the Model by demonstrating the signs, and then, he will hit the key on his keyboard corresponding to the sign's classification and storage in the proper Class. Using the training Data collection module, the user can capture photos of various symbols to train the system. Click start video to open the web camera and start building the database. The preprocessed, accurate, and feature-rich images from the stored images are then used to train the Tensorflow algorithm.
The Architecture System demonstrates, among other things, how the Architecture, which is represented in the figure below, is a component of the system and how it flows. Utilising pre-processing techniques, web cameras' collected photos can be of higher quality. The photos are then stripped of the object and background before being converted to bytes. When photos in a database are matched, feature extraction and rearrangement assist in producing the necessary output in the form of text, which is subsequently turned into audio. We capture the necessary hand section, transform it to black and white, and then use thresholding and Gaussian Blur to detect the edges or capture the features.
Hand Gesture Recognition involves the use of AI algorithms to interpret and understand the gestures made by a person using their hands. These gestures can be a form of sign language, which is often used by the deaf community to communicate. Here's how AI is applied to this context: A diverse dataset of hand gestures and corresponding meanings is collected. This dataset may include images or videos of people making various gestures in different contexts. Machine learning techniques, such as Convolutional Neural Networks (CNNs), are used to train a gesture recognition model. This model learns to identify patterns and features in hand gestures. When a person performs a hand gesture in front of a camera, the AI model analyzes the gesture and translates it into text or spoken language. This can help bridge the communication gap between individuals who understand sign language and those who do not. Gesture recognition can be integrated into devices like smartphones, tablets, or wearable devices, allowing deaf individuals to communicate with people who may not understand sign language. Voice Conversion is the process of altering the characteristics of a person's voice, such as pitch, tone, and accent, to match another person's voice. In the context of helping mute individuals, this technology allows them to "speak" by converting their gestures or other input into spoken words. Here's how it works: A dataset of different voices is collected, which includes recordings of people speaking various sentences. This dataset is used to train the voice conversion model. Deep learning techniques, such as Generative Adversarial Networks (GANs) or Recurrent Neural Networks (RNNs), are often used to train a voice conversion model. The model learns to capture the nuances of different voices. When a person performs a hand gesture, the gesture recognition system processes it, and the recognized gesture is sent to the voice conversion model. The model then generates spoken words based on the gesture input. These AI-powered solutions have the potential to enhance accessibility and inclusivity, enabling more seamless interactions and communication for individuals with communication challenges.
Voice translation technology is designed to assist individuals who are deaf or hard of hearing in understanding spoken language. It converts spoken words into text or visual representations, making it easier for them to comprehend and communicate. Here are some key features of voice translation for the deaf and dumb: Converts spoken language into text format, allowing deaf individuals to read and understand conversations. Provides instant translation, enabling seamless communication in various situations, such as social interactions, meetings, or public events. Users can adjust the text size, font, color, and background for optimal readability based on their preferences. Some systems offer offline translation options, ensuring accessibility even in areas with limited or no internet connectivity. Allows users to type responses, which the system then converts into spoken language, facilitating two-way communication.
Hand gesture recognition technology is designed to facilitate communication for individuals who are deaf and dumb by translating their sign language gestures into textual or spoken language. Here are the key features of hand gesture recognition for the deaf and dumb: Converts sign language gestures into written text, enabling communication with people who do not understand sign language. Provides instant recognition and translation of gestures, enabling natural and fluid conversations. Provides visual feedback to the user, confirming that their gestures have been accurately recognized and interpreted. Includes a library of common gestures and their corresponding meanings, aiding users in learning and using the system effectively.
Both voice translation and hand gesture recognition technologies contribute to breaking down communication barriers for individuals who are deaf and dumb, enabling them to interact with a broader range of people and participate more fully in society.
5 Claims & 1 Figure
Brief description of Drawing
In the figure which are illustrate exemplary embodiments of the invention.
Figure 1, The Process of Voice Translation and Hand Gesture Recognition System , Claims:The scope of the invention is defined by the following claims:
Claim:
1. A system/method for making autonomous system for agriculture using artificial intelligence and WSN, said system/method comprising the steps of:
a) The system starts with the input image data collections from camera (1), then the unwanted data as well as background should be removed from preprocessing (2). The images are converted into the binary values (3).
b) The system will extract the features from the images (4), has the one more way to store database (5) from the model.
c) Then the CNN classification will play role for comparison (6), if output is matched then text will converted to voice (7).
2. As mentioned in claim 1, When a person performs a hand gesture in front of a camera, the AI model analyzes the gesture and translates it into text or spoken language. This can help bridge the communication gap between individuals who understand sign language and those who do not.
3. According to claim 1, Gesture recognition can be integrated into devices like smartphones, tablets, or wearable devices, allowing deaf individuals to communicate with people who may not understand sign language.
4. As per claim 1, A dataset of different voices is collected, which includes recordings of people speaking various sentences. This dataset is used to train the voice conversion model.
5. According to claim 1, Voice Conversion is the process of altering the characteristics of a person's voice, such as pitch, tone, and accent, to match another person's voice. In the context of helping mute individuals, this technology allows them to "speak" by converting their gestures or other input into spoken words.
| # | Name | Date |
|---|---|---|
| 1 | 202341077575-REQUEST FOR EARLY PUBLICATION(FORM-9) [15-11-2023(online)].pdf | 2023-11-15 |
| 2 | 202341077575-FORM-9 [15-11-2023(online)].pdf | 2023-11-15 |
| 3 | 202341077575-FORM FOR STARTUP [15-11-2023(online)].pdf | 2023-11-15 |
| 4 | 202341077575-FORM FOR SMALL ENTITY(FORM-28) [15-11-2023(online)].pdf | 2023-11-15 |
| 5 | 202341077575-FORM 1 [15-11-2023(online)].pdf | 2023-11-15 |
| 6 | 202341077575-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [15-11-2023(online)].pdf | 2023-11-15 |
| 7 | 202341077575-EVIDENCE FOR REGISTRATION UNDER SSI [15-11-2023(online)].pdf | 2023-11-15 |
| 8 | 202341077575-DRAWINGS [15-11-2023(online)].pdf | 2023-11-15 |
| 9 | 202341077575-COMPLETE SPECIFICATION [15-11-2023(online)].pdf | 2023-11-15 |