Abstract: Conversation between both the deaf and the general population is becoming exceedingly challenging, and there is no reputable translator accessible in society to aid. The proposed system enables real-time voice and signal language translation, in distinctive: 1. recognizing male or female signals 2. Establishing a system learning model for interpreting image-to-text data. 3. Seeking innovative terms Four. Constructing Sentences 5. After creating all of the content. 6. Is to obtain the audio output. The goal of this invention is to develop a framework for a potential sign language interpreter who could also translate conversations between sign languages into written and spoken English. 4 Claims & 1 Figure
Description:Field of Invention
The present invention relates to, all the temples to recognize sign language and interpret it into text by applying the deep learning-based image processing. By introducing this type of machine, we can break the barrier of communication between society and the deaf and dumb.
The Objectives of this Invention
The objective of this project is to identify the symbolic expression through images so that the communication gap between a normal and hearing-impaired person can be easily bridged. The main importance of using this model is that the application can learn from each training it receives and generalize the outcome with much better accuracy each time.
Background of the Invention
In (WO2021/081562A3), The goal of the application in question is to use deep learning techniques to do the recognition of optical characters (OCR). An image is sent to a computing device along with an alphabetical identifier indicating that the image's textual information is in a first language. The image is processed through an electronic gadget using a multilingual recognition of words model that works with a multitude of languages. The electronic equipment creates a feature sequence with many values for probability correlating to the image's written content. A variety of subsets of characteristics that correlate to the various languages are included in the feature sequencing. Every probability value for a feature subset denotes the likelihood that a given paragraph correlates to given characters in a dictionary of the related language. In order to identify the textual material, the computer gadget builds sparse masking according to the first languages and mixes it with the feature sequencing.. According to (CN2021/113724714A), The innovation provides a voice component registering approach and a structure based on voice approval, and these mainly acknowledge conceptual texts that corresponds to voices to be identified by means of a voice comprehension technological advances to constitute voice texts and start-stop time location details of single phrases of the conversation texts in the opinions to be identified; comprehending the phonological representations of the unattached protagonists in the conversation content and gathering vowel sounds and vowel sounds in the phonological symbols; acknowledging and acquiring text sentences and text expressions in the voice text by means of a natural the language processing technological advances. In addition to (RU2017/2691214C1), innovation is related to artificial intelligence-based character identification devices and techniques. This is a consequence of the method's acquisition of a text image containing one or more words in one or additional sentences; the acquisition of a text image as the first starting point information for a collection of machine learning algorithms that have been trained, maintaining data on word the compatibility and the probability of together use in actual sentences; acquisition of one or more final outputs from a collection of machine learning models. We found that extraction of one or a number of assumed paragraphs from a single or more ultimate output data. There are potential word sequences in each of the one or more presumptive: excellent text recognition performance through the use of a variety of machine learning methods. In addition to (US2017/10262198B1), The present disclosure pertains to a computer device, a processor-readable storage medium-sized, and an approach and equipment for recognizing sign language. Obtaining a sign language video to carry out comprehension is the first step in the procedure. Next, motion features and signal change features are extracted from each of the frames of the signed language video. Finally, sign language phrase details is extracted from a combined characteristic created by integrating the gesture recognition feature and the motion evolution feature. Finally, sign language word knowledge is combined into a sign speech punishment based on contextual information that corresponds to the sign speech word data. In addition (WO2021/008320A1), A computing apparatus, a computer-readable storage medium, and an approach and equipment for recognizing sign language are all covered by the current invention. Firstly, a sign language video must be acquired for the purpose of performing acknowledgment. Next, each frame of the video must have its gesture characteristic and gesture modifications feature extracted. Finally, sign language word details must be extracted from a combined characteristic that is created by fusing the gesture characteristic and the gesture evolution feature. Finally, the sign language word knowledge must be combined into a sign language sentence based on context-dependent knowledge that corresponds to the sign language word details.
In order to teach parents of deaf children ASL, Chuan et al. (2014) offer a comparable interpretation system, but without the flexibility of a variable dataset or the capacity to recognise gestures. The accuracy of k-Nearest Neighbor and Support Vector Machine classifiers to only identify stationary hand shapes is compared in this study. Although a comparatively high accuracy of over 85% was attained, the authors are aware of the hardware limitations of the sensors.
Parallel Hidden Markov models (PaHMMs) were utilised by Volger et al. [3] to recognise American Sign Language. They claimed that for a continuous recognition system, phonemes may be employed in place of complete signs. Used presuming that the two channels to the right and left hands in voice recognition allow any word to be broken down into its fundamental phonemes. Using a single channel of the hmm model, a small vocabulary of 22 signs was evaluated, and the findings showed an accuracy rate of 8788 a more extensive vocabulary hand arrangement orientation and facial emotions cannot be recognised by the system bsl uses a two-handed fingerspelling system in the current systems as opposed to asls use of a one-handed approach and fsl many american deaf people of hearing loss think that one-handed finger spelling is quicker than two-handed techniques but according to anecdotal evidence when bsl and asl experts were pitted against each other neither finger-spelling system was able to finish the alphabet faster thus the alleged disadvantage is invalidated.
Summary of the Invention
A powerful method to gather expert knowledge spot edges and combine erroneous information from several sources is a sign language recognition system. To obtain the correct categorization convolution neural networks are used. The best feature of this CNN is that it allows us to plot the images while simultaneously knowing the precision and potential loss for each digit. It recognizes which class the gesture belongs to and can be used for multi-class distribution. Last but not least, we will print the expected digit's text and put it on the screen. Additionally, we developed a functional framework and a set of algorithms that we believe are both practical and extroverted.
Detailed Description of the Invention
Convolutional Neural Network (CNN) which extracts the most important features from the hand gestures. A maximum of 3 convolutional layers and 3 pooling layers are used. In the convolution layer, the ReLU activation function was used. 32 convolutional filters were employed in the initial two convolutional layers, which were increased to 64 in the subsequently one convolutional layer to obtain more deep features in the pictures. In the pooling layer, a 2*2 size filter was applied. Finally, the image characteristic was transferred to the completely linked layer. In the hidden layer, the ReLU activation function was also used. The unseen layer included two fifty six neurons that reimplemented the transfer between the fully connected layer's inputs and the output layer's outputs. As in the classification, the output layer contains a total of twenty four nodes with the softmax activation function. Now we'll train the model and assess how accurate it is by comparing it to the test set.
The application is set up to collect frames from the web camera's which will be subjected to a variety of image processing algorithms. Next, the images are preprocessed utilising image preprocessing technique such as RGB to HSV (Hue,Saturation,Value) conversion, which eliminates the skin tone's performance fluctuation. Furthermore, the hand phase is retrieved from the image, as well as the hand sign. After that, the images are compared to the taught model. Such captured images are saved in the input folder. The alphabets of hand gestures are included in this dataset. There are around 1750 photos per category, i.e. one for each letter of the alphabet. After that images are then processed using the CNN module of the tensorflow module, with the sequential model. We used 65 epochs in this training. The model creates a file of .h5 that contains a summary of the training output. This file will be used to determine alphabets according to the model. Finally, input picture is sent to the CNN model for prediction, which compares the input and recorded images. In comparison of the input and recorded images, the CNN model produces textual output.
Sign Language Recognition (SLR) is a vital technology that facilitates communication for individuals with hearing impairments. This document presents a comprehensive system/method for Sign Language Recognition utilizing Convolutional Neural Networks (CNNs). CNNs have proven to be highly effective in image-related tasks, making them a suitable choice for interpreting the intricate and dynamic nature of sign language gestures.
The foundation of the system lies in a vast dataset comprising sign language gestures. This dataset is diverse, encompassing various sign languages, gestures, and environmental conditions to ensure the model's generalization. Preprocessing steps involve image normalization, resizing, and augmentation to enhance the model's ability to handle different input conditions. The core of the system is a well-designed CNN architecture tailored for sign language recognition. The architecture typically consists of multiple convolutional layers with pooling, followed by fully connected layers. The convolutional layers capture spatial hierarchies and patterns in the input images, while pooling layers reduce dimensionality. The fully connected layers then map these features to the output classes, representing different sign language gestures. The CNN model is trained using a supervised learning approach. During training, the model learns to map input images to their corresponding sign language labels. The training process involves the optimization of weights through backpropagation and gradient descent. The choice of loss functions, optimizers, and learning rates is critical to the model's convergence and generalization. The trained CNN model is deployed for real-time sign language recognition. Input video streams or sequences of images are fed into the model, and predictions are generated in near real-time. The system is designed to be efficient, enabling it to operate on various devices, from desktop computers to mobile devices. The system incorporates post-processing steps to enhance the recognition results. These may include temporal analysis to improve the consistency of predictions over time. A user-friendly interface is designed to facilitate interaction with the system, providing a seamless experience for both the hearing-impaired users and those interacting with them. The performance of the system is rigorously evaluated using standard metrics such as accuracy, precision, recall, and F1 score. Cross-validation techniques ensure robustness, and the system is tested on various datasets to assess its generalization capabilities. The system is designed to be extensible, allowing for future enhancements. This may include incorporating more advanced deep learning architectures, expanding the dataset for improved diversity, and exploring multimodal approaches that leverage both visual and temporal information for enhanced accuracy.
Real-time Translation in Video Calls: The system can be integrated into video conferencing applications to provide real-time sign language translation during video calls. This ensures that individuals who use sign language as their primary means of communication can actively participate in discussions and meetings. Education and E-Learning Platforms: Interactive Learning Environments:* Educational platforms can employ the system to create interactive learning environments for students with hearing impairments. The system can interpret sign language gestures made by teachers or peers, facilitating a seamless learning experience. Public Service Announcements: Accessibility in Public Spaces: Public spaces, such as transportation hubs or government offices, can utilize the system to display important information or announcements in sign language. This promotes inclusivity by ensuring that individuals with hearing impairments have access to critical information. Assistive Devices for Daily Living; Smart Assistive Devices:* Wearable devices equipped with cameras can leverage the system to interpret sign language for users in real-time. This could include translating sign language into text or spoken language, providing a more inclusive and accessible daily living experience. Customer Service and Support; Accessible Customer Support:* Companies can implement the system in their customer service channels, enabling hearing-impaired customers to interact seamlessly with support representatives. This promotes inclusivity in customer service interactions, enhancing the overall customer experience. Accessible Gaming: Game developers can integrate the system to create more inclusive gaming experiences. Players using sign language gestures can interact with the game environment, expanding accessibility in the gaming industry. These real-time use cases demonstrate the versatility and impact of a Sign Language Recognition System using Convolutional Neural Networks across various domains, promoting inclusivity and equal access to information and services for individuals with hearing impairments.
The presented system/method for Sign Language Recognition using Convolutional Neural Networks provides an effective and scalable solution for bridging communication gaps for individuals with hearing impairments. Through the integration of advanced deep learning techniques, the system achieves high accuracy and real-time performance, making it a valuable tool for fostering inclusivity in communication.
4 Claims & 1 Figure
Brief description of Drawing
In the figure which are illustrate exemplary embodiments of the invention.
Figure 1, Architecture of the Proposed Invention , Claims:The scope of the invention is defined by the following claims:
Claim:
1. A system/method to the identification of various sign language recognition using CNN, said system/method comprising the steps of:
a) The system starts up, data is gathered in organised manner (1). The data resizing (2) will be done before doing the data augmentations (3).
b) The developed system will be functioning based on three different architectures VGG16 (4), Hybrid architecture (5), and Natural Language model (6).
c) Based on these models, the input can be analyzed efficiently to produce the accurate results (7).
2. As mentioned in claim 1, the proposed invention would be a real time system wherein live sign gestures would be processed using image and voice processing.
3. As per the claim 1, the existing system in this area in terms of response time and accuracy with the use of efficient algorithms is improved, such as CNN, OpenCV and machine learning to train high quality data sets and better sensors.
4. As per the claim 1, the output comparison will be monitored and analyzed by three different architectures efficiently.
| # | Name | Date |
|---|---|---|
| 1 | 202341069029-REQUEST FOR EARLY PUBLICATION(FORM-9) [13-10-2023(online)].pdf | 2023-10-13 |
| 2 | 202341069029-FORM-9 [13-10-2023(online)].pdf | 2023-10-13 |
| 3 | 202341069029-FORM FOR STARTUP [13-10-2023(online)].pdf | 2023-10-13 |
| 4 | 202341069029-FORM FOR SMALL ENTITY(FORM-28) [13-10-2023(online)].pdf | 2023-10-13 |
| 5 | 202341069029-FORM 1 [13-10-2023(online)].pdf | 2023-10-13 |
| 6 | 202341069029-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [13-10-2023(online)].pdf | 2023-10-13 |
| 7 | 202341069029-EVIDENCE FOR REGISTRATION UNDER SSI [13-10-2023(online)].pdf | 2023-10-13 |
| 8 | 202341069029-EDUCATIONAL INSTITUTION(S) [13-10-2023(online)].pdf | 2023-10-13 |
| 9 | 202341069029-DRAWINGS [13-10-2023(online)].pdf | 2023-10-13 |
| 10 | 202341069029-COMPLETE SPECIFICATION [13-10-2023(online)].pdf | 2023-10-13 |