Sign Language Recognition Using Convolution Neural Network

Abstract: The deaf and hard of hearing community generally uses sign language for communication. Hand motions are recognized , and text or speech is produced via Sign Language Recognition (SLR).In sign language, hand movements fall into two categories: static and dynamic. The ability to recognize both static and dynamic gestures is crucial for efficient communication within this society. The cconvolution neural networks and other techniques from computer vision and deep learning can be used to recognize hand motions. Over epochs, models are trained to learn and recognize images of hand gestures. The conversion of recognized motions into equivalent English text facilitates communication. It is possible to turn the generated text into speech. Sign language recognition is a human narrative as much as a technology miracle. It's about elevating voices that are frequently ignored, promoting empathy, and dismantling obstacles to communication. It is evidence of how technology can unite people and appreciate the variety of human expression. Introducing sign language detection, a technologically driven ray of hope. It is akin to constructing a bridge of comprehension, whereby complex hand gestures are interpreted, their silent poetry converted into written or spoken words.

Patent Information

Application #

Filing Date

24 April 2024

Publication Number

18/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Applicants

MLR Institute of Technology

MLR Institute of Technology, Hyderabad

Inventors

1. Mrs. N Thulasi Chitra

Department of Computer Science and Information Technology, MLR Institute of Technology, Hyderabad

2. Dr. P Subhashini

Department of Computer Science and Information Technology, MLR Institute of Technology, Hyderabad

3. Ms. D Manogna

Department of Computer Science and Information Technology, MLR Institute of Technology, Hyderabad

4. Mrs.Munugapati Bhavana

Department of Computer Science and Information Technology, MLR Institute of Technology, Hyderabad

Specification

Description:Field of Invention
This invention relates to sign language recognition. This invention introduces various architectures of CNN for sign language recognition using many convolution layers. We have used Convolution Neural Networks (CNN) architecture in the proposed system for learning about feature classification.
Objective of the Invention
The main objective of the innovation is building a model which will convert sign gestures into readable text. Our model is capable of identifying the words from gestures. Not only that we can enable the video capture, due to which the model scan prepare the sentences from video sessions.
Background of the Invention
Recent works are start with on the recent development of technologies being used for earshot and speech loss people to converse easily and frequently with usual people. [2]The work so far gone for the development introduced in striped technologies and techniques such as android application, techniques such as CNN,RNN etc.
The Real-Time Hand Gesture Detection paper included an algorithm where the video was first captured and then divided into different frames and the frame containing the image was extracted and various other features such as Guassi difference were extracted from it. Scale spatial feature detector etc was extracted using SIFT which helped in gesture recognition. Archana S Ghotkar, Rucha Khatal, Sanjana Khupase, Surbhi Asati and MIthila Hadop developed a different method for hand gesture recognition for Indian Sign Language, which consisted of using Camshift and the HSV model and then using a genetic algorithm to recognize the gestures. making the cam and HSV model was difficult because it was not easy to make it compatible with different versions of MATLAB and the genetic algorithm takes a huge time to develop.
P Subha Rajan and Dr Geges for Indian Sign Language developed a method in which G Balakrishnan proposed to recognize each gesture with a 7-bit orientation and generation process using RIGHT and LEFT scanning. The following process required about six modules and was a tedious way to recognize characters[3]. The method was developed by T. Shanableh for recognition of individual gestures of Arabic sign language in independent of the user. In that method, signers wore gloves to facilitate segmentation of the signer's hands using color segmentation.
The effectiveness of the proposed user-independent feature extraction was evaluated using two different classification methods; namely K-NN and polynomial networks. Many researchers have used special equipment to recognize sign language. Byung - woo min et al presented static gesture or dynamic gesture visual recognition, where hand movements obtained from visual images were recognized on a 2D image plane without external devices. Gestures were detected by task-specific spatial transition based on natural human articulation[8]. Static gestures were detected using snapshots of hand positions, while dynamic gestures were detected by analyzing their trajectories using hidden Markov models (HMM).
In Jing-hao Sun The human hand was separated from the complex context, and the CamShift algorithm was used to detect real-time hand gestures. Then, using a convolutional neural network, the region of hand movements that was observed in real time is recognised, resulting in the identification of 10 common digits. The proposed system has dataset of total 1600 pictures for training dataset, 4000 hand gesture, 400 images for each type. This experiment shows accuracy about 98.3 percent.
Hasanused scaled normalisation to recognise gestures using brightness factor matching. With a black background, thresholding techniques are used for segmenting the input images . At the X and Y axis origins, the coordinates of any segmented image are shifted to match the centroid of the hand unit. and the image’s centre mass is determined. Using a boundary histogram, Wysoski et al[3]. provided rotation invariant postures. The input image was captured with a camera, a filter for skin colour detection was applied, and then a clustering procedure was used to find the border line of each category in the pooling image using a standard contour tracking algorithm. Grids were created from the picture, and the boundaries were normalised. Geethu Nath and Arun C.S. Developed an ASL symbol recognition system based on the ARM CORTEX A8 processor. The machine recognises numbers using the Jarvis algorithm and alphabets using the template matching algorithm.Using Principal Component Analysis (PCA) and various distance classifiers, Kumud Tripathi [7] developed a framework for recognising continuous ISL gestures. The features from the keyframes are extracted from the own data set using Orientation Histogram and provided as input to the device.Noor Tubaiz [8] proposed using the Modified k-Nearest Neighbor (MKNN) approach to classify sequential data. Data gloves are used to detect hand movements.To supplement the raw data, windowbased statistical features are calculated from previous raw feature vectors and future raw feature vectors. To recognise terms in ISL, the proposed framework was developed using novel techniques based on existing systems (ISL).Describe an approach for a continuous sign language recognition method (B. Bauer et al.). It is a framework that depends on continuous hidden Markov models images (HMM). It employs German sign language (GSL). Feature vectors that represent manual signs are fed into the device.
Summary of the Invention
Several methods for bettering the data used by the CNN model have been explored. In addition, CNN was employed in the suggested system for feature classification learning. Adding layers upon layers to this model has been essential to correctly analyzing its performance, since CNN works by the embedding of layers inside by linking neurons. The data collection used to train this model includes both genuine and highly altered synthetic photos. This approach is effective in terms of photo recognition and the data set used places a premium on recognize.
Detailed Description of the Invention
Image Aquistion, It is the action of extracting an image from a source, typically a hardware-based source, for process of image processing. Web Camera is the hardware-based source in our project. It is the first step in the workflow sequence because no processing can be done without an image. The picture that is obtained has not been processed in any way. Segmentation, The method of separating objects or signs from the context of a captured image is known as segmentation. Proposed Methodology, The motion and location of the hand must be detected and segmented in order to recognize gestures using tensor board.
Features Extraction, Predefined features such as form, contour, geometrical feature (position, angle, distance, etc), colour feature, histogram, and others are extracted from the preprocessed images and used later for sign classification or recognition. Feature extraction is a step in the dimensionality reduction process that divides and organises a large collection of raw data. Reduced to smaller, easier-to-manage classes As a result, processing would be simpler. The fact that these massive data sets have a large number of variables is the most important feature. To process these variables, a large amount of computational power is needed. As a result, function extraction aids in the extraction of the best feature from large data sets by selecting and combining variables into functions. Reducing the size of the data these features are simple to use while still accurately and uniquely describing the actual data collection.
Preprocessing, Each picture frame is preprocessed to eliminate noise using a variety of filters including erosion, dilation, and Gaussian smoothing, among others. The size of an image is reduced when a colour image is transformed to grayscale. A common method for reducing the amount of data to be processed is to convert an image to grey scale. The phases of preprocessing are as follow collecting different sign language gesture data, pre-processing it for consistency and choosing a suitable machine learning model such as Convolution Neural Networks (CNN). Once the model is trained, real-time recognition is realized through a user-friendly interface that enables translation of recognized gestures into text for effective communication. The process involves an iterative feedback loop, gathering user input, and refining the model and system accordingly.
Dataset,The dataset collected is clearly divided into individual sets are compared with the present system data. Data can be changed according to the results of the testing outcomes. They can be further added into the files again.
Training,Some data is sent for training which will be stored again in the data files.They can be changed according to the output of the required results.
Result,When the process of the image is started the image will be checked with the trained dataset and then the output will be calculated. Fig 1 is the example of the trained and also the tested dataset result. Fig 1 given as input to our pre-trained model and the input image given to neural network to recognize text present in the given image and generate the text.
CNN,is a neural network where the point birth is performed on the image and point map done on the input image.CNN uses largely effective structures to interpret 2D cinema in order to mimic mortal visual processing. It can also sufficiently learn how to prize and abstract 2Dstamps. In particular, CNN's outside-pooling caste is partially good at absorbing shape oscillations. Likewise, because of the stinting reference to tied loads, CNN requires lower boundaries than a totally cognate organization of undifferentiated from size.
Importantly, it can be trained using a grade- predicated knowledge system and is less susceptible to the abating grade problem. Using a grade- predicated approach, the entire network is trained to directly depress a blunder criterion.
Feature selection,For point selection, we employ a neural network. Convolutional Neural Network (CNN) layers, which make up the system. [5]We used 5 CNN (point birth) layers, to begin, we must first pre-process the images in order to remove the noise. The text is recognised on a character-by-character base, words or handbooks not plant in the training data are also recognised.

Brief description of Drawing
In the figure which is illustrate exemplary embodiments of the invention.
Figure 1. (a) Input image & train the image and (b) Output image , Claims:The scope of the invention is defined by the following claims:

Claim:
1. A system/method to detect sign language recognition using the convolution neural network based Machine Learning algorithms, said system/method comprising the steps of:
a) The system starts taking the image as input from the webcam.
b) The proposed system extracts the sign feature using segmentation from the input image taken by the webcam.
c) The datasets are divided into training and testing sets using tensor board, from this the model is built with CNN algorithms.
2. As mentioned in claim 1, the sign language recognition is often used as a pre-processing step in sign language recognition systems to eliminate the noise from the image.
3. As mentioned in claim 1, fig 1 given as input to the pre-trained model and image given to the neural network to recognize the sign in the image and generate the text.
4. As mentioned in claim 1, when compared to other results our outputs were clear and the problems in data cleaning are solved. The images are perfectly trained and the recognition is matching the testing and trained dataset images.

Documents

Application Documents

#	Name	Date
1	202441032326-REQUEST FOR EARLY PUBLICATION(FORM-9) [24-04-2024(online)].pdf	2024-04-24
2	202441032326-FORM-9 [24-04-2024(online)].pdf	2024-04-24
3	202441032326-FORM FOR SMALL ENTITY(FORM-28) [24-04-2024(online)].pdf	2024-04-24
4	202441032326-FORM 1 [24-04-2024(online)].pdf	2024-04-24
5	202441032326-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [24-04-2024(online)].pdf	2024-04-24
6	202441032326-EVIDENCE FOR REGISTRATION UNDER SSI [24-04-2024(online)].pdf	2024-04-24
7	202441032326-EDUCATIONAL INSTITUTION(S) [24-04-2024(online)].pdf	2024-04-24
8	202441032326-DRAWINGS [24-04-2024(online)].pdf	2024-04-24
9	202441032326-COMPLETE SPECIFICATION [24-04-2024(online)].pdf	2024-04-24