A System/Method To Identify Hand Written Telugu Characters Using

< Back

A System/Method To Identify Hand Written Telugu Characters Using Convolutional Neural Network Algorithm

Abstract: The Deep learning techniques, specifically Convolutional Neural Networks (CNN), have superseded all previous state-of-the-art machine learning methods in the field of computer vision. The Deep learning is currently extensively utilized in various applications for analyzing text data, voice data, and data from sensors. However, it has made significant progress in analyzing image data through the use of convolutional neural networks. In our invention a Convolutional Neural Networks (CNN) is used for identifying handwritten Telugu letters in an offline setting. The CNNs diverge from the conventional method of Handwritten Telugu Character Recognition (HTCR) by autonomously extracting the features. The Convolutional Neural Network (CNN) model is created by training it with the Telugu characters dataset generated by HPLabs, India in an offline fashion. The model shows excellent recognition performance on both the training and testing datasets. 4 Claims & 1 Figure

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

29 June 2024

Publication Number

27/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

MLR Institute of Technology

Laxman Reddy Avenue, Dundigal-500043

Inventors

1. Mr. V. Nitin

Department of Computer Science and Information Technology, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043

2. Mrs. B. Varija

Department of Computer Science and Information Technology, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043

3. Dr. Venkata Nagaraju Thatha

Department of Computer Science and Information Technology, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043

4. Mr. D. Sandeep

Department of Computer Science and Information Technology, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043

Specification

Description:Field of Invention
The process of digitizing a handwritten document encompasses several tasks, including converting a color or grayscale image into a binary image, extracting the text from the background texture, eliminating noise, separating the lines, segmenting the words within each line, segmenting the characters within each word, and recognizing the individual characters. This effort focuses solely on the recognition of individual characters. The primary difficulty in Handwritten Character Recognition (HCR) lies in the diverse range of writing styles exhibited by different individuals. Even an individual's handwriting can exhibit variations at different instances. HCR encompasses many pre-processing procedures, including the division of pages, the separation of lines and characters, and the identification of each character using a trained model. Machine learning techniques were employed to recognize the characters in the segmented character, which is a component of a bigger image, such as a scanned document. A conventional machine learning algorithm acquires knowledge about the characteristics of characters by analyzing a collection of training characters that already have known labels. Conventional machine learning methods have been employed in handwritten character recognition (HCR) for an extended period. The conventional approach to recognizing handwritten characters in machine learning involves pre-processing, segmenting, extracting features, and classifying. A recent advancement in machine learning is the utilization of deep learning, which involves autonomous feature extraction. This approach has demonstrated superior performance in various computer vision tasks, such as character recognition.
Background of the Invention
In early days, most of the documents were handwritten and hence, in the process of digitization, handwriting recognition had become one of the important applications. Dating back to 1950s, automating handwriting recognition had been an area where improvements were made continuously. In 1990, the first handwritten digit classifier was developed by LeCun et al. (1990). It was developed for automating the segregation of mails in postoffices based on pincodes (which contained only numbers). Several work has been in the literature on HCR using CNN for many languages such as Chinese, Japanese, English, Arabic etc. Deep convolutional neural networks have been successful in handwritten Chinese Character Recognition (CCR). The first breakthrough of CNN happened in 2012 in ILSVRC competition when AlexNet proved almost near human performance on ImageNet dataset. But Cires¸An et al. (2012) had produced best results in CNN earlier in the same year (US20180137349A1). They have tested the Chinese character dataset developed by Institute of Chinese of Automation of Chinese Academy Sciences (CASIA) composed of 300 samples of 3755 characters as one of the experiments using Multi Column Deep Neural Network (MCDNN). They implemented this network to have more number of layers and feature maps per layer. The model was trained on GPUs for faster training for a dataset of 1 million characters. This MCDNN have produced an error rate of 6.5% on offline task and 5.61% on online task. Apart from Chinese characters, they have also tested this MCDNN on Modified National Institute of Standards and Technology (MNIST) dataset and Latin characters to produce an error rate of 0.23% and 11.63% (all characters) respectively.
The OnLine HandWritten DataBase (OLHWDB) have more than 3 million isolated character samples and offline HandWriting Database (HWDB) have handwritten text pages with more than million character samples. The isolated character samples are further classified as DB1.0 and DB1.1 which differ by the number of writers and number of classes. DB1.1 is the widely used dataset for CCR which involves 3755 classes. Another database SCUT-COUCH2009, developed by The South China University of Technology includes Chinese as well as English letters, digits, traditional chinese characters, Pinyin and symbols. Though actual traditional Chinese language has more than 50,000 characters, datasets have character classes that are widely used in practice which is mostly 3755 classes. Benchmarking on online and offline CCR was performed on HWDB1.0, HWDB1.1, OLHWDB1.0 and OLHWDB1.1 datasets by Liu et al. (2013) using traditional methods. They implemented a three step process of normalization, feature extraction and classification (JP6840871B2). Seven different normalization techniques and four different classifiers like MQDF, DLQDF were used to achieve an accuracy of 92.08% and 94.85% on offline and online datasets respectively.
Arabic Handwritten Character Recognition (AHCR) had been one of the great interests of Arabic scripts research community. To motivate more research there has also been competitions and development of various databases for Arabic scripts. ICDAR have been organizing Arabic handwritten word and text recognition competitions since 2005. The dataset consisted of seven subsets, of which five were used for training and two were closed test sets which is not from the same writers. Arabic script is written from left to right and it basically has 28 characters, each can take different shape based on the position in the word. Most of the characters take four shapes, some take two shapes and very few take more than four shapes. Some characters have dots above or below the script (US10185882B2).
Devanagiri script is used as writing script for languages such as Hindi, Sanskrit, Marathi, Nepali and Konkani. ISIDCHAR and V2DMDCHAR are the two benchmarked databases for Devanagiri script (Bhattacharya and Chaudhuri, 2005) . An artificial neural network based devanagiri handwritten character recognition for some of the character classes was developed by Khanale and Chitnis (2011) with a recognition accuracy of 96%. A multilayer perceptron was used by Verma (1995) for recognizing handwritten Devanagiri characters along with Radial Basis Function and BackPropagation algorithm. Pal et al. (2008) used zonal information in addition to shape features to classify printed Devanagiri script for an accuracy of 96%. Patil et al. (2012) developed an OCR system for handwritten Devanagari character recognition which used traditional approach for extracting the features using Fourier Descriptor. Training set used 400 images and test set used 100 images and reported an accuracy between 90% to 100%. Jangid and Srivastava (2018) performed layerwise training of CNN along with adaptive gradients. A block of convolution and pooling layer was added in phases during the training of the model and various adaptive gradient optimizers were used. They achieved 98% recognition accuracy on two popular databases for handwritten Devanagiri characters.
While deep learning approaches have been extensively employed in Handwritten Character Recognition for many languages including Chinese, Arabic, and English, most of the work done in Telugu has relied on traditional methodologies. A conventional method for Handwritten Text Character Recognition (HTCR) utilizing traditional machine learning techniques typically involves several steps: pre-processing, character segmentation, feature extraction, classification, and prediction of new characters. In their study, Sureshkumar and Ravichandran (2010) analyzed character glyphs and identified several attributes including height, width, number of horizontal lines, vertical lines, circles, horizontally oriented arcs, vertically oriented arcs, centroid of image, position, and pixels in different regions. Before extracting features, several preprocessing approaches were employed, including skew detection, smoothing, thresholding, and skeletonization. They inputted the full printed paper and scanned it as a picture. Therefore, the image is divided into separate characters from which the characteristics are retrieved. The collected features were classified using Support Vector Machines (SVM), Self Organizing Maps (SOM), Fuzzy network, RCS algorithm, and Radial basis function. Nevertheless, the information regarding the size of the dataset used for training and testing was not given.
Summary of the Invention
Deep learning techniques, specifically Convolutional Neural Networks (CNN), have superseded all other machine learning techniques in the field of computer vision. Deep learning is currently extensively utilized in various applications for analyzing text data, voice data, and data from sensors. However, it has made significant progress in analyzing image data through the use of convolutional neural networks. The proposed invention examines the utilization of Convolutional Neural Networks (CNN) for the purpose of identifying handwritten Telugu letters in an offline setting. CNNs diverge from the conventional method of Handwritten Telugu Character Recognition (HTCR) by automatically extracting the characteristics. The offline convolutional neural network (CNN) model using HPLabs is trained, which is one of the India's Telugu characters dataset. Both the training and testing datasets have shown that the model has obtained favorable recognition outcomes. With the help of state-of-the-art deep learning techniques, this research intends to set a benchmark for offline HTCR.
Brief Description of Drawings
Figure 1: System Architecture of CNN
Detailed Description of the Invention
One form of deep learning model that has had extensive application in computer vision in the last ten years is Convolutional Neural Networks (CNNs). Among the many uses for these algorithms are in the fields of picture categorization, object detection, captioning, face recognition, activity recognition, and pedestrian detection, among others. CNNs, which stand for "Convolutional Neural Networks," are multi-layered neural networks. The goal of these networks is to build precise representations of attributes by methodically identifying and extracting pieces from data layer by layer. At the first few layers of the network, simple patterns are recognized. Later on, at higher levels, these patterns are joined to create more complex abstractions. The training images are inputted into the network, where each layer convolutes the images with a specific set of weights and passes them through all the layers in the forward direction. The network computes the difference at its final stage by utilizing a loss function to measure errors. The weights are modified in each layer during backward propagation, based on these faults. The weights are modified by an optimization function. A full iteration consists of both forward and backward propagation, when the weights are changed. This process continues until convergence is achieved. Convolutional neural networks, in contrast to neural networks, typically consist of convolution layers, pooling layers, and fully connected layers, as illustrated in Figure 1. The CNN model is constructed using these three blocks, with the flexibility to modify the number of blocks by adding or removing them. Various topologies utilize distinct combinations of these layers, along with activation units and additional techniques like normalization and regularization.
The convolution layer is the key element of the CNN design. In contrast to the layers in a normal neural network, not every pixel (neuron) is directly connected to the next layer with a weight and bias. Instead, the entire image is divided into smaller regions, such as a n × n matrix, and weights and bias are applied to these regions as a whole. The weights and bias in question are commonly known as filters or kernels. These filters are applied to each small region of the input picture by convolution, resulting in the generation of feature maps. These filters are basic characteristics that are sought in the input image during the convolution layer. The input consists of an array of pixels with dimensions w × h × d (w representing width, h representing height, and d representing depth of an image). A filter or kernel, with dimensions k × k, is then moved across the width and height of the input image, covering the entire area. During the process of sliding through the image, each individual pixel value in the input image is multiplied with the corresponding values in the filter element. These products are then added together to obtain a single pixel value for the output image. Every layer produces a collection of activation maps or feature maps, with each filter generating one map. These maps are then used as input for the subsequent convolution layer. The convolution operation is explicitly elucidated in Figure 1. In the context of a 5 × 5 image, a 3 × 3 filter is applied to the image by sliding it across one pixel at a time. This process continues until the filter reaches the last column, resulting in three convolution operations for the first row. Next, the filter is shifted one pixel downwards from the top left corner and then moved over the entire image till reaching the end. This horizontal and vertical sliding occurs until it reaches the bottom rightmost 3×3 block. This would result in a 3×3 activation map. Similarly, when there are n filters, n activation maps will be generated. The number of parameters needed for this convolution operation would be small as the same filter is applied to the entire image for a single feature. Convolution facilitates parameter sharing by scanning for the same feature over the entire image. Another benefit is the sparse nature of connections, which is determined by a restricted number of inputs for each output value. The hyperparameters of a convolution layer include the number of filters (depth), the size of the local region, the stride, and the padding. To improve the results, the hyperparameters can be adjusted based on the size and genre of the input image.
The pooling layer aggregates the neighboring outputs inside the same kernel map. This layer decreases the size of the activation maps, excluding the depth, prior to passing it to the subsequent layer. The image's spatial dimension and the number of parameters are both lowered, resulting in a decrease in computational complexity. This layer carries out a predetermined function on the input, and therefore does not add any parameters. Therefore, there is no updating of weights during the backpropagation process. This layer exhibits translational invariance, allowing for accurate image recognition even in the presence of minor positional variations. However, the location data is forfeited when the size is decreased. There are various options for pooling functions.
In the fully connected layer, the flattened output of the last pooling layer is used as the input for a completely connected layer. This layer functions similarly to a conventional neural network layer, with each neuron in the previous layer being connected to the current layer. Therefore, the number of parameters in this layer is greater in comparison to the convolution layer. The fully connected layer is linked to an output layer, typically serving as a classifier that produces the probability score for each class. The fundamental element of an artificial neural network is the neuron, which is stimulated by an activation function. A neuron receives inputs and applies weights to them, resulting in a certain output determined by a threshold value. The threshold value can be derived by applying a function to the weighted total of all the inputs. Activation functions can be categorized as either linear or non-linear. Machine learning commonly employs many activation functions, including sigmoid, tanh, ReLU, and swish. Activation functions are utilized in each concealed layer as well as the output layer.
Backpropagation algorithm is used to calculate the partial derivative i.e gradient of every node of the network. These partial derivatives will be used for training the network, where a loss function would define the deviation from the correct output. This error propagates backward from the model output back to the initial layers. The weights and biases of the network which would minimize the cost function over many iterations should be determined. In order to achieve this, several algorithms are available, which are popularly known as optimization algorithms. The basic optimization algorithm is gradient descent which uses the first derivative for determining the parameters i.e the weights and biases of the network so as to reduce the cost function. The main goal of these algorithms is to determine the global minimum of the cost function. This is applicable for cost functions which are convex in nature. For non-convex cost functions, a least value of the cost function among its neighbourhood is identified which is a local minimum. Optimization algorithms should speed up the convergence and minimize the error rates. In a bowl shaped architecture such as convex functions, it is required to find the bottom of the bowl i.e the minimum of the function. To attain this, starting from a random point the steps should be taken down (in the negative direction) to reach the global minimum. Two factors should be considered to reach the minimum i) which direction to move ii) how steep the next step should be taken (step size). The best direction which leads to the minimum can be determined by computing the gradients which are the partial derivatives of the vector. The step size is another important factor while trying to reach the minimum. This is fixed by a parameter, learning rate. The learning rate should be chosen such that the learning should be faster as well as the loss should be minimum. If larger steps are taken, sometimes it may lead to higher loss. If the learning rate is too small, the learning process takes longer time. Hence, step size should be small enough to reach the minimum, but not too small. The gradient should be computed repeatedly over many iterations to determine the global minimum. After each iteration, the parameters are updated. In a typical gradient descent algorithm, the gradient of all the samples in the training set is computed and parameters are updated for every iteration. Updation of parameters happens only after traversing the entire dataset.
4 Claims & 1 Figure , Claims:The scope of the invention is defined by the following claims:

Claim:
1. A System/Method to Identify Hand written Telugu Characters using Convolutional Neural Network Algorithm comprising:
a) A method is designed to preprocess the given input. The convolution layer is the key element of the CNN design. In contrast to the layers in a normal neural network, not every pixel (neuron) is directly connected to the next layer with a weight and bias. Instead, the entire image is divided into smaller regions, such as a n × n matrix, and weights and bias are applied to these regions as a whole. The weights and bias in question are commonly known as filters or kernels. These filters are applied to each small region of the input picture by convolution, resulting in the generation of feature maps.
b) The convolution operation is explicitly elucidated. In the context of a 5 × 5 image, a 3 × 3 filter is applied to the image by sliding it across one pixel at a time. This process continues until the filter reaches the last column, resulting in three convolution operations for the first row.
c) A Back propagation algorithm is used to calculate the partial derivative i.e gradient of every node of the network. These partial derivatives will be used for training the network, where a loss function would define the deviation from the correct output.
2. A System/Method to Identify Hand written Telugu Characters using Convolutional Neural Network Algorithm as claimed in claim1, led to pre-process the given input data with help of convolution Layer. .
3. A System/Method to Identify Hand written Telugu Characters using Convolutional Neural Network Algorithm as claimed in claim1, Feature vectors can be generated with help of pooling layer.
4. A System/Method to Identify Hand written Telugu Characters using Convolutional Neural Network Algorithm as claimed in claim1, Back propagation algorithm is used to identify the hand written characters.

Documents

Application Documents

#	Name	Date
1	202441049920-REQUEST FOR EARLY PUBLICATION(FORM-9) [29-06-2024(online)].pdf	2024-06-29
2	202441049920-OTHERS [29-06-2024(online)].pdf	2024-06-29
3	202441049920-FORM-9 [29-06-2024(online)].pdf	2024-06-29
4	202441049920-FORM FOR STARTUP [29-06-2024(online)].pdf	2024-06-29
5	202441049920-FORM FOR SMALL ENTITY(FORM-28) [29-06-2024(online)].pdf	2024-06-29
6	202441049920-FORM 1 [29-06-2024(online)].pdf	2024-06-29
7	202441049920-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [29-06-2024(online)].pdf	2024-06-29
8	202441049920-EDUCATIONAL INSTITUTION(S) [29-06-2024(online)].pdf	2024-06-29
9	202441049920-DRAWINGS [29-06-2024(online)].pdf	2024-06-29
10	202441049920-COMPLETE SPECIFICATION [29-06-2024(online)].pdf	2024-06-29