Method And Apparatus For Processing Content Written In An Application

< Back

Method And Apparatus For Processing Content Written In An Application Form Using An E Pen

Abstract: The present invention provides a method and apparatus for processing content written in an application form, according to one embodiment. In one embodiment, stroke data corresponding to content written in fields of an application form is obtained from an e-pen. Then, words corresponding to the written content are extracted from the stroke data and confidence value is assigned to each of the words with respect to each of fields in the template application form. Each of the words corresponding to the written content is mapped to one of the fields in the template application form based on the confidence value assigned to each of the words. Moreover, a tag is assigned to each of the words indicating a mapping between each of the words and one of the fields, and the words along with the assigned tags are stored in the storage unit. Figure 4

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

18 April 2011

Publication Number

25/2013

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Jeswill Hitech Solutions Pvt. Ltd.

3524/1 2nd Floor Service Road HAL 2nd stage Indiranagar Bangalore 560008

Inventors

1. Parthasarathy Srinivasa Moorthy

T-2 Sriram Apartments Oil Mill Road Bangalore 560008

2. Pailla Balakrishna Reddy

217 B 2 Yamuna Nagar Koramangala Bangalore-560047

Claims

1. A computer implemented method for processing content written in an application form by an electronic pen (e-pen), comprising: obtaining stroke data corresponding to content written in a plurality of fields of an application form; extracting words corresponding to the written content from the obtained stroke data; assigning a confidence value to each of the words with respect to each of plurality of fields; mapping each of the words to one of the plurality of fields based on the confidence value assigned to each of the words; and storing each of the words mapped to said corresponding one of the plurality of the fields in a database.

2. The method of claim 1, further comprising: correcting position errors in the obtained stroke data using a trained data set.

3. The method of claim 1, further comprising: computing a skew angle associated with the obtained stroke data; and correcting skew errors associated with the obtained stroke data based on the computed skew angle.

4. The method of claim 1, wherein assigning the confidence value to each of the words with respect to each of the plurality of fields comprises: assigning the confidence value to each of the words based on distance between the plurality of fields and distance between the words.

5. The method of claim 1, wherein storing said each of the words mapped to said corresponding one of the plurality of the fields in the database comprises: assigning a tag to each of the words mapped to the one of the plurality of fields, wherein the tag indicates a mapping between each of the words and one of the plurality of fields to which said each of the words belongs; storing the each of words and the assigned tag in the database.

6. An apparatus comprising: a processor; and memory coupled to the processor, wherein the memory includes a form processing module comprising a mapping module configured for: extracting words from stroke data obtained from an electronic pen (e-pen), wherein the stroke data corresponds to content written in fields of application form; assigning a confidence value to each of the words with respect to each of the plurality of fields; mapping each of the words to one of the plurality of fields based on the confidence value assigned to each of the words; and storing each of the words mapped to the one of the plurality of the fields.

7. The apparatus of claim 6, wherein the form processing module comprises a position correction module configured for correcting position errors in the obtained stroke data using a trained data set.

8. The apparatus of claim 6, wherein the form processing module comprises a skew correction module configured for: computing a skew angle associated with the obtained stroke data; and correcting skew errors associated with the obtained stroke data based on the computed skew angle.

9. The apparatus of claim 6, wherein the mapping module is configured for assigning the confidence value to each of the words based on distance between the plurality of fields and distance between the words.

10. The apparatus of claim 6, wherein the mapping module is operable for: assigning a tag to each of the words mapped to the one of the plurality of fields, wherein the tag indicates a mapping between each of the words and one of the plurality of fields to which said each of the words belongs; and storing the each of words and the assigned tag in a storage unit.

Specification

REALTED APPLICATION

Benefit is claimed to India provisional Application No. 1337/CHE/2011, titled METHOD AND APPRATUS FOR ACCURATE FIELD EXTRATION USING A RELAXATION LABELLING TECHNIAQUE by JESWILL HITECH SOLUTIONS PVT.LTD., filed on 18th April 2011, which is herein incorporated in its entirety by reference for all purposes.

AFIELD OF THE INVENTION

The present invention relates to data processing systems, and more particularly relates to processing content written in an application form using an e-pen.

BACKGROUND OF THE INVENTION

E-Pen provides written content on an application form in form of strokes. The strokes position obtained from the e-pen may have non linear distortions. For example, the strokes have in-accuracies which vary very widely along the width and height of the application form. For example, the tilt in the e-pen while writing on the application form may result in in-accuracies in the data. If you map the co-ordinates the stroke data directly to the application form, lot of errors may occur due to the tilt in the e-pen and in-accuracies of the stroke data.

Typically, content written using an e-pen on an application form is recognized first and then stored in database in corresponding fields of the application form. Many engines are currently known for recognizing written strokes of the e-pen. However, none of the existing recognition engines provide the corresponding position of the pen content to remove inaccuracies in e-pen data.

Consider an empty application form with exact spatial locations of fields as shown in Figure 1A. As shown, the application form includes name field, date field and age field along with their respective spatial location. For example, the name field starts at 3" and ends at 4" in Y direction and starts at 2" and ends at 10" in X direction.

Consider that, a user writes content in fields of the application form using an e-pen as shown in Figure 1B. When the content is filed in each field, the written content like 18 in the age field along with the start location (X, Y) and end location of word (X,Y) is recorded in the e-pen. The stored content or pen strokes along with respective spatial locations are then mapped with corresponding fields of a template application form. For example, upon mapping, the recognition engine may recognize that the pen strokes 18 belong to the age field and 25-02-2010 belong to the date field. This may be the case when the e-pen data does not have inaccuracies. In majority of the cases, the e-pen may provide in-accurate data due to the tilt of the e-pen as well as inherent errors in the e-pen, resulting in mapping of the pen strokes to the wrong fields of the template application form.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

Figure 1A illustrates a schematic representation of an empty application form, in the context of the invention.

Figure 1B illustrate a schematic representation of an application form with content written using e-pen, in the context of the invention.

Figure 2 illustrates a block diagram of a form processing system, according to one embodiment.

Figure 3 is an exploded view of a form processing module such as those shown in Figure 2, according to one embodiment.

Figure 4 is a process flowchart illustrating an exemplary method of processing content written in an application form, according to one embodiment.

Figure 5 is a process flowchart illustrating an exemplary method of assigning confidence to words associated with stroke data written in the application form, according to one embodiment.

The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides a method and apparatus for processing content written in an application form using an electronic pen (e-pen). In the following detailed description of the embodiments of the invention, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims.

Figure 2 illustrates a block diagram of a form processing system 200, according to one embodiment. In Figure 2, the form processing system 200 includes an e-pen 202, a computing device 204, and a storage unit 206. The computing device 204 may be a personal computer, laptop, tablet, smart phone, and the like. The computing device 204 includes a processor 208 and memory 210 having a form processing module 212 stored therein.

When a user fills content in fields of an application form using the e-pen 202, the e-pen 202 captures stroke data corresponding to the written content and stores the stroke data in a non-volatile memory. The computing device 204 obtains the stroke data stored in the e-pen 202 when the e-pen 202 is connected to the computing device 204.

In an exemplary operation, the form processing module 212 performs enhancement of the stroke data corresponding to the written content by removing noise (e.g., wild points and distances between points within each stroke) from the stroke data and smoothening the stroke data. The form processing module 212 corrects position errors in the obtained stroke data using a trained data set 214. Typically, when a user fills content in an application form, actual point and captured point associated with the written content do not match due to inaccuracies in capturing the written content. Thus, in the present invention, the trained data set 214 indicating a relationship between actual point and captured point associated with the stroke data is formed in an offline mode using a neural network and stored in the memory. The form processing module 212 uses the trained data set 214 to resolve position errors in the strokes captured by the e-pen during filling the content in the application form. This helps the form processing module 212 to efficiently map the captured content to the fields in the application form.

The form processing module 212 computes a skew angle associated with the obtained stroke data and corrects skew errors associated with the obtained stroke data based on the computed skew angle. The stroke data captured by the e-pen 202 may contain skew errors due to improper placement of the application form or improper clipping of device to the application form.

The form processing module 212 extracts words corresponding to the written content from the stroke data. The form processing module 212 also assigns confidence value to each of the words associated with the stroke data with respect to each of fields in a template application form based on distance between the fields in the template application form and distance between the words. In one embodiment, for each word, confidence value is normalized with respect to the fields. This helps increase the confidence of the given word towards the field to which the word belongs. The process of assigning the confidence value is repeated till all the words are correctly mapped to the correct fields in the application. The process of assigning the confidence to the words associated with the stroke data is illustrated in greater detail in Figure 5.

Accordingly, the form processing module 212 maps each of the words corresponding to the written content to one of the fields in the template application form based on the confidence value assigned to each of the words. Finally, the form processing module 212 assigns a tag to each of the words indicating a mapping between each of the words and one of the fields, and stores the words corresponding to the written content and the associated tags in the storage unit 106.

Figure 3 is an exploded view of the form processing module 212, according to one embodiment. In Figure 3, the form processing module 212 includes a position correction module 302, a skew correction module 304, and a mapping module 306.

For example, the position correction module 302 corrects position errors in stroke data corresponding to content written in an application form using a trained data set 214. The skew correction module 304 computes a skew angle associated with the stroke data and corrects skew errors associated with the stroke data based on the computed skew angle.

The mapping module 306 extracts words corresponding to the written content from the stroke data and assigns a confidence value to each of the words associated with the stroke data with respect to each of fields in the template application form. The mapping module 308 maps each of the words corresponding to the written content to one of the fields in the template application form based on the confidence value assigned to each of the words. Finally, the mapping module 310 assigns a tag to each of the words indicating a mapping between each of the words and one of the fields and stores the words along with the assigned tags in the storage unit 106.

Figure 4 is a process flowchart 400 illustrating an exemplary method of processing content written in an application form, according to one embodiment. At step 402, stroke data corresponding to content written in fields of an application form is obtained from the e-pen 202. At step 404, position errors in the stroke data are corrected using the trained data set 214. At step 406, skew angle associated with the stroke data is computed. At step 408, skew errors associated with the stroke data are corrected based on the computed skew angle.

At step 410, words corresponding to the written content are extracted from the stroke data. At step 412, confidence value is assigned to each of the words with respect to each of fields in the template application form. At step 414, each of the words corresponding to the written content is mapped to one of the fields in the template application form based on the confidence value assigned to each of the words. At step 416, a tag is assigned to each of the words indicating a mapping between each of the words and one of the fields. At step 418, the words along with the assigned tags are stored in the storage unit 106.

In one embodiment, the form processing module 112 may be stored in the memory 110 in the form of instructions, that when executed by the processor 108, cause the processor 108 to perform a method steps of Figure 4. In another embodiment, the form processing module 112 may be stored in a computer-readable storage medium in the form of instructions, that when executed by the processor 108, cause the processor 108 to perform the method steps of Figure 4.

Figure 5 is a process flowchart 500 illustrating an exemplary method of assigning confidence to words associated with the stroke data, according to one embodiment. At step 502, initial confidence matrix is computed based on the distance between the fields in the template application form and the each of words corresponding to the written content. The initial confidence matrix includes confidence of each of the words belonging to a specific field. At step 504, the initial confidence matrix is normalized and filtered so that the initial confidence matrix maps the probability matrix. The probability matrix includes confidence assigned each of the words with respective each of the fields, where the confidence assigned to each of the words is normalized to values ranging from 0 to 1.
At step 506, the distance between the words corresponding to the written content is computed. It can be noted that, the computed distance between the words correlates the distance between the fields of the template application form. At step 508, best probable field for each of the words is determined based on the computed distance between the words and the distance between the fields. This helps in accurately mapping the words to the correct fields of the application form. However, some of the words may be mapped to the wrong fields. In such case, for each of incorrectly mapped words, a score of mapping one or more words to each of the fields is computed at step 510.

At step 512, the score for mapping the words to the correct field is transformed into a likelihood of mapping a particular word to a correct field and the probability matrix is updated based on the likelihood of the word belonging to a particular field. At step 514, the update probability matrix is normalized (e.g., to a value ranging from 0 to 1). The steps 508-514 are repeated for a pre-determined number of iterations so that upon completion of pre-determined number of iterations, each of the words corresponding to the written content is mapped to the correct field of the template application form, at step 516.

The present embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the various embodiments. Furthermore, the various devices, modules, and the like described herein may be enabled and operated using hardware circuitry, for example, complementary metal oxide semiconductor based logic circuitry, firmware, software and/or any combination of hardware, firmware, and/or software embodied in a machine readable medium. For example, the various electrical structure and methods may be embodied using transistors, logic gates, and electrical circuits, such as application specific integrated circuit.

We Claim:

1. A computer implemented method for processing content written in an
application form by an electronic pen (e-pen), comprising:

obtaining stroke data corresponding to content written in a plurality of fields of an application form;

extracting words corresponding to the written content from the obtained stroke data; assigning a confidence value to each of the words with respect to each of plurality of
fields;

mapping each of the words to one of the plurality of fields based on the confidence value assigned to each of the words; and

storing each of the words mapped to said corresponding one of the plurality of the fields in a database.

2. The method of claim 1, further comprising:
correcting position errors in the obtained stroke data using a trained data set.

3. The method of claim 1, further comprising:
computing a skew angle associated with the obtained stroke data; and correcting skew errors associated with the obtained stroke data based on the computed skew angle.

4. The method of claim 1, wherein assigning the confidence value to each of the
words with respect to each of the plurality of fields comprises:

assigning the confidence value to each of the words based on distance between the plurality of fields and distance between the words.

5. The method of claim 1, wherein storing said each of the words mapped to
said corresponding one of the plurality of the fields in the database comprises:

assigning a tag to each of the words mapped to the one of the plurality of fields, wherein the tag indicates a mapping between each of the words and one of the plurality of fields to which said each of the words belongs;

storing the each of words and the assigned tag in the database.

6. An apparatus comprising:

a processor; and

memory coupled to the processor, wherein the memory includes a form processing module comprising a mapping module configured for:

extracting words from stroke data obtained from an electronic pen (e-pen), wherein the stroke data corresponds to content written in fields of application form;

assigning a confidence value to each of the words with respect to each of the plurality of fields;

mapping each of the words to one of the plurality of fields based on the confidence value assigned to each of the words; and

storing each of the words mapped to the one of the plurality of the fields.

7. The apparatus of claim 6, wherein the form processing module comprises a position correction module configured for correcting position errors in the obtained stroke data using a trained data set.

8. The apparatus of claim 6, wherein the form processing module comprises a skew correction module configured for:

computing a skew angle associated with the obtained stroke data; and correcting skew errors associated with the obtained stroke data based on the computed skew angle.

9. The apparatus of claim 6, wherein the mapping module is configured for
assigning the confidence value to each of the words based on distance between
the plurality of fields and distance between the words.

10. The apparatus of claim 6, wherein the mapping module is operable for:
assigning a tag to each of the words mapped to the one of the plurality of fields, wherein the tag indicates a mapping between each of the words and one of the plurality of fields to which said each of the words belongs; and

storing the each of words and the assigned tag in a storage unit.

Documents

Application Documents

#	Name	Date
1	Form-1.doc	2011-09-03
3	1337-CHE-2011 CLAIMS 29-03-2012.pdf	2012-03-29
4	1337-CHE-2011 POWER OF ATTORNEY 29-03-2012.pdf	2012-03-29
5	1337-CHE-2011 FORM-5 29-03-2012.pdf	2012-03-29
6	1337-CHE-2011 FORM-2 29-03-2012.pdf	2012-03-29
7	1337-CHE-2011 FORM-1 29-03-2012.pdf	2012-03-29
8	1337-CHE-2011 DRAWINGS 29-03-2012.pdf	2012-03-29
9	1337-CHE-2011 DESCRIPTION (COMPLETE) 29-03-2012.pdf	2012-03-29
10	1337-CHE-2011 CORRESPONDENCE OTHERS 29-03-2012.pdf	2012-03-29
11	1337-CHE-2011 ABSTRACT 29-03-2012.pdf	2012-03-29
12	1337-CHE-2011 CORRESPONDENCE OTHERS 18-04-2012.pdf	2012-04-18
13	1337-CHE-2011 POWER OF ATTORNEY 18-04-2012.pdf	2012-04-18
14	1337-CHE-2011 FORM-18 18-04-2012.pdf	2012-04-18
15	abstract1337-CHE-2011..jpg	2012-10-30
16	1337-CHE-2011-FER.pdf	2018-05-15
17	1337-CHE-2011-AbandonedLetter.pdf	2018-11-30

Search Strategy

1	SEARCH_STRATEGY_1337-CHE-2011_11-05-2018.pdf