Abstract: In many different fields, there is a high demand for storing information on a computer storage disk from the data available in printed or handwritten documents or images to later re-utilize this information by means of computers. One simple way to store information in a computer system from these printed documents could be first to scan the documents and then store them as image files. But to re-utilize this information, it would be very difficult to read or query text or other information from these image files. Therefore, a technique to automatically retrieve and store information, in particular text, from image files is needed. Optical character recognition is an active research area that attempts to develop a computer system that automatically extracts and processes text from images. The objective of OCR is to achieve modification or conversion of any form of text or text-containing documents such as handwritten text, and printed or scanned text images, into an editable digital format for deeper and further processing. Therefore, OCR enables a machine to automatically recognize text in such documents. Some major challenges need to be recognized and handled to achieve successful automation. The font characteristics of the characters in paper documents and the quality of images are only some of the recent challenges. Due to these challenges, characters sometimes may not be recognized correctly by the computer system. To overcome these challenges, I will be using a tesseract engine that reads words automatically and approximately accurately words. The output of this project will be that it will be able to extract alphabets, digits, and symbols from an input image and convert them to a Ms-Excel file and it will also compare and tell what the difference is between two excel files.
Description:
The invention generally relates to field of optical character recognition. Converting Image data into a Ms-Excel file, comparing two excel files to determine what is the difference between them.
Background
[0002] The problem at hand is to recognize the textual information contained in a scanned image. The scanned images can come from a wide variety of source material, such as written answers to printed questions on forms, or mailing addresses on postal envelopes.
[0003] Many times, we need to write text from some hard paper like a magazine, newspaper, or some images. They could all involve you spending hours retyping manually and correcting typos. Or you could take a more modern approach and convert all of them into a digital format with fully editable text in a matter of minutes.
[0004] The process of OCR is prone to errors. Therefore an OCR system may be designed to identify several alternatives for segmenting a connected component, and several character choices for each character inside a segmentation alternative.
[0005] Conventional OCR recognition engines exist, which recognize characters with a reasonable accurracy. However, even a 90% accuracy rate at the character level means less than 50% at the word level, so over half of all words contain at least one error
Using Optical Character Recognition (OCR) System, we can capture the image of that page and give that to the OCR engine, OCR will extract the text from that image.
[0006] It happens that many times we go to School, Airport, Railway, or Bank, etc. then they have to enter our document details manually into their database (In Ms-Excel sheet), but by using OCR they can scan our document and extract all the details and easily they can save all the required details.
We Claim:
1. An optical character recognition (OCR) system comprising: means for producing a scan of an input image of text to be recognized; and a context analyzer , coupled to receive the scan, for checking the scan for consistency.
2. An OCR system as recited in claim further comprising means for facilitating modification of the library of semantic routines by a user of the OCR system.
3. A computer program product, for use with a processing system, comprising: a recording medium; means, recorded on the recording medium, for directing the processing system to operate the method.
| # | Name | Date |
|---|---|---|
| 1 | 202211071467-COMPLETE SPECIFICATION [12-12-2022(online)].pdf | 2022-12-12 |
| 1 | 202211071467-STATEMENT OF UNDERTAKING (FORM 3) [12-12-2022(online)].pdf | 2022-12-12 |
| 2 | 202211071467-DECLARATION OF INVENTORSHIP (FORM 5) [12-12-2022(online)].pdf | 2022-12-12 |
| 2 | 202211071467-REQUEST FOR EARLY PUBLICATION(FORM-9) [12-12-2022(online)].pdf | 2022-12-12 |
| 3 | 202211071467-DRAWINGS [12-12-2022(online)].pdf | 2022-12-12 |
| 3 | 202211071467-FORM 1 [12-12-2022(online)].pdf | 2022-12-12 |
| 4 | 202211071467-DRAWINGS [12-12-2022(online)].pdf | 2022-12-12 |
| 4 | 202211071467-FORM 1 [12-12-2022(online)].pdf | 2022-12-12 |
| 5 | 202211071467-DECLARATION OF INVENTORSHIP (FORM 5) [12-12-2022(online)].pdf | 2022-12-12 |
| 5 | 202211071467-REQUEST FOR EARLY PUBLICATION(FORM-9) [12-12-2022(online)].pdf | 2022-12-12 |
| 6 | 202211071467-COMPLETE SPECIFICATION [12-12-2022(online)].pdf | 2022-12-12 |
| 6 | 202211071467-STATEMENT OF UNDERTAKING (FORM 3) [12-12-2022(online)].pdf | 2022-12-12 |