Sign In to Follow Application
View All Documents & Correspondence

Methods And Systems Of Text Extraction From Images

Abstract: A method for extracting text from an image data is disclosed. The method includes pre-processing, via a processor, the image data to obtain a readable image data. The method further includes filtering, via the processor, a plurality of copies of the readable image data using a plurality of noise filters to obtain a corresponding plurality of noise removed images. Yet further, the method includes performing, via the processor, image data recognition on each of the plurality of noise removed images to obtain a text copy associated with each of the plurality of noise removed images. Moreover, the method includes ranking, via the processor, each word in the text copy associated with each of the plurality of noise removed images based on a predefined set of parameters. Finally, the method includes selecting, via the processor, highest ranked words within the text copy associated with each of the plurality of noise removed images to obtain output text for the image data.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
16 March 2015
Publication Number
15/2015
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
ipr@akshipassociates.com
Parent Application
Patent Number
Legal Status
Grant Date
2022-05-25
Renewal Date

Applicants

WIPRO LIMITED
Doddakannelli, Sarjapur Road, Bangalore 560035, Karnataka, India.

Inventors

1. HARIHARA VINAYAKARAM NATARAJAN
301, 10th Cross, 5th Main, ISRO Layout, Bangalore, Karnataka, India
2. TAMILSELVAN SUBRAMANIAN
5/216D, Teachers Colony West, Mohanur, Road, Namakkal, Tamilnadu, India 637002

Specification

CLIAMS:We claim:
1. A method for extracting text from an image data, the method comprising:
pre-processing, via a processor, the image data to obtain a readable image data;
filtering, via the processor, a plurality of copies of the readable image data using a plurality of noise filters to obtain a corresponding plurality of noise removed images;
performing, via the processor, image data recognition on each of the plurality of noise removed images to obtain a text copy associated with each of the plurality of noise removed images;
ranking, via the processor, each word in the text copy associated with each of the plurality of noise removed images based on a predefined set of parameters; and
selecting, via the processor, highest ranked words within the text copy associated with each of the plurality of noise removed images to obtain output text for the image data.
2. The method of claim 1, wherein pre-processing comprises tilting the image data at a predefined angle of rotation until the text is readable.
3. The method of claim 1, wherein pre-processing comprises adjusting corners of the image data when the corners are not aligned, adjusting the corners comprising subjecting the image data to a plurality of movements selected from a group comprising left to right movement, right to left movement, top to bottom movement, and bottom to top movement.
4. The method of claim 1, wherein pre-processing comprises inverting the image data from a colored image to a grayscale image.
5. The method of claim 1, wherein pre-processing comprises adjusting grayscale of the image data when background in the image data is dark.
6. The method of claim 1, wherein pre-processing comprises adjusting brightness of the image data based on an optimum brightness threshold of the image data.
7. The method of claim 1, wherein pre-processing comprises adjusting Dots per Inch (DPI) of the image data.
8. The method of claim 1 further comprising creating, via the processor, the plurality of copies of the readable image data obtained in response to pre-processing.
9. The method of claim 1 further comprising storing, in a memory, each of the plurality of noise removed images obtained in response to filtering.
10. The method of claim 1, wherein the image data recognition comprises optical character recognition, face recognition, and object recognition in the image data.
11. The method of claim 10, wherein performing the optical character recognition comprises generating multi-dimensional text comprising an image data index, a word index, and a word.
12. The method of claim 1, wherein the predefined set of parameters for a word comprise string distance of the word across text copies associated with each of the plurality of noise removed images and a word index associated with the word, the word index being representative of a position of the word across the text copies.
13. The method of claim 12, wherein ranking comprises using a string distance algorithm to weigh each word based on associated word index, highest fit words being assigned highest numerical rank value.
14. A system for extracting text from an image data, the system comprising:
at least one processors; and
a computer-readable medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
pre-processing the image data to obtain a readable image data;
filtering a plurality of copies of the readable image data using a plurality of noise filters to obtain a corresponding plurality of noise removed images;
performing image data recognition on each noise removed image to obtain a text copy associated with each noise removed image;
ranking each word in the text copy associated with each noise removed image based on a predefined set of parameters; and
selecting highest ranked words within the text copy associated with each noise removed image to obtain output text for the image data.
15. The system of claim 14, wherein the pre-processing operation further comprises operation of at least one of tilting, adjusting corners, adjusting Dots per Inch (DPI), adjusting brightness, inverting image data, and adjusting gray scale.
16. The system of claim 14, wherein the operations further comprise creating the plurality of copies of the readable image data obtained in response to pre-processing.
17. The system of claim 14, wherein the operations further comprise storing, in a memory, the plurality of noise removed images obtained in response to filtering.
18. The system of claim 14, wherein performing image data recognition operation further comprises operations of optical character recognition, face recognition, and object recognition in the image data.
19. The system of claim 18, wherein performing optical character recognition operation comprises operation of generating multi-dimensional text comprising an image index, a word index, and a word.
20. The system of claim 14, wherein the predefined set of parameters for a word comprise string distance of the word across text copies associated with each of the plurality of noise removed images and a word index associated with the word, the word index being representative of a position of the word across the text copies.
21. The system of claim 20, wherein the ranking operation further comprises operation of using a string distance algorithm to weigh each word based on associated word index, highest fit words being assigned highest numerical rank value.
22. A non-transitory computer-readable storage medium for extracting text from an image data when executed by a computing device, cause the computing device to:
pre-process the image data to obtain a readable image data;
filter a plurality of copies of the readable image data using a plurality of noise filters to obtain a corresponding plurality of noise removed images;
perform image data recognition on each noise removed image to obtain a text copy associated with each noise removed image;
rank each word in the text copy associated with each noise removed image based on a predefined set of parameters; and
select highest ranked words within the text copy associated with each noise removed image to obtain output text for the image data.

Dated this 16th day of March, 2015

Swetha S.N
Of K&S Partners
Agent for the Applicant
,TagSPECI:TECHNICAL FIELD
This disclosure relates generally to text extraction, and more particularly to methods and systems for text extraction from images.

Documents

Application Documents

# Name Date
1 1303-CHE-2015 FORM-9 16-03-2015.pdf 2015-03-16
2 1303-CHE-2015 FORM-18 16-03-2015.pdf 2015-03-16
3 1303CHE2015_Certifiedcopyrequest.pdf 2015-03-20
4 IP30469-spec.pdf 2015-03-28
5 IP30469-fig.pdf 2015-03-28
6 FORM 5-IP30469.pdf 2015-03-28
7 FORM 3-IP30469.pdf 2015-03-28
8 1303-CHE-2015 POWER OF ATTORNEY 25-06-2015.pdf 2015-06-25
9 1303-CHE-2015 FORM-1 25-06-2015.pdf 2015-06-25
10 1303-CHE-2015 CORRESPONDENCE OTHERS 25-06-2015.pdf 2015-06-25
11 1303-CHE-2015-FER.pdf 2019-11-28
12 1303-CHE-2015-FER_SER_REPLY [27-05-2020(online)].pdf 2020-05-27
13 1303-CHE-2015-US(14)-HearingNotice-(HearingDate-15-03-2022).pdf 2022-02-16
14 1303-CHE-2015-POA [28-02-2022(online)].pdf 2022-02-28
15 1303-CHE-2015-FORM 13 [28-02-2022(online)].pdf 2022-02-28
16 1303-CHE-2015-Correspondence to notify the Controller [28-02-2022(online)].pdf 2022-02-28
17 1303-CHE-2015-AMENDED DOCUMENTS [28-02-2022(online)].pdf 2022-02-28
18 1303-CHE-2015-Written submissions and relevant documents [29-03-2022(online)].pdf 2022-03-29
19 1303-CHE-2015-FORM-26 [29-03-2022(online)].pdf 2022-03-29
20 1303-CHE-2015-PatentCertificate25-05-2022.pdf 2022-05-25
21 1303-CHE-2015-IntimationOfGrant25-05-2022.pdf 2022-05-25

Search Strategy

1 SearchStrategyMatrix-converted_15-11-2019.pdf

ERegister / Renewals

3rd: 02 Aug 2022

From 16/03/2017 - To 16/03/2018

4th: 02 Aug 2022

From 16/03/2018 - To 16/03/2019

5th: 02 Aug 2022

From 16/03/2019 - To 16/03/2020

6th: 02 Aug 2022

From 16/03/2020 - To 16/03/2021

7th: 02 Aug 2022

From 16/03/2021 - To 16/03/2022

8th: 02 Aug 2022

From 16/03/2022 - To 16/03/2023

9th: 13 Mar 2023

From 16/03/2023 - To 16/03/2024

10th: 09 Mar 2024

From 16/03/2024 - To 16/03/2025

11th: 07 Mar 2025

From 16/03/2025 - To 16/03/2026