Sign In to Follow Application
View All Documents & Correspondence

Method And System For Extracting Text From An Engineering Drawing

Abstract: Embodiments of present disclosure relates to method and system to extract text from engineering drawing for performing accurate OCR. Initially, for the extraction, image of engineering drawing is received with a plurality of components. Each of the plurality of components in the image is classified to be one of a textual component and a non-textual component. At least one word element for textual components from the plurality of components is identified based on segmentation of the plurality of components. The segmentation is performed by drawing a plurality of horizontal edge projections of a predefined length for each of the textual components. Further, the textual components is identified to be associated with the at least one word element when horizontal edge projection of each of the textual components overlaps with adjacent textual component. The at least one word element is provided as extracted text for performing OCR on the engineering drawing. Figure 4

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
31 August 2018
Publication Number
10/2020
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
bangalore@knspartners.com
Parent Application
Patent Number
Legal Status
Grant Date
2024-02-27
Renewal Date

Applicants

WIPRO LIMITED
Doddakannelli, Sarjapur Road, Bangalore

Inventors

1. ANIKET ANAND GURAV
D104, Kakde City, Karve Nagar, Pune
2. RUPESH BAPUJI WADIBHASME
8203, Green City apartments, Satav Nagar, Hadapsar, Pune
3. SWAPNIL DNYANESHWAR BELHE
37/2, Dhanasampada Soc, Erandawane, Pune 411038

Specification

Claims:We claim:

1. A method to extract text from an engineering drawing for performing Optical Character Recognition (OCR), wherein the method comprises:
receiving, by a text extraction system (101), an image (206) of an engineering drawing (102) comprising a plurality of components;
classifying, by the text extraction system (101), each of the plurality of components in the image (206) to be one of a textual component and a non-textual component;
identifying, by the text extraction system (101), at least one word element (209) for textual components from the plurality of components based on segmentation of the plurality of components, wherein the segmentation comprises:
drawing a plurality of horizontal edge projections of a predefined length (210) for each of the textual components; and
identifying the textual components to be associated with the at least one word element (209) when horizontal edge projection of each of the textual components overlaps with adjacent textual component; and
providing, by the text extraction system (101), the at least one word element (209) as extracted text for performing OCR on the engineering drawing (102).

2. The method as claimed in claim 1, wherein the classification of each of the plurality of components is performed using a deep learning classifier, wherein the deep learning classifier is trained using a plurality of predefined textual components and a plurality of predefined non-textual components.

3. The method as claimed in claim 1, wherein classifying of each of the plurality of components comprises:
converting the image (206) to a gray-scale image;
drawing a rectangular boundary for each of the plurality of components upon the conversion; and
determining probability (211) of each of the plurality of components to be the textual component, wherein the probability (211) is compared with a predefined threshold (212) to classify the plurality of components to be one of the textual component and the non-textual component.

4. The method as claimed in claim 3, wherein a component from the plurality of components associated with the probability (211) greater than the predefined threshold (212) is classified to be the non-textual component and a component from the plurality of components associated with the probability (211) one of lesser than and equal to the predefined threshold (212) is classified to be the textual component.

5. The method as claimed in claim 1, wherein the predefined length (210) is equal to an adaptive threshold associated with the image (206), wherein the adaptive threshold is average of distance between rectangular contours of every sequential components from the plurality of components.

6. The method as claimed in claim 1, wherein the each of the plurality of horizontal edge projections is drawn from right edge of a rectangular contour associated with corresponding textual component.

7. A text extraction system to extract text from an engineering drawing for performing Optical Character Recognition (OCR), comprises:
a processor (105); and
a memory (107) communicatively coupled to the processor (105), wherein the memory (107) stores processor-executable instructions, which, on execution, cause the processor (105) to:
receive an image (206) of an engineering drawing (102) comprising a plurality of components;
classify each of the plurality of components in the image (206) to be one of a textual component and a non-textual component;
identify at least one word element (209) for textual components from the plurality of components based on segmentation of the plurality of components, wherein the segmentation comprises:
draw a plurality of horizontal edge projections of a predefined length (210) for each of the textual components; and
identify the textual components to be associated with the at least one word element (209) when horizontal edge projection of each of the textual components overlaps with adjacent textual component; and
provide the at least one word element (209) as extracted text for performing OCR on the engineering drawing (102).

8. The text extraction system (101) as claimed in claim 7, wherein the classification of each of the plurality of components is performed using a deep learning classifier, wherein the deep learning classifier is trained using a plurality of predefined textual components and a plurality of predefined non-textual components.

9. The text extraction system (101) as claimed in claim 7, wherein classifying of each of the plurality of components comprises:
converting the image (206) to a gray-scale image;
drawing a rectangular boundary for each of the plurality of components upon the conversion; and
determining probability (211) of each of the plurality of components to be the textual component, wherein the probability (211) is compared with a predefined threshold (212) to classify the plurality of components to be one of the textual component and the non-textual component.

10. The text extraction system (101) as claimed in claim 9, wherein a component from the plurality of components associated with the probability (211) greater than the predefined threshold (212) is classified to be the non-textual component and a component from the plurality of components associated with the probability (211) one of lesser than and equal to the predefined threshold (212) is classified to be the textual component.

11. The text extraction system (101) as claimed in claim 7, wherein the predefined length (210) is equal to an adaptive threshold associated with the image (206), wherein the adaptive threshold is average of distance between rectangular contours of every sequential components from the plurality of components.

12. The text extraction system (101) as claimed in claim 7, wherein the each of the plurality of horizontal edge projections is drawn from right edge of a rectangular contour associated with corresponding textual component.

Dated this 31st day of September, 2018

R Ramya Rao
Of K&S Partners
Agent for the Applicant
IN/PA-1607 , Description:TECHNICAL FIELD
The present subject matter is related in general to Optical Character Recognition (OCR) systems, more particularly, but not exclusively to a method and system for extracting text from engineering drawings for performing a OCR.

Documents

Application Documents

# Name Date
1 201841032861-STATEMENT OF UNDERTAKING (FORM 3) [31-08-2018(online)].pdf 2018-08-31
2 201841032861-REQUEST FOR EXAMINATION (FORM-18) [31-08-2018(online)].pdf 2018-08-31
3 201841032861-POWER OF AUTHORITY [31-08-2018(online)].pdf 2018-08-31
4 201841032861-FORM 18 [31-08-2018(online)].pdf 2018-08-31
5 201841032861-FORM 1 [31-08-2018(online)].pdf 2018-08-31
6 201841032861-DRAWINGS [31-08-2018(online)].pdf 2018-08-31
7 201841032861-DECLARATION OF INVENTORSHIP (FORM 5) [31-08-2018(online)].pdf 2018-08-31
8 201841032861-COMPLETE SPECIFICATION [31-08-2018(online)].pdf 2018-08-31
9 Abstract_201841032861.jpg 2018-09-03
10 201841032861-Request Letter-Correspondence [05-09-2018(online)].pdf 2018-09-05
11 201841032861-Power of Attorney [05-09-2018(online)].pdf 2018-09-05
12 201841032861-Form 1 (Submitted on date of filing) [05-09-2018(online)].pdf 2018-09-05
13 201841032861-Proof of Right (MANDATORY) [17-09-2018(online)].pdf 2018-09-17
14 Correspondence by Agent_Form30,Form1_24-09-2018.pdf 2018-09-24
15 201841032861-PETITION UNDER RULE 137 [25-08-2021(online)].pdf 2021-08-25
16 201841032861-FORM 3 [25-08-2021(online)].pdf 2021-08-25
17 201841032861-FER_SER_REPLY [25-08-2021(online)].pdf 2021-08-25
18 201841032861-FER.pdf 2021-10-17
19 201841032861-US(14)-HearingNotice-(HearingDate-06-02-2024).pdf 2024-01-09
20 201841032861-POA [15-01-2024(online)].pdf 2024-01-15
21 201841032861-FORM 13 [15-01-2024(online)].pdf 2024-01-15
22 201841032861-Correspondence to notify the Controller [15-01-2024(online)].pdf 2024-01-15
23 201841032861-AMENDED DOCUMENTS [15-01-2024(online)].pdf 2024-01-15
24 201841032861-FORM-26 [06-02-2024(online)].pdf 2024-02-06
25 201841032861-Written submissions and relevant documents [21-02-2024(online)].pdf 2024-02-21
26 201841032861-FORM 3 [21-02-2024(online)].pdf 2024-02-21
27 201841032861-PatentCertificate27-02-2024.pdf 2024-02-27
28 201841032861-IntimationOfGrant27-02-2024.pdf 2024-02-27
29 201841032861-FORM 4 [10-09-2024(online)].pdf 2024-09-10

Search Strategy

1 searchE_25-02-2021.pdf

ERegister / Renewals

3rd: 20 Jun 2024

From 31/08/2020 - To 31/08/2021

4th: 20 Jun 2024

From 31/08/2021 - To 31/08/2022

5th: 20 Jun 2024

From 31/08/2022 - To 31/08/2023

6th: 20 Jun 2024

From 31/08/2023 - To 31/08/2024

7th: 10 Sep 2024

From 31/08/2024 - To 31/08/2025

8th: 20 Aug 2025

From 31/08/2025 - To 31/08/2026