Sign In to Follow Application
View All Documents & Correspondence

Method And Device For Extracting Images From Portable Document Format (Pdf) Documents

Abstract: A method and device for extracting images from PDF documents are disclosed. The method includes performing a text recognition process on a PDF document that includes one or more images. The text recognition process replaces the one or more images with a plurality of contiguous newlines. The method further includes storing a location of each of the one or more images within the PDF document based on occurrence of the plurality of contiguous newlines within the PDF document. The method includes converting each page of the PDF document to an image format in order to generate an image document corresponding to the PDF document. The method further includes extracting each of the one or more images from the image document based on the location stored for each of the one or more images within the PDF document. FIG. 3

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
24 May 2017
Publication Number
48/2018
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
bangalore@knspartners.com
Parent Application
Patent Number
Legal Status
Grant Date
2023-03-28
Renewal Date

Applicants

WIPRO LIMITED
Doddakannelli, Sarjapur Road, Bangalore 560035, Karnataka, India.

Inventors

1. BALAJI JAGAN
32 Muthamil Nagar, Chinnalapatti, Dindigul District, Tamil Nadu – 624301, India.
2. NAVEEN KUMAR NANJAPPA
59, Lakshmi Nilaya, Sonnappa Layout, Virupakshapura, Kodigehalli, Bengaluru-560097, Karnataka, India.

Specification

Claims:WE CLAIM:
1. A method for extracting images from Portable Document Format (PDF) documents, the method comprising:
performing, by an image extraction device, a text recognition process on a PDF document comprising one or more images, wherein the text recognition process replaces the one or more images with a plurality of contiguous newlines;
storing, by the image extraction device, a location of each of the one or more images within the PDF document based on occurrence of the plurality of contiguous newlines within the PDF document;
converting, by the image extraction device, each page of the PDF document to an image format in order to generate an image document corresponding to the PDF document; and
extracting, by the image extraction device, each of the one or more images from the image document based on the location stored for each of the one or more images within the PDF document.
2. The method of claim 1, wherein at least one of the one or more images is a vector graphic image.
3. The method of claim 1, wherein storing a location of an image from the one or more images within the PDF document comprises associating a location metadata with the PDF document, wherein the location metadata comprises information related to the location of the image.
4. The method of claim 1, wherein a location of an image from the one or more images comprises a page number of a page including the image and coordinates of corners of the image within the page.
5. The method of claim 4, wherein extracting an image from the image document comprises incrementally scanning a page of the image document comprising the image based on the coordinates of the image.
6. The method of claim 5, wherein the scanning comprises tracing the contour of the image based on the coordinates of corners of the image within the page of the image document in at least one of a square and a rectangle pattern.
7. The method of claim 1 further comprising storing each of the one or more images in a predefined format in response to extracting each of the one or more images from the image document.
8. The method of claim 7, wherein an extracted image is tagged with an associated location metadata indicating location of the extracted image within the PDF document.
9. The method of claim 1, wherein the text recognition process is performed using an Open source Computer Vision (OpenCV) tool.
10. An image extraction device for extracting images from Portable Document Format (PDF) documents, the image extraction device comprising:
at least one processor;
a memory communicatively coupled to the processor, wherein the memory stores processor instructions, which, on execution, causes the processor to:
perform a text recognition process on a PDF document comprising one or more images, wherein the text recognition process replaces the one or more images with a plurality of contiguous newlines;
store a location of each of the one or more images within the PDF document based on occurrence of the plurality of contiguous newlines within the PDF document;
convert each page of the PDF document to an image format in order to generate an image document corresponding to the PDF document; and
extract each of the one or more images from the image document based on the location stored for each of the one or more images within the PDF document.
11. The image extraction device of claim 10, wherein at least one of the one or more images is a vector graphic image.
12. The image extraction device of claim 10, wherein to store a location of an image from the one or more images within the PDF document, the processor instructions further cause the processor to associate a location metadata with the PDF document, wherein the location metadata comprises information related to the location of the image.
13. The image extraction device of claim 10, wherein a location of an image from the one or more images comprises a page number of a page including the image and coordinates of corners of the image within the page.
14. The image extraction device of claim 13, wherein to extract an image from the image document, the processor instructions further cause the processor to incrementally scan a page of the image document comprising the image based on the coordinates of the image.
15. The image extraction device of claim 14, wherein to scan, the processor instructions further cause the processor to trace the contour of the image based on the coordinates of corners of the image within the page of the image document in at least one of a square and a rectangle pattern.
16. The image extraction device of claim 10, wherein the processor instructions further cause the processor to store each of the one or more images in a predefined format in response to extracting each of the one or more images from the image document.
17. The image extraction device of claim 16, wherein an extracted image is tagged with an associated location metadata indicating location of the extracted image within the PDF document.
18. The image extraction device of claim 10, wherein the text recognition process is performed using an Open source Computer Vision (OpenCV) tool.

Dated this 24th day of May, 2017

Swetha SN
Of K&S Partners
Agent for the Applicant
, Description:TECHNICAL FIELD
This disclosure relates generally to extracting images from documents and more particularly to method and device for extracting images from Portable Document Format (PDF) documents.

Documents

Application Documents

# Name Date
1 Power of Attorney [24-05-2017(online)].pdf 2017-05-24
2 Form 5 [24-05-2017(online)].pdf 2017-05-24
3 Form 3 [24-05-2017(online)].pdf 2017-05-24
4 Form 18 [24-05-2017(online)].pdf_578.pdf 2017-05-24
5 Form 18 [24-05-2017(online)].pdf 2017-05-24
6 Form 1 [24-05-2017(online)].pdf 2017-05-24
7 Drawing [24-05-2017(online)].pdf 2017-05-24
8 Description(Complete) [24-05-2017(online)].pdf_577.pdf 2017-05-24
9 Description(Complete) [24-05-2017(online)].pdf 2017-05-24
10 REQUEST FOR CERTIFIED COPY [25-05-2017(online)].pdf 2017-05-25
11 abstract 201741018278 .jpg 2017-05-25
12 201741018278-Proof of Right (MANDATORY) [13-07-2017(online)].pdf 2017-07-13
13 Correspondence by Agent_Form 1_18-07-2017.pdf 2017-07-18
14 201741018278-REQUEST FOR CERTIFIED COPY [19-07-2017(online)].pdf 2017-07-19
15 201741018278-FER.pdf 2020-06-26
16 201741018278-PETITION UNDER RULE 137 [25-12-2020(online)].pdf 2020-12-25
17 201741018278-OTHERS [25-12-2020(online)].pdf 2020-12-25
18 201741018278-FORM 3 [25-12-2020(online)].pdf 2020-12-25
19 201741018278-FER_SER_REPLY [25-12-2020(online)].pdf 2020-12-25
20 201741018278-DRAWING [25-12-2020(online)].pdf 2020-12-25
21 201741018278-COMPLETE SPECIFICATION [25-12-2020(online)].pdf 2020-12-25
22 201741018278-CLAIMS [25-12-2020(online)].pdf 2020-12-25
23 201741018278-US(14)-HearingNotice-(HearingDate-01-03-2023).pdf 2023-02-10
24 201741018278-POA [17-02-2023(online)].pdf 2023-02-17
25 201741018278-FORM 13 [17-02-2023(online)].pdf 2023-02-17
26 201741018278-Correspondence to notify the Controller [17-02-2023(online)].pdf 2023-02-17
27 201741018278-AMENDED DOCUMENTS [17-02-2023(online)].pdf 2023-02-17
28 201741018278-Written submissions and relevant documents [16-03-2023(online)].pdf 2023-03-16
29 201741018278-FORM-26 [16-03-2023(online)].pdf 2023-03-16
30 201741018278-PatentCertificate28-03-2023.pdf 2023-03-28
31 201741018278-IntimationOfGrant28-03-2023.pdf 2023-03-28

Search Strategy

1 SearchStrategyMatrixE_26-06-2020.pdf

ERegister / Renewals

3rd: 12 Jun 2023

From 24/05/2019 - To 24/05/2020

4th: 12 Jun 2023

From 24/05/2020 - To 24/05/2021

5th: 12 Jun 2023

From 24/05/2021 - To 24/05/2022

6th: 12 Jun 2023

From 24/05/2022 - To 24/05/2023

7th: 12 Jun 2023

From 24/05/2023 - To 24/05/2024

8th: 22 May 2024

From 24/05/2024 - To 24/05/2025

9th: 21 May 2025

From 24/05/2025 - To 24/05/2026