Sign In to Follow Application
View All Documents & Correspondence

Method And System For Detecting And Extracting A Tabular Data From A Document

Abstract: This disclosure relates generally to document processing, and more particularly to method and system for detecting and extracting tabular data from a document. In one embodiment, the method may include generating a hierarchy of features, for a plurality of features of an image document derived from the document, based on relative spatial properties of the plurality of features. The method may further include segmenting the image document into a plurality of semantic segments based on the hierarchy of features, classifying each of the plurality of semantic segments into at least one of a plurality of tabular structures, and effecting at least one of a detection or an extraction of the tabular data from the image document based on the classification. Figure 3

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
30 March 2018
Publication Number
40/2019
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
bangalore@knspartners.com
Parent Application
Patent Number
Legal Status
Grant Date
2024-03-14
Renewal Date

Applicants

WIPRO LIMITED
Doddakannelli, Sarjapur Road, Bangalore 560035, Karnataka, India.

Inventors

1. PRASHANTH KRISHNAPURA SUBBARAYA
#89/1, Rajeshwari Sannidhi, G004, 13th Cross, Ideal Homes Township, Rajarajeshwari Nagar, Bengaluru- 560098, Karnataka, India.
2. RAGHAVENDRA HOSABETIU
#3080/3081, 'Venkatadri Nilaya', 2nd Main, 3rd Cross, VBHCS layout, Banashankari 3rd Stage Near Kattriguppe Water Tank, Bangalore-560 050, Karnataka, India.

Specification

Claims:WE CLAIM
1. A method of detecting and extracting tabular data from a document, the method comprising:
generating, by a document processing device, a hierarchy of features, for a plurality of features of an image document derived from the document, based on relative spatial properties of the plurality of features;
segmenting, by the document processing device, the image document into a plurality of semantic segments based on the hierarchy of features;
classifying, by the document processing device, each of the plurality of semantic segments into at least one of a plurality of tabular structures; and
effecting, by the document processing device, at least one of a detection or an extraction of the tabular data from the image document based on the classification.

2. The method of claim 1, further comprising:
receiving the document;
splitting the document into a plurality of sub-documents corresponding to a plurality of pages of the document; and
converting each of the plurality of sub-documents into the image document.

3. The method of claim 1, wherein generating the hierarchy of features comprises generating the hierarchy of features using a machine learning model.

4. The method of claim 3, wherein generating the hierarchy of features further comprises enhancing the plurality of features using the machine learning model.

5. The method of claim 1, wherein segmenting the image document comprises segmenting the image document based on a spatial information obtained from the hierarchy of features.

6. The method of claim 5, wherein segmenting the image document comprises deriving a semantic information for each of the plurality of semantic segments by correlating higher level information from the hierarchy of features with lower level information from the hierarchy of features.

7. The method of claim 1, wherein classifying each of the plurality of semantic segments further comprises classifying each of the plurality of semantic segments based on a spatial information for each of the plurality of semantic segments.

8. The method of claim 1, wherein the plurality of tabular structures comprises a table border, a table structure, a nested table structure, a multi-level header, a cell, a row, and a column.

9. A system for detecting and extracting tabular data from a document, the system comprising:
a document processing device comprising at least one processor and a computer-readable medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
generating a hierarchy of features, for a plurality of features of an image document derived from the document, based on relative spatial properties of the plurality of features;
segmenting the image document into a plurality of semantic segments based on the hierarchy of features;
classifying each of the plurality of semantic segments into at least one of a plurality of tabular structures; and
effecting at least one of a detection or an extraction of the tabular data from the image document based on the classification.

10. The system of claim 9, wherein the operations further comprise:
receiving the document;
splitting the document into a plurality of sub-documents corresponding to a plurality of pages of the document; and
converting each of the plurality of sub-documents into the image document.

11. The system of claim 9, wherein generating the hierarchy of features comprises generating the hierarchy of features using a machine learning model.

12. The system of claim 11, wherein generating the hierarchy of features further comprises enhancing the plurality of features using the machine learning model.

13. The system of claim 9, wherein segmenting the image document comprises segmenting the image document based on a spatial information obtained from the hierarchy of features.

14. The system of claim 13, wherein segmenting the image document comprises deriving a semantic information for each of the plurality of semantic segments by correlating higher level information from the hierarchy of features with lower level information from the hierarchy of features.

15. The system of claim 9, wherein classifying each of the plurality of semantic segments further comprises classifying each of the plurality of semantic segments based on a spatial information for each of the plurality of semantic segments.

16. The system of claim 9, wherein the plurality of tabular structures comprises a table border, a table structure, a nested table structure, a multi-level header, a cell, a row, and a column.

Dated this 30th day of March, 2018

R Ramya Rao
Of K&S Partners
Agent for the Applicant
IN/PA-1607
, Description:TECHNICAL FIELD
This disclosure relates generally to document processing, and more particularly to method and system for detecting and extracting tabular data from a document.

Documents

Application Documents

# Name Date
1 201841012053-STATEMENT OF UNDERTAKING (FORM 3) [30-03-2018(online)].pdf 2018-03-30
2 201841012053-REQUEST FOR EXAMINATION (FORM-18) [30-03-2018(online)].pdf 2018-03-30
3 201841012053-POWER OF AUTHORITY [30-03-2018(online)].pdf 2018-03-30
4 201841012053-FORM 18 [30-03-2018(online)].pdf 2018-03-30
5 201841012053-FORM 1 [30-03-2018(online)].pdf 2018-03-30
6 201841012053-DRAWINGS [30-03-2018(online)].pdf 2018-03-30
7 201841012053-DECLARATION OF INVENTORSHIP (FORM 5) [30-03-2018(online)].pdf 2018-03-30
8 201841012053-COMPLETE SPECIFICATION [30-03-2018(online)].pdf 2018-03-30
9 201841012053-REQUEST FOR CERTIFIED COPY [04-05-2018(online)].pdf 2018-05-04
10 201841012053-Proof of Right (MANDATORY) [30-07-2018(online)].pdf 2018-07-30
11 Correspondence by Agent_Form1_01-08-2018.pdf 2018-08-01
12 201841012053-FER_SER_REPLY [10-05-2021(online)].pdf 2021-05-10
13 201841012053-FER.pdf 2021-10-17
14 201841012053-US(14)-HearingNotice-(HearingDate-15-02-2024).pdf 2024-01-24
15 201841012053-POA [30-01-2024(online)].pdf 2024-01-30
16 201841012053-FORM 13 [30-01-2024(online)].pdf 2024-01-30
17 201841012053-Correspondence to notify the Controller [30-01-2024(online)].pdf 2024-01-30
18 201841012053-AMENDED DOCUMENTS [30-01-2024(online)].pdf 2024-01-30
19 201841012053-Written submissions and relevant documents [01-03-2024(online)].pdf 2024-03-01
20 201841012053-PETITION UNDER RULE 137 [01-03-2024(online)].pdf 2024-03-01
21 201841012053-FORM 3 [01-03-2024(online)].pdf 2024-03-01
22 201841012053-PatentCertificate14-03-2024.pdf 2024-03-14
23 201841012053-IntimationOfGrant14-03-2024.pdf 2024-03-14

Search Strategy

1 SearchStrategyE_10-11-2020.pdf

ERegister / Renewals

3rd: 03 Jun 2024

From 30/03/2020 - To 30/03/2021

4th: 03 Jun 2024

From 30/03/2021 - To 30/03/2022

5th: 03 Jun 2024

From 30/03/2022 - To 30/03/2023

6th: 03 Jun 2024

From 30/03/2023 - To 30/03/2024

7th: 03 Jun 2024

From 30/03/2024 - To 30/03/2025

8th: 28 Mar 2025

From 30/03/2025 - To 30/03/2026