Abstract: The present disclosure is related to field of machine learning and image processing, disclosing method and system for identifying cell region of table including cell borders from an image document. Table detecting system rescales a primary image document into plurality of secondary image documents of different size and resolution to detect plurality of candidate regions comprising predefined table features in each secondary image document. Further, for each candidate region, set of connected components are determined and the connected components corresponding to the IDs that are present in more than one set of the connected components are clustered. Subsequently, areas corresponding to the clusters that are determined to form a table are cropped from the primary image document and each cell region of the table is identified by modifying pixel values of the clusters of the connected components in the cropped area. FIG.2A
Claims:We claim:
1. A method of identifying cell region of a table comprising cell borders from an image document, the method comprising:
receiving, by a table detecting system (103), a primary image document from a data source among one or more data sources (101) associated with the table detecting system (103);
generating, by the table detecting system (103), a plurality of secondary image documents by rescaling the primary image document;
detecting, by the table detecting system (103), a plurality of candidate regions comprising one or more predefined table features in each of the plurality of secondary image documents, wherein the one or more predefined table features comprises at least one of L-shaped edge, laterally inverted L-shaped edge, elongated L-shaped edge, vertically inverted L-shaped edge, T-junction and an intersection;
determining, by the table detecting system (103), a set of connected components corresponding to each of the plurality of candidate regions, wherein each set of the connected components comprises IDs associated with the corresponding connected components, wherein a common ID is associated with the connected components that are interlinked;
generating, by the table detecting system (103), clusters of the connected components, wherein each cluster comprises the connected components corresponding to the IDs that are determined to be present in more than one set of the connected components;
cropping, by the table detecting system (103), areas corresponding to the clusters of the connected components, determined to form a table, from the primary image document, wherein the table is determined when the clusters of the connected components satisfy a predefined probability threshold; and
identifying, by the table detecting system (103), each cell region of the table by modifying pixel values of the clusters of the connected components in the cropped areas.
2. The method as claimed in claim 1, wherein the rescaling comprises sequentially incrementing size of the primary image document by a predefined increment value until a threshold size is reached, wherein image document generated at each sequential increment is the secondary image document, wherein size of each of the plurality of secondary image documents is different.
3. The method as claimed in claim 1, wherein detecting the plurality of candidate regions comprises:
performing, by the table detecting system (103), cross-correlation of each of the one or more predefined table features with each of the plurality of secondary image documents; and
detecting, by the table detecting system (103), regions in each of the plurality of secondary images whose cross-correlation with the one or more predefined table features is higher than a predefined correlation threshold, as the plurality of candidate regions.
4. The method as claimed in claim 1, wherein determining the table comprises:
evaluating, by the table detecting system (103), a probability of forming the table for each cluster of the connected components using pre-trained deep learning techniques; and
inferring, by the table detecting system (103), that the cluster of the connected components form the table when the probability is determined to be greater than the predefined probability threshold.
5. The method as claimed in claim 4 further comprises discarding, by the table detecting system (103), the clusters of the connected components whose probability of forming the table is determined to be less than the predefined probability threshold.
6. The method as claimed in claim 1 further comprises discarding, by the table detecting system (103), the connected components corresponding to each of the plurality of candidate regions that are not present in more than one set of the connected components, before determining existence of the table.
7. The method as claimed in claim 1, wherein each area cropped from the primary image document comprises the table and each cluster of the connected components in the area forms outline of each row and column of the corresponding table.
8. The method as claimed in claim 1, wherein modifying the pixel values comprises inverting the pixel values of the clusters of the connected components and the area enclosed by the clusters of the connected components.
9. The method as claimed in claim 1, wherein each cell region of the table is identified by determining connected components of the area enclosed by the clusters of the connected components that are determined to form the table, upon modifying the pixel values.
10. A table detecting system (103) for identifying cell region of a table comprising cell borders from an image document, the table detecting system (103) comprising:
a processor (105); and
a memory (109) communicatively coupled to the processor (105), wherein the memory (109) stores the processor-executable instructions, which, on execution, causes the processor (105) to:
receive a primary image document from a data source among one or more data sources (101) associated with the table detecting system (103);
generate a plurality of secondary image documents by rescaling the primary image document;
detect a plurality of candidate regions comprising one or more predefined table features in each of the plurality of secondary image documents, wherein the one or more predefined table features comprises at least one of L-shaped edge, laterally inverted L-shaped edge, elongated L-shaped edge, vertically inverted L-shaped edge, T-junction and an intersection;
determine a set of connected components corresponding to each of the plurality of candidate regions, wherein each set of the connected components comprises IDs associated with the corresponding connected components, wherein a common ID is associated with the connected components that are interlinked;
generate clusters of the connected components, wherein each cluster comprises the connected components corresponding to the IDs that are determined to be present in more than one set of the connected components;
crop areas corresponding to the clusters of the connected components, determined to form a table, from the primary image document, wherein the table is determined when the clusters of the connected components satisfy a predefined probability threshold; and
identify each cell region of the table by modifying pixel values of the clusters of the connected components in the cropped areas.
Dated this 15th day of February 2019
NAVEEN SURIYA
K&S PARTNERS
AGENT FOR THE APPLICANT
IN/PA-1419
, Description:TECHNICAL FIELD
The present subject matter is related in general to the field of image processing and machine learning, and more particularly, but not exclusively to a method and a system for identifying cell region of a table comprising cell borders from an image document.
| # | Name | Date |
|---|---|---|
| 1 | 201941006167-IntimationOfGrant09-08-2024.pdf | 2024-08-09 |
| 1 | 201941006167-STATEMENT OF UNDERTAKING (FORM 3) [15-02-2019(online)].pdf | 2019-02-15 |
| 2 | 201941006167-REQUEST FOR EXAMINATION (FORM-18) [15-02-2019(online)].pdf | 2019-02-15 |
| 2 | 201941006167-PatentCertificate09-08-2024.pdf | 2024-08-09 |
| 3 | 201941006167-POWER OF AUTHORITY [15-02-2019(online)].pdf | 2019-02-15 |
| 3 | 201941006167-FER.pdf | 2021-10-17 |
| 4 | 201941006167-FORM 18 [15-02-2019(online)].pdf | 2019-02-15 |
| 4 | 201941006167-FER_SER_REPLY [08-07-2021(online)].pdf | 2021-07-08 |
| 5 | 201941006167-FORM 3 [06-07-2021(online)].pdf | 2021-07-06 |
| 5 | 201941006167-FORM 1 [15-02-2019(online)].pdf | 2019-02-15 |
| 6 | 201941006167-Information under section 8(2) [06-07-2021(online)].pdf | 2021-07-06 |
| 6 | 201941006167-DRAWINGS [15-02-2019(online)].pdf | 2019-02-15 |
| 7 | 201941006167-PETITION UNDER RULE 137 [06-07-2021(online)].pdf | 2021-07-06 |
| 7 | 201941006167-DECLARATION OF INVENTORSHIP (FORM 5) [15-02-2019(online)].pdf | 2019-02-15 |
| 8 | Correspondence By Agent_Proof of Right_24-05-2019.pdf | 2019-05-24 |
| 8 | 201941006167-COMPLETE SPECIFICATION [15-02-2019(online)].pdf | 2019-02-15 |
| 9 | 201941006167-Request Letter-Correspondence [20-02-2019(online)].pdf | 2019-02-20 |
| 9 | 201941006167-Proof of Right (MANDATORY) [21-05-2019(online)].pdf | 2019-05-21 |
| 10 | 201941006167-Power of Attorney [20-02-2019(online)].pdf | 2019-02-20 |
| 10 | abstract 201941006167.jpg | 2019-02-21 |
| 11 | 201941006167-Form 1 (Submitted on date of filing) [20-02-2019(online)].pdf | 2019-02-20 |
| 12 | 201941006167-Power of Attorney [20-02-2019(online)].pdf | 2019-02-20 |
| 12 | abstract 201941006167.jpg | 2019-02-21 |
| 13 | 201941006167-Proof of Right (MANDATORY) [21-05-2019(online)].pdf | 2019-05-21 |
| 13 | 201941006167-Request Letter-Correspondence [20-02-2019(online)].pdf | 2019-02-20 |
| 14 | 201941006167-COMPLETE SPECIFICATION [15-02-2019(online)].pdf | 2019-02-15 |
| 14 | Correspondence By Agent_Proof of Right_24-05-2019.pdf | 2019-05-24 |
| 15 | 201941006167-DECLARATION OF INVENTORSHIP (FORM 5) [15-02-2019(online)].pdf | 2019-02-15 |
| 15 | 201941006167-PETITION UNDER RULE 137 [06-07-2021(online)].pdf | 2021-07-06 |
| 16 | 201941006167-DRAWINGS [15-02-2019(online)].pdf | 2019-02-15 |
| 16 | 201941006167-Information under section 8(2) [06-07-2021(online)].pdf | 2021-07-06 |
| 17 | 201941006167-FORM 1 [15-02-2019(online)].pdf | 2019-02-15 |
| 17 | 201941006167-FORM 3 [06-07-2021(online)].pdf | 2021-07-06 |
| 18 | 201941006167-FER_SER_REPLY [08-07-2021(online)].pdf | 2021-07-08 |
| 18 | 201941006167-FORM 18 [15-02-2019(online)].pdf | 2019-02-15 |
| 19 | 201941006167-FER.pdf | 2021-10-17 |
| 19 | 201941006167-POWER OF AUTHORITY [15-02-2019(online)].pdf | 2019-02-15 |
| 20 | 201941006167-REQUEST FOR EXAMINATION (FORM-18) [15-02-2019(online)].pdf | 2019-02-15 |
| 20 | 201941006167-PatentCertificate09-08-2024.pdf | 2024-08-09 |
| 21 | 201941006167-STATEMENT OF UNDERTAKING (FORM 3) [15-02-2019(online)].pdf | 2019-02-15 |
| 21 | 201941006167-IntimationOfGrant09-08-2024.pdf | 2024-08-09 |
| 22 | 201941006167-FORM 4 [07-05-2025(online)].pdf | 2025-05-07 |
| 1 | 2021-01-2017-11-01E_29-01-2021.pdf |