Abstract: SYSTEM ANDMETHOD FOR RECONSTRUCTING A LAYOUT-AGNOSTIC TABLE DATA IN ELECTRONIC DOCUMENTS ABSTRACT A method (700) for reconstructing a layout-agnostic table data in electronic documents (104) using a data reconstruction system (108) is disclosed. The method (700) includes steps of: applying (704) a pre-processed image onto a deep machine learning model; determining (706) a plurality of layout elements in an uploaded electronic document (104); arranging (708) an unstructured text in the determined plurality of layout elements in a line-wise manner; generating (710) a response by rearranging the arranged unstructured text in line-wise manner; cropping (712) a table that is identified in the uploaded electronic document (104) as an image and sending the cropped image to identify each cell in the table; generating (714) the plurality of cells comprising the rearranged text as an output with a row and column number; and outputting (716) the generated plurality of cells in at least one of: JavaScript Object Notation (json) and Extensible Markup Language (XML) response. FIG. 7
DESC:EARLIEST PRIORITY DATE:
This Application claims priority from a Provisional patent application filed in India having Patent Application No. 202141026985, filed on August 17th, 2021, and titled “SYSTEMS AND METHODS FOR LAYOUT-AGNOSTIC TABLE DATA RECONSTRUCTION IN DOCUMENTS”.
FIELD OF INVENTION
[1] Embodiments of the present invention relate to an electronic documents, more particularly relate to a system and method for reconstructing a layout-agnostic table data in the electronic documents.
BACKGROUND
[2] In the field of Document Digitization to Decisioning, where obtaining correctdocument text is of utmost importance, one other important factor is to have thistext placed in a correct structured format. As such, extraction and decisioningusing the structured data would then become a much simpler task.
[3] Although identifying different layouts elements in documents is not new, in mostof prior technologies, that involved non-deep-learning methods, internal structure(s) of a table was identified based on different heuristics that involved identifying thevertical and horizontal alignments, so that all OCR text can be placed in a properrow-column arrangement.Although these heuristics worked well for standardtables, most of the tables in business documents such as invoices or bankstatements fall in exceptional cases.
[4] For example, the scenario such as documents include different set of length tables from two linear tables covering an entire A4 page, wherein the tables may include borders, semi-borders or no borders. In other scenarios, the tables may include a column header and other subsequent rows that are arranged with different vertical alignments. Line items in the table rows may or may not be spanned for consecutive number of rows and a plurality of cells in the table may be empty.
[5] Therefore, ensuring a correct digitized table reconstruction becomes a challenging task, for example, when the business documents such as the bank statements and transaction details for a particular date are to be digitally fetched or for a particular goods, the corresponding quantity or an invoice amount is to be extracted in an invoice. More of these kinds of variations in characteristics of the tables are observed as a document list, which further extends to other categories like salary slips, financial statements, loan statements, and the like.
[6] Hence, there is a need for asystem and method for reconstructing a layout-agnostic table data in electronic documents, to address the aforementioned problems thereof.
SUMMARY
[7] In accordance with one embodiment of the disclosure, a method for reconstructing a layout-agnostic table data in electronic documents using a data reconstruction system with a trained deep machine learning model is disclosed. The method includes the following steps of: (a) uploading, using a hardware processor, an electronic document on at least one of: a user device or an external server; (b) applying, by the hardware processor, a pre-processed image onto a deep machine learning model; (c) determining, by the hardware processor, a plurality of layout elements in the uploaded electronic document using an output data of the deep machine learning model;(d) arranging, by the hardware processor, an unstructured text in the determined plurality of layout elements in a line-wise manner; (e) generating, by the hardware processor, a response by rearranging the arranged unstructured text in the determined plurality of layout elements in a line-wise manner; (f) cropping, by the hardware processor, the table that is identified in the uploaded electronic document as an image and sending the cropped image to a table cell detection subsystem to identify each cell in the table; (g) generating, by the hardware processor, a plurality of cells including the rearranged text as an output with a row and column number that are assigned to each of the plurality of cells in the table;and (h) outputting, by the hardware processor, the plurality of cells in at least one of: JavaScript Object Notation (json) and Extensible Markup Language (XML) response.
[8] In an embodiment, the processor, the hardware processor determines the plurality of layout elements in the uploaded electronic document by (a) identifying a type of the uploaded electronic document including at least one of: a portable document format (.pdf), a joint photograph group (.jpg), a portable network graphics (.png), a tag image file (.tif) formats; (b) converting the uploaded electronic document to an image with a standard format based on the type of the uploaded electronic document; and (c) pre-processing the image to determine the plurality of layout elements in the uploaded electronic document.
[9] In another embodiment, the method further includes upon determining the plurality of layout elements in the uploaded electronic document, mapping, by the hardware processor, the determined layout elements to a corresponding unstructured text in the pre-processed image using a bounding box coordination technique.
[10] In yet another embodiment, the hardware processor generates the response by rearranging the arranged unstructured text in the determined plurality of layout elements including identifying a plurality of cells in a table in the uploaded electronic document, and upon identifying the plurality of cells in the table, merging the plurality of cells that are spanned over a plurality of rows based on a reference column in the table.
[11] In yet another embodiment, the method further includes identifying, by the hardware processor, regions in the uploaded electronic document including a first set of predefined regions that include at least one of: financial_reports, audit_reports, cash_flow, and a balance_sheet based on a type of the uploaded electronic document. In yet another embodiment, the method further includes identifying, by the hardware processor, the regions in the uploaded electronic document including a second set of predefined regions that include at least one of: invoices, pay slips, and bank statements.
[12] In one aspect, a system for reconstructing a layout-agnostic table data in electronic documents using a trained deep machine learning model is disclosed.The system includes a hardware processor and a memory. The memory is coupled to the hardware processor and the memory includes a set of program instructions in the form of a plurality of subsystems configured to be executed by the hardware processor. The plurality of subsystems includes an image pre-processing subsystem, a layout determination subsystem, a text arrangement subsystem, a logical cell detection and reconciliation subsystem (LDRM), a table detection and table reconciliation subsystem (TDRM), and a table text blocks outputsubsystem. The hardware processor is configured to upload an electronic document in a user device or an external server. The electronic document is stored in at least one of: a database of the user device and a cloud database.
[13] The image pre-processing subsystem applies a pre-processed image onto a deep machine learning model. The layout determination subsystem determines a plurality of layout elements in the uploaded electronic document using an output data of the deep machine learning model. The text arrangement subsystem arranges an unstructured text in the determined plurality of layout elements in a line-wise manner.
[14] The logical cell detection and reconciliation subsystem (LDRM) generates a response by rearranging the rearranged unstructured text in the determined plurality of layout elements in a line-wise manner. The table detection and table reconciliation subsystem (TDRM) crops the table that is identified in the uploaded electronic document as an image and sends the cropped image to a table cell detection subsystem to identify each cell in the table. In an embodiment, each of a plurality of cells in the table is the smallest unit formed by overlapping of row and column position.
[15] The table detection and table reconciliation subsystem generates the plurality of cells including the rearranged text as an output with a row and column number that are assigned to each of the plurality of cells in the table. The table text blocks outputsubsystem outputs the plurality of plurality of cells in at least one of: JavaScript Object Notation (json) and Extensible Markup Language (XML) response. The plurality of cells forms different hierarchal elements in the uploaded electronic document that preserves document structure along with coordinate information.
[16] In an embodiment, the processor, the plurality of layout elements is determined in the uploaded electronic document by (a) identifying a type of the uploaded electronic document, and the type of the uploaded electronic document includes at least one a portable document format (.pdf), a joint photograph group (.jpg), a portable network graphics (.png), a tag image file (.tif) formats, (b) converting the uploaded electronic document to an image with a standard format, and (c) pre-processing the image to determine the plurality of layout elements in the uploaded electronic document.
[17] In another embodiment, the plurality of subsystems includes a layout mapping subsystem that maps the determined layout elements to a corresponding unstructured text in the pre-processed image using a bounding box coordination technique upon determining the plurality of layout elements in the uploaded electronic document.
[18] In yet another embodiment, the logical cell detection and reconciliation subsystem generates the response by rearranging the arranged unstructured text in the determined plurality of layout elements by identifying a plurality of cells in a table in the uploaded electronic document; and upon identifying the plurality of cells in the table, merging the plurality of cells that are spanned over a plurality of rows based on a reference column in the table.
[19] In yet another embodiment, the plurality of subsystems includes a region prediction subsystem that identifies regions in the uploaded electronic document including a first set of predefined regions that include at least one of: financial_reports, audit_reports, cash_flow, and a balance_sheet based on a type of the uploaded electronic document. In another embodiment, the region prediction subsystem that is further configured to identify the regions in the uploaded electronic document including a second set of predefined regions that include at least one of: invoices, pay slips, and bank statements.
[20] To further clarify the advantages and features of the present disclosure, a more particular description of the disclosure will follow by reference to specific embodiments thereof, which are illustrated in the appended figures. It is to be appreciated that these figures depict only typical embodiments of the disclosure and are therefore not to be considered limiting in scope. The disclosure will be described and explained with additional specificity and detail with the appended figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[21] The disclosure will be described and explained with additional specificity and detail with the accompanying figures in which:
[22] FIG. 1 is a block diagram of a system for reconstructing a layout-agnostic table data in electronic documents with a trained deep machine learning model using a data reconstruction system, in accordance with an embodiment of the present disclosure;
[23] FIG. 2 is a block diagram illustrating an exemplary data reconstruction system, such as those shown in FIG. 1, in accordance with an embodiment of the present disclosure;
[24] FIG. 3 is a process flow for reconstructing the layout-agnostic table data in the electronic documents with a plurality of formats using the data reconstruction system, such as those shown in FIG. 1, in accordance with an embodiment of the present disclosure;
[25] FIG. 4 is a tabular view illustrating an output generation of JSON/XML response including different hierarchical elements in the electronic document, in accordance with an embodiment of the present disclosure;
[26] FIG. 5 is an exemplary view of a determined plurality of layout elements in a plurality of documents, in accordance with an embodiment of the present disclosure;
[27] FIG. 6 is an exemplary view illustrating cropping of a table in the electronic document, in accordance with an embodiment of the present disclosure; and
[28] FIG. 7is a flowchart illustrating a computer implemented method for reconstructing the layout-agnostic table data in the electronic documents with the trained deep machine learning model using the data reconstruction system, in accordance with an embodiment of the present disclosure.
[29] Further, those skilled in the art will appreciate that elements in the figures are illustrated for simplicity and may not have necessarily been drawn to scale. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the figures by conventional symbols, and the figures may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures with details that will be readily apparent to those skilled in the art having the benefit of the description herein.
DETAILED DESCRIPTION
[30] For the purpose of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications in the illustrated online platform, and such further applications of the principles of the disclosure as would normally occur to those skilled in the art are to be construed as being within the scope of the present disclosure.
[31] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more devices or subsystems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other devices, subsystems, elements, structures, components, additional devices, additional subsystems, additional elements, additional structures or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.
[32] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting.
[33] In the following specification and the claims, reference will be made to a number of terms, which shall be defined to have the following meanings. The singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise.
[34] A computer system (standalone, client or server computer system) configured by an application may constitute a “module” (or “subsystem”) that is configured and operated to perform certain operations. In one embodiment, the “module” or “subsystem” may be implemented mechanically or electronically, so a module include dedicated circuitry or logic that is permanently configured (within a special-purpose processor) to perform certain operations. In another embodiment, a “module” or “subsystem” may also comprise programmable logic or circuitry (as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations.
[35] Accordingly, the term “module” or “subsystem” should be understood to encompass a tangible entity, be that an entity that is physically constructed permanently configured (hardwired) or temporarily configured (programmed) to operate in a certain manner and/or to perform certain operations described herein.
[36] FIG. 1 is a block diagram of a system 100 for reconstructing a layout-agnostic table data in electronic documents 104 with a trained deep machine learning model using a data reconstruction system 108, in accordance with an embodiment of the present disclosure. The system 100 includes a user device 102 and a computing system 106. The user device 102 includes an electronic document 104 that is read by a user of the user device 102. The computing system 106 includes the data reconstruction system 108. The data reconstruction system 108 includes a plurality of subsystems 110. The data reconstruction system 108 using the plurality of subsystems 110 helps table text blocks (i.e., a plurality of cells) in the electronic document 104 to be generated with rearranged texts as an output with a correct row and column number that are assigned to each of the cell in the table of the electronic document 104 when the user uploads the electronic document 104 having the layout-agnostic table data.
[37] In an embodiment, the electronic document 104 includes a format with at least one of: a portable document format (.pdf), a joint photograph group (.jpg), a portable network graphics (.png), a tag image file (.tif) formats, and the like. In an embodiment, the data reconstruction system 108 can be installed in the user device 102. In an embodiment, the electronic document 104 is stored in a database 224 of the user device 102. In an embodiment, the electronic document 104 is stored in the computing system 106 or in a cloud database. In an embodiment, when the user uploads the electronic document 104, the user device 102 including the data reconstruction system 108 generates the table text blocks with the rearranged texts as an output with a correct row and column number that are assigned to each of the cell in the table.
[38] In another embodiment, when the user uploads the electronic document 104, the computing system 106 including the data reconstruction system 108 generates the table text blocks with the rearranged texts as an output with a correct row and column number that are assigned to each of the cell in the table. In an embodiment, the user device uploads the electronic document 104 on the computing system 106 through a communication interface 112. In an embodiment, the computing system 106 is an external server that includes the data reconstruction system 108. In an embodiment, the user device 102 is at least one of: a mobile phone, a smart phone, a laptop, a personal computer, a personal digital assistant (PDA), and the like. The computing system 106 is at least one of: a mobile phone, a smart phone, a laptop, a personal computer, a personal digital assistant (PDA), and the like.
[39] FIG. 2 is a block diagram illustrating an exemplary data reconstruction system 108, such as those shown in FIG. 2, in accordance with an embodiment of the present disclosure. The data reconstruction system108 includes a hardware processor 224. The data reconstruction system108 also includes a memory 202 coupled to the hardware processor 224. The memory 202 includes a set of program instructions in the form of the plurality of subsystems 110.
[40] The hardware processor(s) 224, as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instruction word microprocessor, an explicitly parallel instruction computing microprocessor, a digital signal processor, or any other type of processing circuit, or a combination thereof.
[41] The memory 202 includes the plurality of subsystems 110 stored in the form of executable program which instructs the hardware processor 224 via a system bus 220 to perform the above-mentioned method steps. The plurality of subsystems 110 include following subsystems: an image pre-processing subsystem 204, a layout determination subsystem 206, a text arrangement subsystem 208, a layout mapping subsystem 210, a logical cell detection and reconciliation subsystem (LDRS) 212, a table detection and table reconciliation subsystem (TDRS)214, a table text blocks outputsubsystem216, and a region prediction subsystem 218.
[42] Computer memory elements may include any suitable memory device(s) for storing data and executable program, such as read only memory, random access memory, erasable programmable read only memory, electrically erasable programmable read only memory, hard drive, removable media drive for handling memory cards and the like. Embodiments of the present subject matter may be implemented in conjunction with program modules, including functions, procedures, data structures, and application programs, for performing tasks, or defining abstract data types or low-level hardware contexts. Executable program stored on any of the above-mentioned storage media may be executable by the hardware processor(s) 224.
[43] The data reconstruction system 108 includes the hardware processor(s) 224 that is configured to upload the electronic document 104 on the user device 102 or the computing system 106. The data reconstruction system 108 further includes the image pre-processing subsystem 204 that is communicatively connected to the hardware processor 224. The image pre-processing subsystem 204 applies a pre-processed image onto a deep machine learning model. The data reconstruction system 108 further includes the layout determination subsystem 206 that is communicatively connected to the hardware processor 224. The layout determination subsystem 206 determines a plurality of layout elements in the uploaded electronic document 104 using an output data of the deep machine learning model. In an embodiment, the plurality of layout elements is determined in the uploaded electronic document 104 by (a) identifying a type of the uploaded electronic document 104 (e.g., .pdf, .jpg, .png, and .tif formats), (b) converting the uploaded electronic document 104 to an image with a standard format based on the type of the uploaded electronic document 104 (i.e., the electronic document 104 with the scanned pdf format), and pre-processing the image to determine the plurality of layout elements in the uploaded electronic document 104. In an embodiment, a single image is generated when a single page of the electronic document 104 is uploaded. In another embodiment, a plurality of images is generated when a plurality of pages of the electronic document 104 is uploaded. In another embodiment, the pre-processing of the images includes processes of deskewing, correction rotation of the image when edges of the image are identified, and dewarping.
[44] The data reconstruction system 108 further includes the text arrangement subsystem 208 that is communicatively connected to the hardware processor 224. The text arrangement subsystem 208 arranges an unstructured text in the determined plurality of layout elements in a line-wise manner.
[45] The data reconstruction system 108 further includesa layout mapping subsystem 210 that is communicatively coupled to the hardware processor 224. The layout mapping subsystem 210 maps the determined layout elements to a corresponding unstructured text in the pre-processed image using a bounding box coordination technique upon determining the plurality of layout elements in the uploaded electronic document 104.
[46] The data reconstruction system 108 further includes the logical cell detection and reconciliation subsystem 212 that is communicatively coupled to the hardware processor 224. The logical cell detection and reconciliation subsystem212 generate a response by rearranging the arranged unstructured text in the determined plurality of layout elements in a line-wise manner. In an embodiment, the logical cell detection and reconciliation subsystem 212 generates the response by rearranging the arranged unstructured text in the determined plurality of layout elements byidentifying a plurality of cells in a table in the uploaded electronic document 104; and upon identifying the plurality of cells in the table, merging the plurality of cells that are spanned over a plurality of rows based on a reference column in the table. In an embodiment, a plurality of cells in the reference column includes a top alignment or a middle alignment with the plurality of cells spanned in the plurality of rows. In an embodiment, the plurality of cells are grouped together in the table as rows and columns pattern using a clustering subsystem. In an embodiment, the clustering subsystem groups all text in the identified table columns. Further, row number to each of these texts is identified by the clustering subsystem (e.g., K-means clusters) that belongs to a class of unsupervised learning. In an embodiment, the above clustering technique faces inaccuracies in cases where table rows are very closely spaced, and the text arranged in the row is not followed a same alignment due to skewness (i.e., even a coordinate change of 1 or 2 pixels), and bad image scanning.
[47] The data reconstruction system 108 further includes a line detection subsystem that is communicatively coupled to the hardware processor 224. The line detection subsystemidentifies rows and columns in the table when the table is a bordered table. In an embodiment, the rows and columns in the table are identified using a OpenCV technique.
[48] The data reconstruction system 108 further includes the table detection and table reconciliation subsystem (TDRS) 214 that is communicatively coupled to the hardware processor 224. The table detection and table reconciliation subsystem (TDRS) 214 crops the table, that is identified in the uploaded electronic document 104, as a new image. The table detection and table reconciliation subsystem (TDRS) 214 further sends the cropped image to atable cell detection subsystem to identify each cell in the table. In an embodiment, each of a plurality of cells in the table is the smallest unit formed by overlapping of row and column position.The table detection and table reconciliation subsystem (TDRS) 214 further generates a plurality of table text blocks (i.e., a plurality of table text cells) including the rearranged text as an output with a correct row and column number that are assigned to each of the plurality of cells in the table.
[49] The data reconstruction system 108 further includes the table text blocks outputsubsystem 216 that is communicatively coupled to the hardware processor 224. The table text blocks outputsubsystem 218 outputs the plurality of cells in at least one of: JavaScript Object Notation (JSON) and Extensible Markup Language (XML) response/format. The plurality of cells forms different hierarchal elements in the uploaded electronic document 104 that preserves document structure along with coordinate information. In an embodiment, each of the hierarchal elements in the JSON or XML response is assigned a unique incremental number for easy accessof the different hierarchal elements.
[50] In an embodiment, the system 100 operates in two modes based on a type of the electronic document 104. The data reconstruction system 108 further includes the region prediction subsystem218 that is communicatively coupled to the hardware processor 224. The region prediction subsystem 218 identifies regions of the uploaded electronic document 104 including a first set of predefined regions that include at least one of:financial_reports, audit_reports, cash_flow, and a balance_sheet, and the like based on the type of the uploaded electronic document 104. The region prediction subsystem218 addresses a plurality of layout elements in the layout-agnostic table data including at least one of: a heading that refers to a top part of the uploaded electronic document 104 that includes bold headings, a paragraph including paragraph information, a text including a single line test and sub headings, a table including outer table information with boundary detection, table cells that refer to the plurality of cells inside each table with the cropped image, a bottom part that refers to a lower part of the uploaded electronic document 104, a footer including a page number, and a logical cell that refers to a closely grouped block of the text, and the like.
[51] The region prediction subsystem218 further identifies the regions in the uploaded electronic document 104 including a second set of predefined regions that include at least one of: invoices, pay slips, bank statements, and the like. In an embodiment, the region prediction subsystem218 addresses the plurality of layout elements in the layout-agnostic table data including a that includes information from same category, a table that refers to an outer table boundary detection, table cells inside each cropped table image, and the like
[52] In an embodiment, a selection option of the region prediction subsystem218 for identifying the first set of predefined regions and the second set of predefined regions is an input variable to be passed by the user while configuring the system 100 of the present invention. In another embodiment, an implementation of at least one of the region prediction subsystem218 for identifying the first set of predefined regions, and the region prediction subsystem218 for identifying the second set of predefined regions ensures the electronic documents 104 with huge varieties of layouts are addressed by the system 100 of the present invention, which includes the electronic documents 104 having line-by-line text arrangement and the electronic documents 104 with the text arranged in the block/cell structure. In another embodiment, the table cell in the two operating modes are same.
[53] FIG. 3 is a process flow 300 for reconstructing the layout-agnostic table data in the electronic documents 104 with a plurality of formats using the data reconstruction system 108, such as those shown in FIG. 1, in accordance with an embodiment of the present disclosure. At step 302, the data reconstruction system 108 uploads the electronic documents (e.g., files) 104 with a plurality of extensions (i.e., different types of the electronic documents 104). In an embodiment, the different types of electronic documents 104 include at least one of: pdf files, jpg files, joint photo expert group (jpeg) files, png files, tif files, and the like. In an embodiment, Single or multi-page electronic documents 104 are uploaded. In another embodiment, Password-protected electronic documents 104 are also uploaded. At step 304, a scanned or readable PDF identifier identifies whether the uploaded electronic document 104 is readable or scanned pdf document internally using a document checker algorithm that is developed in the data reconstruction system 108. The document checker algorithm fetches text lines for the initial portion of the pdf document through an open source pdf parser technique as in step 312. For example, theelectronic documents 104 for which text is generated at the output are readable pdfs and the electronic documents 104 for which the text is not generated are scanned pdfs. When the electronic document 104 is the readable pdf then the text is detected form the electronic document 104 using a parser library. When the electronic document 104 is the scanned pdf then the text is detected form the electronic document 104 using an OCR engine.
[54] When the uploaded electronic document 104 is identified as the scanned pdf format, the data reconstruction system 108 converts the scanned pdf format to an image with a standard format (e.g., a jpg format), as shown in step 306. In an embodiment, a single image is generated (e.g., Page_1.jpg) when the single page of the electronic document 104 is uploaded. In another embodiment, a plurality of imagesis generated (e.g., Page_1.jpg, Page_2.jpg, and the like) when the plurality of pages of the electronic document 104 is uploaded.
[55] At step 308, all the images with plurality of formats (e.g., a scanned pdf, jpg, tif formats) are pre-processed. In an embodiment, the pro-processing of the images includes at least one of: deskewing, rotation correction (in case all the edges of image are identified), dewarping (to certain extent only) operations, and the like. At step 310, the pre-processed images are sent through a detection model (i.e., the layout determination subsystem 204) to determine the different layout elements in the uploaded electronic document 104. In an embodiment, the detection model returns bounding box coordinates for each of the layout elements.At step 314, the bounding box coordinates are then mapped to corresponding text in the image, and then passed to through block reconciliation algorithms. The text information is obtained either from the OCR engine or from the PDF parser, as shown in step 312. At step 316, the data reconstruction system 108 outputs the identified region details, region coordinates, and text corresponding to the region. For table region, the electronic document 104 includes outer table coordinates, each table cell coordinates, and respective text in that cell, along with the column and row information for that cell.
[56] FIG. 4 is a tabular view 400 illustrating an output generation of JSON/XML response including different hierarchical elements in the electronic document 104, in accordance with an embodiment of the present disclosure. FIG. 4 shows that each element in the response is assigned a unique incremental number, for easy access of the elements in the electronic document 104. In an embodiment, the JSON/XML response includes different hierarchical elements in the electronic document 104 that preserves document structure along with their coordinate information. The coordinate information are the information related to the boundary box, which are arranged in at least one of: top, left, width and height formats. In an embodiment, same format of the coordinate information is preserved across each element in the different hierarchical elements. In an embodiment, the hierarchical elements include at least one of: filename, result, image_size, digitized_details, logical_cells, token, column_cords, rows, cells, and the like.
[57] FIG. 5 is an exemplary view 500 of the determined plurality of layout elements in a plurality of documents 104A-C, in accordance with an embodiment of the present disclosure. The text arrangement subsystem 208 arranges an unstructured text in the determined plurality of layout elements (e.g., the first plurality of elements (502A, 502B, 502C) in a first document 104A, the second plurality of elements (504A, 504B, 504C) in a second document 104B, and third plurality of elements (506A, 506B, 506C) in a third document 104C) in the line-wise manner, as shown in FIG. 5. The logical cell detection and reconciliation subsystem 212 generates the response by rearranging the unstructured text in the determined plurality of layout elements (502A-C, 504A-C, 506A-C) in a line-wise manner.
[58] FIG. 6 is an exemplary view 600 illustrating cropping of a table 602 in the electronic document 104, in accordance with an embodiment of the present disclosure. The exemplary view 600 shows that the table detection and table reconciliation subsystem (TDRS) 214 crops the table 602 that is identified in the uploaded electronic document 104, as a new image 604. The table detection and table reconciliation subsystem 214 sends the cropped image of the table 604 to the table cell detection subsystem to identify each cell of the plurality of cells in the table 606.
[59] FIG. 7 is a flowchart illustrating a computer implemented method 700 for reconstructing the layout-agnostic table data in the electronic documents 104 using the data reconstruction system 108 with the trained deep machine learning model, in accordance with an embodiment of the present disclosure. At step 702, an electronic document 104 is uploaded on a user device 102 or an external server 106 using the hardware processor(s) 224. In an embodiment, the electronic document 104 is stored in at least one of: a database 222 of the user device 102 and a cloud database. At step 704, the image pre-processing subsystem 204 applies a pre-processed image onto a deep machine learning model. At step 706, the layout determination subsystem 206 determines a plurality of layout elements in the uploaded electronic document 104 using an output data of the deep machine learning model. In an embodiment, the plurality of layout elements is determined in the uploaded electronic document 104 by (a) identifying a type of the uploaded electronic document 104 including at least one of: .pdf, .jpg, .png, .tif formats, and the like, (b) converting the uploaded electronic document 104 to an image with a standard format based on the type of the uploaded electronic document 104, and (c) pre-processing the image to determine the plurality of layout elements in the uploaded electronic document 104 including processes of at least one of: deskewing, correction rotation of the image when all edges of the image are identified, dewarping, and the like. At step 708, the text arrangement subsystem 208 arranges an unstructured text in the determined plurality of layout elements in a line-wise manner. At step 710, the logical cell detection and reconciliation subsystem 212 generates a response by rearranging the unstructured text present in the determined plurality of layout elements in a line-wise manner. At step 712, the table detection and table reconciliation subsystem 214 crops the table that is identified in the uploaded electronic document 104 as an image and sending the cropped image to the table cell detection subsystem to identify each cell in the table. In an embodiment, each of the plurality of cells in the table is the smallest unit formed by overlapping of row and column position.
[60] At step 714, the table detection and table reconciliation subsystem 214 generates the plurality of cells including the rearranged text as an output with a correct row and column number that are assigned to each of the plurality of cells in the table. At step 716, the table text blocks outputsubsystem 216 outputs the plurality of table text blocks (i.e., the plurality of cells) in at least one of: JavaScript Object Notation (json) and Extensible Markup Language (XML) response. In an embodiment, the plurality of cells forms different hierarchal elements in the uploaded electronic document 104 that preserves document structure along with coordinate information.
[61] In an embodiment, the computer implemented method 700 further includes identifying, using a region prediction subsystem 218, regions in the uploaded electronic document 104 including a first set of predefined regions that include at least one of: financial_reports, audit_reports, cash_flow, and a balance_sheet based on a type of the uploaded electronic document 104, and identifying, using the region prediction subsystem 218, the regions in the uploaded electronic document 104 including a second set of predefined regions that comprise at least one of: invoices, pay slips, and bank statements.
[62] The present disclosure provides the system 100 and method 700 that utilize the deep machine learning model to predict different layout elements in the electronic documents 104. The present disclosure enables the deep machine learning models to learn on the generic layout elements across different document types. In an embodiment, each of the plurality of layout element utilizes a separate deep machine learning model. As such, enhancement in any deep machine learning model with further training is completely independent of the other deep machine learning models. In an embodiment, the present disclosure utilizes the Convolutional Neural Network architecture as a base architecture with pytorch as a base deep learning library. The present disclosure further provides the output with the JSON/XML response that includes different hierarchical elements in the electronic document 104 that preserves document structure along with their coordinate information.
[63] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[64] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, and the like. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[65] The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid-state memory, magnetic tape, a removable computer diskette, a random-access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
[66] Input/output (I/O) devices (including but not limited to keyboards, displays, pointing devices, and the like.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
[67] A representative hardware environment for practicing the embodiments may include a hardware configuration of an information handling/computer system in accordance with the embodiments herein. The system herein comprises at least one processor or central processing unit (CPU). The CPUs are interconnected via system bus 220 to various devices such as a random-access memory (RAM), read-only memory (ROM), and an input/output (I/O) adapter. The I/O adapter can connect to peripheral devices, such as disk units and tape drives, or other program storage devices that are readable by the system. The system can read the inventive instructions on the program storage devices and follow these instructions to execute the methodology of the embodiments herein.
[68] The system further includes a user interface adapter that connects a keyboard, mouse, speaker, microphone, and/or other user interface devices such as a touch screen device (not shown) to the bus to gather user input. Additionally, a communication adapter connects the bus to a data processing network, and a display adapter connects the bus to a display device which may be embodied as an output device such as a monitor, printer, or transmitter, for example.
[69] A description of an embodiment with several components in communication with each other does not imply that all such components are required. On the contrary, a variety of optional components are described to illustrate the wide variety of possible embodiments of the invention. When a single device or article is described herein, it will be apparent that more than one device/article (whether or not they cooperate) may be used in place of a single device/article. Similarly, where more than one device or article is described herein (whether or not they cooperate), it will be apparent that a single device/article may be used in place of the more than one device or article, or a different number of devices/articles may be used instead of the shown number of devices or programs. The functionality and/or the features of a device may be alternatively embodied by one or more other devices which are not explicitly described as having such functionality/features. Thus, other embodiments of the invention need not include the device itself.
[70] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, and the like. of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words "comprising," "having," "containing," and "including," and other similar forms are intended to be equivalent in meaning and be open-ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[71] Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based here on. Accordingly, the embodiments of the present invention are intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
[72] It will be understood that, although the terms first, second, etc. may be used hereinto describe various elements, these elements should not be limited to any order bythese terms. These terms are used only to distinguish one element from another;where there are “second” or higher ordinals, there merely must be that manynumber of elements, without necessarily any difference or other relationship. Forexample, a first element could be termed a second element, and, similarly, a secondelement could be termed a first element, without departing from the scope ofexample embodiments or methods. As used herein, the term “and/or” includes allcombinations of one or more of the associated listed items. The use of “etc.” isdefined as “et cetera” and indicates the inclusion of all other elements belonging tothe same group of the preceding items, in any “and/or” combinations.
[73] The data, in each of the components of the system and method may be ‘encrypted’and suitably ‘decrypted’ when required.
[74] The systems described herein can be made accessible through a portal or aninterface which is a part of, or may be connected to, an internal network or anexternal network, such as the Internet or any similar portal. The portals orinterfaces are accessed by one or more of users through an electronic device,whereby the user may send and receive data to the portal or interface which getsstored in at least one memory device or at least one data storage device or at leastone server and utilises at least one processing unit. The portal or interface incombination with one or more of the memory device, data storage device,processing unit and serves, form an embedded computing setup, and may be usedby, or used in, one or more of a non-transitory, computer readable medium. In atleast one embodiment, the embedded computing setup and optionally one or moreof a non- transitory, computer readable medium, in relation with, and incombination with the said portal or interface forms one of the systems of theinvention. Typical examples of a portal or interface may be selected from but isnot limited to a website, an executable software program or a softwareapplication.
[75] The systems and methods may simultaneously involve more than one user ormore than one data storage device or more than one host server or anycombination thereof.
[76] A user may provide user input through any suitable input device or inputmechanism such as but not limited to a keyboard, a mouse, a joystick, a touchpad, a virtual keyboard, a virtual data entry user interface, a virtual dial pad, a software or a program, a scanner, a remote device, a microphone, a webcam, a camera, a fingerprint scanner, a cave, pointing stick.
[77] The systems and methods can be practiced using any electronic device which may be connected to one or more of other electronic device with wires or wirelessly which may use technologies such as but not limited to, Bluetooth, WiFi, Wimax. This will also extend to use of the aforesaid technologies to provide an authentication key or access key, or electronic device based unique key or any combination thereof.
[78] In at least one embodiment, one or more user can be blocked or denied access to one or more of the aspects of the invention.
[79] Encryption can be accomplished using any encryption technology, such as the process of converting digital information into a new form using a key or a code or a program, wherein the new form is unintelligible or indecipherable to a user or a thief or a hacker or a spammer. The term ‘encryption’ includes encoding, compressing, or any other translating of the digital content. The encryption of the digital media content can be performed in accordance with any technology including utilizing an encryption algorithm. The encryption algorithm utilized is not hardware dependent and may change depending on the digital content. For example, a different algorithm may be utilized for different websites or programs. The term ‘encryption’ further includes one or more aspects of authentication, entitlement, data integrity, access control, confidentiality, segmentation, information control, and combinations thereof.
[80] The described embodiments may be implemented as a system, method, apparatus or article of manufacture using standard programming and/or engineering techniques related to software, firmware, hardware, or any combination thereof. The described operations may be implemented as code maintained in a “non-transitory, computer readable medium”, where a processor may read and execute the code from the non-transitory, computer readable medium. A non-transitory, computer readable medium may comprise media such as magnetic storage medium (e.g., hard disk drives, floppy disks, tape, etc.), optical storage (CDROMs, DVDs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, Flash Memory, firmware, programmable logic, etc.), etc. The code implementing the described operationsmay further be implemented in hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.).
[81] Still further, the code implementing the described operations may be implemented in “transmission signals”, where transmission signals may propagate through space or through a transmission media, such as an optical fibre, copper wire, etc. The transmission signals in which the code or logic is encoded may further comprise a wireless signal, satellite transmission, radio waves, infrared signals, Bluetooth, etc. The transmission signals in which the code or logic is encoded is capable of being transmitted by a transmitting station and received by a receiving station, where the code or logic encoded in the transmission signal may be decoded and stored in hardware or a non-transitory, computer readable medium at the receiving and transmitting stations or devices. An “article of manufacture” comprises non- transitory, computer readable medium or hardware logic, and/or transmission signals in which code may be implemented. A device in which the code implementing the described embodiments of operations is encoded may comprise a non-transitory, computer readable medium or hardware logic. Of course, those skilled in the art will recognize that many modifications may be made to this configuration without departing from the scope of the present invention, and that the article of manufacture may comprise suitable information bearing medium known in the art.
[82] The database may include a number of records in accordance with embodiments of the present invention, including data and / or other information, which may be parsed and stored. The database may further comprise software, which may include and / or employ one or more database management systems ("DBMS"), such asanyone of an Oracle, DB2, Microsoft Access, Microsoft SQL Server, Postgres, MySQL, 4th Dimension, FileMaker and Alpha Five DBMS, and the like. The DBMS may be operable to query the database, parse the information into the records, execute rules for sorting the information parsed into the records, execute rules for performing operations (e.g., mathematical, statistical, logical, etc., operations) on the information parsed into the records, and the like.
[83] In many embodiments, the database software may be operable to apply the data from records into one or more models to form one or more output records. These output records include information that may be used to facilitate the employment matching methods as disclosed herein. In addition, the database software may be operable to interface with web-server software, to allow manipulation of the database via one or more web pages available to the administrator via the network.
[84] The systems and methods can be practiced using any electronic device. An electronic device for the purpose of this invention is selected from any device capable of processing or representing data to a user and providing access to a network or any system similar to the internet, wherein the electronic device may be selected from but not limited to, personal computers, tablet computers, mobile phones, laptop computers, palmtops, portable media players, and personal digital assistants. In an embodiment, the computer readable medium data storage unit or data storage device is selected from a set of but not limited to USB flash drive (pen drive), memory card, optical data storage discs, hard disk drive, magnetic disk, magnetic tape data storage device, data server and molecular memory.
[85] The term network means a system allowing interaction between two or more electronic devices and includes any form of inter/intra enterprise environmentsuch as the world wide web, Local Area Network (LAN) , Wide Area Network (WAN), Storage Area Network (SAN) or any form of Intranet.
[86] The network may comprise any network suitable for embodiments of the present invention. For example, the network may be a partial or full deployment of most any communication / computer network or link, including any of, any multiple of, any combination of or any combination of multiples of a public or private, terrestrial wireless or satellite, and wireline networks or links. The network may include, for example, network elements from a Public Switch Telephone Network (PSTN), the Internet, core and proprietary public networks, wireless voice and packet-data networks, such as 1G, 2G, 2.5G, 3G and 4G telecommunication networks, wireless office telephone systems (WOTS) and / or wireless local area networks (WLANs), including, Bluetooth and/ or IEEE 802.11 WLANs, wireless personal area networks (WPANs), wireless metropolitan area networks (WMANs) and the like; virtual local area networks (VLANs) and/ or communication links, such as Universal Serial Bus (USB) links; parallel port links, Firewire links, RS232 links, RS-485 links, Controller- Area Network (CAN) links, and the like.
[87] In accordance with many embodiments of the present invention, each of the parties associated with the system comprise the necessary electronic devices, having platforms and databases where applicable, to execute the methods as set forth by embodiments of the present invention. Alternative system architectures are contemplated by embodiments of the present invention provided such alternative architectures are capable of executing the various methods disclosed herein.
[88] While this detailed description has disclosed certain specific embodiments for illustrative purposes, various modifications will be apparent to those skilled in the art which do not constitute departures from the spirit and scope of the invention as defined in the following claims, and it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.
,CLAIMS:WE CLAIM:
1. A method (700) for reconstructing a layout-agnostic table data in electronic documents (104)using a data reconstruction system (108), the method (700) comprising:
uploading (702), using a hardware processor (224), an electronic document (104) on at least one of: a user device (102) and an external server (106), wherein the electronic document (104) is stored in at least one of a database (222) of the user device (102) and a cloud database;
applying (704), by the hardware processor (224), a pre-processed image onto a deep machine learning model;
determining (706), by the hardware processor (224), a plurality of layout elements in the uploaded electronic document (104) using an output data of the deep machine learning model;
arranging (708), by the hardware processor (224), an unstructured text in the determined plurality of layout elements in a line-wise manner;
generating (710), by the hardware processor (224), a response by rearranging the arranged unstructured text in the determined plurality of layout elements in a line-wise manner;
cropping (712), by the hardware processor (224), a table that is identified in the uploaded electronic document (104) as an image and sending the cropped image to a table cell detection subsystem to identify each cell in the table, wherein each of a plurality of cells in the table is the smallest unit formed by overlapping of row and column position;
generating (714), by the hardware processor (224), the plurality of cells comprising the rearranged text as an output with a row and column number that are assigned to each of the plurality of cells in the table; and
outputting (716), by the hardware processor (224), the plurality of cells in at least one of: JavaScript Object Notation (JSON) and Extensible Markup Language (XML) response, wherein the plurality of cells forms different hierarchal elements in the uploaded electronic document (104).
2. The method (700) as claimed in claim 1, wherein determining (706) the plurality of layout elements in the uploaded electronic document (104) comprising:
identifying, by the hardware processor (224), a type of the uploaded electronic document (104), wherein the type of the uploaded electronic document (104) comprises a format with at least one of: a portable document format (.pdf), a joint photograph group (.jpg), a portable network graphics (.png), and a tag image file (.tif);
converting, by the hardware processor (224), the uploaded electronic document (104) to an image with a standard format based on the type of the uploaded electronic document (104), wherein a single image is generated when a single page of the electronic document (104) is uploaded and wherein a plurality of images is generated when a plurality of pages of the electronic document (104) is uploaded; and
pre-processing, by the hardware processor (224), the image to determine the plurality of layout elements in the uploaded electronic document (104), wherein the pre-processing the image comprises processes of deskewing, correction rotation of the image when edges of the image are identified, and dewarping.
3. The method (700) as claimed in claim 1, further comprising:
upon determining (706) the plurality of layout elements in the uploaded electronic document (104), mapping, by the hardware processor (224), the determined layout elements to a corresponding unstructured text in the pre-processed image using a bounding box coordination technique.
4. The method(700) as claimed in claim 1, wherein generating (710) the response by rearranging the arranged unstructured text in the determined plurality of layout elements comprising:
identifying, by the hardware processor (224), a plurality of cells in the table in the uploaded electronic document (104); and
upon identifying the plurality of cells in the table, merging, by the hardware processor (224), the plurality of cells that are spanned over a plurality of rows based on a reference column in the table.
5. The method (700) as claimed in claim 1, further comprising:
identifying, by the hardware processor(224), regions in the uploaded electronic document (104) comprising a first set of predefined regions that comprise at least one of: financial_reports, audit_reports, cash_flow, and a balance_sheet based on a type of the uploaded electronic document (104).
6. The method (700) as claimed in claim 1, further comprising:
identifying, by the hardware processor(224), the regions in the uploaded electronic document (104) comprising a second set of predefined regions that comprise at least one of: invoices, pay slips, and bank statements.
7. The method(700) as claimed in claim 1, further comprising:
identifying, by the hardware processor (224), a plurality of rows and columns in the table when the table is a bordered table.
8. The method (700) as claimed in claim 1, wherein the plurality of layout elements in the uploaded electronic document (104) comprising at least one of: a heading that refers to a top part of the uploaded electronic document (104) that comprises bold headings, a paragraph comprising paragraph information, a text comprising a single line test and sub headings, a table comprising outer table information with boundary detection, table cells that refer to the plurality of cells inside each table with the cropped image, a bottom part that refers to a lower part of the uploaded electronic document (104), a footer comprising a page number, and a logical cell that refers to a closely grouped block of the text.
9. The method (700) as claimed in claim 1, wherein each of the hierarchal elements in at least one of: the JSON and the XML response is assigned a unique incremental number for easy accessof the different hierarchal elements.
10. The method (700) as claimed in claim 4, wherein a plurality of cells in the reference column comprises at least one of: a top alignment and a middle alignment with the plurality of cells spanned in the plurality of rows.
11. A system (100) for reconstructing a layout-agnostic table data in electronic documents (104)using a data reconstruction system (108), the system (100) comprising:
a hardware processor (224); and
a memory (202) coupled to the hardware processor (224), wherein thememory (202) comprises a set of program instructions in the form of a plurality of subsystems(110), configured to be executed by the hardwareprocessor (224), wherein the hardware processor (224) is configured to upload an electronic document (104) in at least one of: a user device (102) and an external server (106), wherein the electronic document (104) is stored in at least one of: a database (222) of the user device (102) and a cloud database, and wherein the plurality of subsystems (110) comprises:
an image pre-processing subsystem (204) that is configured to apply a pre-processed image onto a deep machine learning model;
a layout determination subsystem (206) that is configured to determine a plurality of layout elements in the uploaded electronic document (104) using an output data of the deep machine learning model;
a text arrangement subsystem (208) that is configured to arrange an unstructured text in the determined plurality of layout elements in a line-wise manner;
a logical cell detection and reconciliation subsystem (212) that is configured to generate a response by rearranging the arranged unstructured text in the determined plurality of layout elements in a line-wise manner;
a table detection and table reconciliation subsystem (TDRM) (214) that is configured to crop a table that is identified in the uploaded electronic document (104) as an image and send the cropped image to a table cell detection subsystem (206) to identify each cell in the table, wherein each of a plurality of cells in the table is the smallest unit formed by overlapping of row and column position;
the table detection and table reconciliation subsystem (214) that is further configured to generate the plurality of cells comprising the rearranged text as an output with a row and column number that are assigned to each of the plurality of cells in the table; and
a table text blocks outputsubsystem (216) that is configured to output the plurality of cells in at least one of: JavaScript Object Notation (JSON) and Extensible Markup Language (XML) response, wherein the plurality of cells forms different hierarchal elements in the uploaded electronic document (104).
12. The system (100) as claimed in claim 11, wherein the layout determination subsystem (204) determines the plurality of layout elements in the uploaded electronic document (104) by:
identifying, by the hardware processor (224), a type of the uploaded electronic document (104), wherein the type of the uploaded electronic document (104) comprises a format with at least one of: a portable document format (.pdf), a joint photograph group (.jpg), a portable network graphics (.png), and a tag image file (.tif);
converting, by the hardware processor (224), the uploaded electronic document (104) to an image with a standard format, wherein a single image is generated when a single page of the electronic document (104) is uploaded and wherein a plurality of images is generated when a plurality of pages of the electronic document (104) is uploaded; and
pre-processing, by the hardware processor (224), the image to determine the plurality of layout elements in the uploaded electronic document (104), wherein the pre-processing the image comprises processes of deskewing, correction rotation of the image when edges of the image are identified, and dewarping.
13. The system (100) as claimed in claim 11, further comprising:
a layout mapping subsystem (210) that is configured to map the determined layout elements to a corresponding unstructured text in the pre-processed image using a bounding box coordination technique upon determining (706) the plurality of layout elements in the uploaded electronic document (104).
14. The system (100) as claimed in claim 11, wherein the logical cell detection and reconciliation subsystem (212) generates the response by rearranging the arranged unstructured text in the determined plurality of layout elements by:
identifying a plurality of cells in a table in the uploaded electronic document (104); and
upon identifying the plurality of cells in the table, merging the plurality of cells that are spanned over a plurality of rows based on a reference column in the table.
15. The system (100) as claimed in claim 11, further comprising:
a region prediction subsystem (218) that is configured to identify regions in the uploaded electronic document (104) comprising a first set of predefined regions that comprise at least one of: financial_reports, audit_reports, cash_flow, and a balance_sheet based on a type of the uploaded electronic document (104).
16. The system (100) as claimed in claim 11, wherein the region prediction subsystem (218) that is further configured to identify the regions in the uploaded electronic document (104) comprising a second set of predefined regions that comprise at least one of: invoices, pay slips, and bank statements.
17. The system (100) as claimed in claim 11, wherein a plurality of layout elements in the uploaded electronic document (104) comprising at least one of: a heading that refers to a top part of the uploaded electronic document (104) that comprises bold headings, a paragraph comprising paragraph information, a text comprising a single line test and sub headings, a table comprising outer table information with boundary detection, table cells that refer to the plurality of cells inside each table with the cropped image, a bottom part that refers to a lower part of the uploaded electronic document (104), a footer comprising a page number, and a logical cell that refers to a closely grouped block of the text.
18. The system (100) as claimed in claim 11, wherein each of the hierarchal elements in at least one of: the JSON and the XML response is assigned a unique incremental number for easy accessof the different hierarchal elements.
19. The system (100) as claimed in claim 14, wherein the plurality of cells in the reference column comprises at least one of: a top alignment and a middle alignment with the plurality of cells spanned in the plurality of rows.
Dated this 17th day of August 2022
Vidya Bhaskar Singh Nandiyal
Patent Agent (IN/PA-2912)
Agent for applicant
| # | Name | Date |
|---|---|---|
| 1 | 202141026985-PROVISIONAL SPECIFICATION [17-06-2021(online)].pdf | 2021-06-17 |
| 2 | 202141026985-PROOF OF RIGHT [17-06-2021(online)].pdf | 2021-06-17 |
| 3 | 202141026985-FORM FOR SMALL ENTITY(FORM-28) [17-06-2021(online)].pdf | 2021-06-17 |
| 4 | 202141026985-FORM FOR SMALL ENTITY [17-06-2021(online)].pdf | 2021-06-17 |
| 5 | 202141026985-FORM 3 [17-06-2021(online)].pdf | 2021-06-17 |
| 6 | 202141026985-FORM 1 [17-06-2021(online)].pdf | 2021-06-17 |
| 7 | 202141026985-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [17-06-2021(online)].pdf | 2021-06-17 |
| 8 | 202141026985-EVIDENCE FOR REGISTRATION UNDER SSI [17-06-2021(online)].pdf | 2021-06-17 |
| 9 | 202141026985-DRAWINGS [17-06-2021(online)].pdf | 2021-06-17 |
| 10 | 202141026985-FORM-26 [30-08-2021(online)].pdf | 2021-08-30 |
| 11 | 202141026985-FORM-26 [30-08-2021(online)]-1.pdf | 2021-08-30 |
| 12 | 202141026985-FORM-26 [16-06-2022(online)].pdf | 2022-06-16 |
| 13 | 202141026985-PostDating-(17-06-2022)-(E-6-151-2022-CHE).pdf | 2022-06-17 |
| 14 | 202141026985-POA [17-06-2022(online)].pdf | 2022-06-17 |
| 15 | 202141026985-FORM 13 [17-06-2022(online)].pdf | 2022-06-17 |
| 16 | 202141026985-APPLICATIONFORPOSTDATING [17-06-2022(online)].pdf | 2022-06-17 |
| 17 | 202141026985-Proof of Right [24-06-2022(online)].pdf | 2022-06-24 |
| 18 | 202141026985-DRAWING [17-08-2022(online)].pdf | 2022-08-17 |
| 19 | 202141026985-CORRESPONDENCE-OTHERS [17-08-2022(online)].pdf | 2022-08-17 |
| 20 | 202141026985-COMPLETE SPECIFICATION [17-08-2022(online)].pdf | 2022-08-17 |