Abstract: The present disclosure provides a system and a method for automatic key extraction from documents. The system receives a document that includes an invoice associated with one or more users. The system extracts one or more attributes from the document and generates one or more images from the document based on the one or more attributes. Further, the system detects a plurality of textual information from the one or more images. The system generates one or more bounding boxes associated with the textual information. The system detects a plurality of quick response (QR) codes from the document and extracts the plurality of QR codes to generate a document related information. Further, the system generates one or more key-value pairs from the document via an artificial intelligence (AI) engine based on the one or more bounding boxes and the document related information.
DESC:RESERVATION OF RIGHTS
[0001] A portion of the disclosure of this patent document contains material, which is subject to intellectual property rights such as but are not limited to, copyright, design, trademark, integrated circuit (IC) layout design, and/or trade dress protection, belonging to Jio Platforms Limited (JPL) or its affiliates (hereinafter referred as owner). The owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights whatsoever. All rights to such intellectual property are fully reserved by the owner.
FIELD OF INVENTION
[0002] The embodiments of the present disclosure generally relate to systems and methods for facilitating documents digitization and key information extraction. More particularly, the present disclosure relates to a system and a method for automatic key information extraction from documents.
BACKGROUND OF THE INVENTION
[0003] The following description of the related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section is used only to enhance the understanding of the reader with respect to the present disclosure, and not as admission of the prior art.
[0004] Retail includes day-to-day purchase orders from various vendors. For all the retail orders, there are multiple invoices generated with the retail products. Manual efforts include recording a purchase order, generating a list of articles from the purchase order. Further, amount validation from the purchase order adds to the time required for processing invoices. Additionally, manual efforts require excess time and generate high costs associated with the processing of the purchase orders.
[0005] Conventional solutions include multiple natural language processing (NLP)-based processing pipelines aimed at digitizing invoices. Most of the pipelines rely on open-source optical character recognition (OCR) engines to extract the text and use word relationships, and a rule-based engine to extract values from the invoices. However, relevant information extraction from the invoices remains a challenge.
[0006] There is, therefore, a need in the art to provide a system and a method that can mitigate the problems associated with the prior arts.
OBJECTS OF THE INVENTION
[0007] Some of the objects of the present disclosure, which at least one embodiment herein satisfies are listed herein below.
[0008] It is an object of the present disclosure to provide a system and a method that facilitates key information extraction from scanned images of retail invoices.
[0009] It is an object of the present disclosure to provide a system and a method that uses a neural network-based method for extraction of scanned images of retail invoices.
[0010] It is an object of the present disclosure to provide a system and a method that utilizes synthetic invoice generation to generate synthetic documents from scanned images.
SUMMARY
[0011] This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
[0012] In an aspect, the present disclosure relates to a system for generating information from documents. The system includes a processor and a memory operatively coupled with the processor, where said memory stores instructions which, when executed by the processor, cause the processor to receive a document via one or more computing devices. The document includes an invoice associated with one or more users. The processor extracts one or more attributes from the document. The processor generates one or more images from the document based on the one or more attributes. The processor detects textual information from the one or more images. The processor generates one or more bounding boxes associated with the textual information. The processor detects a plurality of quick response (QR) codes from the document and extracts the plurality of QR codes to generate document related information. The processor generates via an artificial intelligence (AI) engine, one or more key-value pairs from the document, based on the one or more bounding boxes and the document related information.
[0013] In an embodiment, the processor may extract one or more tabular regions from the one or more images to generate at least one of a cell, a row, a column, and a header associated with the one or more tabular regions.
[0014] In an embodiment, the one or more attributes may include at least one of an invoice number, an invoice date, an invoice amount, a vendor goods and services tax identification number (GSTIN), a buyer GSTIN, and one or more order details.
[0015] In an embodiment, the one or more order details may include at least one of a quantity, a maximum retail price (MRP), a description, and one or more codes.
[0016] In an embodiment, the processor may determine if at least one QR code among the plurality of QR codes exist in the document and in response to a positive determination, prioritize the at least one QR code among the plurality of QR codes for generating the one or more key-value pairs.
[0017] In an embodiment, in response to a negative determination, the processor may use the textual information from the one or more images for generating the one or more key-value pairs.
[0018] In an embodiment, the processor may determine a location associated with the textual information and the plurality of QR codes to generate the one or more key-value pairs associated with the one or more bounding boxes, and the document related information.
[0019] In an aspect, the present disclosure relates to a method for generating information from documents. The method includes receiving, by a processor, associated with a system, a document via one or more computing devices. The document includes an invoice associated with one or more users. The method includes extracting, by the processor, one or more attributes from the document. The method includes generating, by the processor, one or more images from the document based on the one or more attributes. The method includes detecting, by the processor, textual information from the one or more images. The method includes generating, by the processor, one or more bounding boxes associated with the textual information. The method includes detecting, by the processor, a plurality of QR codes from the document and extracting the plurality of QR codes to generate document related information. The method includes generating, by the processor, via an AI engine, one or more key-value pairs from the document, based on the one or more bounding boxes and the document related information.
[0020] In an embodiment, the method may include extracting, by the processor, one or more tabular regions from the one or more images to generate at least one of a cell, a row, a column, and a header associated with the one or more tabular regions.
[0021] In an embodiment, the one or more attributes may include at least one of an invoice number, an invoice date, an invoice amount, a vendor GSTIN, a buyer GSTIN, and one or more order details.
[0022] In an embodiment, the one or more order details may include at least one of a quantity, a MRP, a description, and one or more codes.
[0023] In an embodiment, the method may include determining, by the processor, if at least one QR code among the plurality of QR codes exist in the document and in response to a positive determination, the method may include prioritizing by the processor, the at least one QR code among the plurality of QR codes for generating the one or more key-value pairs.
[0024] In an embodiment, the method may include using, by the processor, in response to a negative determination, the textual information from the one or more images for generating the one or more key-value pairs.
[0025] In an embodiment, the method may include determining, by the processor, a location associated with the plurality of textual information and the plurality of QR codes to generate the one or more key-value pairs associated with the one or more bounding boxes and the document related information.
BRIEF DESCRIPTION OF DRAWINGS
[0026] The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes the disclosure of electrical components, electronic components, or circuitry commonly used to implement such components.
[0027] FIG. 1 illustrates an exemplary network architecture (100) of a proposed system (108), in accordance with an embodiment of the present disclosure.
[0028] FIG. 2 illustrates an exemplary representation (200) of a proposed system (108), in accordance with an embodiment of the present disclosure.
[0029] FIG. 3 illustrates an exemplary representation of an application flow (300) of invoices during the submission by vendors, in accordance with an embodiment of the present disclosure.
[0030] FIG. 4 illustrates an exemplary representation of a flow diagram across multiple vendors during invoice submission (400), in accordance with an embodiment of the present disclosure.
[0031] FIG. 5 illustrates an exemplary block diagram of an invoice processing backend (500), in accordance with an embodiment of the present disclosure.
[0032] FIG. 6 illustrates an exemplary computer system (600) in which or with which the proposed system (108) may be implemented, in accordance with an embodiment of the present disclosure.
[0033] The foregoing shall be more apparent from the following more detailed description of the invention.
BRIEF DESCRIPTION OF THE INVENTION
[0034] In the following description, for explanation, various specific details are outlined in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.
[0035] The ensuing description provides exemplary embodiments only and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.
[0036] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail to avoid obscuring the embodiments.
[0037] Also, it is noted that individual embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
[0038] The word "exemplary" and/or "demonstrative" is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as "exemplary" and/or "demonstrative" is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms "includes," "has," "contains," and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive like the term "comprising" as an open transition word without precluding any additional or other elements.
[0039] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[0040] The terminology used herein is to describe particular embodiments only and is not intended to be limiting the disclosure. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes any combinations of one or more of the associated listed items.
[0041] The disclosed system and method relate to extracting key information from invoice such as, but not limited to, retail invoices, using the text, position, and appearance of the invoices. In an embodiment, this information is also used to generate digital twins for a scaled invoice and further generate different invoice layouts for improving the model accuracies. The system locates relevant fields and tabular information of the articles in the invoices using state-of-the-art text detection and recognition models. Further, the system generates key-pair values using deep learning models.
[0042] The various embodiments throughout the disclosure will be explained in more detail with reference to FIGs. 1-6.
[0043] FIG. 1 illustrates an exemplary network architecture (100) of a proposed system (108), in accordance with an embodiment of the present disclosure. As illustrated in FIG. 1, one or more computing (104-1, 104-2…104-N), herein referred to as computing devices (104) may be connected to the system (108) through a network (106). The computing devices (104) may also be known as user equipment (UE) throughout the disclosure. One or more users (102) may provide a document to the system (108) via the computing devices (104).
[0044] In an embodiment, the computing devices (104) may include, but not be limited to, a mobile, a laptop, etc. Further, the computing devices (104) may include a smartphone, virtual reality (VR) devices, augmented reality (AR) devices, a general-purpose computer, desktop, personal digital assistant, tablet computer, and a mainframe computer. Additionally, input devices for receiving input from the one or more entities such as a touch pad, touch-enabled screen, electronic pen, and the like may be used. A person of ordinary skill in the art will appreciate that the computing devices (104) may not be restricted to the mentioned devices and various other devices may be used.
[0045] In an embodiment, the network (106) may include, by way of example but not limitation, at least a portion of one or more networks having one or more nodes that transmit, receive, forward, generate, buffer, store, route, switch, process, or a combination thereof, etc. one or more messages, packets, signals, waves, voltage or current levels, some combination thereof, or so forth. The network (106) may also include, by way of example but not limitation, one or more of a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof.
[0046] In an embodiment, the system (108) may receive a document via one or more computing devices (104). The document may include an invoice associated with one or more users (102). The system (108) may extract one or more attributes from the document. The system (108) may extract one or more tabular regions from the one or more images to generate at least one of but not limited to a cell, a row, a column, and a header associated with the one or more tabular regions.
[0047] In an embodiment, the system (108) may generate one or more images from the document based on the one or more attributes. The one or more attributes may include but not limited to an invoice number, an invoice date, an invoice amount, a vendor goods and services tax identification number (GSTIN), a buyer GSTIN, and one or more order details. Further, the one or more order details may include but not limited to a quantity, a maximum retail price (MRP), a description, and one or more codes.
[0048] In an embodiment, the system (108) may detect textual information from the one or more images. The system (108) may generate one or more bounding boxes associated with the textual information. The system (108) may detect a plurality of QR codes from the document and extract the plurality of quick response (QR) codes to generate document related information. The system (108) may determine if at least one QR code among the plurality of QR codes exist in the document and in response to a positive determination, prioritize the at least one QR code among the plurality of QR codes for generating the one or more key-value pairs. The system (108) may in response to a negative determination, use the textual information from the one or more images for generating the one or more key-value pairs.
[0049] In an embodiment, the system (108) may generate via an artificial intelligence (AI) engine (110), one or more key-value pairs from the document, the one or more bounding boxes and the document related information. The system (108) may determine a location associated with the textual information and the plurality of QR codes to generate the one or more key-value pairs associated with the one or more bounding boxes, and the document related information.
[0050] Although FIG. 1 shows exemplary components of the network architecture (100), in other embodiments, the network architecture (100) may include fewer components, different components, differently arranged components, or additional functional components than depicted in FIG. 1. Additionally, or alternatively, one or more components of the network architecture (100) may perform functions described as being performed by one or more other components of the network architecture (100).
[0051] FIG. 2 illustrates an exemplary representation (200) of a proposed system (108), in accordance with an embodiment of the present disclosure. A person of ordinary skill in the art will understand that the system (108) of FIG. 2 may be similar to the system (108) of FIG. 1 in its functionality.
[0052] Referring to FIG. 2, the system (108) may comprise one or more processor(s) (202). The one or more processor(s) (202) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that process data based on operational instructions. Among other capabilities, the one or more processor(s) (202) may be configured to fetch and execute computer-readable instructions stored in a memory (204) of the system (108). The memory (204) may be configured to store one or more computer-readable instructions or routines in a non-transitory computer readable storage medium, which may be fetched and executed to create or share data packets over a network service. The memory (204) may comprise any non-transitory storage device including, for example, volatile memory such as random-access memory (RAM), or non-volatile memory such as erasable programmable read only memory (EPROM), flash memory, and the like.
[0053] In an embodiment, the system (108) may include an interface(s) (206). The interface(s) (206) may comprise a variety of interfaces, for example, interfaces for data input and output (I/O) devices, storage devices, and the like. The interface(s) (206) may also provide a communication pathway for one or more components of the system (108). Examples of such components include, but are not limited to, processing engine(s) (208) and a database (210). Further, the processing engine (208) may include a parameter acquisition engine (212), an extraction engine (214), an AI engine (216), and other engine (s) (218). In an embodiment, the other engine(s) (218) may include, but not limited to, a data management engine, an input/output engine, and a notification engine.
[0054] In an embodiment, the processing engine(s) (208) may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s) (208). In examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processing engine(s) (208) may be processor-executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processing engine(s) (208) may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the machine-readable storage medium may store instructions that, when executed by the processing resource, implement the processing engine(s) (208). In such examples, the system (108) may comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separate but accessible to the system (108) and the processing resource. In other examples, the processing engine(s) (208) may be implemented by electronic circuitry.
[0055] In an embodiment, the processor (202) may receive a document via the parameter acquisition engine (212). The processor (202) may store the document in the database (210). The document may be received from one or more computing devices (104). The document may include an invoice associated with one or more users (102). The processor (202) may extract one or more attributes from the document. The processor (202) may extract one or more tabular regions from the one or more images via the extraction engine (214) to generate at least one of but not limited to a cell, a row, a column, and a header associated with the one or more tabular regions.
[0056] In an embodiment, the processor (202) may generate one or more images from the document based on the one or more attributes. The one or more attributes may include but not limited to an invoice number, an invoice date, an invoice amount, a vendor GSTIN, a buyer GSTIN, and one or more order details. Further, the one or more order details may include but not limited to a quantity, a MRP, a description, and one or more codes.
[0057] In an embodiment, the processor (202) may detect textual information from the one or more images. The processor (202) may generate one or more bounding boxes associated with the textual information. The processor (202) may detect a plurality of QR codes from the document and extract the plurality of quick response (QR) codes to generate document related information. The processor (202) may determine if at least one QR code among the plurality of QR codes exist in the document and in response to a positive determination, prioritize the at least one QR code among the plurality of QR codes for generating the one or more key-value pairs. The processor (202) may in response to a negative determination, use the textual information from the one or more images for generating the one or more key-value pairs.
[0058] In an embodiment, the processor (202) may generate, via an AI engine (216), one or more key-value pairs from the document, the one or more bounding boxes and the document related information. The processor (202) may determine a location associated with the textual information and the plurality of QR codes to generate the one or more key-value pairs associated with the one or more bounding boxes, and the document related information.
[0059] FIG. 3 illustrates an exemplary representation of an application flow (300) of invoices during the submission by vendors, in accordance with an embodiment of the present disclosure.
[0060] As illustrated in FIG. 3, vendors (302) may submit the invoices to a system such as the system (108) of FIGs. 1-2, using computing devices such as the computing devices (104) of FIG. 1. The vendors (302) may submit the invoices with or without an autonomous system number (ASN). A person of ordinary skill in the art will understand that the vendors (302) may be similar to the users (102) of FIG. 1.
[0061] In an embodiment, the system (108) may include a supplier relationship management (SRM) portal (304) and a service access point (SAP) module (306). Further, in an embodiment, the system (108) may be associated with or accessible by an AI engine (308) and a single sign-on (SSO) team (310). In an embodiment, the invoices uploaded on the SRM portal (304) may be accessed by the AI engine (308) for processing. Further, invoices with certain exceptions may be sent to the SSO team (310) for analysis. In an embodiment, the AI engine (308) of FIG. 3 may be similar to the AI engine (216) of FIG. 2 in its functionality.
[0062] In an embodiment, the SRM portal (304) may generate an invoice number (IN) and a purchase order (PO). Further, the SAP module (306) may link the document/invoice based on the IN and PO from the SRM portal (304).
[0063] In an embodiment, the AI engine (308) may process the invoices submitted to the SRM portal (304) by the vendors (302) and generate invoice values. Further, the SSO team (310) may process the invoices under certain exceptional circumstances.
[0064] FIG. 4 illustrates an exemplary representation of a business flow diagram across multiple vendors during invoice submission (400), in accordance with an embodiment of the present disclosure.
[0065] As shown in the business flow diagram, the following steps may be observed.
[0066] At Step 418: The vendors/users may hand over the goods/invoice from a vendor truck (402) to a warehouse staff (404).
[0067] At Step 420: The warehouse staff (404) may create an invoice processing task and send the invoice processing task to an invoice processing application (408).
[0068] At Step 422: The invoice processing application (408) may create a task corresponding to the received invoice processing task.
[0069] At Step 424: The invoice processing backend (410) may update/upload task details at a task database (414).
[0070] At Step 426: Additionally, the invoice processing backend (410) may send a task identifier (ID) to the invoice processing application (408).
[0071] At step 428: The warehouse staff (404) may scan/upload the invoice to the invoice processing application (408). In an exemplary embodiment, the invoice may be in portable document format (PDF) format.
[0072] At Step 430: The invoice processing application (408) may upload a file associated with the uploaded invoice at the invoice processing backend (410).
[0073] At Step 432: The invoice processing backend (410) may update/upload task status corresponding to the task details at the task database (414).
[0074] At Step 434: The invoice processing backend (410) may also upload the file associated with the invoice to a managed object storage module (416).
[0075] At Step 436: Further, the warehouse staff (404) may send a request for schedule processing to the invoice processing application (408).
[0076] At Step 438: The invoice processing application (408), on receiving the request from the warehouse staff (404), may schedule the processing at the invoice processing backend (410). Based on such scheduling, the invoice processing backend (410) may update the task status at the task database (414).
[0077] At Step 440: The invoice processing backend (410) may also send invoice task ID for processing to an invoice content extraction service (412).
[0078] At Step 442: The invoice content extraction service (412) may download the file associated with the invoice from the task database (414).
[0079] At step 444: The invoice content extraction service (412) may perform extraction on the file and update the extracted content at the invoice processing backend (410).
[0080] At Step 446: The invoice processing backend (410) may update the task details based on the extracted content at the task database (414).
[0081] At Step 448: Further, the invoice content extraction service (412) may update the task status to complete at the invoice processing backend (410).
[0082] At Step 450: In an embodiment, the invoice processing application (408) may fetch data for completed tasks from the invoice processing backend (410). In an embodiment, the invoice processing application (408) may fetch such details on a periodic basis.
[0083] At Step 452: The invoice processing backend (410) may request for task details from the task database (414).
[0084] At Step 454: A quality control (QC) staff (406) may verify content for completed tasks through the invoice processing application (408).
[0085] Although FIG. 4 shows exemplary steps, in other embodiments, the flow diagram may include fewer steps, different steps, differently arranged steps, or additional functional steps than depicted in FIG. 4.
[0086] FIG. 5 illustrates an exemplary block diagram of an invoice processing backend (500) flow, in accordance with an embodiment of the present disclosure. A person of ordinary skill in the art will understand that the invoice processing backend (500) may be similar to the invoice processing backend (410) of FIG.4 in its functionality.
[0087] As shown in FIG. 5, a validator (502) and an invoice source (504) may access an invoice validation application (506). The invoice validation application (506) may upload invoices in sets and further use a document upload application programming interface (API) (508) to store the invoices at a storage (514). In an embodiment, the storage (514) may be, but not limited to, a binary large object (BLOB) storage. A person of ordinary skill in the art will understand that BLOB storage is a type of cloud storage for unstructured data.
[0088] Referring to FIG. 5, the invoice validation application (506) may access a document interface trigger API (510) once the invoices are uploaded. The document interface trigger API (510) may further access a bulk invoice processing workflow (512) to categorize the invoices into an invoice number, an invoice date, an invoice amount, vendor goods, GSTIN, a buyer GSTIN, a purchase order (PO) number, and the like, and store this information in the storage (514). The invoice validation application (506) may also access this information from the storage (514) for further processing. In an embodiment, the storage (514) may include the updated invoice number, invoice date, vendor goods, GSTIN, a buyer GSTIN, and a PO. Further, a cache (516) may also be accessed for job state management. Further, the bulk invoice processing workflow (512) may send the processed invoice number, invoice date, invoice amount, vendor goods, a buyer/vendor GSTIN, and a purchase order (PO) number (518) to the storage (520). The document trigger interference API (510) may further access the storage (520) for processing the invoice.
[0089] In an embodiment, the invoice processing backend (500) may include the following stages.
[0090] Pre-process Stage: Documents associated with a job/task ID may be fetched from the storage (514) and assigned with a universal unique identifier (UUID) for tracking. Each document may be then split into pages in image format and stored in the storage (514) along with its metadata information, after splitting the number of pages uniformly across all instances.
[0091] Detection Stage: For each job/task ID, all instances (nodes) may run in parallel. Each instance may use the job ID and an instance number from the pre-process stage. Further, the detection stage may pick metadata information of all the pages associated with that instance. Additionally, each page may be fetched from the storage (514) as data and passed onto a detection API to receive bounding box information as result. The results along with the associated cropped images may be stored in the storage (514) as page-wise directories.
[0092] Detection Aggregator Stage: Detection box paths associated with the job/task ID may be fetched from the storage (514) and split uniformly across all instances.
[0093] QR Code Detection Stage: For each job/task ID, the QR code detection stage may pick metadata information of all the pages associated with that instance. Further, each page may be fetched from the storage (514) as data and may be passed onto a QR code detection API to receive QR code bounding box information as result. The results along with the associated cropped barcode images may be stored in the storage (514) as page-wise directories.
[0094] QR Code Detection Aggregator Stage: QR code detection box paths associated with the job/task ID may be fetched from the storage (514) and split uniformly across all instances.
[0095] QR Code Recognition Stage: For each job/task ID, all instances (nodes) may run in parallel. Each instance may use the job ID and instance number from the QR code detection aggregator stage. Further, the QR code recognition stage may pick metadata information of all the cropped QR code images associated with that instance and may fetch the cropped QR code image from the storage (514) as data. The QR code recognition stage may pass this information to the QR code recognition API to receive recognition information as result. The recognition results may be stored in the storage (514) as page-wise directories.
[0096] Recognition Stage: For each job/task ID, the recognition stage may pick metadata information of all the cropped images associated with that instance and split them into batches. Then, each batch may run through a method to fetch respective cropped images from the storage (514) as data and pass the information to the recognition API to receive recognition information as result. The recognition results may be stored in the storage (514) as page-wise directories.
[0097] Recognition Aggregator Stage: Recognition result paths associated with the job/task ID may be fetched from the storage (514) and split uniformly across instances.
[0098] Key-Value Extractor Stage: For each job/task ID, the key-value extractor stage may pick metadata information of all the recognized images associated with that instance and combine them into page OCR data. Then, each page of OCR data may be passed onto the IV pipeline to obtain key-value extracted information as result. The IV pipeline may include OCR data and invoice messages to be passed onto the key-value extraction neural network model. Further, the results may be stored in the storage (514) as page-wise directories.
[0099] Attribute Post Process Stage: Page-wise attribute extracted results associated with a job/task ID may be fetched from a second storage (520) and merged into document-wise results after post-processing. Further, the document-wise results may be stored in the storage (514) (or, first storage) as results of the job/task ID for the batch to complete processing the attribute extraction sub-graph. In an embodiment, the second storage (520) may be similar to the storage (514). In an embodiment, the second storage (520) may be, but not limited to, a BLOB storage.
[00100] QR Code Post Process Stage: Page-wise QR code extracted results associated with the job/task ID may be fetched from the second storage (520) and merged into document-wise results after post-processing. Further, the page-wise QR code extracted results may be stored in the storage (514) as results of that job ID for the batch to complete processing the QR code extraction sub-graph.
[00101] Post Process Stage: Document-wise attribute and QR code extracted results associated with the job ID may be fetched from the storage (514) and merged into document-wise results. Further, the document-wise attribute and QR code extracted results may be stored in the second storage (520) as results of that job ID for the batch to complete processing of the batch workflow.
[00102] FIG. 6 illustrates an exemplary computer system (600) in which or with which the proposed system (110) may be implemented, in accordance with an embodiment of the present disclosure.
[00103] As shown in FIG. 6, the computer system (600) may include an external storage device (610), a bus (620), a main memory (630), a read-only memory (640), a mass storage device (650), a communication port(s) (660), and a processor (670). A person skilled in the art will appreciate that the computer system (600) may include more than one processor and communication ports. The communication port(s) (660) may be chosen depending on a network, such as a Local Area Network (LAN), Wide Area Network (WAN), or any network to which the computer system (600) connects. The main memory (630) may be Random Access Memory (RAM), or any other dynamic storage device commonly known in the art. The read-only memory (640) may be any static storage device(s) e.g., but not limited to, a Programmable Read Only Memory (PROM) chip for storing static information e.g., start-up or basic input/output system (BIOS) instructions for the processor (670). The mass storage device (650) may be any current or future mass storage solution, which can be used to store information and/or instructions.
[00104] The bus (620) may communicatively couple the processor (670) with the other memory, storage, and communication blocks. Optionally, operator and administrative interfaces, e.g., a display, keyboard, and cursor control device may also be coupled to the bus (620) to support direct operator interaction with the computer system (600). Other operator and administrative interfaces can be provided through network connections connected through the communication port(s) (660). In no way should the aforementioned exemplary computer system (600) limit the scope of the present disclosure.
[00105] While considerable emphasis has been placed herein on the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the disclosure. These and other changes in the preferred embodiments of the disclosure will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be implemented merely as illustrative of the disclosure and not as a limitation.
ADVANTAGES OF THE INVENTION
[00106] The present disclosure provides a system and a method that facilitates key information extraction from scanned images of retail invoices.
[00107] The present disclosure provides a system and a method that uses a neural network-based method for extraction of scanned images of retail invoices.
[00108] The present disclosure provides a system and a method that utilizes synthetic invoice generation to generate labelled synthetic documents from scanned images.
[00109] The present disclosure provides a system and a method that generates a lower cost when compared to continuous human labour needed to manually enter information for each attribute in the invoice.
[00110] The present disclosure provides a system and a method that easily operable as the setup only needs the system to scan the documents and upload in the portal.
[00111] The present disclosure provides a system and a method that is infinitely scalable as the process is completely automated and eliminates the need of skilled professionals and manual labour.
[00112] The present disclosure provides a system and a method that generates higher quality able to achieve human level invoice digitization with a high degree of certainty in a constant time for each document.
,CLAIMS:1. A system (108) for generating information from documents, the system (108) comprising:
a processor (202); and
a memory (204) operatively coupled with the processor (202), wherein said memory (204) stores instructions which, when executed by the processor (202), cause the processor (202) to:
receive a document from one or more computing devices (104), wherein the document comprises an invoice associated with one or more users (102);
extract one or more attributes from the document;
generate one or more images from the document based on the one or more attributes;
detect textual information from the one or more images;
generate one or more bounding boxes associated with the textual information;
detect a plurality of quick response (QR) codes from the document and extract the plurality of QR codes to generate document-related information; and
generate, via an artificial intelligence (AI) engine (110), one or more key-value pairs from the document, based on the one or more bounding boxes and the document-related information.
2. The system (108) as claimed in claim 1, wherein the processor (202) is to extract one or more tabular regions from the one or more images to generate at least one of: a cell, a row, a column, and a header associated with the one or more tabular regions.
3. The system (108) as claimed in claim 1, wherein the one or more attributes comprise at least one of: an invoice number, an invoice date, an invoice amount, a vendor goods and services tax identification number (GSTIN), a buyer GSTIN, and one or more order details.
4. The system (108) as claimed in claim 3, wherein the one or more order details comprise at least one of: a quantity, a maximum retail price (MRP), a description, and one or more codes.
5. The system (108) as claimed in claim 1, wherein the processor (202) is to determine if at least one QR code among the plurality of QR codes exist in the document, and in response to a positive determination, prioritize the at least one QR code among the plurality of QR codes for generating the one or more key-value pairs.
6. The system (108) as claimed in claim 5, wherein, in response to a negative determination, the processor (202) is to use the textual information from the one or more images for generating the one or more key-value pairs.
7. The system (108) as claimed in claim 1, wherein the processor (202) is to determine a location associated with the textual information and the plurality of QR codes to generate the one or more key-value pairs associated with the one or more bounding boxes, and the document-related information.
8. A method for generating information from documents, the method comprising:
receiving, by a processor (202) associated with a system (108), a document from one or more computing devices (104), wherein the document comprises an invoice associated with one or more users (102);
extracting, by the processor (202), one or more attributes from the document;
generating, by the processor (202), one or more images from the document based on the one or more attributes;
detecting, by the processor (202), textual information from the one or more images;
generating, by the processor (202), one or more bounding boxes associated with the textual information;
detecting, by the processor (202), a plurality of quick response (QR) codes from the document and extracting the plurality of QR codes to generate document-related information; and
generating, by the processor (202), via an artificial intelligence (AI) engine (110), one or more key-value pairs from the document, based on the one or more bounding boxes and the document-related information.
9. The method as claimed in claim 8, comprising extracting, by the processor (202), one or more tabular regions from the one or more images to generate at least one of: a cell, a row, a column, and a header associated with the one or more tabular regions.
10. The method as claimed in claim 8, wherein the one or more attributes comprise at least one of: an invoice number, an invoice date, an invoice amount, a vendor goods and services tax identification number (GSTIN), a buyer GSTIN, and one or more order details.
11. The method as claimed in claim 10, wherein the one or more order details comprise at least one of: a quantity, a maximum retail price (MRP), a description, and one or more codes.
12. The method as claimed in claim 8, comprising determining, by the processor (202), if at least one QR code among the plurality of QR codes exist in the document, and in response to a positive determination, the method comprises prioritizing the at least one QR code among the plurality of QR codes for generating the one or more key-value pairs.
13. The method as claimed in claim 12, comprising in response to a negative determination, generating the one or more key-value pairs using the textual information from the one or more images.
14. The method as claimed in claim 8, comprising determining, by the processor (202), a location associated with the textual information and the plurality of QR codes to generate the one or more key-value pairs associated with the one or more bounding boxes and the document-related information.
| # | Name | Date |
|---|---|---|
| 1 | 202221069213-STATEMENT OF UNDERTAKING (FORM 3) [30-11-2022(online)].pdf | 2022-11-30 |
| 2 | 202221069213-PROVISIONAL SPECIFICATION [30-11-2022(online)].pdf | 2022-11-30 |
| 3 | 202221069213-POWER OF AUTHORITY [30-11-2022(online)].pdf | 2022-11-30 |
| 4 | 202221069213-FORM 1 [30-11-2022(online)].pdf | 2022-11-30 |
| 5 | 202221069213-DRAWINGS [30-11-2022(online)].pdf | 2022-11-30 |
| 6 | 202221069213-DECLARATION OF INVENTORSHIP (FORM 5) [30-11-2022(online)].pdf | 2022-11-30 |
| 7 | 202221069213-ENDORSEMENT BY INVENTORS [30-11-2023(online)].pdf | 2023-11-30 |
| 8 | 202221069213-DRAWING [30-11-2023(online)].pdf | 2023-11-30 |
| 9 | 202221069213-CORRESPONDENCE-OTHERS [30-11-2023(online)].pdf | 2023-11-30 |
| 10 | 202221069213-COMPLETE SPECIFICATION [30-11-2023(online)].pdf | 2023-11-30 |
| 11 | 202221069213-FORM 18 [17-01-2024(online)].pdf | 2024-01-17 |
| 12 | 202221069213-FORM-8 [19-01-2024(online)].pdf | 2024-01-19 |
| 13 | Abstract1.jpg | 2024-03-07 |
| 14 | 202221069213-FER.pdf | 2025-08-04 |
| 15 | 202221069213-FORM 3 [04-11-2025(online)].pdf | 2025-11-04 |
| 1 | 202221069213_SearchStrategyNew_E_ExtensiveSearchhasbeencondutctedE_14-02-2025.pdf |