Abstract: Current approaches for high precision fine-grained visual extractions from charts are highly data intensive requiring thousands of annotated samples. Annotating a dataset and retraining for every new chart type with a shift in the spatial composition of chart elements and text role regions, legend preview styles, chart element shapes and text-role definitions, is a time-consuming and costly affair. Present disclosure provides method and system for performing fine granular visual extractions from charts. The system first uses dynamic filtering with text role patch as trigger to detect text regions belonging to same text role label. Thereafter, system uses attention mechanism to segment regions in chart that follows the same style as that of legend preview, which serves as query patch. In particular, the system facilitates detection of chart elements independent of their spatial compositions and shapes, style definitions of legend previews and text-role definitions, thereby making it generalizable to unseen charts.
Description:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
METHOD AND SYSTEM FOR PERFORMING FINE GRANULAR VISUAL EXTRACTIONS FROM CHARTS
Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The disclosure herein generally relates to data extraction, and, more particularly, to a method and a system for performing fine granular visual extractions from charts.
BACKGROUND
Chart is a graphical representation of data that is used for data visualization purposes. Elements of the chart needs to be interpreted to understand the underlying data. So, automated interpretation of elements of charts present in a document is desired to fully understand the document.
However, fine-grained perception capabilities required to interpret the elements of a chart are one of the main bottlenecks towards automated fact extraction from charts. Further, high precision detection of textual and visual elements present in documents, such as chart titles, labels, legend preview, bars, lines, dots, images etc., is critical, as errors in initial detection would cascade to downstream inference tasks leading to substantial discrepancies in the final conclusions, especially in case of numerical data.
Currently, information/data extraction techniques that are available for extracting chart information are highly data intensive thus requiring thousands of annotated samples for training purposes.
Additionally, charts are subjected to variations across different dimensions, such as spatial compositions of chart elements, legend preview styles, chart element types, chart shapes and the like. Due to these variations, the dataset needs to be annotated and associated system needs to be retrained for every new chart type, thereby making it a time-consuming and a costly affair.
SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one aspect, there is provided a processor implemented method for performing fine granular visual extractions from charts. The method comprises receiving, by a chart information extraction system (CIES) via one or more hardware processors, a document, the document comprising at least one chart; detecting, by the CIES via the one or more hardware processors, one or more text regions present in the at least one chart of the document using a text detection engine; determining, by the CIES via the one or more hardware processors, a text role label for each text region of the one or more text regions using a text region and role detection model, wherein the text role label comprises one of a chart title, a legend, an X-axis, a Y-axis, an X-axis tick label, and a Y-axis tick label; accessing, by the CIES via the one or more hardware processors, a location of the text role label identified as the legend, the location comprising one or more legend coordinates of the legend present in the at least one chart; extracting, by the CIES via the one or more hardware processors, one or more chart element styles that are present in the legend using the one or more legend coordinates of the legend; for each chart element style of the one or more chart element styles, performing: identifying, by the CIES via the one or more hardware processors, the chart element style as a query patch; identifying, by the CIES via the one or more hardware processors, one or more chart element regions that are identical to the query patch in the at least one chart using an attention mechanism; extracting, by the CIES via the one or more hardware processors, a chart element mask associated with each identified chart element region of the one or more chart element regions based, at least in part, on a respective chart element region and one or more element region coordinates of the associated chart element region, wherein the element region coordinates are accessed from the text region and role detection model; and storing, by the CIES via the one or more hardware processors, the extracted chart element mask for each identified chart element region in a mask repository, the mask repository comprising one or more chart element masks; and recreating, by the CIES via the one or more hardware processors, the at least one chart based, at least in part, the one or more chart element masks that are extracted for each identified image feature and the at least one chart.
In another aspect, there is provided a chart information extraction system for performing fine granular visual extractions from charts. The system comprises a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: receive a document, the document comprising at least one chart; detect one or more text regions present in the at least one chart of the document using a text detection engine; determine a text role label for each text region of the one or more text regions using a text region and role detection model, wherein the text role label comprises one of a chart title, a legend, an X-axis, a Y-axis, an X-axis tick label, and a Y-axis tick label; access a location of the text role label identified as the legend, the location comprising one or more legend coordinates of the legend present in the at least one chart; extract one or more chart element styles that are present in the legend using the one or more legend coordinates of the legend; for each chart element style of the one or more chart element styles, perform: identify the chart element style as a query patch; identify one or more chart element regions that are similar to the query patch in the at least one chart using an attention mechanism; extract a chart element mask associated with each identified chart element region of the one or more chart element regions based, at least in part, on a respective chart element region and one or more element region coordinates of the associated chart element region, wherein the element region coordinates are accessed from the text region and role detection model; store the extracted chart element mask for each identified chart element region in a mask repository, the mask repository comprising one or more chart element masks; and recreate the at least one chart based, at least in part, the one or more chart element masks that are extracted for each identified image feature and the at least one chart.
In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause a method for performing fine granular visual extractions from charts. The method comprises receiving, by a chart information extraction system (CIES), a document, the document comprising at least one chart; detecting, by the CIES, one or more text regions present in the at least one chart of the document using a text detection engine; determining, by the CIES, a text role label for each text region of the one or more text regions using a text region and role detection model, wherein the text role label comprises one of a chart title, a legend, an X-axis, a Y-axis, an X-axis tick label, and a Y-axis tick label; accessing, by the CIES, a location of the text role label identified as the legend, the location comprising one or more legend coordinates of the legend present in the at least one chart; extracting, by the CIES, one or more chart element styles that are present in the legend using the one or more legend coordinates of the legend; for each chart element style of the one or more chart element styles, performing: identifying, by the CIES, the chart element style as a query patch; identifying, by the CIES, one or more chart element regions that are identical to the query patch in the at least one chart using an attention mechanism; extracting, by the CIES, a chart element mask associated with each identified chart element region of the one or more chart element regions based, at least in part, on a respective chart element region and one or more element region coordinates of the associated chart element region, wherein the element region coordinates are accessed from the text region and role detection model; and storing, by the CIES, the extracted chart element mask for each identified chart element region in a mask repository, the mask repository comprising one or more chart element masks; and recreating, by the CIES, the at least one chart based, at least in part, the one or more chart element masks that are extracted for each identified image feature and the at least one chart.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG. 1 is an example representation of an environment, related to at least some example embodiments of the present disclosure.
FIG. 2 illustrates an exemplary block diagram of a system for performing fine granular visual extractions from charts, in accordance with an embodiment of the present disclosure.
FIG. 3 illustrates a schematic block diagram representation of an extraction process for performing fine granular visual extractions from a chart, in accordance with an embodiment of the present disclosure.
FIGS. 4A and 4B, collectively, illustrate an exemplary flow diagram of a method for performing fine granular visual extractions from charts, in accordance with an embodiment of the present disclosure.
FIG. 5 illustrates a schematic representation of a text region and role detection model for identifying text role labels for text regions, in accordance with an embodiment of the present disclosure.
FIG. 6 illustrates a schematic representation of a chart element extraction model for extracting chart element regions, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
Charts can be considered as compact visualization techniques that are frequently used for illustrating facts in documents, such as scientific and financial documents to summarize observations and draw conclusions about the underlying data. So, automated visual extractions from charts is required for complete and easy understanding the document.
As previously known, the chart elements are distributed across textual elements, such as the chart title, X/Y-axis labels, X/Y-ticks and tick labels, legend preview, legend labels, in addition to visual elements such as bars, lines, and dots. Such distribution of chart elements increases the complexity of the automated fine granular visual extraction process as the automated systems may require thousands of annotated samples.
As discussed earlier, annotating a dataset and retraining a system for every new chart type with a shift in the spatial composition of chart elements and text role regions, legend preview styles, chart element shapes and text-role definitions, is a time-consuming and costly affair.
Further, some techniques that are available for extracting chart information require charts to be constructed from a predefined grammar and some may use image processing for chart extractions but the simplistic chart images they tackle are synthetically generated from the FigureQA and CQAC1 datasets. In particular, available chart extraction systems either work on rule-based methods over carefully selected examples or uses deep model-based object detection algorithms in place of generic heuristics to extract chart elements. These system are able to achieve a maximum mean average precision (mAP) of 93.44% with 0.90 intersection over union (IOU) for visual element detection which is not considered good enough for consistent downstream extraction quality. Thus, the available techniques either require customized rules for chart images or require large data to train object detection models while lacking the required data accuracy.
Additionally, the available techniques only work on defined types of charts, such as pie charts, bar charts, etc., and they cannot be generalized to unseen type of charts as it is infeasible to create an annotated dataset that comprise samples for each new chart dimension/type.
So, a technique that can perform automated fine granular visual extraction from chart and can rapidly adapt to unseen chart distributions with only a few labeled examples is still to be explored.
Embodiments of the present disclosure overcome the above-mentioned disadvantages by providing a method and a system for performing fine granular visual extractions from charts. The system of the present disclosure first detects a text role label for each text region present in the chart using a text region and role detection model. The system then identifies the text role label identified as a legend as irrespective of the chart type, legend previews replicate the pattern representations within the chart. Thereafter, the system uses the legend preview as a query patch to identify chart element regions that are similar to the query patch using an attention mechanism. The identified chart element regions are further utilized to recreate the chart.
In the present disclosure, the system and the method use only legend preview to segment out all the matching chart element regions from the chart, thereby making it invariant to the spatial composition of text regions belonging to distinct roles and chart element shapes/styles which further allows for generalization to unseen charts sharing the same set of text role classes. Further, the system and the method only require small dataset for fine-tuning because of the generalization capabilities of the system, thereby eliminating the need of generating annotated samples which further reduces the time spent on training the system.
Referring now to the drawings, and more particularly to FIGS. 1 through 6, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 illustrates an exemplary representation of an environment 100 related to at least some example embodiments of the present disclosure. Although the environment 100 is presented in one arrangement, other embodiments may include the parts of the environment 100 (or other parts) arranged otherwise depending on, for example, extracting text bounding box coordinates of text regions, identifying text role labels for text regions, etc. The environment 100 generally includes an electronic device, such as an electronic device 102, and a chart information extraction system (hereinafter referred as ‘CIES’) 106, each coupled to, and in communication with (and/or with access to) a network 104. It should be noted that one electronic device is shown for the sake of explanation; there can be more number of electronic devices.
The network 104 may include, without limitation, a light fidelity (Li-Fi) network, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among two or more of the parts or users illustrated in FIG. 1, or any combination thereof.
Various entities in the environment 100 may connect to the network 104 in accordance with various wired and wireless communication protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), 2nd Generation (2G), 3rd Generation (3G), 4th Generation (4G), 5th Generation (5G) communication protocols, Long Term Evolution (LTE) communication protocols, or any combination thereof.
The electronic device 102 is associated with a user (e.g., a user or an entity such as an organization) who wants to extract information about data underlying in a document. Examples of the electronic device 102 include, but are not limited to, a personal computer (PC), a mobile phone, a tablet device, a Personal Digital Assistant (PDA), a server, a voice activated assistant, a smartphone, and a laptop.
The chart information extraction system (CIES) 106 includes one or more hardware processors and a memory. The CIES 106 is configured to perform one or more of the operations described herein. The CIES 106 is first configured to receive a document via the network 104 from the electronic device 102. The document includes at least one chart. Examples of the chart includes, but are not limited to, bar charts, stacked bar charts, line charts, dots charts, pie charts, donut charts, a boxplot, and the like.
The CIES 106 is then configured to detect text regions present in the at least one chart of the document using a text detection engine. Examples of the text detection engine includes, but are not limited to, a connectionist text proposal network(CTPN), efficient and accurate scene text detector (EAST), real-time scene text detection with differentiable binarization (DBNet), and a character-region awareness for text detection (CRAFT) and the like. Thereafter, the CIES 106 is configured to identify text role label for each text region of the one or more text regions using a text region and role detection model. Once the text role label for each text region is available, the CIES 106 is configured identify to at least one text region amongst the one or more text regions whose text role label is determined as the legend.
Thereafter, the CIES 106 uses legend coordinate information to obtain one or more chart element styles that are present in the legend. Then, the CIES 106 identifies each chart element style as a query patch to identify chart element regions that are identical to the respective query patch using an attention mechanism. The attention mechanism is explained in detail with reference to FIG. 3.
Further, the CIES 106 extract chart element mask associated with each identified chart element region that is further utilized by the CIES 106 to recreate the chart.
The number and arrangement of systems, devices, and/or networks shown in FIG. 1 are provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different systems, devices, and/or networks; and/or differently arranged systems, devices, and/or networks than those shown in FIG. 1. Furthermore, two or more systems or devices shown in FIG. 1 may be implemented within a single system or device, or a single system or device shown in FIG. 1 may be implemented as multiple, distributed systems or devices. Additionally, or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of the environment 100 may perform one or more functions described as being performed by another set of systems or another set of devices of the environment 100 (e.g., refer scenarios described above).
FIG. 2 illustrates an exemplary block diagram of a chart information extraction system 200 for performing fine granular visual extractions from charts, in accordance with an embodiment of the present disclosure. In an embodiment, the chart information extraction system 200 may also be referred as system 200 and may be interchangeably used herein. The chart information extraction system 200 is similar to the chart information extraction system 106 explained with reference to FIG. 1. In some embodiments, the system 200 is embodied as a cloud-based and/or SaaS-based (software as a service) architecture. In some embodiments, the system 200 may be implemented in a server system. In some embodiments, the system 200 may be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, and the like.
In an embodiment, the system 200 includes one or more processors 204, communication interface device(s) or input/output (I/O) interface(s) 206, and one or more data storage devices or memory 202 operatively coupled to the one or more processors 204. The one or more processors 204 may be one or more software processing modules and/or hardware processors. In an embodiment, the hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 200 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
The I/O interface device(s) 206 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
The memory 202 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment a database 208 can be stored in the memory 202, wherein the database 208 may comprise, but are not limited to, a pre-defined attention threshold, a similar chart element queue, a mask repository, and the like. The memory 202 further comprises (or may further comprise) information pertaining to input(s)/output(s) of each step performed by the systems and methods of the present disclosure. In other words, input(s) fed at each step and output(s) generated at each step are comprised in the memory 202 and can be utilized in further processing and analysis.
It is noted that the system 200 as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the present disclosure and, therefore, should not be taken to limit the scope of the present disclosure. It is noted that the system 200 may include fewer or more components than those depicted in FIG. 2.
FIG. 3, with reference to FIGS, 1-2, illustrates a schematic block diagram representation 300 of an extraction process associated with the system 200 of FIG. 2 or the CIES 106 of FIG. 1 for performing fine granular visual extractions from a chart, in accordance with an embodiment of the present disclosure.
As seen in FIG.3, the system 200 receives a document as an input. The document can be in an image format or a document format. The received document includes a chart. The system 200 then detects text regions present in the chart and obtain bounding box coordinates associated with each text region using a text detector. The text detector can be any text detector available in the art. In an example embodiment, CRAFT can be used as a text detector for detecting text regions. Thereafter, the system 200 identifies text role label for each detected text region using a text region and role detection model. Further, the location of a text role label identified as legend is accessed by the system 200 to obtain legend coordinates of the legend present in the chart. The legend coordinates are then used by the system 200 to obtain chart element styles that are present in the legend using a chart entity extraction model.
The chart element styles are further utilized by the chart entity extraction model as a query patch to obtain chart element masks. The process of obtaining chart element mask is explained in detail with reference to FIG. 4. The system 200 then uses the chart element masks and the input chart to recreate the chart.
FIGS. 4A and 4B, with reference to FIGS. 1 through 3, collectively, represent an exemplary flow diagram of a method 400 for performing fine granular visual extractions from charts, in accordance with an embodiment of the present disclosure. The method 400 may use the system 200 of FIG. 2 and chart information extraction system 106 of FIG. 1 for execution. In an embodiment, the system 200 comprises one or more data storage devices or the memory 202 operatively coupled to the one or more hardware processors 204 and is configured to store instructions for execution of steps of the method 400 by the one or more hardware processors 204. The sequence of steps of the flow diagram may not be necessarily executed in the same order as they are presented. Further, one or more steps may be grouped together and performed in form of a single step, or one step may have several sub-steps that may be performed in parallel or in sequential manner. The steps of the method of the present disclosure will now be explained with reference to the components of the system 200 as depicted in FIG. 2, and the chart information extraction system 106 of FIG. 1.
At step 402 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 receive a document. The document includes at least one chart. Examples of the document may include, but are not limited to, a financial document, a scientific document, a proposal document etc. In one embodiment, the system 200 may receive the document in form of an image file that includes a chart whose information need to be analyzed.
At step 404 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 detect one or more text regions present in the at least one chart of the document using a text detection engine. It should be noted that the text detection engine used here can be any text detection engine available in the art. In an embodiment, the CRAFT model is used for detecting text regions present in the at least one chart. In particular, the CRAFT model provides bounding box coordinates for each text region of the one or more text regions that are present in the at least one chart.
At step 406 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 determine a text role label for each text region of the one or more text regions using a text region and role detection model. Examples of the text role label includes, but are not limited to, a chart title, a legend, an X-axis, a Y-axis, an X-axis tick label, and a Y-axis tick label. The above step 406 is better understood by way of following description.
After receiving the one or more text regions that are detected by the text detection engine, the system 200 first performs pre-processing of each detected text region of the one or more text regions to obtain one or more pre-processed text regions. It should be noted that the pre-processing is performed by the text region and role detection model present in the system 200.
Generally, the text detection engine misses few isolated characters and often yields partial detection of text regions that are present in the at least one chart which creates problem during the interpretation phase as the extracted information might not be the correct information. So, to overcome that, the text region and role detection model first performs pre-processing of each detected text region to correct the partially detected text present in the respective text region. The corrected text region is referred as the pre-processed text region. Once the pre-processed text region is available corresponding to (or associated with) each detected text region, the text region and role detection model segments the pre-processed text regions from the at least one chart. It should be noted that the text region and role detection model is also correcting the partially detected text, thus eliminating problem of wrong (or incorrect) interpretation that can occur during the interpretation phase while making possible the fine granular extraction from the chart.
Once the one or more pre-processed text regions are segmented out from the at least one chart, the system 200 determines the text role label for each pre-processed text region of the one or more pre-processed text regions using the text region and role detection model.
In an embodiment, the text region and role detection model first identifies the text role label for a pre-processed text region based on one or more features of the respective pre-processed text region. The one or more features are features that are obtained from deep learning image encoding models, such as AlexNet, VGG ResNet, SqueezeNet, DenseNet, InceptionNet, GoogLeNet, ShuffleNet, MobileNet. Once the text role label is determined for the text region, the text region and role detection model detects at least one pre-processed text region of the one or more pre-processed text regions whose text role label is identical to the identified text role label using a dynamic kernel approach. In particular, the text region and role detection model detects another pre-processed text regions whose text role label can be similar to the determined text role label using the dynamic kernel approach. In an exemplary scenario, if a text role label for a text region is determined to be ‘X-axis’. Then, the text region and role detection model may detect other text regions present in the chart whose text role label can be ‘X-axis’.
In one embodiment, the dynamic kernel approach uses a patch based triggering mechanism in which if a text role label is provided as input, the text region and role detection model determines the other text regions that may belong to the same text role label. The architecture of the text region and role detection model is explained in detail with reference to FIG. 5.
Thereafter, once the at least one pre-processed text region belonging to the same text role label is identified, the text region and role detection model assigns the identified text role label to the at least one pre-processed text region. And the same process is repeated again until the text role label is assigned to each pre-processed text region of the one or more pre-processed text regions.
At step 408 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 access a location of the text role label identified as the legend. The above step 408 is better understood by way of following description.
As it is known that the legend previews replicate the pattern representations within the chart elements, the system 200 is built using the same approach. In an embodiment, once the text role label is assigned to each pre-processed text region, the system 200 first identifies at least one text region amongst the one or more text regions whose text role label is determined as the legend as chart element styles present in legend previews are replicated in chart element regions present in the at least one chart. Once the text role label provided as legend is identified, the system 200 access the location of the respective text role label i.e., the legend. The location includes one or more legend coordinates of the legend present in the at least one chart.
At step 410 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 extract one or more chart element styles that are present in the legend using the one or more legend coordinates of the legend. In an embodiment, the one or more legend coordinates are bounding box coordinates of the text region identified as the legend. Once the bounding box coordinates of the legend are known, a chart element extraction model present in the system 200 uses the bounding box coordinates to extract the one or more chart element styles that are present in the legend of the at least one chart. In at least one example embodiment, the chart element styles are different colours or patterns that are used in the chart for representing different categories. For example, in a bar chart representing comparison of population of 3 countries, an orange colour may be used to represent a ‘country A’, green for ‘country B’ and white for ‘country C’. So, the colours orange, green and white must be present in the legend provided in the bar chart and these colour may be considered as chart element styles.
At step 412 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 identify one or more chart element regions that are present in the at least one chart, and extract chart element masks for each chart element region by performing a plurality of steps 412a through 412d for each chart element style of the one or more chart element styles.
At step 412a of the present disclosure, the one or more hardware processors 204 of the system 200 identify a chart element style of the one or more chart element styles as a query patch. With reference to previously explained bar graph example, as there were three chart element styles i.e., the orange, green and white color, the chart element extraction model of the system 200 selects one chart element style say orange amongst the three available chart element style and consider it as a query patch. So, ‘orange colour’ is identified as the query patch at this step.
At step 412b of the present disclosure, the one or more hardware processors 204 of the system 200 identify one or more chart element regions that are identical to the query patch in the at least one chart using an attention mechanism. The above step 410b is better understood by way of following description.
As there can be a plurality of chart element regions that may be present on the at least one chart, the chart element extraction model of the system 200 first calculates an attention score between the query patch and each chart element region of one or more chart element regions using a measuring function. In an embodiment, a modified von Mises-Fisher (vMF) based measuring function is used for calculating the attention score. Thereafter, the chart element extraction model of the system 200 performs a comparison between the calculated attention score and a pre-defined attention threshold. In at least one example embodiment, the pre-defined attention threshold is accessed from the database 208. In particular, the system 200 is trying to verify whether the chart element region is matching with query patch or not with the help of the attention score. Thus, the attention scores help in finding the chart element regions that are matching with the query patch.
With reference to previously explained bar graph example, the system 200 identifies chart elements that are present in ‘orange colour’ in the chart as the orange colour’ is identified as the query patch.
And, based on the comparison, the chart element extraction model of the system 200 adds the chart element region that are similar to the query patch in a similar chart element queue maintained for the respective query patch. In an embodiment, the similar chart element queue includes the one or more chart element regions that are similar to the corresponding query patch.
Once the one or more chart element regions that are identical to the query patch are identified, the system 200 perform step 412c.
At step 412c of the present disclosure, the one or more hardware processors 204 of the system 200 extract a chart element mask associated with each identified chart element region of the one or more chart element regions based, at least in part, on a respective chart element region and one or more element region coordinates of the associated chart element region. The one or more element region coordinates are accessed from the text region and role detection model
As the bounding box coordinate information is already available with the text region and role detection model, the chart element extraction model of the system 200 uses the one or more element region coordinates of each identified chart element region to extract the chart element mask associated with the respective identified chart element region. So, a chart element mask associated with each chart element region is extracted here i.e., the chart element mask associated with each orange colour region in the chart is extracted at this step.
At step 412d of the present disclosure, the one or more hardware processors 204 of the system 200 store the extracted chart element mask for each identified chart element region in the mask repository. The mask repository includes one or more chart element masks extracted for one or more chart element regions.
The steps 412a to 412d are performed again until the one or more chart element masks associated with each chart element style of the one or more chart element styles are extracted.
In an embodiment, at step 414 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 recreate the at least one chart based, at least in part, the one or more chart element masks that are extracted for each identified chart element region and the at least one chart. A recreation module present in the system 200 use the fine-grained extracted information from the received chart and the received chart for recreating the chart.
FIG. 5, with reference to FIGS, 1 to 4A-4B, illustrates a schematic representation of the text region and role detection model associated with the system 200 of FIG. 2 or the CIES 106 of FIG. 1 for identifying text role labels for text regions, in accordance with an embodiment of the present disclosure.
As can be seen in the FIG. 5, the text region and role detection model includes three modules viz an encoder-decoder module, a trigger module and a controller module. In an embodiment, the encoder-decoder module is a U-net model in which the encoder has four down-sampling steps, each consisting of a convolution layer with a 3 × 3 kernel, followed by a rectified linear activation function (ReLU) activation and a batch normalization layer. The encoder-decoder module receives the document containing chart as an input from a user device (e.g., the user device 102). Suppose the encoder-decoder module receives the chart image x as input. The latent representation for an image x obtained from the U-net encoder is:
r = UNET_E (x)…….(1)
Where, r represents output of the U-net encoder.
Further, the U-net encoder filter out the irrelevant information and extracts features from the chart image x. The output o provided by the U-net decoder for the chart image x is:
o = UNET_D (r) ……….(2)
Thereafter, the chart image x is appended by the trigger patch p of a text region detected by the text detector. In particular, the trigger module takes chart image x highlighted with text patch as an input and extracts feature from the highlighted image to get the output t i.e.:
t = GAP(NN_T (x_p))……… (3)
Where, NN_T is a convolutional feature extractor followed by a global average pooling (GAP) layer.
Further, the controller module takes U-net encoder and the trigger module output to generate dynamic kernels that use the patch-based triggering mechanism for detecting text regions that may belong to a same text role label. In particular, the features of the trigger patch t are concatenated with the extracted encoder output features r, and are fed to the controller module to generate dynamic kernels k:
k = NN_c (GAP(r)||t)))……(4)
The dynamic kernels help in getting the text-role specific segmentation output s:
s = (((o ? k1) ? k2) ? k3)…….(5)
where, o represents the convolution layer, and k1,k2 and k3 represent weights of the dynamic kernels k distributed across the three convolutions.
In particular, the controller module uses the features from the trigger module and adapts the output image from the U-net decoder such that final output image s has specific text roles highlighted and corresponding text role label predicted.
FIG. 6, with reference to FIGS, 1 to 5, illustrates a schematic representation of the chart element extraction model associated with the system 200 of FIG. 2 or the CIES 106 of FIG. 1 for extracting chart element regions, in accordance with an embodiment of the present disclosure.
As can be seen in the FIG. 6, the one or more legend coordinates l_i of the legend present in the chart image x are scaled down to a spatial size of the chart feature map f. Thereafter, the legend region feature extractor module takes the mean coordinates of the bounding box of l_i and the features corresponding to these coordinates in f to generate the extracted features fl_i for the legend preview l_i. In particular, the legend region feature extractor module extracts features which corresponds to (or is associated with) a particular legend which is defined on coordinate (li). Thus, from the features of complete image f, only selected features are taken.
Further, a legend attention module takes complete image features, f and the image features corresponding to legend i.e., the extracted features fl_i as input to generate attention scores Sl_i for the entire image regions. The attention scores are generated to verify if the extracted features fl_i matches with rest of the image features f. In an embodiment, the attention score ranges from 0 to 1.
Additionally, a recreation module takes the extracted features fl_i and the attention scores Sl_i as input to generate output features Ol_i after multiplying the above two features. In particular, the attention scores generated by the attention module are used to manifest the legend patterns across the chart image which justifies the regions where corresponding chart entities are present. Thus, the chart element regions that are identical to the legend features are extracted.
Finally, the output features Ol_i is passed through a decoder that generates the corresponding chart element mask and the reconstructed image.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
As discussed earlier, annotating a dataset, and retraining the system for every new chart type with a shift in the spatial composition of chart elements and text role regions, legend preview styles, chart element shapes and text-role definitions, is a time-consuming and costly affair. The available techniques only work on defined types of charts, and they cannot be generalized to unseen type of charts as it is infeasible to create an annotated dataset that comprise samples for each new chart dimension/type. So, to overcome the disadvantages, embodiments of the present disclosure provide a method and a system for performing fine granular visual extractions from charts. More specifically, the system uses the legend preview as query patch to segment chart element regions that are similar to the legend previews, thereby eliminating the need of creating the annotated dataset for training purposes as the legend previews are known to replicate the chart patterns in the chart. The chart element extraction model and the text region and role detection model used in the system make the system invariant to the spatial composition of text regions belonging to distinct roles, and chart element shapes/styles, thereby allowing generalization to unseen charts sharing the same set of text roles classes.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
, Claims:
1. A processor implemented method, comprising:
receiving, by a chart information extraction system (CIES) via one or more hardware processors, a document, the document comprising at least one chart (402);
detecting, by the CIES via the one or more hardware processors, one or more text regions present in the at least one chart of the document using a text detection engine (404);
determining, by the CIES via the one or more hardware processors, a text role label for each text region of the one or more text regions using a text region and role detection model, wherein the text role label comprises one of a chart title, a legend, an X-axis, a Y-axis, an X-axis tick label, and a Y-axis tick label (406);
accessing, by the CIES via the one or more hardware processors, a location of the text role label identified as the legend, the location comprising one or more legend coordinates of the legend present in the at least one chart (408);
extracting, by the CIES via the one or more hardware processors, one or more chart element styles that are present in the legend using the one or more legend coordinates of the legend (410);
for each chart element style of the one or more chart element styles, performing (412):
identifying, by the CIES via the one or more hardware processors, the chart element style as a query patch (412a);
identifying, by the CIES via the one or more hardware processors, one or more chart element regions that are identical to the query patch in the at least one chart using an attention mechanism (412b);
extracting, by the CIES via the one or more hardware processors, a chart element mask associated with each identified chart element region of the one or more chart element regions based, at least in part, on a respective chart element region and one or more element region coordinates of the associated chart element region, wherein the element region coordinates are accessed from the text region and role detection model (412c); and
storing, by the CIES via the one or more hardware processors, the extracted chart element mask for each identified chart element region in a mask repository, the mask repository comprising one or more chart element masks (412d); and
recreating, by the CIES via the one or more hardware processors, the at least one chart based, at least in part, the one or more chart element masks that are extracted for each identified chart element region and the at least one chart (414).
2. The processor implemented method as claimed in claim 1, wherein the step of accessing, by the CIES via the one or more hardware processors, the location of the text role label identified as the legend is preceded by:
identifying, by the CIES via the one or more hardware processors, at least one text region amongst the one or more text regions whose text role label is determined as the legend.
3. The processor implemented method as claimed in claim 1, wherein the attention mechanism comprises:
for each chart element region of a plurality of chart element regions that are present in the at least one chart of the document, performing:
calculating, by the CIES via the one or more hardware processors, an attention score between the query patch and the respective chart element region using a measuring function;
performing, by the CIES via the one or more hardware processors, a comparison of the calculated attention score and a pre-defined attention threshold; and
adding, by the CIES via the one or more hardware processors, the chart element region in a similar chart element queue maintained for the respective query patch based on the comparison, wherein the similar chart element queue comprises the one or more chart element regions that are similar to the corresponding query patch.
4. The processor implemented method as claimed in claim 1, wherein the step of determining, by the CIES via the one or more hardware processors, the text role label for each text region of the one or more text regions using the text region and role detection model is preceded by:
performing, by CIES via the one or more hardware processors, pre-processing of each detected text region of the one or more text regions to obtain one or more pre-processed text regions; and
segmenting, by CIES via the one or more hardware processors, the pre-processed text regions from the at least one chart.
5. The processor implemented method as claimed in claim 4, wherein the step of determining, by the CIES via the one or more hardware processors, the text role label for each text region of the one or more text regions using the text region and role detection model comprises:
for each pre-processed text region of the one or more pre-processed text regions, performing:
identifying, by the CIES via the one or more hardware processors, the text role label for the pre-processed text region based on one or more features of the respective pre-processed text region;
detecting, by the CIES via the one or more hardware processors, at least one pre-processed text region of the one or more pre-processed text regions whose text role label is identical to the identified text role label using a dynamic kernel approach; and
assigning, by the CIES via the one or more hardware processors, the identified text role label to the at least one pre-processed text region,
until the text role label is assigned to each pre-processed text region of the one or more pre-processed text regions.
6. A chart information extraction system (CIES) (200), comprising:
a memory (202) storing instructions;
one or more communication interfaces (206); and
one or more hardware processors (204) coupled to the memory (202) via the one or more communication interfaces (206), wherein the one or more hardware processors (204) are configured by the instructions to:
receive a document, the document comprising at least one chart;
detect one or more text regions present in the at least one chart of the document using a text detection engine;
determine a text role label for each text region of the one or more text regions using a text region and role detection model, wherein the text role label comprises one of a chart title, a legend, an X-axis, a Y-axis, an X-axis tick label, and a Y-axis tick label;
access a location of the text role label identified as the legend, the location comprising one or more legend coordinates of the legend present in the at least one chart;
extract one or more chart element styles that are present in the legend using the one or more legend coordinates of the legend;
for each chart element style of the one or more chart element styles, perform:
identify the chart element style as a query patch;
identify one or more chart element regions that are similar to the query patch in the at least one chart using an attention mechanism;
extract a chart element mask associated with each identified chart element region of the one or more chart element regions based, at least in part, on a respective chart element region and one or more element region coordinates of the associated chart element region, wherein the element region coordinates are accessed from the text region and role detection model; and
store the extracted chart element mask for each identified chart element region in a mask repository, the mask repository comprising one or more chart element masks; and
recreate the at least one chart based, at least in part, the one or more chart element masks that are extracted for each identified chart element region and the at least one chart.
7. The system as claimed in claim 6, wherein the step of accessing the location of the text role label identified as the legend is preceded by:
identify at least one text region amongst the one or more text regions whose text role label is determined as the legend.
8. The system as claimed in claim 6, wherein the attention mechanism comprises:
for each chart element region of a plurality of chart element regions that are present in the at least one chart of the document, perform:
calculate an attention score between the query patch and the respective chart element region using a measuring function;
performing, by the CIES via the one or more hardware processors, a comparison of the calculated attention score and a pre-defined attention threshold; and
adding, by the CIES via the one or more hardware processors, the chart element region in a similar chart element queue maintained for the respective query patch based on the comparison, wherein the similar chart element queue comprises the one or more chart element regions that are similar to the corresponding query patch.
9. The system as claimed in claim 6, wherein the step of determining the text role label for each text region of the one or more text regions using the text region and role detection model is preceded by:
perform pre-processing of each detected text region of the one or more text regions to obtain one or more pre-processed text regions; and
segment the pre-processed text regions from the at least one chart.
10. The system as claimed in claim 9, wherein the step of determining the text role label for each text region of the one or more text regions using the text region and role detection model comprises:
for each pre-processed text region of the one or more pre-processed text regions, perform:
identify the text role label for the pre-processed text region based on one or more features of the respective pre-processed text region;
detect at least one pre-processed text region of the one or more pre-processed text regions whose text role label is identical to the identified text role label using a dynamic kernel approach; and
assign the identified text role label to the at least one pre-processed text region,
until the text role label is assigned to each pre-processed text region of the one or more pre-processed text regions.
| # | Name | Date |
|---|---|---|
| 1 | 202321003164-STATEMENT OF UNDERTAKING (FORM 3) [16-01-2023(online)].pdf | 2023-01-16 |
| 2 | 202321003164-REQUEST FOR EXAMINATION (FORM-18) [16-01-2023(online)].pdf | 2023-01-16 |
| 3 | 202321003164-FORM 18 [16-01-2023(online)].pdf | 2023-01-16 |
| 4 | 202321003164-FORM 1 [16-01-2023(online)].pdf | 2023-01-16 |
| 5 | 202321003164-FIGURE OF ABSTRACT [16-01-2023(online)].pdf | 2023-01-16 |
| 6 | 202321003164-DRAWINGS [16-01-2023(online)].pdf | 2023-01-16 |
| 7 | 202321003164-DECLARATION OF INVENTORSHIP (FORM 5) [16-01-2023(online)].pdf | 2023-01-16 |
| 8 | 202321003164-COMPLETE SPECIFICATION [16-01-2023(online)].pdf | 2023-01-16 |
| 9 | 202321003164-FORM-26 [14-02-2023(online)].pdf | 2023-02-14 |
| 10 | Abstract1.jpg | 2023-03-13 |
| 11 | 202321003164-Proof of Right [16-03-2023(online)].pdf | 2023-03-16 |
| 12 | 202321003164-FER.pdf | 2025-08-05 |
| 13 | 202321003164-FORM 3 [01-10-2025(online)].pdf | 2025-10-01 |
| 1 | 202321003164_SearchStrategyNew_E_Searchreport-202321003164E_27-03-2025.pdf |