Abstract: State of the art graph data extraction approaches require user(s) to manually perform various activities associated with data extraction from graph images. However, the manual intervention maybe prone to human errors, and quality of results of the data extraction depends on experience, and skill of the users performing the data extraction. This may lead to inconsistencies from quality perspective. The disclosure herein generally relates to graph image processing, and, more particularly, to a method and system for data extraction from graph images. The system generates a set of scaling coefficients and a set of numerical data for a graph image collected as input, and then extracts information corresponding to the graph image, by performing mapping between the set of numerical data and the set of scaling coefficients.
Description:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
METHOD AND SYSTEM FOR DATA EXTRACTION FROM GRAPH IMAGES
Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
[001] The disclosure herein generally relates to graph image processing, and, more particularly, to a method and system for data extraction from graph images.
BACKGROUND
[002] Manufacturing industry does extensive testing on various parts and products for quality control using various sensors which results in large amount of data. The data mostly contains plots, majority of which are line plots which do not have a digital version. Test engineers have to manually collect data from these plot images and log data for further analysis. Digitizing the plot data in images reduces the human effort, speeds up the process involved and improves accuracy. State of the art approaches require user(s) to manually perform various activities associated with data extraction from graph images. However, the manual intervention maybe prone to human errors, and quality of results of the data extraction depends on experience, and skill of the users performing the data extraction.
SUMMARY
[003] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor implemented method is provided. The method involves the following steps. Initially, a) a graph image, b) information on number of data points to be sampled, c) information on number of line plots within the graph image, and d) information on types of grids, are received as input data. The graph image is then pre-processed via the one or more hardware processors, to generate a binary image. Further, a first intermediate data is generated based on the binary image, via the one or more hardware processors. Generating the first intermediate data includes the following steps. Initially an x-portion image and a y-portion image are generated from the binary image, by performing an axis detection. Further, a plurality of grid positions are obtained from the binary image after separating the x-portion image and the y-portion image, by performing a grid detection based on the information on types of grids, wherein a plurality of grids at the plurality of grid positions are removed. Further, a set of scaling coefficients is generated as the first intermediate data by performing a scale detection on the x-portion image and the y-portion image after removing the grids. A second intermediate data is generated via the one or more hardware processors, based on the graph image. Generating the second intermediate data includes the following steps. Initially, it is determined whether the graph image comprises of a single plot or multiple plots, based on the information on number of line plots within the graph image. Further, a plurality of grids in the graph image are removed, based on the obtained plurality of grid positions of the binary image. Further, a plurality of plots in the graph image are isolated if the graph image is determined as comprising multiple plots. Further, based on the information on number of data points to be sampled, each plot in the graph image is sampled to generate a set of numerical data as the second intermediate data, wherein the set of numerical data represents actual data comprised in the graph image. Further, information corresponding to the graph image is extracted via the one or more hardware processors, by performing mapping between the set of numerical data in the second intermediate data and the set of scaling coefficients in the first intermediate data, wherein performing the mapping comprises converting the numerical data by using the set of scaling coefficients, wherein the converted numerical data is stored in a dictionary format.
[004] In another aspect, a system is provided. The system includes one or more hardware processors, a communication interface, and a memory storing a plurality of instructions, wherein the plurality of instructions cause the one or more hardware processors to receive a) a graph image, b) information on number of data points to be sampled, c) information on number of line plots within the graph image, and d) information on types of grids, are received as input data. The graph image is then pre-processed via the one or more hardware processors, to generate a binary image. Further, a first intermediate data is generated based on the binary image, via the one or more hardware processors. Generating the first intermediate data includes the following steps. Initially an x-portion image and a y-portion image are generated from the binary image, by performing an axis detection. Further, a plurality of grid positions are obtained from the binary image after separating the x-portion image and the y-portion image, by performing a grid detection based on the information on types of grids, wherein a plurality of grids at the plurality of grid positions are removed. Further, a set of scaling coefficients is generated as the first intermediate data by performing a scale detection on the x-portion image and the y-portion image after removing the grids. A second intermediate data is generated via the one or more hardware processors, based on the graph image. Generating the second intermediate data includes the following steps. Initially, it is determined whether the graph image comprises of a single plot or multiple plots, based on the information on number of line plots within the graph image. Further, a plurality of grids in the graph image are removed, based on the obtained plurality of grid positions of the binary image. Further, a plurality of plots in the graph image are isolated if the graph image is determined as comprising multiple plots. Further, based on the information on number of data points to be sampled, each plot in the graph image is sampled to generate a set of numerical data as the second intermediate data, wherein the set of numerical data represents actual data comprised in the graph image. Further, information corresponding to the graph image is extracted via the one or more hardware processors, by performing mapping between the set of numerical data in the second intermediate data and the set of scaling coefficients in the first intermediate data, wherein performing the mapping comprises converting the numerical data by using the set of scaling coefficients, wherein the converted numerical data is stored in a dictionary format.
[005] In yet another aspect, a non-transitory computer readable medium is provided. The non-transitory computer readable medium includes a plurality of instructions which when executed, cause one or more hardware processors to perform the following steps for extracting data from graph image. Initially, a) a graph image, b) information on number of data points to be sampled, c) information on number of line plots within the graph image, and d) information on types of grids, are received as input data. The graph image is then pre-processed via the one or more hardware processors, to generate a binary image. Further, a first intermediate data is generated based on the binary image, via the one or more hardware processors. Generating the first intermediate data includes the following steps. Initially an x-portion image and a y-portion image are generated from the binary image, by performing an axis detection. Further, a plurality of grid positions are obtained from the binary image after separating the x-portion image and the y-portion image, by performing a grid detection based on the information on types of grids, wherein a plurality of grids at the plurality of grid positions are removed. Further, a set of scaling coefficients is generated as the first intermediate data by performing a scale detection on the x-portion image and the y-portion image after removing the grids. A second intermediate data is generated via the one or more hardware processors, based on the graph image. Generating the second intermediate data includes the following steps. Initially, it is determined whether the graph image comprises of a single plot or multiple plots, based on the information on number of line plots within the graph image. Further, a plurality of grids in the graph image are removed, based on the obtained plurality of grid positions of the binary image. Further, a plurality of plots in the graph image are isolated if the graph image is determined as comprising multiple plots. Further, based on the information on number of data points to be sampled, each plot in the graph image is sampled to generate a set of numerical data as the second intermediate data, wherein the set of numerical data represents actual data comprised in the graph image. Further, information corresponding to the graph image is extracted via the one or more hardware processors, by performing mapping between the set of numerical data in the second intermediate data and the set of scaling coefficients in the first intermediate data, wherein performing the mapping comprises converting the numerical data by using the set of scaling coefficients, wherein the converted numerical data is stored in a dictionary format.
[006] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[007] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[008] FIG. 1 illustrates an exemplary system for extracting data from graph images, according to some embodiments of the present disclosure.
[009] FIG. 2 is a flow diagram depicting steps involved in the process of extracting data from graph images, using the system of FIG. 1, according to some embodiments of the present disclosure.
[010] FIG. 3 is a flow diagram depicting steps involved in the process of generating a first intermediate data, using the system of FIG. 1, according to some embodiments of the present disclosure.
[011] FIG. 4 is a flow diagram depicting steps involved in the process of generating a second intermediate data, using the system of FIG. 1, according to some embodiments of the present disclosure.
[012] FIG. 5 is a flow diagram depicting steps involved in the process of performing an axis detection by the system of FIG. 1, according to some embodiments of the present disclosure.
[013] FIG. 6 is a flow diagram depicting steps involved in the process of performing a scale detection by the system of FIG. 1, according to some embodiments of the present disclosure.
[014] FIG. 7 is a flow diagram depicting steps involved in the process of isolating a plurality of plots by processing the graph image, using the system of FIG. 1, according to some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
[015] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[016] State of the art graph data extraction approaches require the user to manually perform various activities associated with data extraction from graph images. However, the manual intervention maybe prone to human errors, and quality of results of the data extraction depends on experience, and skill of the users performing the data extraction. This may lead to inconsistencies from quality perspective.
[017] System and method disclosed in the embodiments herein provide means for data extraction from graph images. The system generates a set of scaling coefficients and a set of numerical data for a graph image collected as input, and then extracts information corresponding to the graph image, by performing mapping between the set of numerical data and the set of scaling coefficients.
[018] Referring now to the drawings, and more particularly to FIG. 1 through FIG. 7, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[019] FIG. 1 illustrates an exemplary system for extracting data from graph images, according to some embodiments of the present disclosure. The system 100 includes or is otherwise in communication with hardware processors 102, at least one memory such as a memory 104, an I/O interface 112. The hardware processors 102, memory 104, and the Input /Output (I/O) interface 112 may be coupled by a system bus such as a system bus 108 or a similar mechanism. In an embodiment, the hardware processors 102 can be one or more hardware processors.
[020] The I/O interface 112 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 112 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a printer and the like. Further, the I/O interface 112 may enable the system 100 to communicate with other devices, such as web servers, and external databases.
[021] The I/O interface 112 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. For the purpose, the I/O interface 112 may include one or more ports for connecting several computing systems with one another or to another server computer. The I/O interface 112 may include one or more ports for connecting several devices to one another or to another server.
[022] The one or more hardware processors 102 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, node machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 102 is configured to fetch and execute computer-readable instructions stored in the memory 104.
[023] The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 104 includes a plurality of modules 106.
[024] The plurality of modules 106 include programs or coded instructions that supplement applications or functions performed by the system 100 for executing different steps involved in the process of data extraction from graph images, being performed by the system 100. The plurality of modules 106, amongst other things, can include routines, programs, objects, components, and data structures, which performs particular tasks or implement particular abstract data types. The plurality of modules 106 may also be used as, signal processor(s), node machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the plurality of modules 106 can be used by hardware, by computer-readable instructions executed by the one or more hardware processors 102, or by a combination thereof. The plurality of modules 106 can include various sub-modules (not shown). The plurality of modules 106 may include computer-readable instructions that supplement applications or functions performed by the system 100 for the data extraction from graph images.
[025] The data repository (or repository) 110 may include a plurality of abstracted piece of code for refinement and data that is processed, received, or generated as a result of the execution of the plurality of modules in the module(s) 106.
[026] Although the data repository 110 is shown internal to the system 100, it will be noted that, in alternate embodiments, the data repository 110 can also be implemented external to the system 100, where the data repository 110 may be stored within a database (repository 110) communicatively coupled to the system 100. The data contained within such external database may be periodically updated. For example, new data may be added into the database (not shown in FIG. 1) and/or existing data may be modified and/or non-useful data may be deleted from the database. In one example, the data may be stored in an external system, such as a Lightweight Directory Access Protocol (LDAP) directory and a Relational Database Management System (RDBMS). Functions of the components of the system 100 are now explained with reference to the steps in flow diagrams in FIG. 2 through FIG. 7.
[027] FIG. 2 is a flow diagram depicting steps involved in the process of extracting data from graph images, using the system of FIG. 1, according to some embodiments of the present disclosure. Steps in the method 200 are explained with reference to the components of the system 100. In an embodiment, the system 100 comprises one or more data storage devices or the memory 104 operatively coupled to the processor(s) 102 and is configured to store instructions for execution of steps of the method 200 by the processor(s) or one or more hardware processors 102. The steps of the method 200 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG. 1 and the steps of flow diagram as depicted in FIG. 2 through FIG. 7. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods, and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
[028] At step 202 of the method 200, the system 100 receives via one or more hardware processors, a) a graph image, b) information on number of data points to be sampled, c) information on number of line plots within the graph image, and d) information on types of grids, as input data. Further, at step 204 of the method 200, the system 100 pre-processes via the one or more hardware processors, the graph image to generate a binary image. The binary image contains representation of the graph image in the form of binary data i.e., 0s and 1s.
[029] Further, at step 206 of the method 200, the system 100 generates via the one or more hardware processors, a set of scaling coefficients as a first intermediate data. Various steps involved in the process of generating the first intermediate data are depicted in method 300 in FIG. 3, and are explained hereafter. At step 206a of the method 300, the system 100 generates an x-portion image and a y-portion image from the binary image, by performing an axis detection. Various steps involved in the process of axis detection are depicted in method 500 of FIG. 5 and are explained hereafter.
[030] During the axis detection, at step 302 of the method 500, the system 100 identifies a vertical axis in the binary image by scanning the binary image. At this step, the system 100 skips a pre-configured portion from the right side of the binary image, and scans up to a certain percentage, say 25%, from the right edge of the image, till a thick line is detected within scanned space of the binary image. The system 100 may use a suitable approach such as max continuous ones, as scanning approach. Max continuous ones algorithm returns the longest sequence of continuous ones from a set of sequences. As the axes would be the longest line, this algorithm is suitable. Instead of searching for a sequence of ones, the max continuous ones algorithm checks for a longest sequence of 255s in the binary image. The max continuous ones algorithm scans along the vertical axes for the longest sequence of 255 for each pixel in the horizontal. The sequence with largest length is taken as the vertical axis. Once the longest sequence is found the values of the pixels have the same x coordinate for different y coordinates. The x values are stored as the x intercept for the vertical axis and the minimum and maximum y values become the starting and end points of the vertical axis.
[031] Further, at step 304 of the method 500, the system 100 identifies a horizontal axis in the binary image by scanning the binary image. At this step the system 100 finds the horizontal axis by performing the scanning approach similar to that in step 302, but along horizontal direction, keeping search space as the entire binary image to accommodate change in position of the horizontal axis.
[032] Further, at step 306 of the method 500, the system 100 determines portion of the binary image from left side of the binary image to the vertical axis as the y-portion image, and further, at step 308 of the method 500, the system 100 determines a portion of the binary image from the horizontal axis to bottom of the binary image as the x-portion.
[033] Referring to the step 206, at step 206b the system 100 obtains a plurality of grid positions from the binary image after separating the x-portion image and the y-portion image, by performing a grid detection based on the information on types of grids, and the plurality of grids at the plurality of grid positions are removed. Various steps involved in the process of grid detection are explained below.
[034] The input data may indicate that the type of grid is one of a continuous line grids type or a dashed grids type. For continuous line grids type its assumed that the grids must have a length which is almost the length of the data portion of image. The system 100 uses a line detection mechanism similar to the axis detection algorithm which tries to find continuous sequence of 255s along horizontal at each vertical pixel. Continuous lines with length greater than a certain value (for example, 70% of horizontal axis length) are logged as a grid. There might be cases where there may be small breakages in grid. For taking care of such cases a small threshold is taken as breakage length. If breakage length is greater than the set threshold the entire line is removed. For detecting the vertical grids, the system 100 rotates the binary image by 90 degrees and then applies the line detection mechanism. The above processes ensure that straight lines within the actual plot data is not removed. During this process, the system 100 detects starting and end points of each grid line and creates a straight line connecting the two points, and further creates a mask for vertical and horizontal axis.
[035] For dashed grids type, the system 100 may use an appropriate technique such as island detection. The island detection is used to find isolated groups of the same values. The grids form isolated portions of 255s of almost similar lengths. The system uses depth first search methodology to quickly arrive at solution. The system 100 finds short sequences of 255s in the binary image along the horizontal whose lengths are above a first threshold and below a second threshold (for example, 2 and 10 pixels respectively). Once the grid along the horizontal is located along the vertical direction masks of continuous lines are formed from start to end of each grid line. For detecting the grids along vertical axis, the system 100 rotates the image by 90 degrees and applies the island detection approach. The masks are images having the size of data image and having lines at locations of grids. The masks are then used to do bitwise and on the data image to remove the grids.
[036] Referring to the step 206, at step 206c, the system 100 generates the set of scaling coefficients as the first intermediate data, by performing a scale detection on the x-portion image and the y-portion image after removing the grids. Various steps involved in the process of generating the set of scaling coefficients by performing the scaling detection on the x-portion image and the y-portion image are depicted in method 400 in FIG. 6 and are explained hereafter.
[037] At step 402 of the method 600, the system 100 detects value and position of a plurality of numerical markings on the horizontal axis and the vertical axis for the x-portion image and the y-portion image. In an embodiment, the system 100 may use any suitable technique such as but not limited to Optical Character Recognition (OCR) to detect the value and position of the plurality of numerical markings on the x-axis and y-axis. The OCR used can either be local OCR using Tesseract or using any OCR APIs. Further, at step 404 of the method 600, the system 100 determines distance between two closest numerical markings from among the plurality of numerical markings, based on the detected value and position of the plurality of numerical markings. Further, at step 406 of the method 600, the system 100 divides the determined distance between the two closest numerical markings by a pixel distance along the vertical axis to generate a y-scale coefficient. Further, at step 408 of the method 600, the system 100 divides the determined distance between the two closest numerical markings by a pixel distance along the horizontal axis to generate a x-scale coefficient, wherein the x-scale coefficient and the y-scale coefficient forming the set of scaling coefficients. During the process of generating the set of scaling coefficients, the system 100 is configured to find at least 3 points within the binary image as an average based operation is used to find the distance between the 2 closest points. The difference between values at 2 the closest points divided by the pixel distance along vertical or y-axis or the pixel distance along the horizontal or x-axis forms the scaling coefficients.
[038] Referring to method 200, at step 208 of the method 200, the system 100 generates, via the one or more hardware processors, a second intermediate data. Steps involved in the process of generating the second intermediate data are depicted in method 400 in FIG. 4 and are explained hereafter. At step 208a of the method 400, the system 100 determines whether the graph image includes a single plot or multiple plots, based on the information on number of line plots within the graph image. If the graph image includes only a single plot, then the graph image is processed directly at step 208d. If the graph image includes multiple grids, then at step 208b, the system 100 removes a plurality of grids in the graph image, based on the obtained plurality of grid positions of the binary image. Further, at step 208c, the system 100 isolates a plurality of plots in the graph image. Various steps involved in the process of isolating the plurality of plots are depicted in method 700 in FIG. 7 and are explained hereafter. At step 502 of the method 700, the system 100 generates an enhanced image corresponding to the graph image, by processing the graph image using one or more image processing techniques. The processing technique maybe any suitable technique, such as but not limited to Contrast Limited AHE (CLAHE). Further, the enhanced image is converted to a Blue-Green-Red (BGR) image, at step 504 of the method 700. Further, at step 506 of the method 700, the system 100 applies a K-means clustering on the BGR image to generate a plurality of clusters, wherein number of clusters during the clustering is selected as equal to number of plots in the BGR image. Further, at step 508 of the method 700, the system 100 replaces values in each of the plurality of clusters with data from centre of respective clusters, wherein a resulting image obtained after replacing the values in each of the plurality of clusters with data from centre of respective clusters comprises of line plots with number of colors same as that of number of plots. Further, at step 510 of the method 700, the system 100 converts the resulting image to a grayscale image, and then at step 512 of the method 700, the system 100 applies a contour detection on the grayscale image to isolate the plurality of plots.
[039] At step 208d, the system 100 samples, based on the information on number of data points to be sampled, each plot in the graph image (i.e. the single plot if the graph image includes only a single plot, and each of the isolated plots if the graph image includes multiple plots) to generate a set of numerical data as the second intermediate data, wherein the second intermediate data represents actual data comprised in the graph image. The set of numerical data forms the second intermediate data. A sampling algorithm used by the system 100 may initially convert the graph image to an intermediate format that represents the actual data in the graph image. The sampling algorithm then detects independent value axis out of the two-axis present. The algorithm may consider the axis with greatest length amongst the two axes as the independent axis and sampling is done with respect to that. The number of samples input by user is used by the algorithm to find the sampling interval by dividing the length of independent axis by the number of samples. The sampling algorithm tries to find the intercept of the plot to the other axis at each point in the independent axis. The points along the independent axis are taken at the sampling intervals. For independent axis along the horizontal, the sampling algorithm takes a sample by finding the first pixel along the vertical with value 255, the pixel location at this point is taken as minimum and the pixel location at the location further up the vertical axis at which the value transitions from 255 to 0 is taken as maximum. The sampling algorithm either takes the maximum or minimum values one after the other for consecutive samples. The samples returned from the sampling algorithm are in pixel coordinates. This information is converted to actual numerical data by rescaling the values. A re-scaler algorithm maybe used for this purpose, by the system 100.
[040] Referring to the method 200, at step 210 of the method 200, the system 100 extracts, via the one or more hardware processors, information corresponding to the graph image, by performing mapping between the set of numerical data in the second intermediate data and the set of scaling coefficients in the first intermediate data, wherein performing the mapping comprises converting numerical data in the set of numerical data by using the set of scaling coefficients. At this step, the x and y coordinates are multiplied by the scaling coefficients to obtain the actual values, which is then stored in a dictionary format.
[041] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[042] The embodiments of present disclosure herein address unresolved problem of data extraction from graph images. The embodiment thus provides a method and system for data extraction from graph images.
[043] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
[044] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[045] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[046] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[047] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
, Claims:
1. A processor implemented method (200), comprising:
receiving (202), via one or more hardware processors, a) a graph image, b) information on number of data points to be sampled, c) information on number of line plots within the graph image, and d) information on types of grids in the graph image, as input data;
pre-processing (204), via the one or more hardware processors, the graph image to generate a binary image;
generating (206), via the one or more hardware processors, a first intermediate data based on the binary image, wherein generating the first intermediate data comprising:
generating (206a) an x-portion image and a y-portion image from the binary image, by performing an axis detection;
obtaining (206b) a plurality of grid positions from the binary image after separating the x-portion image and the y-portion image, by performing a grid detection based on the information on types of grids, wherein a plurality of grids at the plurality of grid positions are removed; and
generating (206c) a set of scaling coefficients as the first intermediate data, by performing a scale detection on the x-portion image and the y-portion image after removing the grids;
generating (208), via the one or more hardware processors, a second intermediate data based on the graph image, wherein generating the second intermediate data comprising:
determining (208a) whether the graph image comprises of a single plot or multiple plots, based on the information on number of line plots within the graph image;
removing (208b) a plurality of grids in the graph image, based on the obtained plurality of grid positions of the binary image;
isolating (208c) a plurality of plots in the graph image if the graph image is determined as comprising the multiple plots; and
sampling (208d), based on the information on number of data points to be sampled, each plot in the graph image to generate a set of numerical data as the second intermediate data, wherein the set of numerical data represents actual data comprised in the graph image; and
extracting (210), via the one or more hardware processors, information corresponding to the graph image, by performing mapping between the set of numerical data in the second intermediate data and the set of scaling coefficients in the first intermediate data, wherein performing the mapping comprises converting numerical data in the set of numerical data by using the set of scaling coefficients, wherein the converted numerical data is stored in a dictionary format.
2. The method as claimed in claim 1, wherein performing the axis detection comprises:
identifying (302) a vertical axis in the binary image by scanning the binary image;
identifying (304) a horizontal axis in the binary image by scanning the binary image;
determining (306) portion of the binary image from left side of the binary image to the vertical axis as the y-portion image; and
determining (308) portion of the binary image from the horizontal axis to bottom of the binary image as the x-portion.
3. The method as claimed in claim 1, wherein performing the scaling detection on the x-portion image and the y-portion image comprises:
detecting (402) value and position of a plurality of numerical markings on a horizontal axis and a vertical axis for the x-portion image and the y-portion image;
determining (404) distance between two closest numerical markings from among the plurality of numerical markings, based on the detected value and position of the plurality of numerical markings;
dividing (406) the determined distance between the two closest numerical markings by a pixel distance along the vertical axis to generate a y-scale coefficient; and
dividing (408) the determined distance between the two closest numerical markings by a pixel distance along the horizontal axis to generate a x-scale coefficient,
wherein the x-scale coefficient and the y-scale coefficient form the set of scaling coefficients.
4. The method as claimed in claim 1, wherein isolating the plurality of plots in the graph image, comprises:
generating (502) an enhanced image corresponding to the graph image, by processing the graph image using one or more image processing techniques;
converting (504) the enhanced image to Blue-Green-Red (BGR) image;
applying (506) K-means clustering on the BGR image to generate a plurality of clusters, wherein number of clusters during the clustering is equal to number of plots in the BGR image;
replacing (508) values in each of the plurality of clusters with data from centre of respective clusters, wherein a resulting image obtained after replacing the values in each of the plurality of clusters with data from centre of respective clusters comprises of line plots with number of colors same as that of number of plots;
converting (510) the resulting image to a grayscale image; and
applying (512) contour detection on the grayscale image to isolate the plurality of plots.
5. A system (100), comprising:
one or more hardware processors (102);
a communication interface (112); and
a memory (104) storing a plurality of instructions, wherein the plurality of instructions cause the one or more hardware processors to:
receive a) a graph image, b) information on number of data points to be sampled, c) information on number of line plots within the graph image, and d) information on types of grids in the graph image, as input data;
pre-process the graph image to generate a binary image;
generate a first intermediate data based on the binary image, wherein generating the first intermediate data, by:
generating an x-portion image and a y-portion image from the binary image, by performing an axis detection;
obtaining a plurality of grid positions from the binary image after separating the x-portion image and the y-portion image, by performing a grid detection based on the information on types of grids, wherein a plurality of grids at the plurality of grid positions are removed; and
generating a set of scaling coefficients as the first intermediate data, by performing a scale detection on the x-portion image and the y-portion image after removing the grids;
generate a second intermediate data based on the graph image, wherein generating the second intermediate data, by:
determining whether the graph image comprises of a single plot or multiple plots, based on the information on number of line plots within the graph image;
removing a plurality of grids in the graph image, based on the obtained plurality of grid positions of the binary image;
isolating a plurality of plots in the graph image if the graph image is determined as comprising the multiple plots; and
sampling, based on the information on number of data points to be sampled, each plot in the graph image to generate a set of numerical data as the second intermediate data, wherein the set of numerical data represents actual data comprised in the graph image; and
extract information corresponding to the graph image, by performing mapping between the set of numerical data in the second intermediate data and the set of scaling coefficients in the first intermediate data, wherein performing the mapping comprises converting numerical data in the set of numerical data by using the set of scaling coefficients, wherein the converted numerical data is stored in a dictionary format.
6. The system as claimed in claim 5, wherein the one or more hardware processors are configured to perform the axis detection, by:
identifying a vertical axis in the binary image by scanning the binary image;
identifying a horizontal axis in the binary image by scanning the binary image;
determining portion of the binary image from left side of the binary image to the vertical axis as the y-portion image; and
determining portion of the binary image from the horizontal axis to bottom of the binary image as the x-portion.
7. The system as claimed in claim 5, wherein the one or more hardware processors are configured to perform the scaling detection on the x-portion image and the y-portion image, by:
detecting value and position of a plurality of numerical markings on a horizontal axis and a vertical axis for the x-portion image and the y-portion image;
determining distance between two closest numerical markings from among the plurality of numerical markings, based on the detected value and position of the plurality of numerical markings;
dividing the determined distance between the two closest numerical markings by a pixel distance along the vertical axis to generate a y-scale coefficient; and
dividing the determined distance between the two closest numerical markings by a pixel distance along the horizontal axis to generate a x-scale coefficient,
wherein the x-scale coefficient and the y-scale coefficient form the set of scaling coefficients.
8. The system as claimed in claim 5, wherein the one or more hardware processors are configured to isolate the plurality of plots in the graph image, by:
generating an enhanced image corresponding to the graph image, by processing the graph image using one or more image processing techniques;
converting the enhanced image to Blue-Green-Red (BGR) image;
applying K-means clustering on the BGR image to generate a plurality of clusters, wherein number of clusters during the clustering is equal to number of plots in the BGR image;
replacing values in each of the plurality of clusters with data from centre of respective clusters, wherein a resulting image obtained after replacing the values in each of the plurality of clusters with data from centre of respective clusters comprises of line plots with number of colors same as that of number of plots;
converting the resulting image to a grayscale image; and
applying contour detection on the grayscale image to isolate the plurality of plots.
| # | Name | Date |
|---|---|---|
| 1 | 202221047400-STATEMENT OF UNDERTAKING (FORM 3) [19-08-2022(online)].pdf | 2022-08-19 |
| 2 | 202221047400-REQUEST FOR EXAMINATION (FORM-18) [19-08-2022(online)].pdf | 2022-08-19 |
| 3 | 202221047400-FORM 18 [19-08-2022(online)].pdf | 2022-08-19 |
| 4 | 202221047400-FORM 1 [19-08-2022(online)].pdf | 2022-08-19 |
| 5 | 202221047400-FIGURE OF ABSTRACT [19-08-2022(online)].pdf | 2022-08-19 |
| 6 | 202221047400-DRAWINGS [19-08-2022(online)].pdf | 2022-08-19 |
| 7 | 202221047400-DECLARATION OF INVENTORSHIP (FORM 5) [19-08-2022(online)].pdf | 2022-08-19 |
| 8 | 202221047400-COMPLETE SPECIFICATION [19-08-2022(online)].pdf | 2022-08-19 |
| 9 | 202221047400-FORM-26 [02-11-2022(online)].pdf | 2022-11-02 |
| 10 | Abstract1.jpg | 2022-11-29 |
| 11 | 202221047400-Proof of Right [21-12-2022(online)].pdf | 2022-12-21 |
| 12 | 202221047400-FER.pdf | 2025-06-02 |
| 13 | 202221047400-FORM 3 [14-07-2025(online)].pdf | 2025-07-14 |
| 1 | SearchHistoryE_14-10-2024.pdf |