A System And Method For Proofreading Integrated Object Having Text And

< Back

A System And Method For Proofreading Integrated Object Having Text And Images

Abstract: A System and Method For Proofreading Integrated Object Having Text and Images Abstract The present invention discloses a system (100) and a method (700) for proofreading objects in documents. The system (100) includes a user device (102) monitored by a user (104) and a document analysis unit (106). The document analysis unit (106) is configured to receive a source image and a target image of a document from a user device (102) and to detect a plurality of objects on the received source image and the target image of the document based on a predetermined plurality of object classes. The document analysis unit (106) then assigns a label to each of the detected plurality of objects. The document analysis unit (106) is further configured to compare the source image and the target image to determine dissimilarities between the source image and the target image. Also, the document analysis unit (106) is configured to output the labelled objects and the dissimilarities by highlighting them in pixel coordinates at the user device (102). FIG. 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

23 November 2022

Publication Number

22/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Karomi Technology Private Limited

VBC Solitaire Building, 9th Floor, 47 & 49, Bazullah Rd, Parthasarathi Puram, T. Nagar, Chennai, Tamil Nadu, 600017, India

Inventors

1. Sreenivas Narasimha Murali

G1 Urvi, Voora Prithvi Apartments, 141 Kamarajar Salai, Kottivakkam, Chennai - 600041

Specification

DESC:A System and Method For Proofreading Integrated Object Having Text and Images

FIELD OF INVENTION
The present invention relates to proofreading objects in documents having both text and images. More specifically, the present invention relates to a system and a method for proofreading objects in documents having both text and images to integrate visual and textual concept together for documents analysis and proof reading.

BACKGROUND OF INVENTION
Decomposing images and text of document pages into high-level semantic regions is fundamental for intelligent document understanding and editing, be it for any industry. Further, documents analysis and proof reading in any industry involves identifying and analysing various objects found on the documents, such as images, figures, text, tables, barcode, symbols, etc. Object detection in documents is a very challenging task as the objects found in documents may vary in size, texture, layout and various other features. It is also a critical task for a variety of document image analysis, document editing, content understanding and proofreading applications.

Most of the industries known and most of the existing techniques focus on only one aspect, either text or visual information depiction from documents. However, documents might contain a wide variety of objects that may be needed to be analysed in order to perform proper documents analysis and proof reading. Therefore, the industries and techniques known may not provide proper documents analysis and proof reading as few of the objects present on the documents may be left or not considered by such techniques, thereby not providing effective and efficient results. Therefore, there is a need for a system and a method for proofreading objects in documents having both text and images to integrate visual and textual concept together for documents analysis and proof reading. Further, there is a need for a system and a method for automatically proofreading objects in documents having both text and images, without any human intervention.

OBJECT OF INVENTION
The object of the present invention is to provide a system and a method for proofreading objects in documents having both text and images to integrate visual and textual concept together for documents analysis and proof reading. More specifically, the object of the present invention is to provide a method and a system for automatically proofreading objects in documents having both text and images, without any human intervention.

SUMMARY
The present application discloses a system for proofreading objects in documents. The present application discloses that the system includes a user device monitored by a user, and a document analysis unit. The document analysis unit is configured to detect a plurality of objects on the source image and the target image and to provide dissimilarities between the source image and the target image based on the plurality of objects detected. The documents analysis unit includes an object detection unit, an image comparison unit, a text comparison unit, and an output unit.

The object detection unit is configured to receive a source image and a target image of a document from a user device, and to detect a plurality of objects on the received source image and the target image of the document based on a predetermined plurality of object classes. The object detection unit is further configured to assign a label to each of the detected plurality of objects and generate a confidence score and a bounding box for each of the labelled plurality of objects.

The image comparison unit is also configured to receive the source image and the target image of the document from the user device and to align the target image with the source image to generate a wrapped target image when the source image and the target image do not have alignment similarities. The image comparison unit is further configured to analyze the source image and the wrapped target image to determine if the source image and the wrapped target image are similar based on a plurality of first predefined thresholds.

Further, the image comparison unit is configured to compare a plurality of pixel values of the source image and a plurality of pixel values of the wrapped target image to determine dissimilarities between the source image and the wrapped target image if the source image and the wrapped target image are not similar, wherein the image comparison unit determines dissimilarities between the source image and the wrapped target image as a set of contour points or bounding box coordinates.

The text comparison unit is also configured to receive the source image and the target image of the document from the user device and analyze the received source image and the target image to determine a number of pages of the received source image and the target image. The text comparison unit is further configured to extract text from each determined page of the source image and for each determined page of the target image based on a plurality of second predefined thresholds, and capture location index of the extracted text. Also, the text comparison unit is configured to extract a plurality of tables from each determined page of the source image and for each determined page of the target image.

Further, the text comparison unit is configured to form blocks for the extracted text and each of the extracted plurality of tables in the source image and the target image based on the extracted text and a plurality of predefined criteria. The text comparison unit is configured to find a plurality of dissimilarities between the source image and the target image by comparing each of the target image blocks inside each of the source image blocks based on a plurality of third predefined thresholds. Also, the text comparison unit is configured to draw a bounding box around dissimilar source image blocks and dissimilar target image blocks and to categorize each of the dissimilar source image blocks and the dissimilar target image blocks with a dissimilarity class.

The output unit is configured to receive the plurality of labelled objects along with their confidence score and bounding box from the objects detection unit, the set of contour points or the bounding box coordinates representing dissimilarities between the source image and the wrapped target image from the image comparison unit, and the categorized dissimilar source image blocks and the dissimilar target image blocks from the text comparison unit. Further, the output unit is configured to output the labelled objects and the dissimilarities by highlighting them in pixel coordinates at the user device.

The present disclosure further discloses a method for proofreading objects in documents. The method includes receiving, by an object detection unit, an image comparison unit and a text comparison unit, a source image and a target image of a document from a user device. The method includes detecting, by the object detection unit, a plurality of objects on the received source image and the target image of the document based on a predetermined plurality of object classes. The method further includes assigning, by the object detection unit, a label to each of the detected plurality of objects. Also, the method includes generating, by the object detection unit, a confidence score and a bounding box for each of the labelled plurality of objects.

Further, the method includes aligning, by the image comparison unit, the target image with the source image to generate a wrapped target image when the source image and the target image do not have alignment similarities. The method includes analyzing, by the image comparison unit, the source image and the wrapped target image to determine if the source image and the wrapped target image are similar based on a plurality of first predefined thresholds. The method further includes comparing, by the image comparison unit, a plurality of pixel values of the source image and a plurality of pixel values of the wrapped target image to determine dissimilarities between the source image and the wrapped target image if the source image and the wrapped target image are not similar, wherein the image comparison unit determines dissimilarities between the source image and the wrapped target image as a set of contour points or bounding box coordinates.

Furthermore, the method includes analyzing, by the text comparison unit, the received source image and the target image to determine a number of pages of the received source image and the target image. The method includes extracting, by the text comparison unit, text from each determined page of the source image and for each determined page of the target image based on a plurality of second predefined thresholds and capturing location index of the extracted text. Also, the method includes extracting, by the text comparison unit, a plurality of tables from each determined page of the source image and for each determined page of the target image.

Also, the method includes forming, by the text comparison unit, blocks for the extracted text and each of the extracted plurality of tables in the source image and the target image based on the extracted text and a plurality of predefined criteria. The method includes finding, by the text comparison unit, a plurality of dissimilarities between the source image and the target image by comparing each of the target image blocks inside each of the source image blocks based on a plurality of third predefined thresholds.

Further, the method includes drawing, by the text comparison unit, a bounding box around dissimilar source image blocks and dissimilar target image blocks. The method includes categorizing, by the text comparison unit, each of the dissimilar source image blocks and the dissimilar target image blocks with a dissimilarity class.

Furthermore, the method includes receiving, by an output unit, the plurality of labelled objects along with their confidence score and bounding box from the object detection unit, the set of contour points or the bounding box coordinates representing dissimilarities between the source image and the wrapped target image from the image comparison unit, and the categorized dissimilar source image blocks and the dissimilar target image blocks from the text comparison unit. The method includes outputting, by the output unit, the labelled objects and the dissimilarities by highlighting them in pixel coordinates at the user device.

BRIEF DESCRIPTION OF DRAWINGS
The novel features and characteristics of the disclosure are set forth in the description. The disclosure itself, however, as well as a preferred mode of use, further objectives, and advantages thereof, will best be understood by reference to the following description of an illustrative embodiment when read in conjunction with the accompanying drawings. One or more embodiments are now described, by way of example only, with reference to the accompanying drawings wherein like reference numerals represent like elements and in which:

FIG. 1 illustrates a system 100 for proofreading objects in documents, in accordance with an embodiment of the present disclosure.
FIG. 2 illustrates an exemplary source image 202 and target image 204 of a document 200 inputted by the user 102, in accordance with an embodiment of the present disclosure.
FIG. 3A illustrates an exemplary nutritional table present on a document, in accordance with an embodiment of the present disclosure.
FIG. 3B illustrates an exemplary barcode present on a document, in accordance with an embodiment of the present disclosure.
FIG. 3C illustrates an exemplary nutria-score present on a document, in accordance with an embodiment of the present disclosure.
FIG. 3D illustrates an exemplary front of panel declaration (FOP) present on a document, in accordance with an embodiment of the present disclosure.
FIG. 3E illustrates an exemplary plurality of lines present on a document, in accordance with an embodiment of the present disclosure.
FIG. 3F illustrates an exemplary symbol present on a document, in accordance with an embodiment of the present disclosure.
FIG. 3G illustrates an exemplary image present on a document, in accordance with an embodiment of the present disclosure.
FIG. 3H illustrates an exemplary text present on a document, in accordance with an embodiment of the present disclosure.
FIG. 3I illustrates exemplary pantone colours present on a document, in accordance with an embodiment of the present disclosure.
FIG. 4 illustrates an exemplary display screen 400 illustrating the labelled objects detected on a source image 402 and a target image 404, in accordance with an embodiment of the present disclosure.
FIG. 5 illustrates an exemplary reverse matching method used in the Help-loc technique, in accordance with an embodiment of the present disclosure.
FIG. 6 illustrates an exemplary display screen 600 illustrating dissimilar source image blocks and dissimilar target image blocks categorized by the text comparison unit 112, in accordance with an embodiment of the present disclosure.
FIG. 7 illustrates a method 700 for proofreading objects in documents, in accordance with an embodiment of the present disclosure.

The figures depict embodiments of the disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the assemblies, structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.

DETAILED DESCRIPTION
The best and other modes for carrying out the present invention are presented in terms of the embodiments, herein depicted in drawings provided. The embodiments are described herein for illustrative purposes and are subject to many variations. It is understood that various omissions and substitutions of equivalents are contemplated as circumstances may suggest or render expedient, but are intended to cover the application or implementation without departing from the spirit or scope of the present invention. Further, it is to be understood that the phraseology and terminology employed herein are for the purpose of the description and should not be regarded as limiting. Any heading utilized within this description is for convenience only and has no legal or limiting effect.

The terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items.

The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more sub-systems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other, sub-systems, elements, structures, components, additional sub-systems, additional elements, additional structures or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this invention belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting.

The system, methods, and examples provided herein are only illustrative and not intended to be limiting.

Embodiments of the present invention will be described below in detail with reference to the accompanying figures.

The present invention focusses on providing a system and a method for proofreading objects in documents having both text and images for documents analysis and proof reading. Decomposing images and text of document pages into high-level semantic regions is fundamental for intelligent document understanding and editing, be it for any industry. Further, documents analysis and proof reading in any industry involves identifying and analysing various objects found on the documents, such as images, figures, text, tables, barcode, symbols, etc. Object detection in documents is a very challenging task as the objects found in documents may vary in size, texture, layout and various other features.

Further, most of the industries known and most of the existing techniques focus on only one aspect, either text or visual information depiction from documents. However, documents might contain a wide variety of objects that may be needed to be analysed in order to perform proper documents analysis and proof reading. Therefore, the industries and techniques known may not provide proper documents analysis and proof reading as few of the objects present on the documents may be left or not considered by such techniques, thereby not providing effective and efficient results.

Therefore, the present disclosure provides a system and a method for proofreading objects in documents having both text and images to integrate visual and textual concept together for documents analysis and proof reading. Further, there is a need for a system and a method for automatically proofreading objects in documents having both text and images, without any human intervention.

FIG. 1 illustrates a system 100 for proofreading objects in documents, in accordance with an embodiment of the present disclosure. The system 100 includes a user device 102 monitored by a user 104, and a document analysis unit 106. The user device 102 relates to hardware component such as a keyboard, mouse, etc which accepts data from the user 104 and also relates to a hardware component such as a display screen of a desktop, laptop, tablet, etc. which displays data to the user 104. The user device 102 is configured to allow the user 104 to input a source image and a target image of a document. The source image and the target image are images of the same document.

In an embodiment of the present disclosure, a document may be an artwork related to packaging goods, leaflets for packaging medicines, or any other packaging materials used for packaging goods having various objects. In another embodiment of the present disclosure, a document may be any document capable of being scanned and having various objects. The objects may include, but not limited to, text, images, figures, tables, barcodes, symbols, nutri-scores, lines, etc. In an embodiment of the present disclosure, a source image may be a scanned image of a master document that has been prepared and has been digitally approved by the user 104, and a target image may be a copy or a scanned copy of a printed version of the same digitally approved master document. The source image and the target image are images of the same digitally approved master document.

The user 104 may be, but not limited to, an official of an industry responsible for preparing of the document, any employ of an industry monitoring printing of documents, a third party who may have received order of the document preparation, a person at a printing unit who may have received printing order of documents, etc.

FIG. 2 illustrates an exemplary source image 202 and target image 204 of a document 200 inputted by the user 102, in accordance with an embodiment of the present disclosure. As illustrated in FIG. 2, the source image 202 and the target image 204 includes various objects such as text, images, barcodes, symbols, etc. Also, the source image 202 and the target image 204 are images of a same document.

The user device 102 is further configured to send the source image and the target image to the document analysis unit 106. The document analysis unit 106 is a hardware component which is capable for processing any data or information received by them. In certain embodiments, the document analysis unit 106 may be part of any regularly devices, such as laptops, desktops, tablets, mobile devices, etc. The document analysis unit 106 is configured to detect a plurality of objects on the source image and the target image and to provide dissimilarities between the source image and the target image based on the plurality of objects detected. The documents analysis unit 106 includes an object detection unit 108, an image comparison unit 110, a text comparison unit 112, and an output unit 114.

The object detection unit 108 is configured to receive the source image and the target image from the user device 102 and to detect a plurality of objects on the received source image and the target image based on a predetermined plurality of object classes. In an embodiment of the present disclosure, the object detection unit 108 uses the following predetermined plurality of object classes for detecting the plurality of objects on the received source image and the target image:
artworks – an artwork may include packaging information such as graphics, logos, bar-codes and textual information.
tables - tables may include nutritional tables (USA and Europe food market) and supplementary tables. Nutritional table may include information about selected basic foods, weight in grams, calories, amount of protein, carbohydrates, dietary fiber, fat, and saturated fat information for each food. FIG. 3A illustrates an exemplary nutritional table present on a document, in accordance with an embodiment of the present disclosure.
barcodes – a barcode may include a way of representing data in a machine-readable form. FIG. 3B illustrates an exemplary barcode present on a document, in accordance with an embodiment of the present disclosure.
nutri-scores – a nutri-score may include nutritional quality of food and beverages and may use 5 different colours for categorizing packaged food products. FIG. 3C illustrates an exemplary nutria-score present on a document, in accordance with an embodiment of the present disclosure.
Front of panel declarations (FOP) – a FOP is a key policy way to regulate food products and indicate the consumers about the excessive amounts of sugars, total fats, saturated fats, trans fats, and sodium in a food product. FIG. 3D illustrates an exemplary front of panel declaration (FOP) present on a document, in accordance with an embodiment of the present disclosure.
lines – a line may include a boundary line which separates colour and monochromatic areas or differently coloured areas of printing. A document may include different types of lines, where a line also helps to differentiate between a front panel and a back panel of a document. FIG. 3E illustrates an exemplary plurality of lines present on a document, in accordance with an embodiment of the present disclosure.
symbols – a symbol span from food allergy symbols to recycle information, hazard symbols, vegan symbols, safety standard symbols etc. Symbols provide important information to customers and comply with Government regulations. FIG. 3F illustrates an exemplary symbol present on a document, in accordance with an embodiment of the present disclosure.
images – an image may include multiple images present on a document to draw the attention of a customer to that particular document. FIG. 3G illustrates an exemplary image present on a document, in accordance with an embodiment of the present disclosure.
text - text may include any text present on document that provides customers information about data of the document. FIG. 3H illustrates an exemplary text present on a document, in accordance with an embodiment of the present disclosure.
others - others may include anything which is there on a document and not categorized in any of the above-mentioned classes, for example pantone colours. FIG. 3I illustrates exemplary pantone colours present on a document, in accordance with an embodiment of the present disclosure.
In another embodiment of the present disclosure, the object detection unit 108 may use any other object classes known for detecting the plurality of objects on the received source image and the target image.

In an embodiment of the present disclosure, the object detection unit 108 may use a transfer learning (TL) model for detecting the plurality of objects on the received source image and the target image. The transfer learning (TL) model is a methodology where a pre-trained model is used to transfer knowledge already gained, to a new task for improvement in learning. For example, the object detection unit 108 uses a faster RCNN ResNet50 transfer learning model because learned features (edges, colour) of a TL model are also features of document objects. Also, a TL model gives accurate and reliable results for a small dataset. Thus, a last layer of the TL model helps in achieving a semi-automatic learning for accurate object detection in documents.

Further, the TL model is trained for the object classes mentioned above and for small dataset (with only 1700 samples), where each sample is stored as an XML (extensible markup language) file for better processing. However, if new samples of documents are used, which may or may not include new object classes, the new samples may be added to the TL model. Once the new samples are added in the TL model, an XML of the added samples is generated for further processing and saved in the dataset.

After detecting the plurality of objects on the received source image and the target image, the object detection unit 108 is configured to assign a label to each of the detected plurality of objects based on the plurality of object classes. For example, if an object found on the received source image or the target image is a symbol, the object detection unit 108 labels it as “Symbol”. If an object found on the received source image or the target image is a text, the object detection unit 108 labels it as “Text”.

The object detection unit 108 is further configured to generate a confidence score for each of the labelled plurality of objects. In an embodiment of the present disclosure, a confident score may be a number that lies between 0 and 1 and represents correctness of the confidence score generated. In another embodiment of the present disclosure, a confidence score may be any number. Once the confidence score has been generated, the object detection unit 108 generates a bounding box for each of the labelled plurality of objects.

FIG. 4 illustrates an exemplary display screen 400 illustrating the labelled objects detected on a source image 402 and a target image 404, in accordance with an embodiment of the present disclosure. As illustrated in FIG. 4, the detected labelled objects are displayed as highlighted contour points or bounding box coordinates.

The image comparison unit 110 also receives the source image and the target image from the user device 102. In an embodiment, the image comparison unit 110 may receive the source image and the target image in the form of a PDF image. In another embodiment, the image comparison unit 110 may receive the source image and the target image in a standard image format such as .png / .jpeg. In yet another embodiment, the image comparison unit 110 may receive the source image and the target image in any format known.

The image comparison unit 110 is configured to align the target image with the source image. Any two images can be compared perfectly if they are aligned with each other. Image alignment or image registration is a process of wrapping images so that there is a perfect line-up between two images. Therefore, the image comparison unit 110 uses image alignment technique to align the target image with the source image. In an image alignment technique, a sparse set of features are detected in one image and matched with the features in the other image. The image alignment technique also identifies interesting stable points, known as key-points or feature points, in an image. The key points or feature points are points similar to points that a human notices when he/she sees that image for a first time.

In an embodiment of the present disclosure, the image comparison unit 110 uses an Oriented FAST and Rotated BRIEF (ORB) algorithm to detect the key-points or feature points in the source image and the target image. ORB is a combination of two algorithms: FAST (Features from Accelerated Segments Test) and BRIEF (Binary Robust Independent Elementary Features). FAST identifies key points on the source image and the target image that are stable under image transformations like translation (shift), scale (increase/decrease in size), and rotation, and gives the (x, y) coordinates of such points. BRIEF takes the identified key points and turns them into a binary descriptor or a binary feature vector. The key points founded by FAST algorithm and binary descriptors created by BRIEF algorithm both together represent an object of an image. A threshold of max 30,000 key points is defined for ORB to control a number of key points extracted. The advantages of ORB are that it is very fast, accurate, license-free, and gives a high recognition rate. In another embodiment of the present disclosure, the image comparison unit 110 may use any technique known for detecting the key-points or feature points in the source image and the target image.

After the key points have been identified for the source image and the target image and each detected key point has been converted into a binary descriptor, the image comparison unit 110 identifies alignment similarities between the source image and the target image by matching binary descriptors of the source image and binary descriptors of the target image. In an embodiment of the present disclosure, the image comparison unit 110 may use a hamming distance as a measure of similarity between a binary descriptor of the source image and a binary descriptor of target image. The image comparison unit 110 may then sort them by goodness of match (15%) threshold.

In another embodiment of the present disclosure, the image comparison unit 110 may use homography with Random Sample Consensus (RANSAC) method to find similarities between the binary descriptors of the source image and the binary descriptors of the target image. A homography may be computed when there are 4 or more corresponding key points in the source image and the target image. Basically, homography is a 3X3 matrix. Let us assume that (x1, y1) are coordinates of a key point in the source image and (x2, y2) be the coordinates of the same key point in the target image. Then, the Homography (H) for the coordinates is as represented by equation 1:
H = ¦(h00&h01&h02@h10&h11&h12@h20&h21&h22) (1)

Once an accurate homography is calculated, the homography transformation is applied to all pixels in one image to map it to the other image. Therefore, the Homography (H) is then applied to all the pixels of the target image to obtain a wrapped target image, as represented by equation 2:
[¦(x_1@y_1@1)]=H [¦(x_2@y_2@1)] (2)

The image comparison unit 110 then aligns the wrapped target image with the source image. RANSAC is a robust estimation technique. RANSAC has an advantage that it produces a right result even in a presence of large number of bad matches or dissimilarities between the source image and the target image by removing outlier features of both the images. In another embodiment of the present disclosure, the image comparison unit 110 may use any method known to find similarities between the binary descriptors of the source image and the binary descriptors of the target image.

After the wrapped target image has been aligned with the source image, the image comparison unit 110 is further configured to analyze the source image and the wrapped target image to determine if the source image and the wrapped target image are similar based on a plurality of predefined thresholds. In an embodiment of the present disclosure, the image comparison unit 110 uses a Structural Similarity Index (SSIM) to find similarities and/or dissimilarities between the wrapped target image and the source image. SSIM is a perceptual metric used to measure differences or dissimilarities between two similar images. SSIM puts analysis of images in a scale of -1 to 1, where a score of 1 means that the images are very similar and a score of -1 means that the images are very different. Hence, SSIM fits well for document or image comparison. In another embodiment of the present disclosure, the image comparison unit 110 may use a mean square error (MSE) technique to find similarities and/or dissimilarities between the wrapped target image and the source image. In yet another embodiment of the present disclosure, the image comparison unit 110 may use any technique known for finding similarities and/or dissimilarities between the wrapped target image and the source image.

The image comparison unit 110 uses the following predefined thresholds to determine similarities and/or dissimilarities between the wrapped target image and the source image:
SSIM WINDOW SIZE = 25 – This value is based on a character size taken across 100’s of samples.
SSIM VERIFICATION VALUE = [0.925, 0.975] - These values are for verifying each individual deviation or dissimilarity, for normal and high, respectively.
MORPH KERNEL SIZE = 10 - This value is for closing deviations or dissimilarities to combine.
GAUSSIAN BLUR = (3,3) – This represent values to which both the source image and wrapped target image are blurred for smoothing.
RAW THRESHOLDS = (205, 253) - This threshold value is defined to choose between threshold for normal and highly sensitive differences in SSIM output, and to identify which pixels are potential deviations or dissimilarities.
THRESHOLD RANGE = (0.75, 3.0, 18.0, 50.0) - This threshold range is defined to choose values of threshold based on a percentage of pixel differences or dissimilarities (extreme low range, low range, mid-range, high range).
THRESHOLDS NON 0 = (127.5, 160, 180, 240, 250) - These threshold values are used to choose values for documents with rotation/ translation/ scaling as major difference or dissimilarity. For non 0 degree, there may be interpolation artifacts due to rotation. The Threshold values help to ignore interpolation artifacts.
127.5 - High range threshold – This value related to Pack inserts, digital v/s print proof (Since print proof has highly flattened images which has high noise).
160 - Mid-range threshold – This value relates to significant transformation between 2 versions which may lead to registration issue.
180 - Lower mid-range threshold – This value relates to minor transformation leading to registration issue.
240 - Low range threshold – This value relates to same orientation and no transformation with differences or dissimilarities.
250 – This value relates to extreme low range threshold where there is no transformation with very minute differences or dissimilarities.
If, based on the above-mentioned predefined thresholds, the image comparison unit 110 determines that the source image and the wrapped target image are not similar, the image comparison unit 110 compares a plurality of pixel values of the source image and a plurality of pixel values of the wrapped target image to determine dissimilarities between the source image and the wrapped target image. The image comparison unit 110 determines dissimilarities between the source image and the wrapped image as a set of contour points or bounding box coordinates. Contour points provide more accurate results as compared to bounding box coordinates, as a bounding box may include non-difference regions based on an individual difference. The dynamic and automatically chosen contour points are actual differences. Contour points are simply a curve joining all the continuous points (along the boundary), having same colour or intensity. Each individual contour is an array of (x,y) coordinates of boundary points of the object.

The text comparison unit 112 is also configured to receive the source image and the target image from the user device 102. On receiving the source image and the target image from the user device 102, the text comparison unit 112 checks if the source image and the target image are in a PDF format. If not, the text comparison unit 112 first converts the received source image and the target image into a PDF format. In an embodiment of the present disclosure, the text comparison unit 112 may provide a graphical user interface (GUI) to the user 104 on the user device 102 enabling the user 104 to manually select areas from the source image and the target image which need to be compared to find dissimilarities between the source image and the target image.

The text comparison unit 112 is further configured to analyze the source image and the target image individually to determine number of pages of the source image and the target image. Once the number of pages have been determined, the text comparison unit 112 is configured to extract text from each determined page of the source image and for each determined page of the target image based on a plurality of predefined thresholds. The extracted text may include, but not limited to, a word or a plurality of words (hereinafter interchangeably used as word or text), lines or blocks. The plurality of predefined thresholds are defined depending on a paper size of the source image and the target image, and line spacing, word spacing, and column spacing in the source image and the target image. The plurality of predefined thresholds may include, but not limited to, the following:
Document top threshold – 3: It represents a threshold for a word spacing;
Different location threshold -10
Column line spacing factor for y axis – 12: This spacing provides a threshold value to avoid combing two lines. If a vertical gap is too much, then they form separate columns.
Column line spacing factor for x axis – 12: This spacing provides a threshold value so that if two lines are below each other but the horizontal gap is too much, then they form separate columns.
Block word spacing factor for y axis - 1.45: It represents a threshold to define y distance between two words for a non A4 size.
Block word spacing factor for y axis for A4 paper - 3.00: It represents a threshold to define y distance between two words for an A4 size.
Block word spacing factor for x axis - 1.45: It represents a threshold to define x distance between two words for a non A4 size.
Block word spacing factor for x axis for A4 paper - 3.00: It represents a threshold to define x distance between two words for an A4 size.
Document line spacing for x axis - 1.45: It represents line spacing factor for x axis.
Document line spacing for y axis - 1.45: It represents line spacing factor for y axis.
Same line spacing - 2.0: It represents a threshold for same line spacing.
Header threshold value – 0.1: It represents a threshold to remove header.
Footer threshold value – 0.9: It represents a threshold to remove footer.

The text comparison unit 112 also defines and sets tolerance values (for x and y) for each of the plurality of predefined thresholds so that words are not combined with each other during extraction of the test from the source image and the target image. In an embodiment of the present disclosure, the GUI helps the user 104 to customize the plurality of predefined thresholds based on the type of the source image and the target image and based on requirements of a particular industry. For example, different plurality of thresholds may be defined for food industry and different for the pharmaceutical industry, thereby being able to customize to all customer needs and scenarios.

After extracting text from the source image and the target image, the text comparison unit 112 captures location or index of the extracted text. The text comparison unit 112 is further configured to extract a plurality of tables from each determined page of the source image and for each determined page of the target image. The text comparison unit 112 first identifies intersection of lines and rectangles to determine tables on each page of the source image and each page of the target image. The text comparison unit 112 then checks if any of the text extracted lies within the determined tables. If an extracted text lies within a determined table, the text comparison unit 112 extracts rows for that extracted text and cells for the extracted rows. The text comparison unit 112 also extracts metadata related to the extracted text, rows and cells. The metadata may include, but not limited to, position coordinates of the text, font size, page number and orientation of the text. In an embodiment of the present disclosure, the text comparison unit 112 extracts the tables using an object detection system. In another embodiment of the present disclosure, the text comparison unit 112 may extract the tables using any technique known.

The text comparison unit 112 is further configured to form blocks for the extracted text and each of the extracted plurality of tables in the source image and the target image based on the extracted text and a plurality of predefined criteria. The text comparison unit 112 uses the following predefined criteria to form blocks:
if an index of a first word or text on a first page (of the source image and the target image) is zero, a first block is formed,
if a page number of a current word on a current page (page currently considered for comparison), is not equal to a page number of a word on a previous page (page previous to the current page), a new block is formed,
if a current word is upright and a previous word is not upright or vice-versa, a new block is formed,
if a current word is upright and a previous word is also upright, the same block is continued,
if a current word has ‘.’ at the end, and if a difference between lower x coordinate of the current word and a previous word is less than an image top threshold, then same block is continued,
if a current word is bold and a previous word is not bold, then same block is continued,
if a difference between lower y coordinate of a current word and a previous word is greater than the multiplication of current word font size and block factor, then a new block is formed,
if a current word has ‘.’ at the end, and if a difference between lower y coordinate of current word and a previous word is less than the image top threshold, then same block is continued.

Once the blocks have been formed in the source image and the target image, the text comparison unit 112 is configured to find dissimilarities between the source image and the target image by comparing each of the target image blocks inside each of the source image blocks based on a plurality of thresholds. The text comparison unit 112 compares two texts in the source image block and the target image block on various factors, such as, font size of text, orientation of the text, spacing between text (block spacing, line spacing), etc. The text comparison unit 112 uses the following plurality of thresholds for comparing the target image blocks within the source image blocks:
match threshold - 0.47
reverse match threshold - 0.65
match distance - 1000
line spacing factor - 1.25
match max bits - 0
whitespace allowable percent - 70
block word spacing factor for y axis for A4 paper - 3.00
block word spacing factor for x axis - 1.45
block word spacing factor for x axis for A4 paper - 3.00
document line spacing for x axis - 1.45
document line spacing for y axis - 1.45
same line spacing - 2.0

In an embodiment of the present disclosure, the text comparison unit 112 uses a Diff-Match-Patch algorithm by Google to find dissimilarities between the source image and the target image. The Diff-Match-Patch algorithm compares two blocks of text and returns a list of differences. For example, let’s say given a string to search, the Diff-Match-Patch algorithm finds a best fuzzy match in a text block, giving weightage to both accuracy and location. Then list of patches is applied to text, in case the underlying text does not match.

In another embodiment of the present disclosure, the text comparison unit 112 uses Help-loc technique to find dissimilarities between the source image and the target image. The Help-loc technique uses reverse matching between the source image blocks and the target image blocks to find similarities and dissimilarities between them as a small target image block matched within a large source image block will have similar backward location match based on forward location, a length of the small target image block, and a length of the source image block, and is represented as equation 1:
Help-loc = length (target image block) - 1 - (forward location + length (source image block) - 1) ----------------------------(1)

FIG. 5 illustrates an exemplary reverse matching method used in the Help-loc technique, in accordance with an embodiment of the present disclosure. The text comparison unit 112 uses the Help-loc technique to find a match of the target image blocks within the source image blocks. At step 502, as a first match is found between a target image block and a source image block, a forward location of the match is noted. At step 504, based on the Help-loc (as represented in equation (1)) and the forward location of the match, a reverse matching is performed for the target image block within the source location block.

At step 506, a location of an actual matched string is then found in the target image block and the source image block class objects. At step 508, a check is performed if the actual matched string is empty. If the actual matched string is empty, the method moves to step 510. If the actual matched string is not empty, the method moves back to step 506.

At step 510, dissimilarities are found between the source image block and the actual matched string from the target image block. At step 512, indexes are provided to the dissimilarities found for labelling to indicate different dissimilarities with based on different dissimilarity classes. At step 514, unanalysed text in the target image blocks is formed into a new block and the method continues to step 502.

The text comparison unit 112 finds a plurality of dissimilarities between the source image blocks and the target image blocks and draws a bounding box around the dissimilar source image blocks and the dissimilar target image blocks. The text comparison unit 112 then categorizes each of the dissimilar source image blocks and the dissimilar target image blocks with one of the following dissimilarity classes:
Text Difference – It represents dissimilarities in text between the source image blocks and the target image blocks, and includes variations in a text/word,
Case Difference – It represents dissimilarities in case (uppercase or lowercase) of text between the source image blocks and the target image blocks,
Format Difference – It represents dissimilarities in format (bold, italics) of the text between the source image blocks and the target image blocks,
Only in Source – It represents text that is present only in source image blocks and not in the target image blocks,
Only in Target – It represents text that is present in the target image blocks and not in the source image blocks.

FIG. 6 illustrates an exemplary display screen 600 illustrating dissimilar source image blocks and dissimilar target image blocks categorized by the text comparison unit 112, in accordance with an embodiment of the present disclosure. As illustrated in FIG. 6, the dissimilar source image blocks and the dissimilar target image blocks are displayed as highlighted contour points or bounding box coordinates and are categorized by their respective dissimilarity class.

The output unit 114 is configured to receive the plurality of labelled objects along with their confidence score and bounding box from the object detection unit 108; the set of contour points or the bounding box coordinates representing dissimilarities between the source image and the wrapped target image from the image comparison unit 110; and the categorized dissimilar source image blocks and the dissimilar target image blocks from the text comparison unit 112. The output unit 114 is further configured to output the labelled objects and the dissimilarities by highlighting them in pixel coordinates at the user device 102.

FIG. 7 illustrates a method 700 for proofreading objects in documents, in accordance with an embodiment of the present disclosure. At step 702, the method includes receiving, by an object detection unit 108, an image comparison unit 110 and a text comparison unit 112, a source image and a target image of a document from a user device 102. At step 704, the method includes detecting, by the object detection unit 108, a plurality of objects on the received source image and the target image of the document based on a predetermined plurality of object classes.

At step 706, the method includes assigning, by the object detection unit 108, a label to each of the detected plurality of objects. At step 708, the method includes generating, by the object detection unit 108, a confidence score and a bounding box for each of the labelled plurality of objects.

At step 710, the method includes aligning, by the image comparison unit 110, the target image with the source image to generate a wrapped target image when the source image and the target image do not have alignment similarities. At step 712, the method includes analyzing, by the image comparison unit 110, the source image and the wrapped target image to determine if the source image and the wrapped target image are similar based on a plurality of first predefined thresholds.

At step 714, the method includes comparing, by the image comparison unit 110, a plurality of pixel values of the source image and a plurality of pixel values of the wrapped target image to determine dissimilarities between the source image and the wrapped target image if the source image and the wrapped target image are not similar, wherein the image comparison unit 110 determines dissimilarities between the source image and the wrapped target image as a set of contour points or bounding box coordinates.

At step 716, the method includes analyzing, by the text comparison unit 112, the received source image and the target image to determine a number of pages of the received source image and the target image. At step 718, the method includes extracting, by the text comparison unit 112, text from each determined page of the source image and for each determined page of the target image based on a plurality of second predefined thresholds and capturing location index of the extracted text.

At step 720, the method includes extracting, by the text comparison unit 112, a plurality of tables from each determined page of the source image and for each determined page of the target image. At step 722, the method includes forming, by the text comparison unit 112, blocks for the extracted text and each of the extracted plurality of tables in the source image and the target image based on the extracted text and a plurality of predefined criteria.

At step 724, the method includes finding, by the text comparison unit 112, a plurality of dissimilarities between the source image and the target image by comparing each of the target image blocks inside each of the source image blocks based on a plurality of third predefined thresholds. At step 726, the method includes drawing, by the text comparison unit 112, a bounding box around dissimilar source image blocks and dissimilar target image blocks.

At step 728, the method includes categorizing, by the text comparison unit 112, each of the dissimilar source image blocks and the dissimilar target image blocks with a dissimilarity class.

At step 730, the method includes receiving, by an output unit 114, the plurality of labelled objects along with their confidence score and bounding box from the objects detection unit 108, the set of contour points or the bounding box coordinates representing dissimilarities between the source image and the wrapped target image from the image comparison unit 110, and the categorized dissimilar source image blocks and the dissimilar target image blocks from the text comparison unit 112.

At step 732, the method includes outputting, by the output unit 114, the labelled objects and the dissimilarities by highlighting them in pixel coordinates at the user device 102.

The system and method for proofreading objects in documents disclosed in the present disclosure have numerous advantages. The system and method disclosed in the present disclosure is used for proofreading objects in documents having both text and images to integrate visual and textual concept together for documents analysis and proof reading. Further, the system and method disclosed in the present disclosure is used for automatically proofreading objects in documents having both text and images, without any human intervention. Furthermore, the system and method disclosed in the present disclosure efficiently computes difference in text, case and format between a source image and a target image of a document, and helps to understand and detect differences of all the objects on the source image and the target image accurately.

The embodiments herein and the various features and advantageous details thereof are explained with reference to the non-limiting embodiments in the description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.

The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments.
It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the embodiments as described herein.

Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.

The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the disclosure to achieve one or more of the desired objects or results.

Any discussion of documents, acts, materials, devices, articles and the like that has been included in this specification is solely for the purpose of providing a context for the disclosure.

It is not to be taken as an admission that any or all of these matters form a part of the prior art base or were common general knowledge in the field relevant to the disclosure as it existed anywhere before the priority date of this application.

The numerical values mentioned for the various physical parameters, dimensions or quantities are only approximations and it is envisaged that the values higher/lower than the numerical values assigned to the parameters, dimensions or quantities fall within the scope of the disclosure, unless there is a statement in the specification specific to the contrary.

While considerable emphasis has been placed herein on the particular features of this disclosure, it will be appreciated that various modifications can be made, and that many changes can be made in the preferred embodiments without departing from the principles of the disclosure. These and other modifications in the nature of the disclosure or the preferred embodiments will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the disclosure and not as a limitation.
,CLAIMS:I/We Claim:
1. A system (100) for proofreading objects in documents, the system (100) comprising:
an object detection unit (108) configured to:
receive a source image and a target image of a document from a user device (102);
detect a plurality of objects on the received source image and the target image of the document based on a predetermined plurality of object classes;
assign a label to each of the detected plurality of objects; and
generate a confidence score and a bounding box for each of the labelled plurality of objects;
an image comparison unit (110) configured to:
receive the source image and the target image of the document from the user device (102);
align the target image with the source image to generate a wrapped target image when the source image and the target image do not have alignment similarities;
analyze the source image and the wrapped target image to determine if the source image and the wrapped target image are similar based on a plurality of first predefined thresholds; and
compare a plurality of pixel values of the source image and a plurality of pixel values of the wrapped target image to determine dissimilarities between the source image and the wrapped target image if the source image and the wrapped target image are not similar, wherein the image comparison unit (110) determines dissimilarities between the source image and the wrapped target image as a set of contour points or bounding box coordinates;
a text comparison unit (112) configured to:
receive the source image and the target image of the document from the user device (102);
analyze the received source image and the target image to determine a number of pages of the received source image and the target image;
extract text from each determined page of the source image and for each determined page of the target image based on a plurality of second predefined thresholds, and capture location index of the extracted text;
extract a plurality of tables from each determined page of the source image and for each determined page of the target image;
form blocks for the extracted text and each of the extracted plurality of tables in the source image and the target image based on the extracted text and a plurality of predefined criteria;
find a plurality of dissimilarities between the source image and the target image by comparing each of the target image blocks inside each of the source image blocks based on a plurality of third predefined thresholds;
draw a bounding box around dissimilar source image blocks and dissimilar target image blocks; and
categorize each of the dissimilar source image blocks and the dissimilar target image blocks with a dissimilarity class; and
an output unit (114) configured to:
receive the plurality of labelled objects along with their confidence score and bounding box from the objects detection unit (108), the set of contour points or the bounding box coordinates representing dissimilarities between the source image and the wrapped target image from the image comparison unit (110), and the categorized dissimilar source image blocks and the dissimilar target image blocks from the text comparison unit (112); and
output the labelled objects and the dissimilarities by highlighting them in pixel coordinates at the user device (102).

2. The system (100) as claimed in claim 1, wherein the predetermined plurality of object classes comprises: artworks, tables, barcodes, nutri-scores, front of panel declarations (FOP), lines, symbols, images, text or pantone colours.

3. The system (100) as claimed in claim 1, wherein the object detection unit (108) uses a transfer learning (TL) model for detecting the plurality of objects on the received source image and the target image of the document.

4. The system (100) as claimed in claim 3, wherein the object detection unit (108) uses a faster RCNN ResNet50 transfer learning model.

5. The system (100) as claimed in claim 1, wherein the confident score is a number that lies between 0 and 1.

6. The system (100) as claimed in claim 1, wherein the image comparison unit (110) is configured to:
identify key points on the source image and the target image that are stable under a plurality of image transformations;
convert each of the identified key points into a binary descriptor;
identify alignment similarities between the source image and the target image by matching binary descriptors of the source image and binary descriptors of the target image;
apply homography to a plurality of pixels of the target image to generate the wrapped target image when the source image and the target image do not have alignment similarities; and
align the wrapped target image with the source image.

7. The system (100) as claimed in claim 6, wherein the image comparison unit (110) uses Oriented FAST and Rotated BRIEF (ORB) technique to identify the key points and to convert the identified key points into binary descriptors.

8. The system (100) as claimed in claim 6, wherein the plurality of image transformations comprises translation (shift), scale (increase/decrease in size), rotation, reflection, and dialation.

9. The system (100) as claimed in claim 1, wherein the image comparison unit (110) applies homography to the plurality of pixels of the target image using Random Sample Consensus (RANSAC), and wherein the plurality of pixels comprises all the pixels of the target image.

10. The system (100) as claimed in claim 1, wherein the image comparison unit (110) uses Structural Similarity Index (SSIM) to determine if the source image and the wrapped target image are similar.

11. The system (100) as claimed in claim 1, wherein the plurality of first predefined thresholds comprises:
SSIM WINDOW SIZE (25), SSIM VERIFICATION VALUE (0.925, 0.975), MORPH KERNEL SIZE (10), GAUSSIAN BLUR (3,3), RAW THRESHOLDS (205, 253), THRESHOLD RANGE (0.75, 3.0, 18.0, 50.0), and THRESHOLDS NON 0 (127.5, 160, 180, 240, 250).

12. The system (100) as claimed in claim 1, wherein the plurality of second predefined thresholds are defined based on a paper size of the source image and the target image, and line spacing, word spacing, and column spacing in the source image and the target image.

13. The system (100) as claimed in claim 1, wherein the plurality of second predefined thresholds comprises: Image top threshold = 3, Different location threshold =10, Column line spacing factor for y axis = 12, Column line spacing factor for x axis = 12, Block word spacing factor for y axis = 1.45, Block word spacing factor for y axis for A4 paper = 3.00, Block word spacing factor for x axis = 1.45, Block word spacing factor for x axis for A4 paper = 3.00, Text line spacing for x axis = 1.45, Text line spacing for y axis = 1.45, Same line spacing = 2.0, Header threshold value = 0.1, and Footer threshold value = 0.9.

14. The system (100) as claimed in claim 1, wherein the text comparison unit (112) extracts the plurality of tables in the source image and the target image by:
identifying intersection of lines and rectangles to determine tables on each page of the source image and each page of the target image;
checking if the text extracted lies within the determined tables;
extracting rows for the extracted text that lies within the determined tables and extracting cells for the extracted rows; and
extracting metadata related to the extracted text, rows and cells, wherein the metadata comprises position coordinates of the text, font size, page number and orientation of the text.

15. The system (100) as claimed in claim 1, wherein the predefined criteria for forming blocks comprises:
• if an index of a first word or text on a first page is zero, a first block is formed,
• if a page number of a current word on a current page, is not equal to a page number of a word on a previous page, a new block is formed,
• if a current word is upright and a previous word is not upright or vice-versa, a new block is formed,
• if a current word is upright and a previous word is also upright, the same block is continued,
• if a current word has ‘.’ at the end, and if a difference between lower x coordinate of the current word and a previous word is less than an image top threshold, then same block is continued,
• if a current word is bold and a previous word is not bold, then same block is continued,
• if a difference between lower y coordinate of a current word and a previous word is greater than the multiplication of current word font size and block factor, then a new block is formed, and
• if a current word has ‘.’ at the end, and if a difference between lower y coordinate of current word and a previous word is less than the image top threshold, then same block is continued.

16. The system (100) as claimed in claim 1, wherein the text comparison unit (112) compares texts in the source image blocks and the target image blocks by comparing font size of text, orientation of the text, and spacing between text.

17. The system (100) as claimed in claim 1, wherein the plurality of third predefined thresholds used by the text comparison unit (112) comprises: match threshold = 0.47, reverse match threshold = 0.65, match distance = 1000, line spacing factor = 1.25, match max bits = 0, whitespace allowable percent = 70, block word spacing factor for y axis for A4 paper = 3.00, block word spacing factor for x axis = 1.45, block word spacing factor for x axis for A4 paper = 3.00, document line spacing for x axis = 1.45, document line spacing for y axis = 1.45, and same line spacing = 2.0.

18. The system (100) as claimed in claim 1, wherein the text comparison unit (112) uses a Diff-Match-Patch algorithm by Google to compare target image blocks within the source image blocks.

19. The system (100) as claimed in claim 1, wherein the text comparison unit (112) uses Help-loc technique to compare target image blocks within the source image blocks.

20. The system (100) as claimed in claim 1, wherein the text comparison unit (112) categorizes the dissimilar source image blocks and the dissimilar target image blocks with one of the following dissimilarity classes:
• Text Difference for dissimilarities in text;
• Case Difference for dissimilarities in case (uppercase or lowercase) of text;
• Format Difference for dissimilarities in format (bold, italics) of the text between the source image blocks and the target image blocks,
• Only in Source – It represents text that is present only in source image blocks and not in the target image blocks,
Only in Target – It represents text that is present in the target image blocks and not in the source image blocks.

21. A method (700) for proofreading objects in documents, the method (700) comprising:
receiving, by an object detection unit (108), an image comparison unit (110) and a text comparison unit (112), a source image and a target image of a document from a user device (102);
detecting, by the object detection unit (108), a plurality of objects on the received source image and the target image of the document based on a predetermined plurality of object classes;
assigning, by the object detection unit (108), a label to each of the detected plurality of objects;
generating, by the object detection unit (108), a confidence score and a bounding box for each of the labelled plurality of objects;
aligning, by the image comparison unit (110), the target image with the source image to generate a wrapped target image when the source image and the target image do not have alignment similarities;
analyzing, by the image comparison unit (110), the source image and the wrapped target image to determine if the source image and the wrapped target image are similar based on a plurality of first predefined thresholds;
comparing, by the image comparison unit (110), a plurality of pixel values of the source image and a plurality of pixel values of the wrapped target image to determine dissimilarities between the source image and the wrapped target image if the source image and the wrapped target image are not similar, wherein the image comparison unit (110) determines dissimilarities between the source image and the wrapped target image as a set of contour points or bounding box coordinates;
analyzing, by the text comparison unit (112), the received source image and the target image to determine a number of pages of the received source image and the target image;
extracting, by the text comparison unit (112), text from each determined page of the source image and for each determined page of the target image based on a plurality of second predefined thresholds, and capturing location index of the extracted text;
extracting, by the text comparison unit (112), a plurality of tables from each determined page of the source image and for each determined page of the target image;
forming, by the text comparison unit (112), blocks for the extracted text and each of the extracted plurality of tables in the source image and the target image based on the extracted text and a plurality of predefined criteria;
finding, by the text comparison unit (112), a plurality of dissimilarities between the source image and the target image by comparing each of the target image blocks inside each of the source image blocks based on a plurality of third predefined thresholds;
drawing, by the text comparison unit (112), a bounding box around dissimilar source image blocks and dissimilar target image blocks;
categorizing, by the text comparison unit (112), each of the dissimilar source image blocks and the dissimilar target image blocks with a dissimilarity class;
receiving, by an output unit (114), the plurality of labelled objects along with their confidence score and bounding box from the objects detection unit (108), the set of contour points or the bounding box coordinates representing dissimilarities between the source image and the wrapped target image from the image comparison unit (110), and the categorized dissimilar source image blocks and the dissimilar target image blocks from the text comparison unit (112); and
outputting, by the output unit (114), the labelled objects and the dissimilarities by highlighting them in pixel coordinates at the user device (102).

22. The method (700) as claimed in claim 21, wherein aligning, by the image comparison unit (110) comprises:
identifying key points on the source image and the target image that are stable under a plurality of image transformations;
converting each of the identified key points into a binary descriptor;
identifying alignment similarities between the source image and the target image by matching binary descriptors of the source image and binary descriptors of the target image;
applying homography to a plurality of pixels of the target image to generate the wrapped target image when the source image and the target image do not have alignment similarities; and
aligning the wrapped target image with the source image.

23. The method (700) as claimed in claim 21, wherein the plurality of first predefined thresholds comprises:
SSIM WINDOW SIZE (25), SSIM VERIFICATION VALUE (0.925, 0.975), MORPH KERNEL SIZE (10), GAUSSIAN BLUR (3,3), RAW THRESHOLDS (205, 253), THRESHOLD RANGE (0.75, 3.0, 18.0, 50.0), and THRESHOLDS NON 0 (127.5, 160, 180, 240, 250).

24. The method (700) as claimed in claim 21, wherein wherein the plurality of second predefined thresholds comprises: Image top threshold = 3, Different location threshold =10, Column line spacing factor for y axis = 12, Column line spacing factor for x axis = 12, Block word spacing factor for y axis = 1.45, Block word spacing factor for y axis for A4 paper = 3.00, Block word spacing factor for x axis = 1.45, Block word spacing factor for x axis for A4 paper = 3.00, Text line spacing for x axis = 1.45, Text line spacing for y axis = 1.45, Same line spacing = 2.0, Header threshold value = 0.1, and Footer threshold value = 0.9.

25. The method (700) as claimed in claim 21, wherein wherein the predefined criteria for forming blocks comprises:
• if an index of a first word or text on a first page is zero, a first block is formed,
• if a page number of a current word on a current page, is not equal to a page number of a word on a previous page, a new block is formed,
• if a current word is upright and a previous word is not upright or vice-versa, a new block is formed,
• if a current word is upright and a previous word is also upright, the same block is continued,
• if a current word has ‘.’ at the end, and if a difference between lower x coordinate of the current word and a previous word is less than an image top threshold, then same block is continued,
• if a current word is bold and a previous word is not bold, then same block is continued,
• if a difference between lower y coordinate of a current word and a previous word is greater than the multiplication of current word font size and block factor, then a new block is formed, and
• if a current word has ‘.’ at the end, and if a difference between lower y coordinate of current word and a previous word is less than the image top threshold, then same block is continued.

26. The method (700) as claimed in claim 21, wherein wherein the plurality of third predefined thresholds used by the text comparison unit (112) comprises: match threshold = 0.47, reverse match threshold = 0.65, match distance = 1000, line spacing factor = 1.25, match max bits = 0, whitespace allowable percent = 70, block word spacing factor for y axis for A4 paper = 3.00, block word spacing factor for x axis = 1.45, block word spacing factor for x axis for A4 paper = 3.00, document line spacing for x axis = 1.45, document line spacing for y axis = 1.45, and same line spacing = 2.0.

Documents

Application Documents

#	Name	Date
1	202241029635-STATEMENT OF UNDERTAKING (FORM 3) [23-05-2022(online)].pdf	2022-05-23
2	202241029635-PROVISIONAL SPECIFICATION [23-05-2022(online)].pdf	2022-05-23
3	202241029635-PROOF OF RIGHT [23-05-2022(online)].pdf	2022-05-23
4	202241029635-POWER OF AUTHORITY [23-05-2022(online)].pdf	2022-05-23
5	202241029635-FORM FOR SMALL ENTITY(FORM-28) [23-05-2022(online)].pdf	2022-05-23
6	202241029635-FORM FOR SMALL ENTITY [23-05-2022(online)].pdf	2022-05-23
7	202241029635-FORM 1 [23-05-2022(online)].pdf	2022-05-23
8	202241029635-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [23-05-2022(online)].pdf	2022-05-23
9	202241029635-DRAWINGS [23-05-2022(online)].pdf	2022-05-23
10	202241029635-DECLARATION OF INVENTORSHIP (FORM 5) [23-05-2022(online)].pdf	2022-05-23
11	202241029635-DRAWING [18-05-2023(online)].pdf	2023-05-18
12	202241029635-CORRESPONDENCE-OTHERS [18-05-2023(online)].pdf	2023-05-18
13	202241029635-COMPLETE SPECIFICATION [18-05-2023(online)].pdf	2023-05-18
14	202241029635-PostDating-(12-06-2023)-(E-6-193-2023-CHE).pdf	2023-06-12
15	202241029635-APPLICATIONFORPOSTDATING [12-06-2023(online)].pdf	2023-06-12