An Automated System And A Methodology For Artwork Proofreading

< Back

An Automated System And A Methodology For Artwork Proofreading

Abstract: An Automated System and a Methodology for Artwork Proofreading Abstract The present invention discloses a system (100) and a method (500) for proofreading of artworks. The system (100) includes a user device (102) monitored by a user (104) and an text analysis unit (106). The text analysis unit (106) is configured to find dissimilarities between a source document and a target document related to an artwork. The text analysis unit (106) receives the source document and the target document and extracts text and a plurality of tables for the received source document and the target document. The text analysis unit (106) then forms blocks for the extracted text and each of the extracted plurality of tables in the source document and the target document and finds a plurality of dissimilarities between the source document and the target document by comparing each of the target document blocks inside each of the source documents blocks. The text analysis unit (106) then displays the plurality of dissimilarities found along with their annotations on the user device (102). FIG. 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

23 November 2022

Publication Number

22/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Karomi Technology Private Limited

VBC Solitaire Building, 9th Floor, 47 & 49, Bazullah Rd, Parthasarathi Puram, T. Nagar, Chennai, Tamil Nadu, 600017, India

Inventors

1. Sreenivas Narasimha Murali

G1 Urvi, Voora Prithvi Apartments, 141 Kamarajar Salai, Kottivakkam, Chennai - 600041

Specification

DESC:An Automated System and a Methodology for Artwork Proofreading

FIELD OF INVENTION
The present invention relates to a system and a method for proofreading of artworks. More specifically, the present invention relates to an automated system and method for proofreading of artworks to identify dissimilarities between an original document and a printed copy of the original document of an artwork.

BACKGROUND OF INVENTION
Investments in packaging is increasing day-by-day in various industries specially in pharmaceutical, food and beverages, etc. Sale of any packaged good launched in the market by any industry depends on content about the packaged good given on the packaging. The content present on the packaging is referred to as an artwork related to the packaged good, wherein the artwork includes images, brand name, text, composition, nutritional value table, etc. related to the packaged good. Hence, an artwork management system plays a crucial role in these industries.

Globally, thousands of artworks are approved every day in different industries and in different languages. During developments of these artworks, lots of versions of an artwork are formed where all the versions of the artwork have some or many dissimilarities. Such dissimilarities may cause confusion and lack of trust in minds of customers as to whether they are buying correct packaged good or not. This is true, especially in case of medicines, as the customers need to buy the right composition and quantity of medicine. Hence, proof reading of the artworks is required before the goods are packaged and dispatched in the market.

Most of the known industries do not have dedicated proof-readers for detecting the dissimilarities in the artworks. Moreover, in most of the industries, proofreading is done manually or by scanning the artworks, leading to high probability of human errors. In such a scenario, the artworks can only be compared when the packaged products have been packed and are ready to be shipped. This causes a lot of money and time wastage, as if the artworks have dissimilarities there is no way of correcting the dissimilarities. The industry officials either have to ship the packaged products as it or will have to repeat the entire packaging process, thereby leading to monetary losses.

Further, errors in proof reading, especially in pharmaceuticals and food processing industries can lead to recalls and can be quite a pricey task. For example, if the packaged product is there in the market and without the mistake identified, it may lead to a packaged product recall. In the past, there have been many historic expensive recalls that took in pharma industry, which lead to loss of billions of dollars.

Therefore, there is a need for a system and a method for proofreading of artworks for increasing accuracy and reducing time taken for proofreading of artworks. Further, there is a need for an automated system and method for proofreading of artworks to find dissimilarities well before packaged goods are packaged without any human intervention.

OBJECT OF INVENTION
The object of the present invention is to provide a system and a method for proofreading of artworks by comparing text for increasing accuracy and reducing time taken for proofreading of artworks. More specifically, the object of the present invention is to provide an automated system and method for proofreading of artworks to find dissimilarities well before packaged goods are packaged without any human intervention. Further, the object of the present invention is to provide an efficient mechanism for an automated system and method for proofreading of artworks to identify dissimilarities between an original document and a printed copy of the original document of an artwork.

SUMMARY
The present application discloses a system for proofreading of artworks. The present application discloses that the system includes a user device monitored by a user, and text analysis unit. The text analysis unit includes a text extraction unit, a text comparison unit, and an output unit. The text extraction unit is configured to receive a source document and a target document from the user device. The text extraction unit is further configured to analyze the received source document and the target document to determine a number of pages of the received source document and the target document.

Further, the text extraction unit is configured to extract text from each determined page of the source document and for each determined page of the target document based on a plurality of predefined thresholds, and capture location index of the extracted text. The plurality of predefined thresholds are defined based on a paper size of the source document and the target document, and line spacing, word spacing, and column spacing in the source document and the target document. Also, the text extraction unit is configured to extract a plurality of tables from each determined page of the source document and for each determined page of the target document. Once the text and the plurality of tables are extracted, the text extraction unit forms blocks for the extracted text and each of the extracted plurality of tables in the source document and the target document based on the extracted text and a plurality of predefined criteria.

The text comparison unit is configured to receive the source document blocks and the target blocks from the text extraction unit and find a plurality of dissimilarities between the source document and the target document by comparing each of the target document blocks inside each of the source documents blocks based on a plurality of thresholds. The text comparison unit is further configured to annotate each of the plurality of dissimilarities found with a colour scheme.

The output unit is configured to receive the plurality of dissimilarities found along with their annotation from the text comparison unit and display the plurality of dissimilarities found along with their annotations on the user device.

The present disclosure further discloses a method for proofreading of artworks. The method includes receiving, by a text extraction unit, a source document and a target document from a user device. The method further includes analyzing, by the text extraction unit, the source document and the target document individually to determine number of pages of the source document and the target document. Also, the method includes extracting, by the text extraction unit, text from each determined page of the source document and for each determined page of the target document based on a plurality of predefined thresholds. The method further includes capturing, by the text extraction unit, location or index of the extracted text

Further, the method includes extracting, by the text extraction unit, a plurality of tables from each determined page of the source document and for each determined page of the target document. The method includes forming, by the text extraction unit, blocks for the extracted text and each of the extracted plurality of tables in the source document and the target document based on the extracted text and a plurality of predefined criteria.

Furthermore, the method includes receiving, by a text comparison unit, the source document blocks and the target document blocks from the text extraction unit. The method includes finding, by the text comparison unit, a plurality of dissimilarities between the source document and the target document by comparing each of the target document blocks inside each of the source documents blocks based on a plurality of thresholds. Also, the method includes annotating, by the text comparison unit, each of the found plurality of dissimilarities with a different colour scheme; and displaying, by an output unit, the plurality of dissimilarities along with their annotation on the source document and the target document.

BRIEF DESCRIPTION OF DRAWINGS
The novel features and characteristics of the disclosure are set forth in the description. The disclosure itself, however, as well as a preferred mode of use, further objectives, and advantages thereof, will best be understood by reference to the following description of an illustrative embodiment when read in conjunction with the accompanying drawings. One or more embodiments are now described, by way of example only, with reference to the accompanying drawings wherein like reference numerals represent like elements and in which:

FIG. 1 illustrates a system 100 for proofreading of artworks, in accordance with an embodiment of the present disclosure.
FIG. 2 illustrates an exemplary source document 202 and target document 204 related to an artwork 200 inputted by the user 102, in accordance with an embodiment of the present disclosure.
FIG. 3 illustrates an exemplary reverse matching method used in the Help-loc technique, in accordance with an embodiment of the present disclosure.
FIG. 4 illustrates an exemplary display 400 on the user device 102 displaying a dissimilarity between the source document 402 and the target document 404, in accordance with an embodiment of the present disclosure.
FIG. 5 illustrates a method 500 for proofreading of artworks, in accordance with an embodiment of the present disclosure.

The figures depict embodiments of the disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the assemblies, structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.

DETAILED DESCRIPTION
The best and other modes for carrying out the present invention are presented in terms of the embodiments, herein depicted in drawings provided. The embodiments are described herein for illustrative purposes and are subject to many variations. It is understood that various omissions and substitutions of equivalents are contemplated as circumstances may suggest or render expedient, but are intended to cover the application or implementation without departing from the spirit or scope of the present invention. Further, it is to be understood that the phraseology and terminology employed herein are for the purpose of the description and should not be regarded as limiting. Any heading utilized within this description is for convenience only and has no legal or limiting effect.

The terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items.

The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more sub-systems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other, sub-systems, elements, structures, components, additional sub-systems, additional elements, additional structures or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this invention belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting.

The system, methods, and examples provided herein are only illustrative and not intended to be limiting.

Embodiments of the present invention will be described below in detail with reference to the accompanying figures.

The present invention focusses on providing a system and a method for proofreading of artworks in goods or products produced by diverse industries, such as consumer packaged goods, pharmaceuticals, etc. Sale of any packaged good launched in the market by any industry depends on content about the packaged good given on the packaging. The content present on the packaging is referred to as an artwork related to the packaged good, wherein the artwork includes images, brand name, text, composition, nutritional value table, etc. related to the packaged good. Therefore, an artwork plays a crucial role in sale of a packaged good. However, during developments of these artworks lots of versions of an artwork are formed where all the versions of the artwork have some or many dissimilarities. Therefore, such dissimilarities may cause confusion and lack of trust in minds of customers as to whether they are buying correct packaged good or not.

Most of the known industries do not have dedicated proof-readers for detecting the dissimilarities in the artworks. Moreover, in most of the industries, proofreading is done manually or by scanning the artworks, leading to high probability of human errors. Further, the errors in proof reading, especially in pharmaceuticals and food processing industries may lead to recalls and may be quite a pricey task. For example, if the packaged product is there in the market and without the mistake identified, it may lead to a packaged product recall. Therefore, the present disclosure discloses a system and a method for proofreading of artworks for increasing accuracy and reducing time taken for proofreading of artworks. Further, the present disclosure discloses an automated system and method for proofreading of artworks to find dissimilarities well before packaged goods are packaged without any human intervention.

FIG. 1 illustrates a system 100 for proofreading of artworks, in accordance with an embodiment of the present disclosure. The system 100 is configured to perform proofreading of artworks by comparing text present on the artworks. The system 100 includes a user device 102 monitored by a user 104, and a text analysis unit 106. The user device 102 relates to hardware component such as a keyboard, mouse, etc which accepts data from the user 104 and also relates to a hardware component such as a display screen of a desktop, laptop, tablet, etc. which displays data to the user 104.

The user device 102 is configured to allow the user 104 to input a source document and a target document related to an artwork of a packaged good. The source document may be a scanned copy of an original artwork that has been prepared for packaged good and has been digitally approved by the user 104, and the target document may be a scanned copy of a printed version of the same digitally approved source document. The source document and the target document are related to a same artwork. The source document and the target document may include information such as name of the medicine, composition and quantity of the medicine, dosage and precaution instruction, storage guidelines, distributer information, bar code, etc. The user 104 may be, but not limited to, an official of a packaged good industry responsible for preparing of the artwork, a third party who may have received order of artworks preparation, etc.

FIG. 2 illustrates an exemplary source document 202 and target document 204 related to an artwork 200 inputted by the user 102, in accordance with an embodiment of the present disclosure. As illustrated in FIG. 2, the source document 202 and the target document 204 are relate to a same artwork.

The user device 102 is further configured to send the source document and the target document to the text analysis unit 106. In an embodiment of the present disclosure, the user device 102 may send the source document and the target document to the text analysis unit 106 in a PDF format. In another embodiment, the user device 102 may send the source document and the target document to the text analysis unit 106 as an excel document, a word document, a Brief document, a QRD document (Quality Review Document), non-structured documents such as labels, cartons, blister foils, etc. In yet another embodiment, the user device 102 may send the source document and the target document to the text analysis unit 106 in any format representing text in sentences or in tabular form.

The text analysis unit 106 is a hardware component which is capable for processing any data or information received by them. In certain embodiments, the text analysis unit 106 may be part of any regularly devices, such as laptops, desktops, tablets, mobile devices, etc. The text analysis unit 106 includes a text extraction unit 108, a text comparison unit 110, and an output unit 112.

The text extraction unit 108 is configured to receive the source document and the target document from the user device 102. On receiving the source document and the target document from the user device 102, the text extraction unit 108 checks if the source document and the target document are in a PDF format. If not, the text extraction unit 108 first converts the received source document and the target document into a PDF format. In an embodiment of the present disclosure, the text extraction unit 108 may provide a graphical user interface (GUI) to the user 104 on the user device 102 enabling the user 104 to manually select areas from the source document and the target document which need to be compared to find dissimilarities between the source document and the target document.

The text extraction unit 108 is further configured to analyze the source document and the target document individually to determine number of pages of the source document and the target document. Once the number of pages have been determined, the text extraction unit 108 is configured to extract text from each determined page of the source document and for each determined page of the target document based on a plurality of predefined thresholds. The extracted text may include, but not limited to, a word or a plurality of words (hereinafter interchangeably used as word or text), lines or blocks. The plurality of predefined thresholds are defined depending on a paper size of the source document and the target document, and line spacing, word spacing, and column spacing in the source document and the target document. The plurality of predefined thresholds may include, but not limited to, the following:
• Document top threshold – 3: It represents a threshold for a word spacing;
• Different location threshold -10
• Column line spacing factor for y axis – 12: This spacing provides a threshold value to avoid combing two lines. If a vertical gap is too much, then they form separate columns.
• Column line spacing factor for x axis – 12: This spacing provides a threshold value so that if two lines are below each other but the horizontal gap is too much, then they form separate columns.
• Block word spacing factor for y axis - 1.45: It represents a threshold to define y distance between two words for a non A4 size.
• Block word spacing factor for y axis for A4 paper - 3.00: It represents a threshold to define y distance between two words for an A4 size.
• Block word spacing factor for x axis - 1.45: It represents a threshold to define x distance between two words for a non A4 size.
• Block word spacing factor for x axis for A4 paper - 3.00: It represents a threshold to define x distance between two words for an A4 size.
• Artwork line spacing for x axis - 1.45: It represents line spacing factor for x axis.
• Artwork line spacing for y axis - 1.45: It represents line spacing factor for y axis.
• Same line spacing - 2.0: It represents a threshold for same line spacing.
• Header threshold value – 0.1: It represents a threshold to remove header.
• Footer threshold value – 0.9: It represents a threshold to remove footer.

The text extraction unit 108 also defines and sets tolerance values (for x and y) for each of the plurality of predefined thresholds so that words are not combined with each other during extraction of the test from the source document and the target document. In an embodiment of the present disclosure, the GUI helps the user 104 to customize the plurality of predefined thresholds based on the type of the source document and the target document and based on requirements of a particular industry. For example, different plurality of thresholds may be defined for food industry and different for the pharmaceutical industry, thereby being able to customize to all customer needs and scenarios.

After extracting text from the source document and the target document, the text extraction unit 108 captures location or index of the extracted text. The text extraction unit 108 is further configured to extract a plurality of tables from each determined page of the source document and for each determined page of the target document. The text extraction unit 108 first identifies intersection of lines and rectangles to determine tables on each page of the source document and each page of the target document. The text extraction unit 108 then checks if any of the text extracted by the text extraction unit 108 lies within the determined tables. If an extracted text lies within a determined table, the text extraction unit 108 extracts rows for that extracted text and cells for the extracted rows. The text extraction unit 108 also extracts metadata related to the extracted text, rows and cells. The metadata may include, but not limited to, position coordinates of the text, font size, page number and orientation of the text. In an embodiment of the present disclosure, the text extraction unit 108 extracts the tables using an object detection system. In another embodiment of the present disclosure, the text extraction unit 108 may extract the tables using any technique known.

The text extraction unit 108 is further configured to form blocks for the extracted text and each of the extracted plurality of tables in the source document and the target document based on the extracted text and a plurality of predefined criteria. The text extraction unit 108 uses the following predefined criteria to form blocks:
• if an index of a first word or text on a first page (of the source document and the target document) is zero, a first block is formed,
• if a page number of a current word on a current page (page currently considered for comparison), is not equal to a page number of a word on a previous page (page previous to the current page), a new block is formed,
• if a current word is upright and a previous word is not upright or vice-versa, a new block is formed,
• if a current word is upright and a previous word is also upright, the same block is continued,
• if a current word has ‘.’ at the end, and if a difference between lower x coordinate of the current word and a previous word is less than a document top threshold, then same block is continued,
• if a current word is bold and a previous word is not bold, then same block is continued,
• if a difference between lower y coordinate of a current word and a previous word is greater than the multiplication of current word font size and block factor, then a new block is formed,
• if a current word has ‘.’ at the end, and if a difference between lower y coordinate of current word and a previous word is less than the document top threshold, then same block is continued.
Once the blocks have been formed in the source document and the target document, the text extraction unit 108 sends the source document blocks and the target document blocks to the text comparison unit 110.

The text comparison unit 110 is configured to receive the source document blocks and the target document blocks from the text extraction unit 108 and to find dissimilarities between the source document and the target document by comparing each of the target document blocks inside each of the source documents blocks based on a plurality of thresholds. The text comparison unit 110 compares two texts in the source document block and the target document block on various factors, such as, font size of text, orientation of the text, spacing between text (block spacing, line spacing), etc. The text comparison unit 110 uses the following plurality of thresholds for comparing the target documents blocks within the source document blocks:
• match threshold - 0.47
• reverse match threshold - 0.65
• match distance - 1000
• line spacing factor - 1.25
• match max bits - 0
• whitespace allowable percent - 70
• block word spacing factor for y axis for A4 paper - 3.00
• block word spacing factor for x axis - 1.45
• block word spacing factor for x axis for A4 paper - 3.00
• artwork line spacing for x axis - 1.45
• artwork line spacing for y axis - 1.45
• same line spacing - 2.0

In an embodiment of the present disclosure, the text comparison unit 110 uses a Diff-Match-Patch algorithm by Google to find dissimilarities between the source document and the target document. The Diff-Match-Patch algorithm compares two blocks of text and returns a list of differences. For example, let’s say given a string to search, the Diff-Match-Patch algorithm finds a best fuzzy match in a text block, giving weightage to both accuracy and location. Then list of patches is applied to text, in case the underlying text does not match.

In another embodiment of the present disclosure, the text comparison unit 110 uses Help-loc technique to find dissimilarities between the source document and the target document. The Help-loc technique uses reverse matching between the source document blocks and the target document blocks to find similarities and dissimilarities between them as a small target document block matched within a large source document block will have similar backward location match based on forward location, a length of the small target document block, and a lengthof the source document block, and is represented as equation 1:
Help-loc = length (target document block) - 1 - (forward location + length (source document block) - 1) ----------------------------(1)
FIG. 3 illustrates an exemplary reverse matching method used in the Help-loc technique, in accordance with an embodiment of the present disclosure. The text comparison unit 110 uses the Help-loc technique to find a match of the target documents blocks within the source document blocks. At step 302, as a first match is found between a target document block and a source document block, a forward location of the match is noted. At step 304, based on the Help-loc (as represented in equation (1)) and the forward location of the match, a reverse matching is performed for the target document block within the source location block.

At step 306, a location of an actual matched string is then found in the target document block and the source document block class objects. At step 308, a check is performed if the actual matched string is empty. If the actual matched string is empty, the method moves to step 310. If the actual matched string is not empty, the method moves back to step 306.

At step 310, dissimilarities are found between the source document block and the actual matched string from the target document block. At step 312, indexes are provided to the dissimilarities found for annotation to indicate different dissimilarities with different colour schemes. At step 314, unanalysed text in the target document blocks is formed into a new block and the method continues to step 302.

The text comparison unit 110 finds a plurality of dissimilarities between the source document blocks and the target document blocks and annotates each of the plurality of dissimilarities found with a different colour scheme. The text comparison unit 110 annotates the plurality of dissimilarities found with the following colours:
• Red colour – Red colour indicates dissimilarities in text between the source document blocks and the target document blocks, and includes variations in a text/word,
• Yellow colour – Yellow colour indicates dissimilarities in case (uppercase or lowercase) of text between the source document blocks and the target document blocks,
• Blue colour – Blue colour indicates dissimilarities in format (bold, italics) of the text between the source documents blocks and the target document blocks,
• Pink colour – Pink colour indicates text with colour difference between the source document blocks and the target document blocks,
• Gray colour in Source Document - Gray colour in source document indicates text that is present only in source documents blocks and not in the target document blocks,
• Gray colour in Target Document – Gray colour in target document indicates text that is present in the target document blocks and not in the source document blocks, and
• Black colour – Black colour indicates any unknown difference between the source document blocks and the target document blocks.

In an embodiment of the present disclosure, the text comparison unit 110 allows the user 104 to customize the plurality of thresholds for comparing the source document blocks and the target document blocks through the GUI. For example, in case when the source document and the target document is a Brief document, if there is a small text saying “sodium” which may appear twice in the artwork (in the ingredients and in no sodium statement). In such a scenario, the text comparison unit 110 performs highest to lowest level of block ordering to get correct matches between the source document blocks and the target document blocks. Further, in case when the source document and the target document is in the form of a leaflet, no such ordering is required, because generally the ordering is same in a leaflet for both the source document and the target document.

The text comparison unit 110 sends the plurality of dissimilarities found between the source document blocks and the target document blocks along with their annotations to the output unit 112. The output unit 112 is configured to receive the plurality of dissimilarities found between the source document blocks and the target document blocks along with their annotations from the text comparison unit 110 and to display the dissimilarities along with their annotation on the user device 102, thereby making it easy for the user 104 to view the dissimilarities between the source document and the target document on the user device 102. The output unit 112 may display the plurality of dissimilarities either by highlighting the plurality of dissimilarities with their annotated colour, forming a box of annotated colour around the plurality of dissimilarities, underlining the plurality of dissimilarities with their annotated colour, etc.

FIG. 4 illustrates an exemplary display 400 on the user device 102 displaying a dissimilarity between the source document 402 and the target document 404, in accordance with an embodiment of the present disclosure. As illustrated in FIG. 4, since the dissimilarity is in text (g in source document and mg in target document), the text comparison unit 110 annotates the dissimilarity found by a red highlighted box and the same has been displayed by the output unit 112 on the user device 102.

FIG. 5 illustrates a method 500 for proofreading of artworks, in accordance with an embodiment of the present disclosure. The step 502 includes receiving, by the text extraction unit 108, a source document and a target document from a user device 102. The step 504 includes analyzing, by the text extraction unit 108, the source document and the target document individually to determine number of pages of the source document and the target document. The step 506 includes extracting, by the text extraction unit 108, text from each determined page of the source document and for each determined page of the target document based on a plurality of predefined thresholds. The plurality of predefined thresholds are defined depending on a paper size of the source document and the target document, and line spacing, word spacing, and column spacing in the source document and the target document.

The step 508 includes capturing, by the text extraction unit 108, location or index of the extracted text. The step 510 includes extracting, by the text extraction unit 108, a plurality of tables from each determined page of the source document and for each determined page of the target document. The step 512 includes forming, by the text extraction unit 108, blocks for the extracted text and each of the extracted plurality of tables in the source document and the target document based on the extracted text and a plurality of predefined criteria.

The step 514 includes receiving, by the text comparison unit 110, the source document blocks and the target document blocks from the text extraction unit 108. The step 516 includes finding, by the text comparison unit 110, a plurality of dissimilarities between the source document and the target document by comparing each of the target document blocks inside each of the source documents blocks based on a plurality of thresholds. The text comparison unit 110 compares two texts in the source document block and the target document block on various factors, such as, font size of text, orientation of the text, spacing between text (block spacing, line spacing), etc.

The step 518 includes annotating, by the text comparison unit 110, each of the found plurality of dissimilarities with a different colour scheme. The step 520 includes displaying, by the output unit 112, the plurality of dissimilarities along with their annotation on the user device 102.

The system and method for proofreading of artworks disclosed in the present disclosure have numerous advantages. The system and method for proofreading of artworks disclosed in the present disclosure proofreads the artworks by comparing text, thereby increasing accuracy and reducing time taken for proofreading of artworks. Further, the system and method for proofreading of artworks disclosed in the present disclosure finds dissimilarities automatically and well before packaged goods are packaged without any human intervention.

The present disclosure provides an efficient mechanism for an automated system and method for proofreading of artworks to identify dissimilarities between an original document and a printed copy of the original document of an artwork. The system and the method disclosed in the present disclosure efficiently computes difference in text, case and format between a source document and a target document, thereby providing accurate, reliable, and efficient artworks proofreading results.

Further, the graphical use interface (GUI) incorporated within the system helps in reduces the user efforts to a great extent. The interactive GUI assists the users to perform their job more efficiently with reduced efforts. The interactive GUI allows to customize the plurality of predefined thresholds based on the type of the source document and the target document and based on requirements of a particular industry. For example, different plurality of thresholds may be defined for food industry and different for the pharmaceutical industry, thereby being able to customize to all customer needs and scenarios. This provides the user an ability to efficiently compare multiple documents and artworks, identify differences, monitor visualization effects to the differences with comments, in many diverse industries say, consumer packaged goods, pharmaceuticals etc.

The embodiments herein and the various features and advantageous details thereof are explained with reference to the non-limiting embodiments in the description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.

The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments.
It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the embodiments as described herein.

Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.

The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the disclosure to achieve one or more of the desired objects or results.

Any discussion of documents, acts, materials, devices, articles and the like that has been included in this specification is solely for the purpose of providing a context for the disclosure.

It is not to be taken as an admission that any or all of these matters form a part of the prior art base or were common general knowledge in the field relevant to the disclosure as it existed anywhere before the priority date of this application.

The numerical values mentioned for the various physical parameters, dimensions or quantities are only approximations and it is envisaged that the values higher/lower than the numerical values assigned to the parameters, dimensions or quantities fall within the scope of the disclosure, unless there is a statement in the specification specific to the contrary.

While considerable emphasis has been placed herein on the particular features of this disclosure, it will be appreciated that various modifications can be made, and that many changes can be made in the preferred embodiments without departing from the principles of the disclosure. These and other modifications in the nature of the disclosure or the preferred embodiments will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the disclosure and not as a limitation
,CLAIMS:I/We Claim:
1. An automated system (100) for proofreading of artworks comprising:
a text extraction unit (108) configured to:
receive a source document and a target document from a user device (102);
analyze the received source document and the target document to determine a number of pages of the received source document and the target document;
extract text from each determined page of the source document and for each determined page of the target document based on a plurality of predefined thresholds, and capture location index of the extracted text;
extract a plurality of tables from each determined page of the source document and for each determined page of the target document; and
form blocks for the extracted text and each of the extracted plurality of tables in the source document and the target document based on the extracted text and a plurality of predefined criteria;
a text comparison unit (110) configured to:
receive the source document blocks and the target blocks from the text extraction unit (108);
find a plurality of dissimilarities between the source document and the target document by comparing each of the target document blocks inside each of the source documents blocks based on a plurality of thresholds; and
annotate each of the plurality of dissimilarities found with a colour scheme; and
an output unit configured to:
display the plurality of dissimilarities found along with their annotations on the user device (102).

2. The system (100) as claimed in claim 1, wherein the source document is a scanned copy of an original artwork prepared for a packaged good and the target document is a scanned copy of a printed version of the original artwork.

3. The system (100) as claimed in claim 1, wherein the text analysis unit (108) checks if the received source document and the target document are in a PDF format and converts the received source document and the target document into a PDF format if they are not in a PDF format before analyzing the received source document and the target document.

4. The system (100) as claimed in claim 1, wherein the text comprises a word, a plurality of words, lines, or blocks.

5. The system (100) as claimed in claim 1, wherein the plurality of predefined thresholds are defined based on a paper size of the source document and the target document, and line spacing, word spacing, and column spacing in the source document and the target document.

6. The system (100) as claimed in claim 1, wherein the plurality of thresholds comprises: Document top threshold = 3, Different location threshold =10, Column line spacing factor for y axis = 12, Column line spacing factor for x axis = 12, Block word spacing factor for y axis = 1.45, Block word spacing factor for y axis for A4 paper = 3.00, Block word spacing factor for x axis = 1.45, Block word spacing factor for x axis for A4 paper = 3.00, Artwork line spacing for x axis = 1.45, Artwork line spacing for y axis = 1.45, Same line spacing = 2.0, Header threshold value = 0.1, and Footer threshold value = 0.9.

7. The system (100) as claimed in claim 1, wherein the text extraction unit (108) extracts the plurality of tables in the source document and the target document by:
identifying intersection of lines and rectangles to determine tables on each page of the source document and each page of the target document;
checking if the text extracted lies within the determined tables;
extracting rows for the extracted text that lies within the determined tables and extracting cells for the extracted rows; and
extracting metadata related to the extracted text, rows and cells, wherein the metadata comprises position coordinates of the text, font size, page number and orientation of the text.

8. The system (100) as claimed in claim 1, wherein the text extraction unit (108) uses an object detection system for extracting the plurality of tables.

9. The system (100) as claimed in claim 1, wherein the predefined criteria for forming blocks comprises:
• if an index of a first word or text on a first page is zero, a first block is formed,
• if a page number of a current word on a current page, is not equal to a page number of a word on a previous page, a new block is formed,
• if a current word is upright and a previous word is not upright or vice-versa, a new block is formed,
• if a current word is upright and a previous word is also upright, the same block is continued,
• if a current word has ‘.’ at the end, and if a difference between lower x coordinate of the current word and a previous word is less than a document top threshold, then same block is continued,
• if a current word is bold and a previous word is not bold, then same block is continued,
• if a difference between lower y coordinate of a current word and a previous word is greater than the multiplication of current word font size and block factor, then a new block is formed, and
• if a current word has ‘.’ at the end, and if a difference between lower y coordinate of current word and a previous word is less than the document top threshold, then same block is continued.

10. The system (100) as claimed in claim 1, wherein the text comparison unit (110) compares texts in the source document blocks and the target document blocks by comparing font size of text, orientation of the text, and spacing between text.

11. The system (100) as claimed in claim 1, wherein the plurality of thresholds used by the text comparison unit (110) comprises: match threshold = 0.47, reverse match threshold = 0.65, match distance = 1000, line spacing factor = 1.25, match max bits = 0, whitespace allowable percent = 70, block word spacing factor for y axis for A4 paper = 3.00, block word spacing factor for x axis = 1.45, block word spacing factor for x axis for A4 paper = 3.00, artwork line spacing for x axis = 1.45, artwork line spacing for y axis = 1.45, and same line spacing = 2.0.

12. The system (100) as claimed in claim 1, wherein the text comparison unit (110) uses a Diff-Match-Patch algorithm by Google to compare target document blocks within the source document blocks.

13. The system (100) as claimed in claim 1, wherein the text comparison unit (110) uses Help-loc technique to compare target document blocks within the source document blocks.

14. The system (100) as claimed in claim 1, wherein the text comparison unit (110) annotates the dissimilarities found with the following colour schemes:
• Red colour for dissimilarities in text,
• Yellow colour for dissimilarities in case (uppercase or lowercase) of text,
• Blue colour for dissimilarities in format (bold, italics) of the text,
• Pink colour for text with colour difference,
• Gray colour (in Source Document) for text that is present only in source documents blocks and not in the target document blocks,
• Gray colour (in Target Document) for text that is present in the target document blocks and not in the source document blocks, and
• Black colour for an unknown difference.

15. The system (100) as claimed in claim 1, wherein the system (100) provides a graphical user interface (GUI) to a user (104) to customize the plurality of predefined thresholds, the plurality of predefined criteria, and the plurality of thresholds based on the type of the source document and the target document and based on requirements of a particular industry, and
wherein the GUI allows the user (104) to manually select areas from the source document and the target document which need to be compared to find dissimilarities between the source document and the target document.

16. A method (500) for proofreading of artworks, the method (500) comprising:
receiving, by a text extraction unit (108), a source document and a target document from a user device (102);
analyzing, by the text extraction unit (108), the source document and the target document individually to determine number of pages of the source document and the target document;
extracting, by the text extraction unit (108), text from each determined page of the source document and for each determined page of the target document based on a plurality of predefined thresholds;
capturing, by the text extraction unit (108), location or index of the extracted text;
extracting, by the text extraction unit (108), a plurality of tables from each determined page of the source document and for each determined page of the target document;
forming, by the text extraction unit (108), blocks for the extracted text and each of the extracted plurality of tables in the source document and the target document based on the extracted text and a plurality of predefined criteria;
receiving, by a text comparison unit (110), the source document blocks and the target document blocks from the text extraction unit (108);
finding, by the text comparison unit (110), a plurality of dissimilarities between the source document and the target document by comparing each of the target document blocks inside each of the source documents blocks based on a plurality of thresholds;
annotating, by the text comparison unit (110), each of the found plurality of dissimilarities with a different colour scheme; and
displaying, by an output unit (112), the plurality of dissimilarities along with their annotation on the source document and the target document.

17. The method (500) as claimed in claim 16, wherein the plurality of thresholds comprises: Document top threshold = 3, Different location threshold =10, Column line spacing factor for y axis = 12, Column line spacing factor for x axis = 12, Block word spacing factor for y axis = 1.45, Block word spacing factor for y axis for A4 paper = 3.00, Block word spacing factor for x axis = 1.45, Block word spacing factor for x axis for A4 paper = 3.00, Artwork line spacing for x axis = 1.45, Artwork line spacing for y axis = 1.45, Same line spacing = 2.0, Header threshold value = 0.1, and Footer threshold value = 0.9.

18. The method (500) as claimed in claim 16, wherein the predefined criteria for forming blocks comprises:
• if an index of a first word or text on a first page is zero, a first block is formed,
• if a page number of a current word on a current page, is not equal to a page number of a word on a previous page, a new block is formed,
• if a current word is upright and a previous word is not upright or vice-versa, a new block is formed,
• if a current word is upright and a previous word is also upright, the same block is continued,
• if a current word has ‘.’ at the end, and if a difference between lower x coordinate of the current word and a previous word is less than a document top threshold, then same block is continued,
• if a current word is bold and a previous word is not bold, then same block is continued,
• if a difference between lower y coordinate of a current word and a previous word is greater than the multiplication of current word font size and block factor, then a new block is formed, and
• if a current word has ‘.’ at the end, and if a difference between lower y coordinate of current word and a previous word is less than the document top threshold, then same block is continued.

19. The method (500) as claimed in claim 16, wherein the plurality of thresholds used by the text comparison unit (110) comprises: match threshold = 0.47, reverse match threshold = 0.65, match distance = 1000, line spacing factor = 1.25, match max bits = 0, whitespace allowable percent = 70, block word spacing factor for y axis for A4 paper = 3.00, block word spacing factor for x axis = 1.45, block word spacing factor for x axis for A4 paper = 3.00, artwork line spacing for x axis = 1.45, artwork line spacing for y axis = 1.45, and same line spacing = 2.0.

20. The method (500) as claimed in claim 16, wherein the method comprises using Help-loc technique to compare target document blocks within the source document blocks, and
wherein the Help-loc technique comprises:
(1) finding a first match between a target document block and a source document block and noting a forward location of the match found;
(2) performing a reverse matching for the target document block with the source document block;
(3) finding a location of an actual match string in the target document block;
(4) checking if the actual match string is empty, wherein dissimilarities are found between the source document block and the actual matched string from the target document block when the actual match string is empty, and moving to step (3) when the actual match string is not empty;
(5) providing indexes to the dissimilarities found for annotation to indicate different dissimilarities with different colour schemes; and
(6) forming a new block for unanalyzed text in the target document and continuing to step (1).

Documents

Application Documents

#	Name	Date
1	202241029636-STATEMENT OF UNDERTAKING (FORM 3) [23-05-2022(online)].pdf	2022-05-23
2	202241029636-PROVISIONAL SPECIFICATION [23-05-2022(online)].pdf	2022-05-23
3	202241029636-PROOF OF RIGHT [23-05-2022(online)].pdf	2022-05-23
4	202241029636-POWER OF AUTHORITY [23-05-2022(online)].pdf	2022-05-23
5	202241029636-FORM FOR SMALL ENTITY(FORM-28) [23-05-2022(online)].pdf	2022-05-23
6	202241029636-FORM FOR SMALL ENTITY [23-05-2022(online)].pdf	2022-05-23
7	202241029636-FORM 1 [23-05-2022(online)].pdf	2022-05-23
8	202241029636-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [23-05-2022(online)].pdf	2022-05-23
9	202241029636-DRAWINGS [23-05-2022(online)].pdf	2022-05-23
10	202241029636-DECLARATION OF INVENTORSHIP (FORM 5) [23-05-2022(online)].pdf	2022-05-23
11	202241029636-DRAWING [02-05-2023(online)].pdf	2023-05-02
12	202241029636-CORRESPONDENCE-OTHERS [02-05-2023(online)].pdf	2023-05-02
13	202241029636-COMPLETE SPECIFICATION [02-05-2023(online)].pdf	2023-05-02
14	202241029636-PostDating-(12-06-2023)-(E-6-192-2023-CHE).pdf	2023-06-12
15	202241029636-APPLICATIONFORPOSTDATING [12-06-2023(online)].pdf	2023-06-12