Sign In to Follow Application
View All Documents & Correspondence

Method And System For Targeted Extraction Of Labelled Listings And Information From A Product Flyer

Abstract: Extracting valuable insights from retail or marketing flyers is crucial for both retailers and consumers. Existing techniques reliant on training flyer images, which struggle with abrupt variations in flyer layouts and designs. A flyer is received in a portable document format as an input document. Each page of the input document is processed by mask region-based convolutional neural network (R-CNN) to convert into images based on PDF processing library. Regions of Interest (ROIs) is identified for each image based on attribute classes of the products by using (i) a class overlapping measures, and (ii) a class distance measures. A textual information is extracted from the identified ROIs within each image by implementing an optical character recognition (OCR) based on the attribute classes. The extracted textual information is segmented by a price extraction technique to create bounding boxes around each character within a Price ROI of a Product ROI. A pricing information is extracted using the OCR within height-based clusters. [To be published with FIG. 2]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
07 March 2024
Publication Number
37/2025
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. VASUDEVAN, Bagya Lakshmi
Tata Consultancy Services Limited, Module-4, 3rd Floor South Block, Chennai One IT SEZ Phase-2, 200 Feet Radial Rd, MCN Nagar Extension, Pallavaram, Thoraipakkam, Kotivakkam, Chennai 600097, Tamil Nadu, India
2. ABRAHAM, Kuruvilla
Tata Consultancy Services Limited, Lucerna Tower, Plot A2B, Sector 125, Noida 201303, Uttar Pradesh, India
3. SHARMA, Gaurav
Tata Consultancy Services Limited, PTI, No. 4, Sansad Marg, Connaught Place, New Delhi 110001, Delhi, India
4. CHATTERJEE, Kallol
Tata Consultancy Services Limited, Lucerna Tower, Plot A2B, Sector 125, Noida 201303, Uttar Pradesh, India
5. CHAKRAPANI, Chakrapani
Tata Consultancy Services Limited, Lucerna Tower, Plot A2B, Sector 125, Noida 201303, Uttar Pradesh, India
6. SOM, Suvodip
Tata Consultancy Services Limited, Gitanjali Park. Plot-II/F/3 Action Area -II, Gitanjali Road, Newtown, Kolkata 700156, West Bengal, India

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
METHOD AND SYSTEM FOR TARGETED EXTRACTION OF LABELLED LISTINGS AND INFORMATION FROM A PRODUCT FLYER
Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description:
The following specification particularly describes the invention and the manner in which it is to be performed.
2
TECHNICAL FIELD
[001]
The disclosure herein generally relates to data management, and, more particularly, to a method and system for targeted extraction of labelled listings and information from a product flyer.
5
BACKGROUND
[002]
In an era marked by digitization and information abundance, ubiquitous promotional materials i.e., retail or marketing flyers persist as potent tools which influence consumer’s behaviour. Marketing flyers are physical advertisements that provide information about a store’s offerings. They can 10 encourage customers to visit physical stores or shop online. The ubiquitous promotional materials bridge the gap between merchants and customers, yet their full potential remains largely untapped. Extracting valuable insights from retail or marketing flyers is crucial for both retailers and consumers. The retailers can use the data to stay competitive by analyzing pricing strategies, optimizing product 15 placement, and adapting to market trends. The consumers, on the other hand, can use the extracted product prices to make informed purchasing decisions, compare offers across different stores, and save money. The business problem surrounding retail flyer information extraction lies in the inefficiency of manually gathering and analyzing data from diverse and multilingual retail flyers. Traditional methods of 20 extracting relevant details such as pricing, promotions, and product information are time-consuming and prone to errors. Retailers face the challenge of processing vast amounts of flyer data across various formats, languages, font types, color, and text positioning and designs. Extracting actionable insights swiftly from the diverse materials becomes a bottleneck for timely decision-making, impacting promotional 25 strategies, inventory management, and pricing competitiveness. Existing algorithms often falter when faced with dynamic changes in flyer layouts or designs, leading to inaccuracies and inefficiencies. The existing techniques rely on training flyer images, which struggle with abrupt variations in the flyer layouts and designs. 30
3
SUMMARY
[003]
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one aspect, a method of extracting labelled listings and information from a product flyer 5 is provided. The processor implemented method includes: receiving via one or more hardware processors, a flyer as an input document; processing, via the one or more hardware processors, each page of the input document by a mask region-based convolutional neural network (R-CNN) to convert into one or more images based on a PDF processing library; identifying, via the one or more hardware processors, 10 one or more Regions of Interest (ROIs) for each image based on one or more attribute classes of the one or more products by using (i) a class overlapping measures (COM), and (ii) a class distance measures (CDM); extracting, via the one or more hardware processors, a textual information from the identified ROIs within each image by implementing an optical character recognition (OCR) based on the 15 one or more attribute classes; segmenting, via the one or more hardware processors, the extracted textual information by a price extraction technique to create one or more bounding boxes around each character within a Price ROI of a Product ROI; and extracting, via the one or more hardware processors, a pricing information using the OCR within height-based clusters. The flyer includes information 20 associated with one or more products. The one or more regions of interest (ROIs) correspond to a relationship among one or more attribute classes associated with the flyer. The one or more bounding boxes are created by identifying and isolating individual segments or characters within each cropped image using an edge detection technique. A height of the one or more bounding boxes are computed by 25 used regions in the cropped image. The height of the one or more bounding boxes are clustered based on similarity using a K-means clustering technique. A Paddle OCR extracts the pricing information within a separate grouped character. The separate grouped character are concatenated to extract the pricing information.
[004]
In an embodiment, the one or more entities of the one or more 30 products corresponds to (i) a name, (ii) a price, and (iii) a discount. In an
4
embodiment, t
he class overlapping measures (COM) corresponds to one or more overlapping ROIs among the one or more distinct class within a product region of the image. In an embodiment, the class overlapping measures (COM) is interpreted by Intersection over Minimum (IoM). In an embodiment, the IoM corresponds to a ratio between an intersection area and a minimum area between one or more 5 bounding boxes. In an embodiment, the one or more distinct classes are (i) a complete overlapped position, or (ii) a partial overlapped position. In an embodiment, the class distance measures (CDM) corresponds to a distance between one or more distinct ROIs to generate a distance vector. In an embodiment, each bounding box is accompanied by one or more centroid coordinates to point a 10 location of each detected class. In an embodiment, an Euclidean distance between corresponding centroids is computed to associate nearby relation using one or more coordinate values. In an embodiment, the one or more ranks are marked to each class based on the computed distance vector. Each calculated distance is ranked to assign one or more proximity scores. In an embodiment, the association of classes 15 is determined by each proximity score of corresponding ROIs. In an embodiment, the extracted texts from a product name ROIs are input to categorize into entities by implementing a Bidirectional Encoder Representations from Transformers (BERT)-based Named Entity Recognition (NER) model. In an embodiment, the categorized entities are combined with extracted product prices. In an embodiment, 20 the BERT-based NER model is employed which is trained on large number of product description OCR text, for identification of the product name and the description from the OCR extracted text.
[005]
In another aspect, there is provided a system for extraction of labelled listings and information from a product flyer. The system includes a 25 memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: receive, a flyer as an input document; process, each page of the input document by a mask region-based convolutional neural 30 network (R-CNN) to convert into one or more images based on a PDF processing
5
library; identify, one or more Regions of Interest (ROIs) for each image
based on one or more attribute classes of the one or more products by using (i) a class overlapping measures (COM), and (ii) a class distance measures (CDM); extract, a textual information from the identified ROIs within each image by implementing an optical character recognition (OCR) based on the one or more attribute classes; 5 segment, the extracted textual information by a price extraction technique to create one or more bounding boxes around each character within a Price ROI of a Product ROI; and extract, a pricing information using the OCR within height-based clusters. The flyer includes information associated with one or more products. The one or more regions of interest (ROIs) correspond to a relationship among one or more 10 attribute classes associated with the flyer. The one or more bounding boxes are created by identifying and isolating individual segments or characters within each cropped image using an edge detection technique. A height of the one or more bounding boxes are computed by used regions in the cropped image. The height of the one or more bounding boxes are clustered based on similarity using a K-means 15 clustering technique . A Paddle OCR extracts the pricing information within a separate grouped character. The separate grouped character are concatenated to extract the pricing information.
[006]
In an embodiment, the one or more entities of the one or more products corresponds to (i) a name, (ii) a price, and (iii) a discount. In an 20 embodiment, the class overlapping measures (COM) corresponds to one or more overlapping ROIs among the one or more distinct class within a product region of the image. In an embodiment, the class overlapping measures (COM) is interpreted by Intersection over Minimum (IoM). In an embodiment, the IoM corresponds to a ratio between an intersection area and a minimum area between one or more 25 bounding boxes. In an embodiment, the one or more distinct classes are (i) a complete overlapped position, or (ii) a partial overlapped position. In an embodiment, the class distance measures (CDM) corresponds to a distance between one or more distinct ROIs to generate a distance vector. In an embodiment, each bounding box is accompanied by one or more centroid coordinates to point a 30 location of each detected class. In an embodiment, an Euclidean distance between
6
corresponding
centroids is computed to associate nearby relation using one or more coordinate values. In an embodiment, the one or more ranks are marked to each class based on the computed distance vector. Each calculated distance is ranked to assign one or more proximity scores. In an embodiment, the association of classes is determined by each proximity score of corresponding ROIs. In an embodiment, 5 the extracted texts from a product name ROIs are input to categorize into entities by implementing a Bidirectional Encoder Representations from Transformers (BERT)-based Named Entity Recognition (NER) model. In an embodiment, the categorized entities are combined with extracted product prices. In an embodiment, the BERT-based NER model is employed which is trained on large number of 10 product description OCR text, for identification of the product name and the description from the OCR extracted text.
[007]
In yet another aspect, a non-transitory computer readable medium for comprising one or more instructions which when executed by one or more hardware processors causes at least one of: receiving, a flyer as an input document; 15 processing, each page of the input document by a mask region-based convolutional neural network (R-CNN) to convert into one or more images based on a PDF processing library; identifying, one or more Regions of Interest (ROIs) for each image based on one or more attribute classes of the one or more products by using (i) a class overlapping measures (COM), and (ii) a class distance measures (CDM); 20 extracting, a textual information from the identified ROIs within each image by implementing an optical character recognition (OCR) based on the one or more attribute classes; segmenting, the extracted textual information by a price extraction technique to create one or more bounding boxes around each character within a Price ROI of a Product ROI; and extracting, a pricing information using the OCR 25 within height-based clusters. The flyer includes information associated with one or more products. The one or more regions of interest (ROIs) correspond to a relationship among one or more attribute classes associated with the flyer. The one or more bounding boxes are created by identifying and isolating individual segments or characters within each cropped image using an edge detection 30 technique. A height of the one or more bounding boxes are computed by used
7
regions in
the cropped image. The height of the one or more bounding boxes are clustered based on similarity using a K-means clustering technique . A Paddle OCR extracts the pricing information within a separate grouped character. The separate grouped character are concatenated to extract the pricing information.
[008]
In an embodiment, the one or more entities of the one or more 5 products corresponds to (i) a name, (ii) a price, and (iii) a discount. In an embodiment, the class overlapping measures (COM) corresponds to one or more overlapping ROIs among the one or more distinct class within a product region of the image. In an embodiment, the class overlapping measures (COM) is interpreted by Intersection over Minimum (IoM). In an embodiment, the IoM corresponds to a 10 ratio between an intersection area and a minimum area between one or more bounding boxes. In an embodiment, the one or more distinct classes are (i) a complete overlapped position, or (ii) a partial overlapped position. In an embodiment, the class distance measures (CDM) corresponds to a distance between one or more distinct ROIs to generate a distance vector. In an embodiment, each 15 bounding box is accompanied by one or more centroid coordinates to point a location of each detected class. In an embodiment, an Euclidean distance between corresponding centroids is computed to associate nearby relation using one or more coordinate values. In an embodiment, the one or more ranks are marked to each class based on the computed distance vector. Each calculated distance is ranked to 20 assign one or more proximity scores. In an embodiment, the association of classes is determined by each proximity score of corresponding ROIs. In an embodiment, the extracted texts from a product name ROIs are input to categorize into entities by implementing a Bidirectional Encoder Representations from Transformers (BERT)-based Named Entity Recognition (NER) model. In an embodiment, the 25 categorized entities are combined with extracted product prices. In an embodiment, the BERT-based NER model is employed which is trained on large number of product description OCR text, for identification of the product name and the description from the OCR extracted text.
8
[009]
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS 5
[010]
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[011]
FIG. 1 illustrates a system for targeted extraction of labelled listings and information from a product flyer, according to an embodiment of the present 10 disclosure.
[012]
FIG. 2 illustrates an exemplary block diagram of the system of FIG.1 for targeted extraction of the labelled listings and the information from the product flyer, according to an embodiment of the present disclosure.
[013]
FIG. 3 is an exemplary schematic view illustrating derivation of an 15 intersection over minimum (IoM) to interpret overlapping of two distinct regions of interest (ROI’s), according to an embodiment of the present disclosure.
[014]
FIG. 4A is an exemplary block diagram illustrating a complete overlapping of the two distinct ROI’s, according to an embodiment of the present disclosure. 20
[015]
FIG. 4B is an exemplary block diagram illustrating a partial overlapping of the two distinct ROI’s, according to an embodiment of the present disclosure.
[016]
FIG. 5 is an exemplary graphical representation illustrating a scatter plot depicting an association of one or more distinct ROIs, according to an 25 embodiment of the present disclosure.
[017]
FIG. 6A is an exemplary graphical representation illustrating an ROI layout and extraction of one or more regions of interest (ROIs) from a page of the product flyer, according to an embodiment of the present disclosure.
9
[018]
FIG. 6B is an exemplary graphical representation illustrating measured distances between various distinct ROI’s, according to an embodiment of the present disclosure.
[019]
FIG. 6C is an exemplary graphical representation illustrating a ROI association after applying a class distance measure (CDM) and a class overlap 5 measure (COM), according to an embodiment of the present disclosure.
[020]
FIG. 6D is an exemplary graphical representation illustrating a product details extraction by leveraging an optical character recognition (OCR) engine, according to an embodiment of the present disclosure.
[021]
FIG. 7 is an exemplary flow diagram illustrating a method of 10 extracting the labelled listings and information from the product flyer, according to some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
[022]
Exemplary embodiments are described with reference to the 15 accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are 20 possible without departing from the scope of the disclosed embodiments.
[023]
There is a need for an automated approach to accurately extract information from a product flyer. Embodiments of the present disclosure provides a method and system for targeted extraction of labelled listings and information from the product flyer. The embodiments of the present disclosure provide an 25 ensemble of an overlapping approach and a distance-based approach for the targeted extraction of labelled listings and information from the product flyer. The product flyer include retail information associated with one or more products.
[024]
Referring now to the drawings, and more particularly to FIGS. 1 through 7, where similar reference characters denote corresponding features 30 consistently throughout the figures, there are shown preferred embodiments, and
10
these embodiments are described in the context of the following exemplary system
and/or method.
[025]
FIG. 1 illustrates a system 100 for targeted extraction of labelled listings and information from the product flyer, according to an embodiment of the present disclosure. In an embodiment, the system 100 includes one or more 5 processor(s) 102, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 104 operatively coupled to the one or more processors 102. The memory 104 includes a database. The one or more processor(s) processor 102, the memory 104, and the I/O interface(s) 106 may be coupled by a system bus such as a system bus 108 or a 10 similar mechanism. The one or more processor(s) 102 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more processor(s) 102 is 15 configured to fetch and execute computer-readable instructions stored in the memory 104. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, a network cloud, and the like.
[026]
The I/O interface device(s) 106 can include a variety of software and 20 hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface device(s) 106 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a camera device, and a printer. Further, the I/O interface device(s) 106 may enable the system 100 to communicate with 25 other devices, such as external databases. The I/O interface device(s) 106 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. In an embodiment, the I/O interface device(s) 106 can include one or more ports 30 for connecting a number of devices to one another.
11
[027]
The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an 5 embodiment, the memory 104 includes a plurality of modules 110 and a repository 112 for storing data processed, received, and generated by the plurality of modules 110. The plurality of modules 110 may include routines, programs, objects, components, data structures, and so on, which perform particular tasks or implement particular abstract data types. 10
[028]
Further, the database stores information pertaining to inputs fed to the system 100 and/or outputs generated by the system (e.g., data/output generated at each stage of the data processing) 100, specific to the methodology described herein. More specifically, the database stores information being processed at each step of the proposed methodology. 15
[029]
Additionally, the plurality of modules 110 may include programs or coded instructions that supplement applications and functions of the system 100. The repository 112, amongst other things, includes a system database 114 and other data 116. The other data 116 may include data generated as a result of the execution of one or more modules in the plurality of modules 110. Herein, the memory for 20 example the memory 104 and the computer program code configured to, with the hardware processor for example the processor 102, causes the system 100 to perform various functions described herein under.
[030]
FIG. 2 illustrates an exemplary block diagram of the system 100 of FIG.1 for targeted extraction of the labelled listings and the information from the 25 product flyer, according to an embodiment of the present disclosure. A system 200 may be an example of the system 100 (FIG. 1). In an example embodiment, the system 200 may be embodied in, or is in direct communication with the system, for example the system 100 (FIG. 1). The system 200 includes a flyer input processing unit 202, a regions of interest (ROI) detection unit 204, a text extraction unit 206, a 30 label extraction unit 208, and a product flyer information extraction unit 210. The
12
flyer input processing unit 202
receive a flyer as an input document for extraction. For example, the flyer include retail information associated with one or more products. The flyer is obtained as a portable document format (PDF). Each page of the PDF is converted into one or more images by a PDF processing library to analyze each individual page of the product flyer. The system 200 employs a model, 5 utilizing a deep learning architecture i.e., a mask region-based convolutional neural network (R-CNN) to convert scanned or text PDFs into images, and annotate to identify valid products within each page of the product flyer.
[031]
FIG. 6A is an exemplary graphical representation illustrating an ROI layout and extraction of one or more regions of interest (ROIs) from a page of the 10 product flyer, according to an embodiment of the present disclosure. The regions of interest (ROI) detection unit 204 identifies one or more coordinates for one or more entities or one or more attribute classes e.g., k pre-defined class. Pre-trained weights from the R-CNN model is employed to accurately to identify the one or more regions of interest (ROIs) of one or more products. The one or more entities 15 of the one or more products are alternatively referred as one or more attribute classes. The one or more attribute classes correspond to but is not limited to: (a) price, (b) discount, (c) a product name, and (d) a description, which are associated based on minimum average distance. The regions of interest (ROI) detection unit 204 bind one or more classes based on a relationship, or association among the one 20 or more attribute classes present in each product flyer. The association among concerned regions determine a most suitable distinct class, which includes one or more specific properties associated with the one or more products. The ROIs are auto-annotated, to obtain likelihood probabilities across the one or more attribute classes. Each ROI carries a predictive probability for each class, facilitating 25 subsequent categorization and segmentation. In an embodiment, the distinct class is determined based on training knowledge of the R-CNN model which is trained on large number of annotated flyers. The distinct classes are predicted by the R-CNN model based on the likelihood probabilities, and one or more bounding box are created. The individual classes are identified which are mapped into singular 30 product ROIS. The regions of interest (ROI) detection unit 204 identify the one or
13
more Regions of Interest (ROIs) for each image of the one or more products by
using (i) a class overlapping measures (COM), and (ii) a class distance measures (CDM).
[032]
The regions of interest (ROI) detection unit 204 includes, an overlap detection unit 204A, and a distance detection unit 204B. The overlap detection unit 5 204A detect overlapping ROIs within the image based on the class overlapping measures (COM). The class overlapping measures (COM) is an identification and tracking of a ROI overlapping possibility among the distinct classes to form a product region. The overlapping ROIs are grouped into one or more single product representations. The class overlapping measures (COM) is mathematically 10 interpreted by Intersection over Minimum (IoM). The IoM is defined as a ratio between an intersection area and a minimum area between two bounding boxes i.e., a minimum area of the bounding box is considered. FIG. 3 is an exemplary schematic view illustrating derivation of the intersection over minimum (IoM) to interpret overlapping of two distinct ROI’s, according to an embodiment of the 15 present disclosure. Consider, A and B can be two distinct ROI’s being overlapped, the intersection over minimum (IoM) are derived as:
IoM (T, S) = T ∩ S/min (T, S)
Where T and S both are regions and the ∩ denotes the intersection of the two regions. 20
[033]
For example, considering two bounding boxes representing different detections within an image i.e., one bounding box represents a product label, and the other represents a price tag. The IoM metric is calculated to check if both classes are mapped to a singular product ROIs.
[034]
FIG. 4A is an exemplary block diagram illustrating a complete 25 overlapping of the two distinct ROI’s, according to an embodiment of the present disclosure. In a first scenario, the two distinct class’s ROI are overlapping completely between areas A and B, i.e., one smaller ROI is fully merged under comparatively bigger ROI.
Let P (A) = 0.8, P (B) = 0.4, P (A∩B) =0.4 then P(A∪B) = 0.8 30
IoU = 0.4/0.8 = 0.50,
14
IoM = 0.4/0.4 = 1
[035]
FIG. 4B is an exemplary block diagram illustrating a partial overlapping of the two distinct ROI’s, according to an embodiment of the present disclosure. In a second scenario, the two distinct class’s ROI are not completely merged, instead some portion of an ROI is overlapping with other distinct ROI. 5
Let P (A) = 0.8, P (B) = 0.6, P (A∩B) =0.5 then P(A∪B) = 0.9
IoU = 0.5/0.9 = 0.55,
IoM = 0.5/0.6 = 0.83
[036]
The distance detection unit 204B compute distances between one or more ROIs which are extracted and grouped into cohesive product details based on 10 the class distance measures (CDM). The class distance measures (CDM) is a distance between the extracted ROIs i.e., one or more distinct ROIs are measured, to generate a distance vector. The distance vector is generated, showcasing the spatial relationships between various entities. The distance vector encapsulates pairwise distances between each ROI in the image as depicted in FIG. 6B is an 15 exemplary graphical representation illustrating measured distances between various distinct ROI’s, according to an embodiment of the present disclosure. Each bounding box is accompanied by centroid coordinates that pinpoint the location of the detected entity. An Euclidean distance between corresponding centroids is computed to associate nearby relation, using one or more coordinate values. A one 20 or more ranks are marked to each class based on the computed distance vector. The formula for distance calculation is a square root of a sum of squares of the differences in x and y coordinates as given below equation.
𝐷 (𝐶1→𝐶2)= √(𝑋1−𝑋2)2+ (𝑌1−𝑌2)2
where D represents distance between two centroids C1 and C2, of the 25 bounding boxes two distinct classes. Here, 𝑋1, 𝑌1 are the coordinates of the centroid of C1, and 𝑋2, 𝑌2 are the coordinates of the centroid of C2.
[037]
The calculated distances are ranked to assign one or more proximity scores. Smaller distances imply a closer proximity, indicating potential associations between ROIs. The one or more proximity scores are mapped to respective classes 30 or categories based on predefined thresholds or class labels. The association of
15
classes is determined by the proximity of
corresponding ROIs and thereby the closest entities are associated in final step as depicted in FIG. 6C is an exemplary graphical representation illustrating a ROI association after applying the class distance measure (CDM) and the class overlap measure (COM), according to an embodiment of the present disclosure. 5
[038]
In an embodiment, a collective minimum mean distance is computed among three ROIs at a time. The three ROIs are grouped into cohesive product details based on calculated distances. FIG. 5 is an exemplary graphical representation illustrating a scatter plot depicting the association of the one or more distinct ROIs, according to an embodiment of the present disclosure. The 10 relationship among the four distinct classes is clearly visible. Neighboring classes can be mapped together that represent a specific product content.
[039]
The text extraction unit 206 extract a textual information from the identified ROIs of the one or more entities (e.g., the product name, the price, and the discount) within each image by implementing an optical character recognition 15 (OCR). The extraction of the textual information by the OCR from cropped images of the flyers from the associated ROIs.
[040]
FIG. 6D is an exemplary graphical representation illustrating product details extraction by leveraging the optical character recognition (OCR) engine, according to an embodiment of the present disclosure. The label extraction 20 unit 208 focus on price ROI areas of each image to segment the textual information by the price extraction technique to create one or more bounding boxes around each character within the Price ROI of a Product ROI. An Open Computer Vision Library i.e., an edge detection technique e.g., a Canny Edge Detection is performed to identify and isolate individual segments or characters within the cropped image, 25 thereby the one or more bounding boxes are created. A height of the one or more bounding boxes are computed. The height of the one or more bounding boxes are clustered based on similarity using a K-means clustering technique i.e., by grouping characters or the one or more bounding boxes based on corresponding heights. A connected component analysis is utilized to compute the height of the one or more 30 bounding boxes by used regions in the cropped image. Each bounding box are
16
cropped separately and input each ROI to
the optical character recognition (OCR) engine (e.g., Paddle OCR) which involves a strategic association of the OCR text into distinct entities, particularly focusing on the product name, the description, the price, and the quantity. A Part-Of-Speech (POS) tagging is employed, allowing for extraction of nouns and proper nouns from the OCR extracted text for isolating and 5 categorizing one or more essential linguistic elements for identification. A pricing information is extracted using the OCR within height-based clusters i.e., a list of heights is obtained from the bounding boxes dimensions. The list of heights are clustered into four clusters. A rule based technique is applied to filter out unaligned prices, strike-out prices, and quantities. The Paddle OCR is leveraged to extract the 10 pricing information within the separate grouped characters and finally the grouped characters are concatenated to extract the price separately. In an embodiment, any unaligned characters is concatenated to associate into prices.
[041]
The extracted texts from the product name ROIs are inputted to categorize into entities i.e., the product name and product quality by implementing 15 a Bidirectional Encoder Representations from Transformers (BERT)-based Named Entity Recognition (NER) model. The BERT-based NER model is employed which is trained on large number of product description OCR text, for identification of the Product name and the description from the OCR extracted text. The product flyer information extraction unit 210 combines the categorized entities with already 20 extracted product prices. The extracted information are structured into a JSON format. The structured JSON data is compiled from all images/pages into a single comprehensive output file and saved the final JSON file containing organized product details extracted from all pages of the flyer. In an embodiment, a fuzzy string-matching algorithms, a partial matching process ensues, comparing the 25 elements from the list with the extracted linguistic components, and with a predefined threshold of 60 per cent for matching accuracy comparing with the consolidated list of product name and description. The text or the nouns surpassing the predefined threshold are identified and marked as potential product names and description respectively highlighting a high degree of similarity to items present in 30 the consolidated list.
17
[042]
FIG. 7 is an exemplary flow diagram illustrating a method 700 of extracting the labelled listings and information from the product flyers, according to some embodiments of the present disclosure. In an embodiment, the system 100 comprises one or more data storage devices or the memory 104 operatively coupled to the one or more hardware processors 102 and is configured to store instructions 5 for execution of steps of the method by the one or more processors 102. The flow diagram depicted is better understood by way of following explanation/description. The steps of the method 700 of the present disclosure will now be explained with reference to the components of the system as depicted in FIG. 1, and FIG. 2.
[043]
At step 702, a flyer is received as an input document. The flyer 10 includes information associated with one or more products. The flyer is obtained as a portable document format (PDF). At step 704, each page of the input document is processed by the mask region-based convolutional neural network (R-CNN) to convert into the one or more images based on the PDF processing library. At step 706, the one or more Regions of Interest (ROIs) for each image is identified based 15 on the one or more attribute classes of the one or more products by using (i) the class overlapping measures (COM), and (ii) the class distance measures (CDM). The one or more regions of interest (ROIs) correspond to a relationship among one or more attribute classes associated with the flyer. The one or more attribute classes of the one or more products correspond to (i) a name, (ii) a price, and (iii) a discount. 20 The class overlapping measures (COM) corresponds to the one or more overlapping ROIs among the one or more distinct classes within the product region of the image. The class overlapping measures (COM) is interpreted by the Intersection over Minimum (IoM) (as described in corresponding description of FIG. 3). The IoM corresponds to the ratio between the intersection area and the minimum area 25 between the one or more bounding boxes. The one or more distinct classes is either in the complete overlapped position (as described in corresponding description of FIG. 4A), or in the partial overlapped position (as described in corresponding description of FIG. 4B). The class distance measures (CDM) (as described in corresponding description of FIG. 6B and FIG. 6C) correspond to the distance 30 between one or more distinct ROIs to generate the distance vector. Each bounding
18
box is accompanied by
the one or more centroids coordinate to point the location of each detected class. The Euclidean distance between corresponding centroids is computed to associate nearby relation using the one or more coordinate values. The one or more ranks are marked to each class based on the computed distance vector. Each calculated distance is ranked to assign the one or more proximity scores. The 5 association of classes is determined by each proximity score of corresponding ROIs. At step 708, a textual information is extracted from the identified ROIs within each image by implementing an optical character recognition (OCR) based on the one or more attribute classes.
[044]
At step 710, the extracted textual information is segmented by the 10 price extraction technique to create one or more bounding boxes around each character within the Price ROI of the Product ROI. The one or more bounding boxes are created by identifying and isolating individual segments or characters within each cropped image using the edge detection technique e.g., the Canny Edge Detection. The height of the one or more bounding boxes are computed by used 15 regions in the cropped image. The height of the one or more bounding boxes are clustered based on similarity using the K-means clustering technique i.e., by grouping characters or the one or more bounding boxes based on corresponding heights.
[045]
At step 712, the pricing information is extracted using the OCR 20 within the height-based clusters. A Paddle OCR extracts the pricing information within the separate grouped character. The separate grouped character is concatenated to extract the pricing information. The extracted texts from a product name ROIs are input to categorize into entities by implementing the BERT-based NER model. The categorized entities are combined with extracted product prices. 25 The BERT NER model is employed which is trained on large number of product description OCR text for identification of the product name and the description from the OCR extracted text.
[046]
The embodiments of present disclosure herein address unresolved problem of accuracy in information extraction from the product flyers. The 30 embodiments of present disclosure herein address the gap in accurately identifying
19
and delineating product regions within
the product flyer by an ensemble of the overlapping approach and the distance-based approach, allowing enhanced product detection within the product flyers with better accuracy and automatically corrects misclassification. The distance-based approach complements other methods by providing spatial context, enabling efficient ROI association, and improving the 5 overall accuracy of object detection systems. The embodiments of present disclosure leverage distinct technique to identify and extract product entities accurately from the product flyers. Moreover, the embodiments of present disclosure consolidate disparate product entities into a unified representation, overcoming the limitations of existing approaches. The disclosed approach is 10 designed to adapt dynamically towards the changes in flyer layouts and designs and in addition to the training process, by employing the distance-based and the overlapping technique. The embodiments of present disclosure create a robust framework capable of detecting, extracting, and aligning product information despite variations in flyer presentations. The emphasis on additional dynamic 15 techniques rather than just static image-based training sets the disclosed approach apart, ensuring greater resilience to sudden alterations in the flyer designs. The adaptive nature of the disclosed approach allows for more reliable and accurate extraction of crucial retail information, promising significant advancements in the automation of the product flyer data extraction processes. The class overlapping 20 measures (COM) consolidates and refine object detections within images, offering a systematic way to associate and merge ROIs that represent the same object or entity. The IoM (Intersection over Minimum area) which is more accurate in associating ROIs. The embodiments of present disclosure identifies products from the flyer including the name, the price, and the discount using the association 25 approach and the price tag extraction technique. The claimed approach identify whole product information and also identify correct product using Optical character recognition technique and the entity based extraction. each character and prices are identified and separated based on heights of the text based on clusters.
[047]
The embodiments of present disclosure ensure precise identification 30 of one or more product areas and overcomes the limitations of simplistic image-
20
based approaches.
Through the OCR integration, the disclosed approach efficiently captures textual information, mitigating the challenges posed by varying layouts i.e., diverse flyer designs, and languages. The embodiments of present disclosure in which the BERT-based NER model to identify names from a text and then correctly verifies using a fuzzy match program based on a input dictionary. The 5 named entity recognition (NER) technique which precisely identifies and labels product names, overcoming the limitations of simplistic keyword-based or template-driven recognition. The disclosed approach addresses the gap in precise and adaptable extraction of price details, by the price tag extraction technique which refines the extraction process by specifically targeting and accurately retrieving 10 pricing information. Adaptive techniques and machine learning integration in the disclosed approach ensures robustness in face of varying flyer formats, overcoming the rigidity of template-dependent techniques.
[048]
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the 15 subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims. 20
[049]
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device 25 can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an 30 ASIC and an FPGA, or at least one microprocessor and at least one memory with
21
software
processing components located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs. 5
[050]
The embodiments herein can comprise hardware and softwareelements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a 10 computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[051]
The illustrated steps are set out to explain the exemplaryembodiments shown, and it should be anticipated that ongoing technological 15 development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are 20 appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are 25 intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. 30
22
[052]
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more 5 processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, 10 nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[053]
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.We Claim:
1. A processor implemented method (700), comprising:
receiving (702), via one or more hardware processors, a flyer as an input document, wherein the flyer comprises information associated with one or more products;
processing (704), via the one or more hardware processors, each page of the input document by a mask region-based convolutional neural network (R-CNN) to convert into one or more images based on a PDF processing library;
identifying (706), via the one or more hardware processors, one or more Regions of Interest (ROIs) for each image based on one or more attribute classes of the one or more products by using (i) a class overlapping measures (COM), and (ii) a class distance measures (CDM), and wherein the one or more regions of interest (ROIs) correspond to a relationship among one or more attribute classes associated with the flyer;
extracting (708), via the one or more hardware processors, a textual information from the identified ROIs within each image by implementing an optical character recognition (OCR) based on the one or more attribute classes;
segmenting (710), via the one or more hardware processors, the extracted textual information by a price extraction technique to create one or more bounding boxes around each character within a Price ROI of a Product ROI, wherein the one or more bounding boxes are created by identifying and isolating individual segments or characters within each cropped image using an edge detection technique, wherein a height of the one or more bounding boxes are computed by used regions in the cropped image, and wherein the height of the one or more bounding boxes are clustered based on similarity using a K-means clustering technique; and
extracting (712), via the one or more hardware processors, a pricing information using the OCR within height-based clusters, wherein a Paddle OCR extracts the pricing information within at least one separate grouped character, and wherein at least one separate grouped character are concatenated to extract the pricing information.

2. The processor implemented method (700) as claimed in claim 1, wherein the flyer is obtained as a portable document format (PDF), and wherein the one or more attribute classes of the one or more products correspond to at least one of (i) a name, (ii) a price, and (iii) a discount.
3. The processor implemented method (700) as claimed in claim 1, wherein the class overlapping measures (COM) correspond to one or more overlapping ROIs among at least one distinct class within a product region of the image, wherein the class overlapping measures (COM) is interpreted by Intersection over Minimum (IoM), wherein the IoM correspond to a ratio between an intersection area and a minimum area between one or more bounding boxes, and wherein at least one distinct class is at least one of: (i) a complete overlapped position, or (ii) a partial overlapped position.
4. The processor implemented method (700) as claimed in claim 1, wherein the class distance measures (CDM) corresponds to a distance between one or more distinct ROIs to generate a distance vector, wherein each bounding box is accompanied by at least one centroid coordinate to point a location of each detected class, wherein an Euclidean distance between corresponding centroids is computed to associate nearby relation using one or more coordinate values, wherein one or more ranks are marked to each class based on the computed distance vector, wherein each calculated distance is ranked to assign one or more proximity scores, and wherein association of classes is determined by each proximity score of corresponding ROIs.
5. The processor implemented method (700) as claimed in claim 1, wherein the extracted texts from a product name ROIs are inputted to categorize into entities by implementing a Bidirectional Encoder Representations from Transformers (BERT)-based Named Entity Recognition (NER) model, wherein the categorized entities are combined with extracted product prices,

and wherein the BERT-based NER model is employed which is trained on large number of product description OCR text, for identification of the product name and the description from the OCR extracted text.
6. A system (100), comprising:
a memory (104) storing a plurality of instructions;
one or more communication interfaces (106); and
one or more hardware processors (102) coupled to the memory (104) via the
one or more communication interfaces (106), wherein the one or more
hardware processors (102) are configured by the instructions to:
receive, a flyer as an input document, wherein the flyer comprises information associated with one or more products;
process, each page of the input document by a mask region-based convolutional neural network (R-CNN) to convert into one or more images based on a PDF processing library;
identify, one or more Regions of Interest (ROIs) for each image based on one or more attribute classes of the one or more products by using (i) a class overlapping measures (COM), and (ii) a class distance measures (CDM), and wherein the one or more regions of interest (ROIs) correspond to a relationship among one or more attribute classes associated with the flyer;
extract, a textual information from the identified ROIs within each image by implementing an optical character recognition (OCR) based on the one or more attribute classes;
segment, the extracted textual information by a price extraction technique to create one or more bounding boxes around each character within a Price ROI of a Product ROI, wherein the one or more bounding boxes are created by identifying and isolating individual segments or characters within each cropped image using an edge detection technique, wherein a height of the one or more bounding boxes are computed by used regions in the cropped image, and wherein the height of the one or more

bounding boxes are clustered based on similarity using a K-means clustering technique; and
extract, a pricing information using the OCR within height-based clusters, wherein a Paddle OCR extracts the pricing information within at least one separate grouped character, and wherein at least one separate grouped character are concatenated to extract the pricing information.
7. The system (100) as claimed in claim 6, wherein the flyer is obtained as a portable document format (PDF), and wherein the one or more attribute classes of the one or more products correspond to at least one of (i) a name, (ii) a price, and (iii) a discount.
8. The system (100) as claimed in claim 6, wherein the class overlapping measures (COM) correspond to one or more overlapping ROIs among at least one distinct class within a product region of the image, wherein the class overlapping measures (COM) is interpreted by Intersection over Minimum (IoM), wherein the IoM correspond to a ratio between an intersection area and a minimum area between one or more bounding boxes, and wherein at least one distinct class is at least one of: (i) a complete overlapped position, or (ii) a partial overlapped position.
9. The system (100) as claimed in claim 6, wherein the class distance measures (CDM) correspond to a distance between one or more distinct ROIs to generate a distance vector, wherein each bounding box is accompanied by at least one centroid coordinate to point a location of each detected class, wherein an Euclidean distance between corresponding centroids is computed to associate nearby relation using one or more coordinate values, wherein one or more ranks are marked to each class based on the computed distance vector, wherein each calculated distance is ranked to assign one or more proximity scores, and wherein association of classes is determined by each proximity score of corresponding ROIs.

10. The system (100) as claimed in claim 6, wherein the extracted texts from a
product name ROIs are inputted to categorize into entities by implementing a Bidirectional Encoder Representations from Transformers (BERT)-based Named Entity Recognition (NER) model, wherein the categorized entities are combined with extracted product prices, and wherein the BERT-based NER model is employed which is trained on large number of product description OCR text, for identification of the product name and the description from the OCR extracted text.

Documents

Application Documents

# Name Date
1 202421016170-STATEMENT OF UNDERTAKING (FORM 3) [07-03-2024(online)].pdf 2024-03-07
2 202421016170-REQUEST FOR EXAMINATION (FORM-18) [07-03-2024(online)].pdf 2024-03-07
3 202421016170-FORM 18 [07-03-2024(online)].pdf 2024-03-07
4 202421016170-FORM 1 [07-03-2024(online)].pdf 2024-03-07
5 202421016170-FIGURE OF ABSTRACT [07-03-2024(online)].pdf 2024-03-07
6 202421016170-DRAWINGS [07-03-2024(online)].pdf 2024-03-07
7 202421016170-DECLARATION OF INVENTORSHIP (FORM 5) [07-03-2024(online)].pdf 2024-03-07
8 202421016170-COMPLETE SPECIFICATION [07-03-2024(online)].pdf 2024-03-07
9 Abstract1.jpg 2024-04-08
10 202421016170-FORM-26 [08-05-2024(online)].pdf 2024-05-08
11 202421016170-FORM-26 [22-05-2025(online)].pdf 2025-05-22