Proximity Based Cluster Algorithm For Varied Character Recognition

< Back

Proximity Based Cluster Algorithm For Varied Character Recognition

Abstract: Retail flyer information extraction is important as it can benefit both retail competitors and consumers. However, existing optical character recognition (OCR) engines often struggle when it comes to accurately detecting unaligned characters. Present disclosure provides method and system for recognizing unaligned characters present in image. The system first apply connected component analysis technique on input image for isolating individual segments. Then, system extract a segment height of each individual segment which are then used to create a segment array. Thereafter, system uses a proximity based clustering algorithm to group one or more segment heights present in the segment height array based on their proximity. Further, system applies OCR technique on individual segment images present corresponding to the grouped segment heights to identify one or more characters present in grouped segment heights. Finally, system concatenate identified characters based on grouped segment heights to obtain original text present in input image.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

07 March 2024

Publication Number

37/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. Vasudevan, Bagya Lakshmi

Tata Consultancy Services Limited, Module-4, 3rd Floor South Block, Chennai One IT SEZ Phase-2, 200 Feet Radial RD, MCN Nagar Extension, Pallavaram, Thoraipakkam, Kotivakkam Chennai 600097, Tamil Nadu, India

2. ABRAHAM, Kuruvilla

Tata Consultancy Services Limited, Lucerna Tower, Plot A2B, Sector 125, Noida 201303, Uttar Pradesh, India

3. SOM, Suvodip

Tata Consultancy Services Limited, Gitanjali Park, Plot-II/F/3 Action Area -II, Gitanjali Road, Newtown, Kolkata 700156, West Bengal, India

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
PROXIMITY BASED CLUSTER ALGORITHM FOR VARIED CHARACTER RECOGNITION
Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description:
The following specification particularly describes the invention and the manner in which it is to be performed.
2
TECHNICAL FIELD
[001]
The disclosure herein generally relates to character recognition, and, more particularly, to a method and a system for recognizing unaligned characters present in an image using a proximity based cluster algorithm.
5
BACKGROUND
[002]
Retail flyer’s information extraction plays a pivotal role in today's retail landscape as it can benefit both retail competitors and consumers alike. Retailers’ ability to extract valuable data from competitor’s flyer empowers them to stay competitive by analyzing and benchmarking pricing strategies, optimizing 10 product placement, and adapting to market trends. Similarly, extracting product prices from flyers becomes essential for customers as it enables them to make informed purchasing decisions, compare offers across stores, and ultimately save money. In these times, where consumers are increasingly price-conscious and retailers strive to remain competitive, effective information extraction, particularly 15 regarding product prices, is integral to success in the dynamic retail industry.
[003]
But, nowadays, unaligned texts are commonly found in retail flyers. Existing optical character recognition (OCR) engines often struggle when it comes to accurately detecting unaligned characters as most of the OCR systems are designed in such a manner that they consider only aligned and structured text. In 20 particular, the exiting OCR engines excel at recognizing well-structured documents but fall short in handling the unique and unaligned nature of text typically found in price tags, where characters can be at varying heights and alignments. Thus, as a result, the OCR engine performance suffers, leading to errors and inaccuracies in character recognition which ultimately impacts the reliability and efficiency of the 25 data extraction from these critical retail elements.
SUMMARY
[004]
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical 30 problems recognized by the inventors in conventional systems. For example, in one
3
aspect, there is provided a
method for recognizing unaligned characters present in an image. The method comprises receiving, by a system via one or more hardware processors, an input image from a user device, the input image comprising one or more unaligned texts; applying, by the system via the one or more hardware processors, a connected component analysis technique on the input image for 5 isolating one or more individual segments that are present in the input image, wherein the connected component analysis technique isolates the one or more individual segments by creating a bounding box around each individual segment of the one or more individual segments, and wherein each individual segment image available for each individual segment is arranged in an image array; determining, 10 by the system via the one or more hardware processors, a segment height of each individual segment of the one or more individual segments; extracting, by the system via the one or more hardware processors, the segment height of each individual segment of the one or more individual segments, wherein the segment height of each individual segment is arranged to create a segment height array, and 15 wherein each segment height present in the segment height array is connected with an associated individual segment image present in the image array; grouping, by the system via the one or more hardware processors, one or more segment heights present in the segment height array based on their proximity using a proximity based clustering algorithm; identifying, by the system via the one or more hardware 20 processors, one or more characters present in the one or more grouped segment heights by applying an optical character recognition (OCR) technique on one or more individual segment images present connected with the one or more grouped segment heights; and concatenating, by the system via the one or more hardware processors, the one or more identified characters based on the one or more grouped 25 segment heights to obtain an original text present in the input image.
[005]
In an embodiment, the method comprises: displaying, by the system via the one or more hardware processors, the original text on the user device.
[006]
In an embodiment, the step of applying the connected component analysis technique on the input image for isolating the one or more individual 30 segments that are present in the input image is preceded by : preprocessing, by the
4
system
via the one or more hardware processors, the input image using one or more pre-processing techniques to obtain a pre-processed input image.
[007]
In an embodiment, the step of determining the segment height of each individual segment of the one or more individual segments further comprises: filtering, by the system via the one or more hardware processors, the one or more 5 individual segments based on one or more predefined filtering criteria to obtain at least one filtered individual segment of the one or more individual segments present in the pre-processed input image; and determining, by the system via the one or more hardware processors, the segment height of each individual segment of the at least one filtered individual segment. 10
[008]
In an embodiment, the proximity based clustering algorithm perform: sorting the one or more segment heights in a descending order to obtain a sorted height list, wherein the sorted height list comprises one or more numbers associated with the one or more segment heights; calculating an absolute difference between one or more pairs of consecutive numbers that are present in the sorted 15 height list, wherein the calculated absolute difference for each pair of the one or more pairs of consecutive numbers is arranged in a difference list, and wherein the calculated absolute difference for each pair is arranged in the difference list in same order as of the corresponding pair in the sorted height list; sorting the difference list in an ascending order to obtain a sorted difference list; identifying a median position 20 in the sorted difference list using a gradient based technique; grouping the one or more numbers present in the sorted difference list based on the identified median position to obtain a smaller group and a larger group, wherein the smaller group comprises at least one number of the one or more numbers that is present before the identified median, and wherein the larger group comprises the at least one number 25 of the one or more numbers that is present after the identified median; iterating through the difference list to mark the numbers that are common in the difference list and the larger group; identifying an index of each marked number present in the difference list; and segmenting the sorted height list based on the identified index of each marked number. 30
5
[009]
In another aspect, there is provided a system for recognizing unaligned characters present in an image. The system comprises a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: 5 receive an input image from a user device, the input image comprising one or more unaligned texts; apply a connected component analysis technique on the input image for isolating one or more individual segments that are present in the input image, wherein the connected component analysis technique isolates the one or more individual segments by creating a bounding box around each individual 10 segment of the one or more individual segments, and wherein each individual segment image available for each individual segment is arranged in an image array; determine a segment height of each individual segment of the one or more individual segments; extract the segment height of each individual segment of the one or more individual segments, wherein the segment height of each individual 15 segment is arranged to create a segment height array, and wherein each segment height present in the segment height array is connected with an associated individual segment image present in the image array; group one or more segment heights present in the segment height array based on their proximity to each other using a proximity based clustering algorithm; identify one or more characters 20 present in the one or more grouped segment heights by applying an optical character recognition (OCR) technique on one or more individual segment images present connected with the one or more grouped segment heights; and concatenate the one or more identified characters based on the one or more grouped segment heights to obtain an original text present in the input image. 25
[010]
In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors recognize unaligned characters present in an image by receiving, by a system, an input image from a user device, the input image comprising one or more unaligned texts; 30 applying, by the system, a connected component analysis technique on the input
6
image for isolating
one or more individual segments that are present in the input image, wherein the connected component analysis technique isolates the one or more individual segments by creating a bounding box around each individual segment of the one or more individual segments, and wherein each individual segment image available for each individual segment is arranged in an image array; 5 determining, by the system, a segment height of each individual segment of the one or more individual segments; extracting, by the system, the segment height of each individual segment of the one or more individual segments, wherein the segment height of each individual segment is arranged to create a segment height array, and wherein each segment height present in the segment height array is connected with 10 an associated individual segment image present in the image array; grouping, by the system, one or more segment heights present in the segment height array based on their proximity to each other using a proximity based clustering algorithm; identifying, by the system, one or more characters present in the one or more grouped segment heights by applying an optical character recognition (OCR) 15 technique on one or more individual segment images connected with the one or more grouped segment heights; and concatenating, by the system, the one or more identified characters based on the one or more grouped segment heights to obtain an original text present in the input image.
[011]
It is to be understood that both the foregoing general description and 20 the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[012]
The accompanying drawings, which are incorporated in and 25 constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[013]
FIG. 1 is an example representation of an environment, related to at least some example embodiments of the present disclosure.
7
[014]
FIG. 2 illustrates an exemplary block diagram of a system for recognizing unaligned characters present in an image, in accordance with an embodiment of the present disclosure.
[015]
FIG. 3 illustrates a schematic block diagram representation of a unaligned character recognition process performed by the system of FIG. 2, in 5 accordance with an embodiment of the present disclosure.
[016]
FIGS. 4A and 4B, collectively, illustrate an exemplary flow diagram of a method for recognizing unaligned characters present in an image, in accordance with an embodiment of the present disclosure.
[017]
FIG. 5A illustrates a tabular representation showing a plurality of 10 outputs obtained for a plurality of inputs using a proximity based clustering algorithm, in accordance with an embodiment of the present disclosure.
[018]
FIG. 5B illustrates a tabular representation showing a plurality of outputs obtained for a plurality of input images using the system of FIG. 2, in accordance with an embodiment of the present disclosure. 15
DETAILED DESCRIPTION OF EMBODIMENTS
[019]
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever 20 convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[020]
As discussed earlier, existing OCRs become vulnerable to texts 25 where the characters have varying components, such as heights, width, font type and the like. This happens because of the fact that the OCR engines take into effect the overall alignment and font types of the whole text, and thus a sudden change in these components can hamper the OCR engines’ recognition accuracy.
[021]
Further, the challenge associated with the price tag is that the 30 numbers i.e., the price denominations present in the price tags have a varying height
8
and
thus there is no set convention of how small or big the characters can be in comparison with each other. And most of the OCR engines cannot recognize such text topology, which consist of positioning of characters i.e., include unaligned characters.
[022]
So, technique that can identify unaligned characters present in image 5 while minimizing considerable compromise in accuracy is still to be explored.
[023]
Embodiments of the present disclosure overcome the above-mentioned disadvantages by providing a method and a system for recognizing unaligned characters present in an image. The system of the present disclosure first apply a connected component analysis technique on an input image for isolating 10 one or more individual segments that are present in the input image. Then, the system extract a segment height of each individual segment which are then used to create a segment array. Thereafter, the system uses a proximity based clustering algorithm to group one or more segment heights present in the segment height array based on their proximity. Further, the system applies optical character recognition 15 (OCR) technique on one or more individual segment images present corresponding to the one or more grouped segment heights to identify one or more characters present in the one or more grouped segment heights. Finally, the system concatenate the one or more identified characters based on the one or more grouped segment heights to obtain an original text present in the input image. 20
[024]
In the present disclosure, the system and the method first applies the proximity based clustering algorithm to group one or more segment heights and then applies the OCR technique to identify characters, thus ensuring improved accuracy as the OCR inaccuracies will not hamper the grouping of characters. Further, the system uses the connected component analysis technique to isolate the 25 individual segments thus each individual character is considered as a single entity, and hence the alignment of the character doesn’t hinder the performance of the system. Additionally, the system doesn’t require any sort of training or additional computation power, thereby ensuring less computationally intensive and time complex system. 30
9
[025]
Referring now to the drawings, and more particularly to FIGS. 1 through 5B, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method. 5
[026]
FIG. 1 illustrates an exemplary representation of an environment 100 related to at least some example embodiments of the present disclosure. Although the environment 100 is presented in one arrangement, other embodiments may include the parts of the environment 100 (or other parts) arranged otherwise depending on, for example, isolating individual segments using connected 10 component analysis technique, grouping segment heights using a proximity based clustering algorithm etc. The environment 100 generally includes a system 102, an electronic device 106 (hereinafter also referred as a user device 106), each coupled to, and in communication with (and/or with access to) a network 104. It should be noted that one user device is shown for the sake of explanation; there can be more 15 number of user devices.
[027]
The network 104 may include, without limitation, a light fidelity (Li-Fi) network, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) 20 network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among two or more of the parts or users illustrated in FIG. 1, or any combination thereof.
[028]
Various entities in the environment 100 may connect to the network 104 in accordance with various wired and wireless communication 25 protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), 2nd Generation (2G), 3rd Generation (3G), 4th Generation (4G), 5th Generation (5G) communication protocols, Long Term Evolution (LTE) communication protocols, or any combination thereof.
[029]
The user device 106 is associated with a user (e.g., a 30 retailer/consumer rescue) who wants to identify unaligned characters present in an
10
image
. Examples of the user device 106 include, but are not limited to, a personal computer (PC), a mobile phone, a tablet device, a Personal Digital Assistant (PDA), a server, a voice activated assistant, a smartphone, and a laptop.
[030]
The system 102 includes one or more hardware processors and a memory. The system 102 is first configured to receive an input image via the 5 network 104 from the user device 106. The system 102 then applies a connected component analysis technique on the input image for isolating individual segments that are present in the input image. Thereafter, the system 102 performs segment height extraction to obtain the heights of each connected component and then each segment image is converted into a segment array. 10
[031]
Further, the system 102 groups one or more segment heights present in the segment height array based on their proximity using a proximity based clustering algorithm. The proximity based clustering algorithm is explained in detail with reference to FIG. 4. Once the grouped segment heights are available, the system applies an optical character recognition (OCR) technique on one or more 15 individual segment images present corresponding to the one or more grouped segment heights to identify one or more characters present in the one or more grouped segment heights. Additionally, the system concatenate the one or more identified characters based on the one or more grouped segment heights to obtain an original text present in the input image. 20
[032]
The process of recognizing unaligned characters present in an image is explained in detail with reference to FIGS. 4A and 4B.
[033]
The number and arrangement of systems, devices, and/or networks shown in FIG. 1 are provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different 25 systems, devices, and/or networks; and/or differently arranged systems, devices, and/or networks than those shown in FIG. 1. Furthermore, two or more systems or devices shown in FIG. 1 may be implemented within a single system or device, or a single system or device shown in FIG. 1 may be implemented as multiple, distributed systems or devices. Additionally, or alternatively, a set of systems (e.g., 30 one or more systems) or a set of devices (e.g., one or more devices) of the
11
environment 100 may perform one or more functions described as being performed
by another set of systems or another set of devices of the environment 100 (e.g., refer scenarios described above).
[034]
FIG. 2 illustrates an exemplary block diagram of the system 102 for recognizing unaligned characters present in an image, in accordance with an 5 embodiment of the present disclosure. In some embodiments, the system 102 is embodied as a cloud-based and/or SaaS-based (software as a service) architecture. In some embodiments, the system 102 may be implemented in a server system. In some embodiments, the system 102 may be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, 10 mainframe computers, and the like.
[035]
In an embodiment, the system 102 includes one or more processors 204, communication interface device(s) or input/output (I/O) interface(s) 206, and one or more data storage devices or memory 202 operatively coupled to the one or more processors 204. The one or more processors 204 may be one or more software 15 processing modules and/or hardware processors. In an embodiment, the hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and 20 execute computer-readable instructions stored in the memory. In an embodiment, the system 102 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
[036]
The I/O interface device(s) 206 can include a variety of software and 25 hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for 30 connecting a number of devices to one another or to another server.
12
[037]
The memory 202 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an 5 embodiment a database 208 can be stored in the memory 202, wherein the database 208 may comprise, but are not limited to, image array, segment height array, optical character recognition (OCR) technique, predefined filtering criteria, proximity based clustering algorithm, one or more processes and the like. The memory 202 further comprises (or may further comprise) information pertaining to 10 input(s)/output(s) of each step performed by the systems and methods of the present disclosure. In other words, input(s) fed at each step and output(s) generated at each step are comprised in the memory 202 and can be utilized in further processing and analysis.
[038]
It is noted that the system 102 as illustrated and hereinafter described 15 is merely illustrative of an apparatus that could benefit from embodiments of the present disclosure and, therefore, should not be taken to limit the scope of the present disclosure. It is noted that the system 102 may include fewer or more components than those depicted in FIG. 2.
[039]
FIG. 3 illustrates a schematic block diagram representation of a 20 unaligned character recognition process performed by the system 102, in accordance with an embodiment of the present disclosure.
[040]
As seen in FIG. 3, the system 102 first receives an input image on which unaligned character recognition process needs to be performed. The system 102 then applies connected component analysis technique on the input image to 25 segment characters present in the input image. Then, the system 102 performs filtering on the characters based on one or more predefined filtering criteria to obtain filtered individual segment for further processing. In an embodiment, the filtering is performed to obtain bounding boxes of characters that may be of interest to user. Once the filtered bounding boxes are obtained, the system 102 extracts 30 height of each filtered bounding box. Thereafter, the system 102 applies clustering
13
technique i.e., the
proximity based clustering algorithm on the filtered bounding boxes to group heights into one or more segment heights. Further, the system 102 performs the OCR on the segmented character images present corresponding to the one or more segment heights to identify one or more characters. Finally, the system 102 concatenates the one or more characters based on the grouped segment heights 5 to obtain the original text present in the input image. In an embodiment, without limiting the scope of the invention, the original text can be a retail product price and thus the text may be displayed in a price format.
[041]
FIGS. 4A and 4B, collectively, with reference to FIGS. 1 to 3, represent an exemplary flow diagram of a method 400 for recognizing unaligned 10 characters present in an image, in accordance with an embodiment of the present disclosure. The method 400 may use the system 102 of FIGS. 1 and 2 for execution. In an embodiment, the system 102 comprises one or more data storage devices or the memory 208 operatively coupled to the one or more hardware processors 206 and is configured to store instructions for execution of steps of the method 400 by 15 the one or more hardware processors 206. The sequence of steps of the flow diagram may not be necessarily executed in the same order as they are presented. Further, one or more steps may be grouped together and performed in form of a single step, or one step may have several sub-steps that may be performed in parallel or in sequential manner. The steps of the method of the present disclosure will now 20 be explained with reference to the components of the system 102 as depicted in FIG. 2 and FIG. 1.
[042]
At step 402 of the present disclosure, the one or more hardware processors 206 of the system 102 receive an input image from a user device, such as the user device 106. The input image includes one or more unaligned texts. 25
[043]
In an embodiment, the system 102 performs pre-processing of the input using one or more pre-processing techniques to obtain a pre-processed input image. Examples of the pre-processing techniques that are applied on the input image include, but are not limited to, binarization, thresholding and the like. In particular, the pre-processing is performed by the system 102 to enhance the input 30 image and make it suitable for further processing/analysis.
14
[044]
At step 404 of the present disclosure, the one or more hardware processors 206 of the system 102 applies a connected component analysis technique on the input image i.e., the pre-processed input image for isolating one or more individual segments that are present in the pre-processed input image input image. In an embodiment, the connected component analysis technique creates a bounding 5 box around each individual segment/character of the one or more individual segments to isolate the one or more individual segments present in the input image. An example representation of the output of the connected component analysis is shown with reference to FIG. 3. Further, the system 102 arranges each individual segment image available corresponding to each individual segment in an image 10 array. In particular, an image available corresponding to each segment is extracted and arranged in the image array.
[045]
At step 406 of the present disclosure, the one or more hardware processors 206 of the system 102 determine a segment height of each individual segment of the one or more individual segments. 15
[046]
In an embodiment, before obtaining the segment heights, the system 102 filters the one or more individual segments based on one or more predefined filtering criteria to obtain at least one filtered individual segment of the one or more individual segments present in the pre-processed input image. The one or more predefined filtering criteria may be defined based on the requirement of the user. 20 For example, in case of retail user who is interested in knowing prices of competitors from their price tags, the one or more predefined filtering criteria can be like top coordinates of the bounding boxes are aligned, the bounding box area is above a certain threshold, and the width of the bounding boxes is sufficient to avoid any unwanted lines. 25
[047]
So, in case the filtering is performed, the system 102 obtains the segment height of each individual segment of the at least one filtered individual segment.
[048]
At step 408 of the present disclosure, the one or more hardware processors 206 of the system 102 extract the segment height of each individual 30 segment of the one or more individual segments i.e., the at least one filtered
15
individual segment.
In one embodiment, the system 102 arranges the segment height of each individual segment to create a segment height array. Further, the system 102 connects each segment height present in the segment height array with a corresponding individual segment image present in the image array.
[049]
At step 410 of the present disclosure, the one or more hardware 5 processors 206 of the system 102 groups one or more segment heights present in the segment height array based on their proximity to each other using a proximity based clustering algorithm.
[050]
In an embodiment, the proximity based clustering algorithm first perform sorting of the one or more segment heights present in the segment height 10 array in a descending order to obtain a sorted height list. The sorted height list includes one or more numbers corresponding to the one or more segment heights.
[051]
In an exemplary scenario, assume the segment height array includes ‘100, 99, 66, 8, 20, 9, 65, 64’. Then, the sorted height list includes ‘100, 99, 66, 65, 64, 20, 9, 8’. 15
[052]
Then, an absolute difference is the calculated between one or more pairs of consecutive numbers that are present in the sorted height list. Thereafter. the calculated absolute difference for each pair of the one or more pairs of consecutive numbers is arranged in a difference list. It should be noted that the calculated absolute difference for each pair is arranged in the difference list in same 20 order as of the corresponding pair in the sorted height list.
[053]
With reference to the previous exemplary scenario, the difference list may comprise ‘1, 33 , 1 , 1 ,44 ,11 , 1’ as 100-99 is 1, 99-68 is 33 and the like.
[054]
Further, the difference list is sorted in an ascending order to obtain a sorted difference list. The sorted difference list may comprise values ‘1, 1, 1, 1, 25 11, 33, 44’.
[055]
Once the sorted difference list is available, the system 102 identifies a median position in the sorted difference list using a gradient based technique.
[056]
In at least one example embodiment, the gradient based technique first computes a difference between the consecutive numbers present in the sorted 30 difference list and divide the difference by the previous number. Thereafter, an
16
absolute value of the difference and divide is taken to obtain a list. A formula for
the same can be represented as: 𝐴𝑏𝑠[(𝑥2−𝑥1)/𝑥1], where 𝑥2 represents a next number and 𝑥1 represents a previous number.
[057]
With reference to the previous example, for the sorted difference list ‘1, 1, 1, 1, 11, 33, 44’, the system may obtain the list ‘0, 0, 0, 10, 2, 0.3333’. Then 5 a maximum value from the list is selected is selected and an index position of the maximum value is determined. In this case, the maximum value is ‘10’ and the index position of the ‘10’ is ‘3’. So, the identified median position will be ‘3’.
[058]
The system 102 then groups the one or more numbers present in the sorted difference list based on the identified median position to obtain a smaller 10 group and a larger group. The smaller group includes at least one number of the one or more numbers that is present before the identified median. Similarly, the larger group includes the at least one number of the one or more numbers that is present after the identified median.
[059]
So, with reference to previous example, as the identified median 15 position is ‘3’, the sorted difference list ‘1, 1, 1, 1, 11, 33, 44’ can be divided into smaller group of ‘1, 1, 1, 1’ and the larger group of ‘11, 33, 44’ as index position of ‘3’ in the sorted difference list is ‘1’ coming before ‘11’.
[060]
Thereafter, the system 102 iterate through the difference list to mark the numbers that are common in the difference list and the larger group. So, ‘11, 20 33, 44’ are marked in the difference list ‘1, 33 , 1 , 1 ,44 ,11 , 1’. Then, an index of each marked number present in the difference list is identified. So, index of 33, 44 and 11 are identified. The index values obtained include ‘1, 4, and 5’ corresponding to values ’33, 44, and 11’.
[061]
Finally, the system 102 segments the sorted height list based on the 25 identified index of each marked number. The sorted height list includes ‘100, 99, 66, 65, 64, 20, 9, 8’, then the segmented height groups may include ‘100 – 99’, ‘66 - 65 – 64’, ‘20’ and ‘9 - 8’.
[062]
At step 412 of the present disclosure, the one or more hardware processors 206 of the system 102 identify one or more characters present in the one 30 or more grouped segment heights by applying an optical character recognition
17
(OCR) technique
on one or more individual segment images connected with the one or more grouped segment heights. In particular, the OCR technique is applied on each individual segment image present in image array corresponding to each grouped segment heights. This application of the OCR improves the recognition accuracy of the system 102. It should be noted that any existing OCR technique can 5 be used for identification of characters.
[063]
At step 414 of the present disclosure, the one or more hardwareprocessors 206 of the system 102 concatenate the one or more identified characters based on the one or more grouped segment heights to obtain an original text present in the input image. In particular, the identified characters are concatenated in same 10 order of the grouped segment heights to obtain the original text of the input image. In an embodiment, the original text is referred to a text of interest present in an image comprising a plurality of texts.
[064]
FIG. 5A illustrates a tabular representation showing a plurality ofoutputs obtained for a plurality of inputs using the proximity based clustering 15 algorithm, in accordance with an embodiment of the present disclosure.
[065]
FIG. 5B illustrates a tabular representation showing a plurality ofoutputs obtained for a plurality of input images using the system 102, in accordance with an embodiment of the present disclosure.
[066]
The written description describes the subject matter herein to enable20 any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent 25 elements with insubstantial differences from the literal language of the claims.
[067]
As discussed earlier, existing optical character recognition (OCR)engines often struggle when it comes to accurately detecting unaligned characters as most of the OCR systems are designed in such a manner that they consider only aligned and structured text. So, to overcome the disadvantages, embodiments of the 30 present disclosure provide a method and a system for recognizing unaligned
18
characters present in an image
. More specifically, the system and the method first applies the proximity based clustering algorithm to group one or more segment heights and then applies the OCR technique to identify characters, thus ensuring improved accuracy as the OCR inaccuracies will not hamper the grouping of characters. Further, the system uses the connected component analysis technique to 5 isolate the individual segments thus each individual character is considered as a single entity, and hence the alignment of the character doesn’t hinder the performance of the system. Additionally, the system doesn’t require any sort of training or additional computation power, thereby ensuring less computationally intensive and time complex system. 10
[068]
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device 15 can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an 20 ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different 25 hardware devices, e.g., using a plurality of CPUs.
[069]
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or 30 combinations of other components. For the purposes of this description, a
19
computer
-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[070]
The illustrated steps are set out to explain the exemplaryembodiments shown, and it should be anticipated that ongoing technological 5 development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are 10 appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are 15 intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. 20
[071]
Furthermore, one or more computer-readable storage media may beutilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more 25 processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, 30
20
nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any
other known physical storage media.
[072]
It is intended that the disclosure and examples be considered asexemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
We Claim:
1. A processor implemented method (400), comprising:
receiving (402), by a system via one or more hardware processors, an input image from a user device, the input image comprising one or more unaligned texts;
applying (404), by the system via the one or more hardware processors, a connected component analysis technique on the input image for isolating one or more individual segments that are present in the input image, wherein the connected component analysis technique isolates the one or more individual segments by creating a bounding box around each individual segment of the one or more individual segments, and wherein each individual segment image available for each individual segment is arranged in an image array;
determining (406), by the system via the one or more hardware processors, a segment height of each individual segment of the one or more individual segments;
extracting (408), by the system via the one or more hardware processors, the segment height of each individual segment of the one or more individual segments, wherein the segment height of each individual segment is arranged to create a segment height array, and wherein each segment height present in the segment height array is connected with an associated individual segment image present in the image array;
grouping (410), by the system via the one or more hardware processors, one or more segment heights present in the segment height array based on their proximity to each other using a proximity based clustering algorithm;
identifying (412), by the system via the one or more hardware processors, one or more characters present in the one or more grouped segment heights by applying an optical character recognition (OCR) technique on one or more individual segment images connected with the one or more grouped segment heights; and

concatenating (414), by the system via the one or more hardware processors, the one or more identified characters based on the one or more grouped segment heights to obtain an original text present in the input image.
2. The processor implemented method (400) as claimed in claim 1,
comprising:
displaying, by the system via the one or more hardware processors, the original text on the user device.
3. The processor implemented method (400) as claimed in claim 1, wherein
the step of applying the connected component analysis technique on the input
image for isolating the one or more individual segments that are present in the
input image is preceded by:
preprocessing, by the system via the one or more hardware processors, the input image using one or more pre-processing techniques to obtain a pre-processed input image.
4. The processor implemented method (400) as claimed in claim 3, wherein
the step of determining the segment height of each individual segment of the one
or more individual segments further comprises:
filtering, by the system via the one or more hardware processors, the one or more individual segments based on one or more predefined filtering criteria to obtain at least one filtered individual segment of the one or more individual segments present in the pre-processed input image; and
determining, by the system via the one or more hardware processors, the segment height of each individual segment of the at least one filtered individual segment.
5. The processor implemented method (400) as claimed in claim 4, wherein
the proximity based clustering algorithm perform:

sorting the one or more segment heights present in the segment height array in a descending order to obtain a sorted height list, wherein the sorted height list comprises one or more numbers associated with the one or more segment heights;
calculating an absolute difference between one or more pairs of consecutive numbers that are present in the sorted height list, wherein the calculated absolute difference for each pair of the one or more pairs of consecutive numbers is arranged in a difference list, and wherein the calculated absolute difference for each pair is arranged in the difference list in same order as of the corresponding pair in the sorted height list;
sorting the difference list in an ascending order to obtain a sorted difference list;
identifying a median position in the sorted difference list using a gradient based technique;
grouping the one or more numbers present in the sorted difference list based on the identified median position to obtain a smaller group and a larger group, wherein the smaller group comprises at least one number of the one or more numbers that is present before the identified median, and wherein the larger group comprises the at least one number of the one or more numbers that is present after the identified median;
iterating through the difference list to mark the numbers that are common in the difference list and the larger group;
identifying an index of each marked number present in the difference list; and
segmenting the sorted height list based on the identified index of each marked number.
6. A system (102), comprising:
a memory (202) storing instructions;
one or more communication interfaces (206); and

one or more hardware processors (204) coupled to the memory (202) via the one or more communication interfaces (206), wherein the one or more hardware processors (204) are configured by the instructions to:
receive an input image from a user device, the input image comprising one or more unaligned texts;
apply a connected component analysis technique on the input image for isolating one or more individual segments that are present in the input image, wherein the connected component analysis technique isolates the one or more individual segments by creating a bounding box around each individual segment of the one or more individual segments, and wherein each individual segment image available for each individual segment is arranged in an image array;
determine a segment height of each individual segment of the one or more individual segments;
extract the segment height of each individual segment of the one or more individual segments, wherein the segment height of each individual segment is arranged to create a segment height array, and wherein each segment height present in the segment height array is connected with an associated individual segment image present in the image array;
group one or more segment heights present in the segment height array based on their proximity to each other using a proximity based clustering algorithm;
identify one or more characters present in the one or more grouped segment heights by applying an optical character recognition (OCR) technique on one or more individual segment images connected with the one or more grouped segment heights; and
concatenate the one or more identified characters based on the one or more grouped segment heights to obtain an original text present in the input image.
7. The system as claimed in claim 6, wherein the one or more hardware
processors (204) are configured by the instructions to: display the original text on the user device.

8. The system as claimed in claim 6, wherein before applying the connected
component analysis technique on the input image for isolating the one or more
individual segments that are present in the input image, the one or more hardware
processors (204) are configured by the instructions to:
preprocess the input image using one or more pre-processing techniques to obtain a pre-processed input image.
9. The system as claimed in claim 8, wherein for determining the segment
height of each individual segment of the one or more individual segments, the one
or more hardware processors (204) are further configured by the instructions to:
filter the one or more individual segments based on one or more predefined filtering criterias to obtain at least one filtered individual segment of the one or more individual segments present in the pre-processed input image; and
determine the segment height of each individual segment of the at least one filtered individual segment.
10. The system as claimed in claim 9, wherein the proximity based clustering
algorithm perform:
sort the one or more segment heights present in the segment height array in a descending order to obtain a sorted height list, wherein the sorted height list comprises one or more numbers associated with the one or more segment heights;
calculate an absolute difference between one or more pairs of consecutive numbers that are present in the sorted height list, wherein the calculated absolute difference for each pair of the one or more pairs of consecutive numbers is arranged in a difference list, and wherein the calculated absolute difference for each pair is arranged in the difference list in same order as of the corresponding pair in the sorted height list;
sort the difference list in an ascending order to obtain a sorted difference list;

identify a median position in the sorted difference list using a gradient based technique;
group the one or more numbers present in the sorted difference list based on the identified median position to obtain a smaller group and a larger group, wherein the smaller group comprises at least one number of the one or more numbers that is present before the identified median, and wherein the larger group comprises the at least one number of the one or more numbers that is present after the identified median;
iterate through the difference list to mark the numbers that are common in the difference list and the larger group;
identify an index of each marked number present in the difference list; and segment the sorted height list based on the identified index of each marked number.

Documents

Application Documents

#	Name	Date
1	202421016169-STATEMENT OF UNDERTAKING (FORM 3) [07-03-2024(online)].pdf	2024-03-07
2	202421016169-REQUEST FOR EXAMINATION (FORM-18) [07-03-2024(online)].pdf	2024-03-07
3	202421016169-FORM 18 [07-03-2024(online)].pdf	2024-03-07
4	202421016169-FORM 1 [07-03-2024(online)].pdf	2024-03-07
5	202421016169-FIGURE OF ABSTRACT [07-03-2024(online)].pdf	2024-03-07
6	202421016169-DRAWINGS [07-03-2024(online)].pdf	2024-03-07
7	202421016169-DECLARATION OF INVENTORSHIP (FORM 5) [07-03-2024(online)].pdf	2024-03-07
8	202421016169-COMPLETE SPECIFICATION [07-03-2024(online)].pdf	2024-03-07
9	Abstract1.jpg	2024-04-08
10	202421016169-FORM-26 [08-05-2024(online)].pdf	2024-05-08
11	202421016169-FORM-26 [22-05-2025(online)].pdf	2025-05-22