Method And System For Unsupervised Word Image Clustering

< Back

Method And System For Unsupervised Word Image Clustering

Abstract: The present application provides a method and system for unsupervised word image clustering, comprises capturing one or more image wherein the one or more image comprises at least one word images. Extracting at least one feature vector using an untrained convolution neural network architecture, wherein the convolution filters are initialized by random filter based deep learning techniques using Gaussian random variable with zero mean and unit standard deviation, and wherein the convolution filters are constrained to sum to zero. The extracted feature vectors are used for clustering, wherein clustering is performed in two stages. First stage includes clustering word images which are similar using a graph connected component. Second stage clustering includes clustering a remaining word images which are not clustered during the first stage by evaluating the remaining images against the clusters formed during the first stage and assigning them to clusters based on the evaluation.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

15 March 2016

Publication Number

37/2017

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

ip@legasis.in

Parent Application

Patent Number

Legal Status

Grant Date

2023-10-16

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point,Mumbai-400021, Maharashtra, India

Inventors

1. KULKARNI, Mandar Shrikant

Tata Consultancy Services Limited, Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune - 411 013 Maharashtra, India

2. SRIRAMAN, Anand

Tata Consultancy Services Limited, Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune - 411 013 Maharashtra, India

3. KUMAR, Rahul

Tata Consultancy Services Limited, Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune - 411 013 Maharashtra, India

4. KALRA, Kanika

Tata Consultancy Services Limited, Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune - 411 013 Maharashtra, India

5. KARANDE, Shirish Subhash

Tata Consultancy Services Limited, Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune - 411 013 Maharashtra, India

6. LODHA, Sachin Premsukh

Tata Consultancy Services Limited, Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune - 411 013 Maharashtra, India

Specification

Claims:
1. A method for unsupervised word image clustering; said method comprising processor implemented steps of:
capturing one or more image using at least one image capture device (200), wherein at least one of the one or more image comprises at least one word image;
extracting one or more feature vector for each of the at least one word image using an untrained convolution neural network architecture, wherein extraction comprises:
applying, by a convolution module (210), a first convolution to the at least one input image using a first plurality of filters, wherein the first plurality of filters are initialized by random filter based deep learning techniques using Gaussian random variable with zero mean and unit standard deviation, and wherein the plurality of filters are constrained to sum to zero,
applying, by the convolution module (210) a Rectified Linear Unit (ReLU) non-linearity to a first plurality of feature maps, wherein the first plurality of feature maps is generated as output of the first convolution,
applying, by a sub-sampling module (212) a first sub-sampling to increase the position invariance of the first plurality of feature maps, wherein sub-sampling comprises applying a pooling operation on non overlapping segments of the first plurality of feature maps after application of the ReLU non-linearity,
applying, by the convolution module (210), a second convolution on the output of the first sub-sampling using a second plurality of filters wherein the second plurality of filters are initialized by random filter based deep learning techniques using Gaussian random variable with zero mean and unit standard deviation, and wherein the plurality of filters are constrained to sum to zero,
applying, by the convolution module (210), a Rectified Linear Unit (ReLU) non-linearity to a second plurality of feature maps, wherein the second plurality of feature maps are generated as output of the second convolution,
applying, by the sub-sampling module (212), a second sub-sampling to increase the position invariance of the second plurality of feature maps, wherein sub-sampling comprises applying a pooling operation on non-overlapping segments of the second plurality of feature maps after application of the ReLU non linearity, and
combining, by a combination module (214), the plurality of feature maps generated at the output of the second subsampling for extracting one or more feature vector; and
clustering, by a graph clustering module (216), the one or more word images, wherein clustering is based on the one or more feature vector.

2. The method according to claim 1, wherein the extracted one or more feature vector is normalized such that the one or more feature vector has a zero mean and a unit norm.

3. The method according to claim 1, wherein clustering comprises a first stage clustering and a second stage clustering,
the first stage clustering comprises clustering word images which are similar using a graph connected component analysis, wherein similarity is determined based on the extracted feature vector; and
the second stage clustering comprises clustering a remaining word images which are not clustered during the first stage by evaluating the remaining images against the clusters formed during the first stage and assigning the remaining word images to the clusters based on the evaluation by using a refuse to predict analysis.

4. The method according to claim 3, wherein a first similarity threshold value (tstrong) is used for clustering during the first stage and a second similarity threshold (tsoft) used for clustering during the second stage, wherein the first similarity threshold value is higher than the second similarity threshold value.

5. A system (102) for unsupervised word image clustering; said system (102) comprising at least one image capture device (200) operatively coupled to the system (102), a processor (202), an interface (204), and memory (206) comprising thereon instructions to:
capture one or more image using at least one image capture device (200), wherein at least one of the one or more image comprises at least one word image;
extract one or more feature vector for each of the at least one word image using an untrained convolution neural network architecture, wherein extraction is performed by:
a convolution module (210), configured to apply, a first convolution to the at least one input image using a first plurality of filters, wherein the first plurality of filters are initialized by random filter based deep learning techniques using Gaussian random variable with zero mean and unit standard deviation, and wherein the plurality of filters are constrained to sum to zero
the convolution module (210), further configured to apply, a Rectified Linear Unit (ReLU) non-linearity to a first plurality of feature maps, wherein the first plurality of feature maps is generated as output of the first convolution
a sub-sampling module (212), configured to apply, a first sub-sampling to increase the position invariance of the first plurality of feature maps, wherein sub-sampling comprises applying a pooling operation on non overlapping segments of the first plurality of feature maps after application of the ReLU non-linearity
the convolution module (210), further configured to apply, a second convolution on the output of the first sub-sampling using a second plurality of filters wherein the second plurality of filters are initialized by random filter based deep learning techniques using Gaussian random variable with zero mean and unit standard deviation, and wherein the plurality of filters are constrained to sum to zero
the convolution module (210), further configured to apply, a Rectified Linear Unit (ReLU) non-linearity to a second plurality of feature maps, wherein the second plurality of feature maps are generated as output of the second convolution
the sub-sampling module (212), further configured to apply, a second sub-sampling to increase the position invariance of the second plurality of feature maps, wherein sub-sampling comprises applying a pooling operation on non-overlapping segments of the second plurality of feature maps after application of the ReLU non linearity, and
a combination module (214), configured to combine, the plurality of feature maps generated at the output of the second subsampling for extracting one or more feature vector; and
cluster, the one or more word images using a graph clustering module (216), wherein clustering is based on the one or more feature vector.

6. The system according to claim 5 wherein the combination module (214) is further configured to normalize the one or more extracted feature vector such that the one or more feature vector has a zero mean and a unit norm.

7. The system according to claim 5 wherein the graph clustering module (216) is configured to perform a first stage clustering and a second stage clustering,
the first stage clustering comprises clustering word images which are similar using a graph connected component analysis, wherein similarity is determined based on the extracted feature vector; and
the second stage clustering comprises clustering a remaining word images which are not clustered during the first stage by evaluating the remaining images against the clusters formed during the first stage and assigning the remaining word images to the clusters based on the evaluation.

8. The system according to claim 7, wherein a first similarity threshold value (tstrong) is used for clustering during the first stage and a second similarity threshold (tsoft) used for clustering during the second stage, wherein the first similarity threshold value is higher than the second similarity threshold value.
, Description:
FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
METHOD AND SYSTEM FOR UNSUPERVISED WORD IMAGE CLUSTERING

Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

The following specification particularly describes the invention and the manner in which it is to be performed.
FIELD OF THE INVENTION

[001] The present application generally relates to machine learning. Particularly, the application provides a method and system for unsupervised word image clustering.

BACKGROUND OF THE INVENTION

[002] In countries like India, several government, bank, real estate etc. related transactions take place on paper. There is a strong recent initiative to reduce paper based transaction, however digitization of archival data remains a big challenge for achieving this goal. Robust character segmentation is a challenge for many Indic scripts, and hence the accuracies of Optical Character Recognition (OCR) remain poor.

[003] Robust character segmentation is a challenge for many Indic scripts, and hence the accuracies of Optical Character Recognition (OCR) remain poor. An OCR engines fail on Indian scripts mainly because character segmentation is non-trivial. Segmenting words from scripts is relatively easier and thus creation of a word level dataset provides a viable alternative. This data can help applications such as indexing, transcription, OCR etc.

[004] Feature based word clustering is an alternative that is employed for word recognition. Further randomly initialized deep networks work well for object recognition. However the randomly initialized deep network are not fine-tuned for shape feature extraction.

[005] Although supervised feature based word clustering, which is the method that is currently employed for word clustering is ava however, this method requires large amount of training data, computing resources and takes long time for training.

[006] Prior art literature have illustrated various method for digitization of paper based data, however, digitization of paper based data especially for Indic languages is still considered as one of the biggest challenges of the technical domain.
SUMMARY OF THE INVENTION

[007] Before the present methods, systems, and hardware enablement are described, it is to be understood that this invention is not limited to the particular systems, and methodologies described, as there can be multiple possible embodiments of the present invention which are not expressly illustrated in the present disclosure. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present invention which will be limited only by the appended claims.

[008] The present application provides a method and system for unsupervised word image clustering.

[009] The present application provides a computer implemented method for unsupervised word image clustering, wherein said method comprises, capturing one or more image using at least one image capture device (200). In one embodiment at least one of the one or more image comprises at least one word image. The method further comprises extracting one or more feature vector for each of the at least one word image using an untrained convolution neural network architecture, wherein extraction comprises, applying, by a convolution module (210), a first convolution to the at least one input image using a first plurality of filters, wherein the first plurality of filters are initialized by random filter based deep learning techniques using Gaussian random variable with zero mean and unit standard deviation, and wherein the plurality of filters are constrained to sum to zero. The method further comprises, applying, by the convolution module (210) a Rectified Linear Unit (ReLU) non-linearity to a first plurality of feature maps. In an embodiment, the first plurality of feature maps is generated as output of the first convolution. Further the method comprises applying, by a sub-sampling module (212) a first sub-sampling to increase the position invariance of the first plurality of feature maps, wherein sub-sampling comprises applying a pooling operation on non-overlapping segments of the first plurality of feature maps after application of the ReLU non-linearity; applying, by the convolution module (210), a second convolution on the output of the first sub-sampling using a second plurality of filters wherein the second plurality of filters are initialized by random filter based deep learning techniques using Gaussian random variable with zero mean and unit standard deviation, and wherein the plurality of filters are constrained to sum to zero. Further the method comprises, applying, by the convolution module (210), a Rectified Linear Unit (ReLU) non-linearity to a second plurality of feature maps, wherein the second plurality of feature maps are generated as output of the second convolution. The method further comprises applying, by the sub-sampling module (212), a second sub-sampling to increase the position invariance of the second plurality of feature maps, wherein sub-sampling comprises applying a pooling operation on non-overlapping segments of the second plurality of feature maps after application of the ReLU non linearity. The method further comprises, combining, by a combination module (214), the plurality of feature maps generated at the output of the second subsampling for extracting one or more feature vector. Finally, a graph clustering module (216), clusters the one or more word images, wherein clustering is based on the one or more feature vector.

[0010] The present application provides a system (102) for unsupervised word image clustering; said system (102) comprising at least one image capture device (200) operatively coupled to the system (102), a processor (202), an interface (204), and memory (206) comprising thereon instructions to: capture one or more image using at least one image capture device (200), wherein at least one of the one or more image comprises at least one word image, extract one or more feature vector for each of the at least one word image using an untrained convolution neural network architecture. In an embodiment, extraction is performed by a convolution module (210), configured to apply, a first convolution to the at least one input image using a first plurality of filters, wherein the first plurality of filters are initialized by random filter based deep learning techniques using Gaussian random variable with zero mean and unit standard deviation, and wherein the plurality of filters are constrained to sum to zero. In another embodiment, the convolution module (210), is further configured to apply, a Rectified Linear Unit (ReLU) non-linearity to a first plurality of feature maps, wherein the first plurality of feature maps is generated as output of the first convolution. Feature extraction further comprises, a sub-sampling module (212), configured to apply, a first sub-sampling to increase the position invariance of the first plurality of feature maps, wherein sub-sampling comprises applying a pooling operation on non-overlapping segments of the first plurality of feature maps after application of the ReLU non-linearity. In yet another embodiment, the convolution module (210) is further configured to apply, a second convolution on the output of the first sub-sampling using a second plurality of filters wherein the second plurality of filters are initialized by random filter based deep learning techniques using Gaussian random variable with zero mean and unit standard deviation, and wherein the plurality of filters are constrained to sum to zero. In another aspect the convolution module (210) is further configured to apply, a Rectified Linear Unit (ReLU) non-linearity to a second plurality of feature maps, wherein the second plurality of feature maps are generated as output of the second convolution. In yet another embodiment, the sub-sampling module (212), further configured to apply, a second sub-sampling to increase the position invariance of the second plurality of feature maps, wherein sub-sampling comprises applying a pooling operation on non-overlapping segments of the second plurality of feature maps after application of the ReLU non linearity, and a combination module (214), configured to combine, the plurality of feature maps generated at the output of the second subsampling for extracting one or more feature vector. The system finally clusters, the one or more word images using a graph clustering module (216)., wherein clustering is based on the one or more feature vector,

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The foregoing summary, as well as the following detailed description of preferred embodiments, are better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the drawings exemplary constructions of the invention; however, the invention is not limited to the specific methods and system disclosed. In the drawings:

[0012] Figure 1: shows a network implementation (100) of a system (102) for unsupervised word image clustering in accordance with an embodiment of the disclosed subject matter;

[0013] Figure 2: shows a block diagram illustrating the system (102) for unsupervised word image clustering in accordance with an embodiment of the disclosed subject matter;

[0014] Figure 3: shows a flow chart illustrating steps for unsupervised word image clustering in accordance with an embodiment of the disclosed subject matter;

[0015] Figure 4: shows a flowchart illustrating steps for unsupervised word image clustering based word image clustering in accordance with an embodiment of the disclosed subject matter;

DETAILED DESCRIPTION OF THE INVENTION
[0016] Some embodiments of this invention, illustrating all its features, will now be discussed in detail.
[0017] The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items.

[0018] It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, the preferred, systems and methods are now described.

[0019] The disclosed embodiments are merely exemplary of the invention, which may be embodied in various forms.

[0020] The elements illustrated in the Figures inter-operate as explained in more detail below. Before setting forth the detailed explanation, however, it is noted that all of the discussion below, regardless of the particular implementation being described, is exemplary in nature, rather than limiting. For example, although selected aspects, features, or components of the implementations are depicted as being stored in memories, all or part of the systems and methods consistent with the attrition warning system and method may be stored on, distributed across, or read from other machine-readable media.

[0021] The techniques described above may be implemented in one or more computer programs executing on (or executable by) a programmable computer including any combination of any number of the following: a processor, a storage medium readable and/or writable by the processor (including, for example, volatile and non-volatile memory and/or storage elements), plurality of input units, and plurality of output devices. Program code may be applied to input entered using any of the plurality of input units to perform the functions described and to generate an output displayed upon any of the plurality of output devices.

[0022] Each computer program within the scope of the claims below may be implemented in any programming language, such as assembly language, machine language, a high-level procedural programming language, or an object-oriented programming language. The programming language may, for example, be a compiled or interpreted programming language. Each such computer program may be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a computer processor.

[0023] Method steps of the invention may be performed by one or more computer processors executing a program tangibly embodied on a computer-readable medium to perform functions of the invention by operating on input and generating output. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, the processor receives (reads) instructions and data from a memory (such as a read-only memory and/or a random access memory) and writes (stores) instructions and data to the memory. Storage devices suitable for tangibly embodying computer program instructions and data include, for example, all forms of non-volatile memory, such as semiconductor memory devices, including EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROMs. Any of the foregoing may be supplemented by, or incorporated in, specially-designed ASICs (application-specific integrated circuits) or FPGAs (Field-Programmable Gate Arrays). A computer can generally also receive (read) programs and data from, and write (store) programs and data to, a non-transitory computer-readable storage medium such as an internal disk (not shown) or a removable disk.

[0024] Any data disclosed herein may be implemented, for example, in one or more data structures tangibly stored on a non-transitory computer-readable medium. Embodiments of the invention may store such data in such data structure(s) and read such data from such data structure(s).

[0025] The present application provides a computer implemented method and system for unsupervised word image clustering.

[0026] Referring to Fig. 1, a network implementation 100 of a system 102 for unsupervised word image clustering is illustrated, in accordance with an embodiment of the present subject matter. Although the present subject matter is explained considering that the system 102 is implemented on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like. In one implementation, the system 102 may be implemented in a cloud-based environment. It will be understood that the system 102 may be accessed by multiple users through one or more user devices 104-1, 104-2…104-N, collectively referred to as user devices 104 hereinafter, or applications residing on the user devices 104. Examples of the user devices 104 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 104 are communicatively coupled to the system 102 through a network 106.

[0027] In one implementation, the network 106 may be a wireless network, a wired network or a combination thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.

[0028] In one embodiment of the invention, referring to Fig. 2 block diagram illustrating a system (102) for unsupervised word image clustering is disclosed. The system (102) comprises at least one image capture device (200) which is configured to capture one or more image. In an embodiment, at least one of the one or more image comprises at least one word image. The system (102) further comprises a convolution module (210) which is configured to apply, convolution on the at least one input image using a first plurality of filters. In an embodiment, the first plurality of filters are initialized by random filter based deep learning techniques. In an aspect of the present invention, Gaussian random variable are used with zero mean and unit standard deviation for initialization of filters. Further, the first plurality of filters are constrained to sum to zero which allows inherent of edge features from the one or more images. In an aspect, the convolution filters may be constrained to sum to zero as per equation (1).
(1)

[0029] Referring to equation (1), f denotes a random convolution filter, M and N denotes width and height of the filter respectively.

[0030] In an embodiment, the output of the convolution module (210) after application of a first convolution comprises a set first plurality of feature maps. In an embodiment the convolution module (210), further configured to apply, a Rectified Linear Unit (ReLU) non-linearity to the first plurality of feature maps as per equation (2).
Wi = Ri max(0, Fi) (2)

[0031] Referring to equation (2), Fi denotes ith feature map and Ri denotes gain coefficient associated with it. In one aspect, Gain coefficients are chosen from a uniform random variable in the range (0,1).

[0032] The system (102) further comprises a sub-sampling module (212). In one aspect of the disclosed invention, the sub-sampling module (212) is configured to apply a first sub-sampling to increase the position invariance of the first plurality of feature maps. Sub-sampling comprises applying a pooling operation on non-overlapping segments of the first plurality of feature maps after application of the ReLU non-linearity by the convolution module (212). In an embodiment A block of k x k is replaced by its average or max value which in turn reduces the dimension of the first plurality of feature maps.

[0033] In an embodiment the convolution module (210) may further be configured to apply a second convolution on the output of the first sub-sampling using a second plurality of filters. In an embodiment, the second plurality of filters are initialized by random filter based deep learning techniques. In an aspect of the present invention, Gaussian random variable are used with zero mean and unit standard deviation for initialization of filters. Further, the second plurality of filters are constrained to sum to zero which allows inherent of edge features from the one or more images. In an embodiment, the output of the convolution module (210) after application of a first convolution comprises a set first plurality of feature maps. In an embodiment the convolution module (210), further configured to apply, a Rectified Linear Unit (ReLU) non-linearity to the first plurality of feature maps as per equation (1).

[0034] Further the sub-sampling module (212) is configured to apply, a second sub-sampling. Sub-sampling is applied to increase the position invariance of the second plurality of feature maps. In an embodiment sub-sampling comprises applying a pooling operation on non-overlapping segments of the second plurality of feature maps after application of the ReLU non linearity. ). In an embodiment A block of k x k is replaced by its average or max value which in turn reduces the dimension of the first plurality of feature maps.

[0035] The system (102) further comprises a combination module configured to combine, the plurality of feature maps generated at the output of the second subsampling for extracting one or more feature vector. In an embodiment the one or more feature vector is normalized to have a zero mean and unit norm.

[0036] Further the system (102) comprises a graph clustering module (216) configured to cluster the one or more word images based on the extracted one or more feature vector. In an embodiment clustering com2prises of two stages. In a first stage, similar word images are clustered using graph connected component, wherein similarity is determined based on the one or more feature vector, extracted corresponding to each of the one or more word image. In a second stage, reaming word images which are not clustered during the first stage are evaluated against formed clusters and assigned to a cluster based on the evaluation.

[0037] In one embodiment, during the first stage a normalized cross correlation similarity graph G is generated on the word features as per equation (3).
G = UTU (3)

[0038] Referring to equation (3), U denotes the feature vector matrix where features are combined as columns. In an embodiment, the dimension of U is d x n where d denotes the dimension of the feature vector and n denotes the number of words in the dataset. Further, an adjacency matrix A is obtained by thresholding G with a pre-defined threshold as per equation (4).
A(i, j) = 1 such that, G(i, j) = tstrong (4)

[0039] Referring to equation (4), tstrong indicates a first similarity threshold value used. Values less than the tstrong are set to zero. Further, graph connected component analysis is applied on A to find strongly connected points in the data.

[0040] Advantages of this method is that it computes the number of clusters automatically and it is computationally efficient. In an embodiment, in order to tackle noise in the data and to avoid errors in the initial clusters, tstrong is set to a high value. Thereafter, patterns with strong similarity are chosen in the process. The clusters generated are then used as a reference to carry out assignment for the remaining word images.

[0041] In the second stage, a mean vector for each cluster obtained in the first stage is calculated. In an embodiment, horizontal and vertical linear shifts are applied, to each individual feature map of the mean vector. Therefore, from each cluster mean, multiple feature vectors are obtained which are shifted versions of each other.

[0042] In an aspect, the evaluation of similarity of remaining word images with clustered word images (clusters) is calculated as per equation (5)
Ym x p = XT m x d Vd x p (5)

[0043] Referring to equation (5), X denotes the set of feature vectors obtained with transformations of the mean vectors while every column of V corresponds to the feature vector of an unassigned word image. The dimension of X is d x m where d indicates dimension of the feature vector and m indicates the number of vectors obtained after transformations of the mean vectors. The dimension of V is d x p where p indicates the number of unassigned points. The assignment of a test point is found by maximizing the similarity value across the clusters.

[0044] In an embodiment, a refuse-to-predict analysis is used and a word image is assigned to a cluster if the similarity exceeds a pre-defined threshold as per equation (6).
(6).

[0045] Referring to equation (6), Y(i,j) denotes the similarity of a jth test point with ith training point. C(j) indicates the cluster label assigned to the jth test point. tsoft indicates a second similarity threshold used for cluster assignment. The value of tsoft is set relatively lower as compared to tstrong. .

[0046] Referring now to Fig. 3 a flow chart illustrating steps for unsupervised word image clustering in accordance with an embodiment of the disclosed subject matter is shown. The process starts at step 302, where one or more image is captured using at least one image capture device (200), wherein at least one of the one or more image comprises at least one word image. At the step 304, one or more feature vector are extracted, for each of the at least one word image, using an untrained convolution neural network architecture. The steps for extraction of one or more feature vectors are will be explained in detail in the following paragraphs. At the step 306 a first stage clustering is performed. In an embodiment, the first stage clustering comprises clustering word images which are similar using a graph connected component analysis, wherein similarity is determined based on the extracted feature vector. At the step 308 a second stage clustering is performed. In an embodiment the second stage clustering comprises clustering a remaining word images which are not clustered during the first stage by evaluating the remaining images against the clusters formed during the first stage and assigning the remaining word images to the clusters based on the evaluation by using a refuse to predict analysis.

[0047] Referring now to Fig. 4 a flowchart illustrating steps for feature extraction during unsupervised word image clustering in accordance with an embodiment of the disclosed subject matter is shown. The process starts at step 402 wherein at least one input image is provided to the system for word image clustering.

[0048] At the step 404, a first convolution is applied, by a convolution module (210), to the at least one input image using a first plurality of filters. In an embodiment the first plurality of filters are initialized by random filter based deep learning techniques using Gaussian random variable with zero mean and unit standard deviation. In another embodiment the plurality of filters are constrained to sum to zero. Further in an embodiment a Rectified Linear Unit (ReLU) non-linearity is applied to a first plurality of feature maps, wherein the first plurality of feature maps is generated as output of the first convolution.

[0049] At the step 406, a first sub-sampling is applied, by a sub-sampling module (212), wherein the sub-sampling increases the position invariance of the first plurality of feature maps. In an embodiment sub-sampling comprises applying a pooling operation on non-overlapping segments of the first plurality of feature maps after application of the ReLU non-linearity.

[0050] At the step 408, a second convolution is applied, by the convolution module (210), on the output of the first sub-sampling using a second plurality of filters. In an embodiment the second plurality of filters are initialized by random filter based deep learning techniques using Gaussian random variable with zero mean and unit standard deviation. In another embodiment the plurality of filters are constrained to sum to zero. Further in yet another embodiment a Rectified Linear Unit (ReLU) non-linearity is applied to a second plurality of feature maps, wherein the second plurality of feature maps is generated as output of the second convolution.

[0051] At the step 410, a second sub-sampling is applied, by the sub-sampling module (212), wherein sub-sampling increases the position invariance of the second plurality of feature maps. In an embodiment, sub-sampling comprises applying a pooling operation on non-overlapping segments of the second plurality of feature maps after application of the ReLU non linearity.

[0052] At the step 412, the plurality of feature maps generated at the output of the second subsampling are combined by a combination module (214), for extracting one or more feature vector.

Documents

Application Documents

#	Name	Date
1	Form 3 [15-03-2016(online)].pdf	2016-03-15
3	Form 18 [15-03-2016(online)].pdf	2016-03-15
4	Drawing [15-03-2016(online)].pdf	2016-03-15
5	Description(Complete) [15-03-2016(online)].pdf	2016-03-15
6	REQUEST FOR CERTIFIED COPY [23-02-2017(online)].pdf	2017-02-23
7	Form 3 [08-03-2017(online)].pdf	2017-03-08
8	201621009008-Power of Attorney-100516.pdf	2018-08-11
9	201621009008-Form 1-100516.pdf	2018-08-11
10	201621009008-Correspondence-100516.pdf	2018-08-11
11	201621009008-CORRESPONDENCE(IPO)-(CERTIFIED)-(1-3-2017).pdf	2018-08-11
12	201621009008-FER.pdf	2020-02-24
13	201621009008-OTHERS [24-08-2020(online)].pdf	2020-08-24
14	201621009008-FER_SER_REPLY [24-08-2020(online)].pdf	2020-08-24
15	201621009008-COMPLETE SPECIFICATION [24-08-2020(online)].pdf	2020-08-24
16	201621009008-CLAIMS [24-08-2020(online)].pdf	2020-08-24
17	201621009008-PatentCertificate16-10-2023.pdf	2023-10-16
18	201621009008-IntimationOfGrant16-10-2023.pdf	2023-10-16

Search Strategy

1	201621009008_13-02-2020.pdf

ERegister / Renewals

3rd: 16 Jan 2024

From 15/03/2018 - To 15/03/2019

4th: 16 Jan 2024

From 15/03/2019 - To 15/03/2020

5th: 16 Jan 2024

From 15/03/2020 - To 15/03/2021

6th: 16 Jan 2024

From 15/03/2021 - To 15/03/2022

7th: 16 Jan 2024

From 15/03/2022 - To 15/03/2023

8th: 16 Jan 2024

From 15/03/2023 - To 15/03/2024

9th: 16 Jan 2024

From 15/03/2024 - To 15/03/2025

10th: 03 Mar 2025

From 15/03/2025 - To 15/03/2026