Abstract: In recent years, trillions of images have been shared on various types of social media platforms on an everyday basis. Social media’s wide range of features dominates the globe by making the searching process complex; identification of relevant objects or images turns out to be impossible due to the varying redundancy levels. The high-level image visuals are preferably represented in the form of feature type vectors comprising of numerical values, and hence lack semantic representation of image features. The Content Based Image Retrieval (CBIR) can differentiate the images based on lower level factors like shape, spatial layout, color, and texture. The CBIR approaches are available, which use the classic similarity measure that focuses on extraction results but less on computation time and computational complexity. Thus, the CBIR system needs effective and efficient image retrieval with minimum time and computational complexity. The utilization of deep learning techniques in Content Based Image Retrieval (CBIR) is expanding rapidly. The picture recognition process relies on its shape, features, and tag for identification. The process of tagging the image is employed in our invention to facilitate the identification of the objects. The Geon similarity model is utilized to extract the utmost similarity between several photos through its precise and efficient computational techniques. In our invention, the enhanced Grey Wolf Optimization (GWO) algorithm and the novelty-based Convolution Neural Network (CNN) model, specifically ResNet-50 is used as a hashing technique and classifier. 4 Claims &2 Figures
Description:Field of Invention
The huge volume of resources residing on the internet could be assessed and preserved by everyone. The extensive growth and development of digital technologies has led to an endless rise in the creation and storage of visual data. Content Based Image Retrieval (CBIR) constitutes a platform that surmounts the various types of issues as they are dependent on the visual examination of the data that appears to be an element of the Query Image. Deep learning techniques exhibit its maximum performance level and perform improvised extractions of the content from the data in the process. Deep learning constitutes one of the classification approaches of soft computing phenomenon that retrieves data from millions of distributed images.
Background of the Invention
The analysis of multimedia content is quite important in understanding the semantic meaning of multimedia documents. The performance of multimedia content analysis is diversified in real world computer vision applications. The significant part of multimedia data is digital images. Over a trillion images are shared every day on Facebook, Instagram, and other social media platforms like Twitter. Therefore, a research committee on computer vision is finding it difficult to describe the images and to analyze the relevant or closely related images from a pool of image collections. Most search engines still rely on outdated text-based approaches that rely on metadata, keywords, and captions to retrieve images (US10346677B2). Therefore, there is a big discrepancy between how the research represents picture elements and how human visuals are understood. As a result, there is a lot of modern, extensive study into picture categorization, analysis, and content-based image retrieval. In the CBIR, feature vectors—one-dimensional matrices with numerical values—are used to mark high-level image views.
A study on the CBIR with the help of the hybrid features and the distance metrics specifies the hybrid types of features were integrated with the different feature descriptors comprising of the spatial features, the frequency domain, and Gabor Wavelet Transform. Further, to increase the efficiency of the retrieval system, the Binarized Statistical Image Features (BSIF), and the Color and Edge Directivity Descriptors (CEDD) were used. Features were preferably retrieved with the help of the BSIF, the CEDD, the HSV color histogram, and the color moment, correspondingly. The features that were appropriately retrieved with the help of the HSV histogram were found to comprise the color quantization and the color space conversions (US10271098B2). The conversion of an RGB image to a grayscale and the selection of a patch from the grayscale image are both components of feature extraction by using the BSIF. Different experiments were performed on the introduced methodology, and the outcomes thus obtained were found to portray the efficiency of the proposed technique in comparison with the existing methodologies.
Finding other photos in the image dataset that are comparable to the query image is possible with the help of content-based image retrieval techniques. Search engine optimization for Google relies heavily on the well-known CBIR. The NUS-WIDE dataset includes low-level features, ground truth, tags, concepts, image lists, and image URLs; to retrieve images, the research use a modified CNN. A median filter is used to preprocess the image. To retrieve images from content-based databases, feature extraction is the main component. After doing some preliminary processing on the image to remove any duplicates, we can calculate its redundancy.
Image retagging schema utilizing a collaborative tag propagation mechanism work flow begins with the user provided tags that appears to be imprecise and incomplete in nature. Followed by which it would involve in the task of learning the tag specific visual sub vocabulary pertaining to the individual tags provided by the users (US10929671B2). With the accomplishment of the procedure, the tag specific image similarity graph together with the multiple tag specific similarity graphs would be framed. The estimation expenditure is minimized by partitioning the tag similarity graph into sub graphs using the normalized cut algorithms. At the end, the retagging outcomes thus obtained illustrate the improvement in the quality of the tag. Yet, the technique fails to consider the correlation aspect among the concerned labels. For evaluating the introduced method, various experiments on the PRW, the MARS, the Market-1501 and the CUHK03 databases
Image retrieval has been considered a required task as images and stored data have increased the size of memory disks in image processing. Sezavar et al. (2019) introduced a robust methodology for the Content-based image extraction process using a combination of the convolutional neural network and the sparse representation. This model similar to any extraction algorithm comprises of two main procedures, namely the feature extraction and the online retrieval. The Offline training procedure comprises of the network fine-tuning parameter for the individual database and the computing feature matrix, while on the other hand the online procedure would receive a query image from the user and would represent the nearest images using a combination of the sparse representation and the CNN. For the purpose of retrieving the features, a pre-trained AlexNet is used for training the datasets. With the completion of the training procedure, the filter parameters would be trained appropriately for classifying the images to their respective classes. Feature vector estimation is performed for the individual images in the database thus producing a bank of feature vectors called as the feature matrix. The feature vector of a particular query image is thus retrieved from the final layer of the CNN. One of the simplest methodologies for identifying the nearest images to the query image is Euclidian distance. Here, instead of estimating the distances between the images, a sparse representation technique can be adopted for the purpose of extracting the images with similar features. The experimental outcomes on the Corel, the ALOI and the MPEG-7 datasets illustrate the superior performance level of the introduced system in terms of accuracy or speed, in comparison with the other existing techniques.
Summary of the Invention
Content Based Image Retrieval is a procedure that uses tagging, recognition, and retrieval methods to gather the necessary images. Applying the suggested techniques to the NUS-WIDE dataset yields impressive results in terms of recall, accuracy, precision, and Mean Average Precision compared to various existing methods. The methods include the Geon similarity model for feature extraction, modified Grey-Wolf Optimization for feature selection, and an Enhanced CNN classifier. The retrieval and similarity are examined by calculating the Wasserstein distance. To recover the images, we compare the query with the source image databases using minimum threshold values to guarantee high similarity and accuracy in the images. Images from the real-time dataset can be successfully retrieved using the deep learning methods employed in this study. It improves the identification rate and meets user needs with little response time, and it works well with datasets with more dimensions.
Brief Description of Drawings
Figure 1: Block diagram of the CBIR framework
Figure 2: RGB information of query image after median filtering
Detailed Description of the Invention
Image preprocessing, duplicate image removal, feature extraction, recognition, and picture retrieval are some of the technique stages used in this study. The images used in the research are from the NUS WIDE dataset. As seen in figure 1, the novel-based convolutional neural network method retrieves the output image with a high recognition rate and accuracy through the use of improved hashing algorithms.
Preprocessing includes the processes of resizing and noise removal. During resizing phase, different-dimensional images (203×240, 240×180, etc.) are resized into 256×256 dimensional images. During noise removal phase, median filter is used to increase the clarity of the output images. The median filter outperforms all other filters because it is resilient and less affected by extreme values. The median values are unaffected by the misplaced neighborhood values. After sorting all the pixel values from each window into statistical values, the median pixel value is calculated by inserting the pixel that was taken as the median. The median filters' methodology for preprocessing makes use of the windowing based method. A non-linear method, median filtering is used to reduce impulsive noise. It helps keep the margins of the photos intact while reducing the random noise. Irregularities in the data transmission might lead to spurious noise. The median intensity pixel value controls the processed pixel's output intensity as the median filter window advances over the image. Assume that the processed pixel has a value of 55 and the window pixels have values of 5, 6, 55, 10, and 5.After being sorted by 5, 5, 6, 10, and 55 during the median filtering procedure, the pixel values are displayed in that order. Then it is transferred from the 6th pixel in the middle to the 55th pixel in the processed image. Then, the middle value of five, is the result of the median filter and the present pixel position.
The image is represented as three channels and which are each 8bits in the RGB color space (Red =8 bits, Green=8 bits and Blue=8 bits) which is shown in Figure 2.The duplicated images in the processed dataset are removed. The dimensions of the image are (256 Width, 256 Height, 3 Channels).Therefore, 65536 pixels are used to represent an image. The similarity of the images is computed by comparing two images from the query and source image databases and generating the value of similarity. When the output is low, it indicates that both photos are more similar. If the result is 0, it indicates that the two photos are identical. The similarity computation in this context employs the techniques of Complete Similarity (CS), Type Similarity (TS), Hypernym Hyponym similarity (HH), and the newly developed Geon similarity (NS). Complete Similarity builds a correlation between thoughts that share common characteristics. The United States of America and the U.S.A. denote the identical geographical place. Type similarity refers to the correlation between two ideas that possess distinct features of the same item. For example, Goldfish and angelfish are both classified as fish, although they possess distinct traits. A hypernym-hyponym relationship is a hierarchical connection that indicates that object X belongs to the category of thing Y. For example, the river Nile is a specific instance of a geographical place, namely in Egypt, which provides the contextual meaning for both. Geon Similarity is determined by analyzing the collinearity, symmetry, curvature, parallelism, and cotermination of the boundaries and edges of the pictures. In this context, the Geon similarity principle is utilized to determine the arrangement of a select number of genes from a limited collection that is capable of distinguishing at least 10 entities from a large pool of thousands. The image is acquired through the process of the similarity model when it is provided as input for labeling. In Labeling, the tags (classes or concepts) are allotted to every image in the database. At the end of this process, imprecise tags are transformed into precise tags for each database image with the help of similarity models. Therefore, the precise labeled dataset contains m×n entries, where m is the number of images in the database and n is the number of unique tags (81 classes). Here, the labeled images are sent to the novel based modified CNN.
Feature extraction is a process that takes a set of computed data and transforms it into derived features that are both non-redundant and relevant. This is done through a series of ordered processes that enhance human interpretation and facilitate learning.The color histogram of an image is a representation of the distribution of colors within the image. A color histogram in digital photographs represents the number of pixels inside specific color ranges, including the spaces between the pixels, and encompasses all possible colors. Color Moments examines the distribution of color characteristics within an image. By comparing the digital image with precomputed features to another image and analyzing the probability distribution, it is feasible to detect and extract information regarding the similarities between the two images. The color correlogram provides a precise description of how the spatial correlation of color pairs evolves with distance. The color histograms and moments solely capture the color distribution and do not provide any information regarding spatial correlation. The Edge Direction Histogram is used to determine the distribution of boundary points in each direction. The calculation involves determining the number of pixels in each direction specified by the user. It is utilized to ascertain the accurate form and arrangement of the image by delineating its borders. The high-frequency data of a digital image utilizes wavelet transform to identify the appropriate frequency range based on the signal's qualities and characteristics. It relies on time frequency analysis. The study involves extracting several dimensional features from different image components. Specifically, 64-dimensional features are recovered from the Color Histogram, 225-dimensional features from Color Moments, 144-dimensional features from Color Correlogram, 73-dimensional features from Edge Direction Histogram, and 128-dimensional features from Wavelet Texture. Consequently, a total of 634 dimensional characteristics are taken from each image in the database and utilized as input for the feature optimization process.
The hashing technique is employed to create an index for data with multiple dimensions, specifically multimedia material such as films and photographs. The research use Grey Wolf Optimization (GWO) to accurately identify the most relevant aspects in the image. The objective value used in the dataset images is the hash function. The GWO algorithm is used to optimize the extracted feature images, which contain valuable information, by calculating their lower and upper bounds. To attain a high level of accuracy in picture retrieval, the optimization procedure is iterated 500 times. The coefficient parameters are computed. Each coefficient parameter is compared with the derived fitness value, and the score and position of the wolves are updated for each iteration. The primary purpose of the GWO algorithm was initially to address continuous optimization landscapes. In this context, each wolf adjusts its position inside a search domain that consists of real-valued variables and is limited by the restrictions of the individual problem. Hence, to effectively employ the GWO, it is necessary to develop a method for converting actual values into binary space. Various methods and techniques can be employed at various phases of the optimization process to convert from continuous to binary representation.
The feature optimization technique proposed is founded on the Grey Wolf Optimization (GWO) algorithm. During the hunting process, alpha members of the group identify a prey, after which the team proceeds to follow, pursue, and ultimately approach it. Next, they surround and intimidate the prey until it comes to a stop. The wolves steadily advance towards their prey until they reach a close proximity, at which point the dominant alphas (a) and subordinate betas (b) wolves initiate the attack. Currently, deltas and omegas are in a state of anticipation. If the prey manages to evade capture, the predators adjust their positions based on the prey's updated location. As a result of the social hierarchy among wolves, it is expected that the top three wolves (a, b, and d) possess greater knowledge regarding the whereabouts of the prey compared to the omegas. Thus, it is imperative for all wolves to adjust their postures in accordance with the positions of a, b, and d. By employing iterative surrounding and hunting techniques, one can effectively pinpoint the optimal position of the target (prey). During this stage, the 634-dimensional features are refined and transformed into 457-dimensional characteristics for each image. These features are then utilized as input for the improved CNN Resnet-50 classifier.
As a result, the proposed modified CNN combines features from the labeling stage and the hash-based GWO stage to perform hash learning in the latent space, including 16 bits, 24 bits, 32 bits, 48 bits, 64 bits, and 128 bits. In this work, 5000 images are randomly chosen from the training dataset (150679 images) in order to learn hash code for the purpose of improving the discriminative ability of the classifier. During Learning process, every image is represented by a unique hash code in binary form.
4 Claims & 2 Figures , Claims:The scope of the invention is defined by the following claims:
Claim:
1. A System/Method for Recognition of Content-Based Images using Enhanced CNN Approach comprising the steps of:
a) A median filter is used to increase the clarity of the output images. The median filter outperforms all other filters because it is resilient and less affected by extreme values. The median values are unaffected by the misplaced neighborhood values.
b) The image is represented as three channels and which are each 8bits in the RGB color space (Red =8 bits, Green=8 bits and Blue=8 bits). The duplicated images in the processed dataset are removed. The dimensions of the image are (256 Width, 256 Height, 3 Channels).Therefore, 65536 pixels are used to represent an image. The similarity of the images is computed by comparing two images from the query and source image databases and generating the value of similarity. When the output is low, it indicates that both photos are more similar.
c) A classifier is designed for improving the accuracy of the recognition system. The hashing technique is employed to create an index for data with multiple dimensions, specifically multimedia material such as films and photographs.
2. A System/Method for Recognition of Content-Based Images using Enhanced CNN Approach as claimed in claim1, the Geon similarity is applied to complex objects in order to identify visual similarities between objects.
3. A System/Method for Recognition of Content-Based Images using Enhanced CNN Approach as claimed in claim1, the Grey-Wolf Optimization technique is designed to select highly relevant features of image using hash based.
4. A System/Method for Recognition of Content-Based Images using Enhanced CNN Approach as claimed in claim1, the enhanced convolutional neural network is used to increase the performance of classification.
| # | Name | Date |
|---|---|---|
| 1 | 202441049925-REQUEST FOR EARLY PUBLICATION(FORM-9) [29-06-2024(online)].pdf | 2024-06-29 |
| 2 | 202441049925-OTHERS [29-06-2024(online)].pdf | 2024-06-29 |
| 3 | 202441049925-FORM-9 [29-06-2024(online)].pdf | 2024-06-29 |
| 4 | 202441049925-FORM FOR STARTUP [29-06-2024(online)].pdf | 2024-06-29 |
| 5 | 202441049925-FORM FOR SMALL ENTITY(FORM-28) [29-06-2024(online)].pdf | 2024-06-29 |
| 6 | 202441049925-FORM 1 [29-06-2024(online)].pdf | 2024-06-29 |
| 7 | 202441049925-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [29-06-2024(online)].pdf | 2024-06-29 |
| 8 | 202441049925-EDUCATIONAL INSTITUTION(S) [29-06-2024(online)].pdf | 2024-06-29 |
| 9 | 202441049925-DRAWINGS [29-06-2024(online)].pdf | 2024-06-29 |
| 10 | 202441049925-COMPLETE SPECIFICATION [29-06-2024(online)].pdf | 2024-06-29 |