Specification
FIELD OF INVENTION
[0001] This invention relates to image searches and comparison. Several use case examples are provided for the field of intellectual property, viz.―trademarks, designs and further for image search as it applies to trademarks, designs, and copyrights. However, the embodiments described herein may be used elsewhere, such as in designing machine vision and artificial intelligence as applied to pattern recognition.
BACKGROUND
[0002] Trademarks play an important role in trade, commerce and industry. Trademarks may be a device / logo of the company, and signifies the source of products or services in the market. Trademarks may also be associated with quality or the kind of service, associated with a source.
[0003] It is customary across jurisdictions to apply for, and obtain a trademark. Attorneys, applicants, users, proprietors of the trademark system usually conduct a search for existing marks in the jurisdiction specific trademark journal before filing for one. A text search is easier as compared to a logo / device search.
[0004] Once a trademark is granted, it is the duty of the trademark owner to police the mark and weed out infringers thereof by conducting searches of granted marks or marks applied for. This search again is easier for word marks, and is difficult for device / logo marks
[0005] The trademark office while considering whether to grant a trademark applied for, also does an internal search on whether to grant the application or not. This requires comparison with existing applications / granted trademarks. Once again, this is easier for text / but difficult for device / logo marks.
[0006] With the growth of the economies, the trademark system has become widely adopted. This has led to computer / machine based searches to an extent, 1
but there are very different methods by which trademark offices, and / or users search for and compare device / logo marks.
[0007] There have been attempts globally to have a classification system for device marks (International Classification of Figurative Marks) commonly known as the Vienna Codification. But this system is processed manually at certain jurisdictions and others make use of computer / machines with a human operator. Regardless, there is significant backlog at trademark offices for examination of device trademark / design applications. Trademark Examiners and supervisors have an ever-increasing backlog because of these issues.
[0008] For example, general notes to the International Classification of the figurative Elements of Marks (hereafter “Vienna codification”) provides that: (a) Figurative elements should be placed in the different categories, divisions and sections on the basis of their shape, regardless of their material composition or the purpose of the object in which they are incorporated. Consequently, toys in the form of dolls, animals and vehicles are placed in the categories for human beings, animals and vehicles, respectively. Similarly, persons, animals or objects of any kind represented in pictures or sculptures, for instance, are placed in the categories for human beings, animals, or the objects concerned. If the pictures or sculptures are widely known and famous, they should also be placed in the division provided for that purpose (division 22.5). (b) The representation of an object forming part of another object should be classified under the same category, division and section as the object of which it forms part, unless expressly classified in another category, division and section. Thus, the bodywork of a motor vehicle is placed in sections 18.1.7 and A 18.1.9, as are motor vehicles, whereas the tyres, wheels or steering wheel of such a vehicle are placed in section 18.1.21, expressly intended for these parts of motor vehicles. (c) If a figurative element is presented in such a way that it is not possible to determine clearly whether it belongs to a given division or section, it should be placed in both divisions or sections, unless there is a special note to the contrary. If, for example, the representation of a human being is not such that it can be clearly determined whether it is a man or a woman, it is placed in both of the divisions 2.1 and 2.3. (d) The representation of an object or of a living creature similar to that of an object or of a living creature
2
mentioned in the text of a given section is placed in that section even if it is not expressly mentioned there or in any other section. (e) It is understood that, if a mark comprises several figurative elements, each of which having its own distinctive characteristic and being classified in a different category, division and section, these figurative elements should be placed in the appropriate different categories, divisions and sections. Thus, the label of a bottle comprising the representation of a castle and a characteristic form of writing is placed in the appropriate divisions and sections of categories 7 and 27; in the same way, the representation of a man in uniform, on horseback and playing a trumpet, is placed in the three sections 2.1.2, 2.1.20 and 2.1.9.
[0009] As one can see this system requires regular man – machine interaction for classification.
[0010]Hence there is a need to have a simple machine implemented system to compare an application with existing trademarks / logos that helps reduce operator time. There is also a need to machine classify device marks according to Vienna codification with either no human aid or minimal aid. There is also a need for a system that can quickly sort through and compare an application with existing marks, quickly as the global trademark portfolio has gone beyond 100 Million trademarks.
SUMMARY
[0011] Accordingly, it is an object of the present invention to provide a trademark search system that can be used to identify, compare existing device marks, logos, marks, quickly and accurately.
[0012] It is a general object of this invention to provide a system agnostic, browser based tool to compare and identify / classify images. Another object is to provide the methodology where the identification system can be used for multiple connected databases, such as the trademark journals of multiple jurisdictions, or commercial databases of companies, entities and the like.
[0013]Another object is for renditions of images that helps in analysis of images.
[0014]Other objects, features, and advantage of this invention will become apparent from the following detailed description. 3
BRIEF DESCRIPTION OF DRAWINGS
[0015] A more detailed understanding may be had from the following description, given by way of example in conjunction with the accompanying drawings wherein:
[0016] FIG. 1A illustrates a system for analyzing images under an embodiment of the invention;
[0017]FIG. 1B describes a flow chart of the process used in system as described in FIG 1A;
[0018]FIG. 2A – 2C represent obtaining rotational variants of an image;
[0019]FIG. 2D represents obtaining and extracting key elements;
[0020]FIG. 3A – 3B represent cleaning of an image;
[0021]FIG 4A – 4B depicts determination of an image’ background’;
[0022]FIG. 5 represents the algorithm as applied for image re-sizing;
[0023]FIG. 6 – 7 represents the images with padding and after application of the algorithm in FIG. 5;
[0024]FIG. 8 represents extracting of Zernike moments for a normalized image;
[0025]FIG. 9 – 10 represents extraction of structural similarity between images;
[0026]FIG. 11 – 13 represent extraction of various others vectors from the images using the Equations 2 – 4; and
[0027]FIG. 14A-14D represents the results of feature extraction and comparison for different shape forms
DETAILED DESCRIPTION
[0028] Embodiments are described herein that describe a system for creating a data collection of recognized logos / device marks from a trademark database. The system includes an image analysis module that is configured to programmatically analyze individual images and store each image data analysis in a database. Any new image is analyzed according to an algorithm and compared with collection of images / image data to determine closeness with existing images. The system may
4
also include a manual interface that is configured to (i) interface with one or more human operators, and (ii) displays a plurality of panels concurrently.
[0029] Individual panels may be provided for one or more analyzed images, and individual panels may be configured to display information that is at least indicative of the one or more images of that panel and / or of the information determined from the one or more images.
[0030] Additionally, the manual interface enables the one or more human editors to view the plurality of panels concurrently and to interact with each of the plurality of panels to classify images from a panel according to Vienna Codification system used to classify device marks.
[0031] One or more embodiments enable image analysis of content items that include images / logos / text and images.
[0032] As used herein, the term “image data” is intended to mean data that corresponds to or is based on discrete portions of a captured image. For example, with digital images, such as those provided in a JPEG format, the image data may correspond to data or information about pixels that form the image, or data or information determined from pixels of the image. Another example of “image data” is the signature or other non-textual data that represents a classification or identity of an object, as well as a global or local feature.
[0033] The terms recognize, or recognition, or variants thereof, in the context of an image or image data (e.g. recognize an image) is meant to means that a determination is made as to what the image correlates to, represents, identifies, means, and/or a context provided by the image. Recognition does not mean a determination of identity by name, unless stated so expressly, as name identification may require an additional step of correlation.
[0034] As used herein, the terms programmatic, programmatically or variations thereof mean through execution of code, programming or other logic. A programmatic action may be performed with software, firmware or hardware, and generally without user-intervention, albeit not necessarily automatically, as the action may be manually triggered. For example, a user may upload an image to compare it with existing images in the database to determine closeness to existing 5
trademarks or applications. Another user may also upload an image to categorize it according to Vienna codification of an existing image.
[0035]The embodiments described herein may be implemented using programmatic elements, often referred to as modules or components, although other names may be used. These programmatic elements may include a program, a subroutine, a portion of a program, or a software component or a hardware component capable of performing one or more stated tasks or functions. As used herein, a module or component, can exist on a hardware component independently of other modules/components or a module/component can be a shared element or process of other modules/components, programs or machines. A module or component may reside on one machine, such as on a client or on a server, or a module/component may be distributed amongst multiple machines, such as on multiple clients or server machines. Any system described may be implemented in whole or in part on a server, or as part of a network service. Alternatively, a system such as described herein may be implemented on a local computer or terminal, in whole or in part. In either case, implementation of system provided for in this application may require use of memory, processors and network resources (including data ports, and signal lines (optical, electrical etc.), unless stated otherwise.
[0036] The embodiments described herein generally require the use of computers, including processing and memory resources. For example, systems described herein may be implemented on a server or network service. Such servers may connect and be used by users over networks such as the Internet, or by a combination of networks, such as cellular networks and the Internet. Alternatively, one or more embodiments described herein may be implemented locally, in whole or in part, on computing machines such as desktops, cellular phones, personal digital assistances or laptop computers. Thus, memory, processing and network resources may all be used in connection with the establishment, use or performance of any embodiment described herein (including with the performance of any method or with the implementation of any system).
[0037] Furthermore, one or more embodiments described herein may be implemented through the use of instructions that are executable by one or more
6
processors. These instructions may be carried on a computer-readable medium. Machines shown in figures below provide examples of processing resources and computer-readable mediums on which instructions for implementing embodiments of the invention can be carried and / or executed. In particular, the numerous machines shown with embodiments of the invention include processor(s) and various forms of memory for holding data and instructions. Examples of computer-readable mediums include permanent memory storage devices, such as hard drives on personal computers or servers. Other examples of computer storage mediums include portable storage units, such as CD or DVD units, flash memory (such as carried on many cell phones and personal digital assistants (PDAs)), and magnetic memory. Computers, terminals, network enabled devices (e.g. mobile devices such as cell phones) are all examples of machines and devices that utilize processors, memory, and instructions stored on computer-readable mediums.
[0038] FIG. 1A illustrates a system for analyzing images under an embodiment of the invention. A system such as shown and described by an embodiment of FIG. 1 includes applications for enabling search and retrieval, and / or enabling display of programmatically determined information. As described with other embodiments herein, a system such as described with an embodiment of FIG. 1 may be used to enable a look up of published applications / trademarks and for classification of images according to the Vienna codification. FIG. 1 is a block diagram showing the implementation of the methods described herein at a processing unit.
[0039] FIG. 1 provides a system for image analysis according to the embodiments provided herein. The system of FIG. 1 includes an arrangement 102 for image acquisition, an image processing unit 104, with memory and display units. The image acquisition unit may be connected to an external scanner, or for user upload, or through an image database hyperlink.
[0040] The image processing unit 104 is arranged to process each image and compare it with images stored in a database (not shown). The processing unit 104 of FIG. 1 is configured with the described embodiments. 7
[0041] The processing unit 104 may comprises modules for analyzing images. The system includes modules in the form of image segmentizer, feature extraction, and analysis data generation.
[0042] In performing various analysis operations, processing unit may determine and / or use information that is descriptive or identifiable to objects shown in the images of the content items.
[0043] The processing unit 104 determines image similarity according to the following principles by quantifying shape similarity between images by using a mixture of region based and contour based methods supplemented by gestalt principles. The unit 104 classifies regions in an image according to Zernike Moments. Zernike Moments are orthogonal polynomials and are used to represent properties of an image with no redundancy or overlap of information between them. The unit 104 further includes processing for contours that are Fourier descriptors of a Concave Hull and Gestalt Principles for determining contour tree structure and arrangement of contours with respect to the center of mass of an image. Based on these principles, the processor 104 sorts results obtained from a combination of the contour, region and boundary techniques.
[0044] FIG. 1B describes a flow chart of the process used in system as described in FIG 1A. Each image / logo / device mark / design is uploaded the system and is programmatically scanned for artefacts. The process starts by reducing artefacts in the image that may be there due to scanning / compression differences. The resulting image is then converted to binary. Once converted to binary, the system detects the image background, and converts it into a black on white type. The resulting image / binary is then padded and the image mass / binary centered.
[0045]The system determines and plots relative positions of individual contours with respect to the center of mass of the image blob. From the plots generated, multiple attributes / descriptors / image properties are extracted. These attributes / image properties include weighted polygons, structural descriptors, Zernike moments, histograms of oriented gradients, Concave Hull and Fourier Descriptors.
[0046] The extracted attributes are then merged to obtain specific image / trademark / device / design mark data which is then stored in an image attribute
8
database. The stored data may be individual attributes / merged attributes for easier comparison and classification.
[0047] The individual elements of the embodiments are now explained in more detail.
[0048]Application of Gestalt Principles: Under Gestalt proximity, "objects or shapes that are close to one another appear to form groups". That is, even if the shapes, sizes, and objects are radically different, they will appear as a group if they are close enough. This grouping refers to the way smaller elements are "assembled" in a composition / image. This helps in outlining collective presence of the set of elements as it becomes more meaningful than their presence as separate elements. Further, elements which are grouped together create the illusion of shapes or planes in space, even if the elements are not touching.
[0049]Removal of scanning and compression artefacts and scale and position invariance: The image is cleaned using Gaussian Blur filter and related morphological operations. For rotational invariance, principal axes of the image are calculated and the image is rotated along one of them. To prevent orientation issues (principal axis = principal axis + 180°), the image is oriented in the direction where it’s bottom half contains more points than that of the top half. This ensures that all rotational variants of an image are now oriented in one particular direction as shown in FIG 2A-2C. The image is also scanned for position invariance and the key elements extracted as shown in FIG. 2D.
[0050] Automatic background detection: After morphological operations and converting the image to binary format, the densities of black and white pixels at the boundaries are determined. Depending the ratio of their magnitudes and the ratio of overall density of black and white pixels across the image, it is determined whether the image is black on white or vice versa.
[0051] Image Normalization: After detecting the background of the image, pad the image is padded with color for normalization. The distance distribution of all the points in the image with respect to the center of mass of the binary image is determined and the point that is at the farthest distance from the center of mass is identified. To simplify the process, canny filter may be applied. A distance profile is then determined since all the points that lie within a region are at a 9
shorter distance than those on the boundary. Following the determination of the distance of the farthest point from the center of mass, the image is padded with background color with the net effect of center of image becoming the center of mass of the combined blobs of the image.
[0052] Resizing image: The image may then be resized using nearest neighbor interpolation to minimally affect the output following canny filter application.
[0053] Reducing illustration to a weighted polygon: After applying canny filter / transformation, all the independent edges in a given image are determined. For each one of the contours in the image, the center of mass of the contours is determined and attributes such as area, eccentricity, aspect ratio etc. The image is reduced to a “weighted polygon” by calculating the relative positions of the center of mass of individual contours with respect to the center of mass of the aggregate contours. This may be used to quantify the similarity in terms of relative arrangement of contours with respect to the center of the image.
[0054] Contour Tree: Another metric that is used in conjunction with the weighted polygon is the structure of the contour tree. This helps to quantify the internal arrangements of all the contours in the image with respect to one another. The topological information contained in the image may be represented using a tree structure. Using this representation, the structural similarity between images is determined.
[0055] Zernike moments, Histogram of Oriented Gradients and Fourier Descriptors of the Concave hull of the image are now computed and merged into one large feature vector.
[0056] Similarity Computation: After applying the normalizing operation on the query image, the feature vectors – combined contour, region and gradient histogram vector and the weighted polygon are extracted, and compared with corresponding vectors in the database. The images that are closest in terms of Minkowski distance with respect to the merged vector are sorted using results weighted polygon similarity and tree similarity.
[0057] The embodiments are now described specifically. Any image is cleaned using Gaussian Blur and morphological operations of kernel sizes (5, 5) and (3,3) respectively as shown in FIG. 3A - FIG.3B. 10
𝐺 (𝑥,𝑦)= 12𝜋𝜎2𝑒− ( 𝑥2 + 𝑦2 )2𝜎2 ... Equation 1
[0058]Calculating the extreme points and automatic background detection: After morphological operations and converting the image to binary format, the densities of black and white pixels at the boundaries. Depending the ratio of their magnitudes and the ratio of overall density of black and white pixels across the image, it is determined whether the image is black on white or vice versa as shown in FIG. 4A- FIG.4B.
[0059] Image Normalization: After detecting the background of the image, we are in a position to pad the image with the right color for the purposes of normalization. A distance distribution of all the points in the image with respect to the center of mass of the binary image is determined and the point that is at the farthest distance is obtained. To simplify the operation, canny filter canny filter may be applied. The distance profile is determined since all the points that lie within a region are at a shorter distance than those on the boundary. The image is padded with background color with the net effect of center of image becoming the center of mass of the combined blobs of the image using the algorithm of FIG. 5 and resulting image is shown in FIG.6 / FIG.7.
[0060] For rotational invariance, principal axes of the image are calculated using Principal Component Analysis and the image is rotated along the major axis. To prevent orientation issues (principal axis ~ principal axis + 180), the image in oriented in the direction where it’s bottom half contains more points than that of the top half. This ensures that all rotational variants of an image are now oriented in one particular direction as shown in FIG. 7
[0061] Zernike moments for the normalized image are extracted as a 36 dimensional vector using the formula of Equation 2 and as shown in FIG.8.
11
𝑍𝑗(𝜌,𝜃) = Σ(−1)𝑘!𝑘!((𝑛+𝑚2)−𝑘)!((𝑛−𝑚2)−𝑘)!𝜌𝑛−2𝑘(𝑛−𝑚)/2𝑘=0•{cos(𝑚𝜃) 𝑚 ≠0 𝑗 𝑒𝑣𝑒𝑛sin(𝑚𝜃)𝑚 ≠0 𝑗 𝑜𝑑𝑑1 𝑚=0
...Equation 2
[0062] Post extraction of Zernike moments, histogram of oriented gradients (HOG) on the normalized image is obtained by using all / some of the steps: global image normalization; computing the gradient image in x and y; computing gradient histograms; normalising across blocks and flattening the result into a feature vector.
[0063] Contour arrangements: Structural similarity between two images is now described in further detail. Contours are extracted from the normalized image using topological structural analysis of digitized binary images by border following. A maximum of eight contours are chosen from these based on their relative sizes. A polygon is obtained by connecting the center of masses of these islands. This polygon is now converted into a feature vector by plotting the turning function as depicted in the pictures below.
[0064] In another embodiment, structural similarity may be obtained as follows. After applying canny transformation to the image matrix, all independent edges in a given image are determined. For each one of the contours in the image, the center of mass of each contour is independently determined and attributes such as area, eccentricity, aspect ratio etc. are determined. The image is reduced to a “weighted polygon” by determining a relative positions of the center of mass of individual contours with respect to the center of mass of the aggregate contours. These parameters so determined may be used to quantify the similarity in terms of relative arrangement of contours with respect to the center of the image.
....Equation 3
[0065] Contour Tree: Another metric that is used in conjunction with the weighted polygon is the structure of the contour tree. The contour tree is used to 12
quantify the internal arrangements of all the contours in the image with respect to one another. The topological information contained in the image may also be represented using a tree structure. Using this representation, structural similarity between images may be determined as shown in FIG. 9- 10.
[0066] Fourier Descriptors of Concave Hull: Outlines of the concave hull of the image matrix as well as the largest contour are computed and described using Fourier Descriptors. They are further flattened and merged to arrive at a composite shape descriptor.
....Equation 4
[0067] The operations may be seen in FIG. 11 – 13.
[0068]By combining all the aforementioned feature vectors, a multidimensional vector is obtained that can be used to quantify the shape of the image. Minkowski measure as a was chosen based on the frequency of good results.
[0069] Embodiments for Vienna codification: A pre-classified image is obtained according to Vienna codification. The embodiments for image / shape similarity are used to find out images similar to the pre-classified image. A user then automatically may tag the similar image according to the pre-classified image.
[0070] The designed system achieved precision and recall values greater than 60% when tested on the MPEG 7 Data set. Further manual inspection revealed that more than 90% of wrongly categorized classes are very closely related to the query class in shape. (Eg: Horses and Cows, Keys and Guitars). The query results may be seen in FIG. 14A-14D.
[0071] The above results show the promise of using this algorithm in conjunction with deep-learning to automatically tag query designs with the appropriate classes.
[0072]Of particular interest, are the Vienna Codification that are used to classify Trademarks / Designs. By virtue of the exhaustive categorization and the sheer 13
amount of trademark designs that are meticulously classified using them, an automatic Vienna Codification identifier for design / trademarks has been developed using the embodiments described herein and in conjunction with convolutional neural networks.
[0073]Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer-readable medium for execution by a computer or processor. Examples of computer-readable media include electronic signals (transmitted over wired or wireless connections) and computer readable storage media. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a User Equipment, UE, terminal, base station, RNC, or any host computer.
14
CLAIMS
1. A system for creating a database of recognized images, the system comprising:
an image analysis module that is configured to programmatically analyze an image to determine information about each image in the collection;
a manual interface that is configured to (i) interface with at least one human editor, and (ii) display structurally similar images to the image.
2. The system of claim 1 wherein the programmatical analysis includes:
removing artefacts from the image;
converting the binary;
converting the image to a black on white background; padding the image and mass centering the image; and generating plots.
3. The system of claim 1 wherein the programmatical analysis further includes extracting attributes for weighted polygons, structural descriptors, Zernike moments, histograms of oriented gradients, Concave Hull and Fourier Descriptors.
4. The system of claim 1 wherein the programmatical analysis further includes merging extracted attributes.
5. A system for image feature extraction and comparison, the system comprising:
an input module configured to receive an image; 15
a memory module configured to store image related data, connected to a digital signal processor, and
the digital signal processor configured to:
remove artefacts from the image received using any of Gaussian filter, blur, selective blur, edge detection, Discrete cosine transform, etc. techniques;
convert the resulting image after removal of artefacts, and determine a mass center binary of the image;
determine individual contours of the image and extract individual attributes techniques weighted polygons, structural descriptors, Zernike moments from the cleaned, converted image;
classify each individual contour according to a predefined system stored in the connected memory;
merge each individual contour data and store as a single vector file for each image.
6. The system of claim 5 wherein the predefined system to classify images is created using the International Classification of Figurative marks.
7. The system of claim 5 wherein any new image is converted to a single vector file and compared with a plurality of single vector files saved in the memory.