Mechanism For Better Similarity Evaluation Using Enhanced Ontology

< Back

Mechanism For Better Similarity Evaluation Using Enhanced Ontology Concepts

Abstract: The main object of the present invention is to evaluate the similarities as well as the dissimilarities between the entities for enhancing the quality of recommendations. In a preferred embodiment, the present invention provides a method for evaluating similarity in a recommender system comprising the step of: Extending the notion of ontology for understanding the similarity/dissimilarity between the users for improving the accuracy of recommender relationship and consequently improving the quality of recommendations.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

15 May 2009

Publication Number

47/2010

Publication Type

INA

Invention Field

ELECTRONICS

Status

Parent Application

Applicants

SAMSUNG ELECTRONICS COMPANY

416, MAETAN-DONG, YEONGTONG-GU, SUWON-SI, GYEONGGI-DO

Inventors

1. CHHAVI BHANDARI

SAMSUNG INDIA ELECTRONICS PRIVATE LIMITED. GROUND AND FIRST FLOOR, D-5, SECTOR 59, NODIA

2. REVOTI PRASAD BORA

SAMSUNG INDIA ELECTRONICS PRIVATE LIMITED. GROUND AND FIRST FLOOR, D-5, SECTOR 59, NODIA

3. ANISH MEHTA

SAMSUNG INDIA ELECTRONICS PRIVATE LIMITED. GROUND AND FIRST FLOOR, D-5, SECTOR 59, NODIA

Specification

FIELD OF THE INVENTION
The present invention generally relates to a method for evaluating similarity
in a recommender system. More particularly relates to enhancing the profile
comparison mechanism with the help of Ontology in order to improve the
similarity evaluation process and consequently recommendations.
BACKGROUND OF THE INVENTION
The Internet is growing at a tremendous pace and so is the number of
contents. Due to the overwhelming amount of contents available, usually it
is difficult for people to find contents of their choice. Although search
engines ease the job of searching a content by filtering pages that match
explicit queries but most of the times people are unable to provide the exact
query. Moreover they don't take into account the taste of the user for
searching contents. To solve these problems recommender systems came
into existence. A recommender system can be defined as a specific type of
information filtering (IF) technique that attempts to present contents that are
likely to be of interest to the user. Typically a recommender system
dynamically maps the behavior of a user into a profile, compares it with
some reference characteristics and tries to calculate the user's likings for that
particular content. The characteristic may be extracted from the information
of the item i.e. content based technique or other users i.e. collaborative
filtering (CF). In the present invention only the latter is considered but this
scheme is equally applicable for content based filtering . Breese et al [1]
identify two basic subclasses of algorithms for CF: model based and
memory based. Model based algorithms build a model of the data and use
the model for prediction, whereas memory based algorithms do not build a

model, but use the data directly for prediction. Memory based algorithms
rely on similar users' rating for a content in making predictions. Hence the
performance of CF is directly dependent on finding similar users. In order to
calculate similarity, it is a common practice to represent the user profiles in
the so-called bag-of-words (BOW) format. A BOW format is a set of
weighted terms that best describe the entity so that the similarity between
two entities can be computed using some standard techniques like Cosine,
Pearson Coefficient etc.
Ontology is a conceptualization of a domain into a human-understandable,
machine-readable format consisting of entities, attributes and their
relationships [2]. Ontology can provide a rich conceptualization of the
domain of any dataset (e.g. Movies tags) representing the main concepts and
relationships among these concepts. The term ontology is used here to refer
to the classification structure and instances within a knowledge base. The
latest recommender systems consider domain knowledge to improve the
accuracy and effectiveness of recommendations. An item can be expressed
simply by a feature vector where each dimension represents an item feature.
It is assumed that item features are dependent on user perspective and
therefore understanding item features and their relationships can help us gain
more information about the user's preferences. With ontology, item features
and relationships among them can be expressed in machine readable format
and therefore can be used to compute more accurate recommendations.
The similarity calculations, using the BOW format, so far takes into account
only the 'Sameness Quotient' between the entities in question i.e. the

technique tries to determine the keywords (or tags) that are common to the
BOWs of the two user profile vectors being compared. Users with more
matching keywords are considered more similar than users with lesser
matching keywords. Spreading Activation Methods (incorporating concepts
of ontology) have been used to improve similarity calculations but their
approach has been limited to finding more and more similar keywords, for
instance by way of identifying synonyms. However no consideration is
given, to any polarities that the two profiles might have which in turn has the
potential to make recommendations less accurate. Thus we intend to explore
this aspect in greater detail.
In this paper, we extend the notion of ontology to improve the
recommendation systems. Our approach uses ontology not just to find
similar tags (weight =1) but also unrelated tags (weight=0) and dissimilar or
opposite tags (weight = -1) in the two user profiles. This approach will help
the recommender system to better understand not only the similarity
between the users but also the dissimilarity between them, thus improving
the recommender list and consequently the recommendations.
The bag-of-words (BOW) model is a simplifying assumption used in natural
language processing and information retrieval. In this model, a text (such as
a sentence or a document) is represented as an unordered collection of
words, disregarding grammar and even word order. In the BOW format an
entity (E) is represented by a set of pairs, denoted as (ti, wi) where ti is a
term that describes the entity and ti € terms of (E), wi is the weight or
importance of the respective term in describing the entity. The similarity

computations are then done directly on the BOWs of different user profiles.
There exist several mathematical formulae to compute similarity between
two user profiles for example Pearson's correlation, cosine similarity,
Jaccard similarity coefficient, Dice's coefficient, Mountford's index of
similarity etc.
In recent researches the concept of ontology is being used to improve
similarity evaluations. Semantic similarity measures play an important role
in information retrieval and information integration. Both Rajesh
Thiagarajan et al [3] and Dipti Aswath et al [4] use a unique approach to
extend the notion of semantic similarity between two entities to consider
inherent relationships between concepts/keywords appearing in the user's
respective BOW representation, through a process called Spreading
Activation, the process of including additional related terms to an entity
description by referring to Ontology. In the spreading activation stage of
Dipti Aswath et al.[4], a network is activated as a set of nodes representing
product descriptions or phrases (i.e. indexing terms of the product
descriptions) with relationships between the nodes specified by labeled
links. The 2-level node activation process starts in one direction placing an
initial activation weight on the hot phrase node and then proceeds to spread
the activation through the network one link at-a-time, activating product
description and phrase nodes relevant to the hot phrase. In the other
direction, the network originating with the hot phrase node activates its
synonym nodes that in turn activates their relevant product and phrase
nodes.
Ming Mao et al.[5] presents an approach to computing semantic similarity

that relaxes the requirement of a single ontology and accounts for
differences in the levels of explicitness and formalization of the different
ontology specifications. A similarity function determines similar entity
classes by using a matching process over synonym sets, semantic
neighborhoods, and distinguishing features that are classified into parts,
functions, and attributes. In [1], an ontology concept is represented using a
profile with words describing the concept. A propagation technique to enrich
the profile with words from neighboring concepts is also presented. Such
profiles are subsequently used to determine closeness (using cosine metric)
between concepts they represent.
As described above, all the approaches till now concentrate on finding
similarity that may not be explicit, between the two entities. Thus ontology
is used to find two entities that are similar but are overlooked by traditional
calculations due to implicit matching failure. There is a used to find
dissimilarities also (besides similarities) between the entities in order to
improve similarity computations leading to better recommendations.
SUMMARY OF THE INVENTION
The main object of the present invention is to evaluate the similarities as
well as the dissimilarities between the entities for enhancing the quality of
recommendations.
In a preferred embodiment, the present invention provides a method for

evaluating similarity in a recommender system comprising the step of:
Extending the notion of ontology for understanding the
similarity/dissimilarity between the users for improving the accuracy of
recommender relationship and consequently improving the quality of
recommendations.
Brief Description of the Accompanying Drawings
The invention can now be described with the help of the accompanying
drawings where :
Fig. 1 shows User Profiles for Users A and B
Fig. 2 shows Venn Diagram for the two user profiles
Fig. 3 & Fig. 4 show the frequency and rating related to each keyword
Detailed Description
The user profile of a user can be assumed to be a set of vector components
(or keywords) as in 101 and 102. Similarity between two users can be
represented as a function of the two user profile vectors 201 and 202 as
follows :
Sim = f(Pl, P2), where P1 and P2 are the profiles of the two users. In the
Bag of words representation, the usual connotation of this function would
imply, f(Pl, P2)=P1DP2 where 0 represents the intersection between the
two sets of profiles. Classically, intersection would result in all those
keywords which are common to both the user profiles i.e. it is bases on
simple string match function. More the number of string matches occur,

more similar the two users are, Mathematically, 0 would represent only
exact string matches i.e. the region 207, in the classical approach.
Recent research has extended the concept of ontology to this field such that
first search keyword in a user profile is expanded using ontology and only
then the comparisons occur. This helps in identifying relations which may
not be explicit but are implicitly fabricated in real terms. In other words,
intersection in this scenario is equivalent to any synonymous match that
might exist between the two profile vectors.
Hence, fl represents string match (1)* + synonym match (1)*
Where * quantity in bracket refers to the weightage given to the relation.
In our novel approach, we further extend the notion of ontology to include
implicit dissimilarities between the two profiles. This approach works in two
directions simultaneously, i.e. on one hand finding similarities in order to
bring two users closer to each other and on the other hand finding
dissimilarities in order to pull the two users away from each other. Therefore
keeping in mind both likeness and unlikeness between the two users which
would result in better recommendations eventually.
Mathematically, fl = string match (1)* + synonym match (1)* + antonym
match (-1)*
To take a simplified example, consider three profiles A, B, C with their
keywords or BOW as follows :
A = {War, Comedy}

B = (Peace, Comedy}
C = {Romantic, Comedy}
Now using the present techniques available A, B and C each will have the
same 'Similarity' values w.r.t. each other because all of the users have one
overlapping keyword and one non-overlapping keyword.
i.e., Sim (B, C) = Sim (B, A)
But going by human perception, it is clear that B&C are more likely to share
better recommendations than B&A (since B&A have clashing interests in
some areas) i.e.
Sim (B, C) > Sim (B, A). This fundamental aspect has been taken into
account in our approach.
In figure. 2. P1 i.e.201 and P2 i.e. 202 denote the User Profile of User A and
User B respectively. I, 207, is the intersection i.e. common vector
components found by exact string match between 201 and 202
I = P1nP2
S1 i.e. 203 denotes the set of profile vector elements in P1 ,201 which have
an ontologically established synonymous relationship to the set of elements
in S2 i.e. 204. Earlier inventions use I, 207 and set (X19,y1) such that Xj 8 Si and
yi e S2 and x1 and y1 are synonyms.
In our novel invention we extend the approach to another new dimension i.e.
dissimilarity. Thus we also consider sets D1 i.e. 205 and D2 i.e. 206 such
that D1 denotes the set of profile vector elements in P1 which have an

ontologically established antonymous relationship to the set of elements in
D2. Thus we add another set (pi,qi) such that p1 E D1 and qi 8 D2 and that pi
and qi are antonyms.
Hence in our method the total similarity includes

Consider two users A and B with User Profile Vectors as shown in Fig. 1.
wherein each component or keyword is represented as Ki and K'i for user A
and B respectively. Each keyword has an associated frequency Fi , 302 or
F'i., 402. Rating associated (303 and 403) with each keyword is the average
rating for each occurrence of the keyword in the users profile. Thus
similarity calculation in terms of cosine similarity formula would look like
the equation as follows:

Where Ri , 303 is the average rating of the ith keyword in 'X' such that it
matches with the jth keyword in 'Y' with rating Rj ,403 and 'k' is the
number of keywords in X.

And 'a' is the coefficient of the differentiated weight scheme such that a = 1
if the match is string match or ontologically matched synonym and a = -1 if
the match is ontological antonym.
In the present invention ontology is used not just to find similar tags
associated (weight =1) but also unrelated tags associated (weight = 0) and
dissimilar or opposite tags associated (weight = -1) in the two user profiles.
This approach will help the recommender system to better understand not
only the similarity between the users but also the dissimilarity between
them, thus improving the accuracy of recommender relationship and
consequently the quality of recommendations.
Associated Weights can vary anywhere in the range of [-1,1] instead of
absolute values of-1, 0 and 1 considered in this patent.
In the present invention cosine formula is used to evaluate similarity. But
any other formula for e.g. Pearson's correlation, cosine similarity, Jaccard
similarity coefficient, Dice's coefficient, Mountford's index of similarity etc
can be used in a similar manner.
Notion of ontology can be extended to any system that requires similarity
between any two entities to be evaluated.

References
Research Papers
1. Breese, J., D. Heckerman, and C. Kadie, "Empirical
Analysis of Predictive Algorithms for Collaborative
Filtering". Uncertainty in Artificial Intelligence. 1998.
2. Guarino, N. Giaretta, P, "Ontologies and Knowledge
bases: towards a terminological clarification". In N.
MARS (Ed.) Towards Very Large Knowledge Bases:
Knowledge Building and Knowledge Sharing, IOS Press,
25-32, 1995.
3. Rajesh Thiagarajan, Geetha Manjunath, and Markus
Stumptner HP Laboratories, "Computing Semantic
Similarity Using Ontologies". Submitted to ISWC 08, the
International Semantic Web Conference (ISWC), 2008,
Karlsruhe, Germany.
4. Dipti Aswath, James D'cunha, Syed Toufeeq Ahmed,
Hasan Davulcu, "Boosting Item Keyword Search with
Spreading Activation". Dept. of Computer Science,
Arizona State University, Tempe, AZ. Proceedings of the
2005 IEEE/WIC/ACM International Conference on Web
Intelligence Pages: 704 - 707
5. Ming Mao, "Ontology mapping: An information retrieval
and interactive activation network based approach". In
Proc. ISWC, 2007.

Patent References
1. Ontology-Content-Based Filtering Method for Personalized
Newspapers, US20080294628A1, Deutsche Telekom AG
2. Recommendation Systems and Methods Using Interest Correlation,
US20080294624A1, Ontogenix, Inc.
3. Method for Content Recommendation, US20090013002A1, Sony
Deutschland GmbH

WE CLAIM:
1. A method for evaluating similarity in a recommender system comprising
the step of:
- extending the notion of ontology for understanding the
similarity/dissimilarity between the users for improving the
accuracy of recommender relationship and consequently improve
the quality of recommendations;
2. The method as claimed in claim-1, wherein the evaluation involves
finding out similar tags (associated weight =1), unrelated tags
(associated weight = 0) and dissimilar or opposite tags (associated
weight = -1) in two user profiles.
3. The method as claimed in claim-2, wherein said associated weight
can vary anywhere in the range of -1 and 1 instead of absolute values
of-l, 0 and 1.
4. The method as claimed in claim-1, wherein evaluation of similarity is
based on cosine formula.
5. The method as claimed in claim-1, wherein evaluation of similarity is
based on Peterson's correlation.
6. The method as claimed in claim-1, wherein evaluation of similarity is
based on Jaccard similarity coefficient.

7. The method as claimed in claim-1, wherein evaluation of similarity is
based on Dice's coefficient.
8. The method as claimed in claim-1, wherein evaluation of similarity is
based on Mountford's index of similarity.
9. The method as claimed in claim-1, wherein the notion of ontology
can be extended to any system requiring similarity to be evaluated
between any two entities.
10. A method of evaluating similarity in a recommender system
substantially as herein described and illustrated in the accompanying
drawings.

The main object of the present invention is to evaluate the similarities as well as the dissimilarities between the entities for enhancing the quality of recommendations. In a preferred embodiment, the present invention provides a method for evaluating similarity in a recommender system comprising the step of: Extending the notion of ontology for understanding the similarity/dissimilarity between the users for improving the accuracy of
recommender relationship and consequently improving the quality of recommendations.

Documents

Application Documents

#	Name	Date
1	abstract-744-kol-2009.jpg	2011-10-07
2	744-kol-2009-specification.pdf	2011-10-07
3	744-kol-2009-gpa.pdf	2011-10-07
4	744-kol-2009-form 3.pdf	2011-10-07
5	744-kol-2009-form 2.pdf	2011-10-07
6	744-KOL-2009-FORM 18.pdf	2011-10-07
7	744-kol-2009-form 1.pdf	2011-10-07
8	744-kol-2009-drawings.pdf	2011-10-07
9	744-kol-2009-description (complete).pdf	2011-10-07
10	744-kol-2009-correspondence.pdf	2011-10-07
11	744-kol-2009-claims.pdf	2011-10-07
12	744-kol-2009-abstract.pdf	2011-10-07
13	744-KOL-2009-(27-08-2013)-CORRESPONDENCE.pdf	2013-08-27
14	744-KOL-2009_EXAMREPORT.pdf	2016-06-30