Privacy Protection While Offering Data As A Service

< Back

Privacy Protection While Offering Data As A Service

Abstract: Method for privacy protection while offering data-as-a-service comprises receiving TG predicates defining a set of conditions for a network provider for selecting data to be provided to a service provider. Clustering a plurality of users registered with the network provider into a plurality of clusters based on user profiles corresponding to each of the plurality of users. Transforming each of the plurality of clusters to obtain a corresponding DP noise enabled cluster having a DP noise enabled low-dimensional cluster centroid. Determining a best matching cluster, from among a plurality of DP noise enabled clusters. The best matching cluster E corresponds to the DP noise enabled low-dimensional cluster centroid having a highest similarity with a low-dimensional representation of the TG predicates. Obtaining one or more DP noise enabled dominant features of the user profiles corresponding to one or more users, from among the plurality of users, associated with the best matching cluster. To be published with figure 2 Sir, yr lallan ko kaan nhi h shayad kabhi iski seat par to kabhi uski seat par betha rehta h.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

12 December 2013

Publication Number

19/2016

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

ALCATEL LUCENT

3, AVENUE OCTAVE GREARD 75007 PARIS, FRANCE

Inventors

1. NANDI, AMINESH

ALCATEL-LUCENT INDIA LIMITED, NAGAWARA VILLAGE, KASABA TALUK, OUTER RING ROAD, MANYATA EMBASSY BUSINESS PK, 560045, BANGALORE, INDIA

2. GUPTA, HIMANSHU

HOUSE #26, PHASE-2, HOUSING BOARD COLONY SAPROON, SOLAN HIMACHAL PRADESH 173211, INDIA

3. MAJUMDER, ANIRBAN

ALCATEL-LUCENT INDIA LIMITED, NAGAWARA VILLAGE, KASABA TALUK, OUTER RING ROAD, MANYATA EMBASSY BUSINESS PK, 560045, BANGALORE, INDIA

4. SRIVASTAVA, NISHEETH

A1301, NCC ASTER PART YELAHANKA, BANGALORE KARNATAKA 560065, INDIA

5. JAISWAL, SHARED

ALCATEL-LUCENT INDIA LIMITED, NAGAWARA VILLAGE, KASABA TALUK, OUTER RING ROAD, MANYATA EMBASSY BUSINESS PK, 560045, BANGALORE, INDIA

Specification

FIELD OF INVENTION
[000l] The present subject matter relates to data-as-a-service and, particularly but not
exclusively, to privacy protection while offering data-as-a-service.
[0002] Communication networks, such as cellular networks, are often used for making
voice calls and video calls, and for accessing data available on the internet and so on. The
number of users using internet has increased substantially over the years. Owing to the large
number of users, a huge volume of content related to the users is available with network
operators operating the communication networks. The network operators typically use the
content to provide personalized assistance and services to the end users. For instance, the
network operators may analyze the content to determine advertisements that may be provided to
the end users. Conventionally, various techniques, such as content based recommendation,
collaborative recommendation, and data analytics are used to provide personalized services to
the end users. In content based recommendation, the end users are recommended content,
services, or products which are similar to the content, services or products used or liked by the
end users in the past or which match the interest or choice of the end user. In collaborative
recommendation, the end user is recommended content, services, or products which are similar
to the content, services or products used or liked by other end users having similar or same
interest or choices.
SUMMARY
[0003] This summary is provided to introduce concepts related to privacy protection
while offering data-as-a-service. This summary is not intended to identifjl essential features of
the claimed subject matter nor is it intended for use in determining or limiting the scope of the
claimed subject matter.
[0004] In one implementation, a method for privacy protection while offering data-as-aservice
is described. The method includes receiving target group (TG) predicates from a service
provider, where the TG predicates define a set of conditions for a network provider for selecting
data to be provided to the service provider as a part of offering the data-as-a-service. The method
further includes clustering a plurality of users registered with the network provider into a
plurality of clusters based on user profiles corresponding to each of the plurality of users.
Further, the method includes transforming each of the plurality of clusters to obtain a
corresponding differential-privacy (DP) noise enabled cluster having a DP noise enabled lowdimensional
cluster centroid. The method further includes determining a best matching cluster,
from among a plurality of DP noise enabled clusters, wherein the best matching cluster
corresponds to the DP noise enabled low-dimensional cluster centroid having a highest similarity
with a low-dimensional representation of the TG predicates. Further, the method includes
obtaining one or more DP noise enabled dominant features of the user profiles corresponding to
one or more users, from among the plurality of users, associated with the best matching cluster
for being provided to the service provider as a part of the data-as-a-service.
[OOOS] In another implementation, a privacy protection system is described. The privacy
protection system includes a processor and an interaction module coupled to the processor. The
an interaction module receives target group (TG) predicates from a service provider, wherein the
TG predicates define a set of conditions for a network provider for selecting data to be provided
to the service provider as a part of offering the data-as-a-service. The privacy protection system
further includes a clustering module coupled to the processor to cluster a plurality of users
registered with the network provider into a plurality of clusters based on user profiles
corresponding to each of the plurality of users. Further, the privacy protection system includes a
cluster transformation module coupled to the processor to transform each of the plurality of
clusters to obtain a corresponding differential-privacy (DP) noise enabled cluster having a DP
noise enabled low-dimensional cluster centroid. The cluster transformation module further
determines a best matching cluster, from among a plurality of DP noise enabled clusters, wherein
the best matching cluster corresponds to the DP noise enabled low-dimensional cluster centroid
having a highest similarity with a low-dimensional representation of the TG predicates. Further,
the cluster transformation module obtains one or more DP noise enabled dominant features of the
user profiles corresponding to one or more users, from among the plurality of users, associated
with the best matching cluster for being provided to the service provider as a part of the data-asa-
service.
[0006] In another implementation, a non-transitory computer-readable medium having
embodied thereon a computer program for executing a privacy protection while offering data-asa-
service is described. The method comprises receiving target group (TG) predicates from a
service provider, where the TG predicates define a set of conditions for a network provider for
selecting data to be provided to the service provider as a part of offering the data-as-a-service.
The method further includes clustering a plurality of users registered with the network provider
into a plurality of clusters based on user profiles corresponding to each of the plurality of users.
Further, the method includes transforming each of the plurality of clusters to obtain a
corresponding differential-privacy (DP) noise enabled cluster having a DP noise enabled lowdimensional
cluster centroid. The method further includes determining a best matching cluster,
from among a plurality of DP noise enabled clusters, wherein the best matching cluster
corresponds to the DP noise enabled low-dimensional cluster centroid having a highest similarity
with a low-dimensional representation of the TG predicates. Further, the method includes
obtaining one or more DP noise enabled dominant features of the user profiles corresponding to
one or more users, from among the plurality of users, associated with the best matching cluster
for being provided to the service provider as a part of the data-as-a-service.
[0007] The detailed description is described with reference to the accompanying figures.
In the figures, the left-most digit(s) of a reference number identifies the figure in which the
reference number first appears. The same numbers are used throughout the figures to reference
like features and components. Some embodiments of system andlor methods in accordance with
embodiments of the present subject matter are now described, by way of example only, and with
reference to the accompanying figures, in which:
[OOOS] Figure 1 illustrates an exemplary network environment implementation of privacy
protection while offering data-as-a-service, according to an embodiment of the present subject
matter;
[0009] Figure 2 illustrates a method of privacy protection while offering data-as-aservice,
in accordance with an embodiment of the present subject matter; and
[00 lo] Figure 3 illustrates a graph depicting a comparison of data utility and data privacy
of user profiles provided to third party service providers using the present subject matter and
conventional techniques.
[OOll] In the present document, the word "exemplary" is used herein to mean "serving as
an example, instance, or illustration." Any embodiment or implementation of the present subject
matter described herein as "exemplary" is not necessarily to be construed as prekrred or
advantageous over other embodiments.
[OO 121 It should be appreciated by those skilled in the art that any block diagrams herein
represent conceptual views of illustrative systems embodying the principles of the present
subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, state
transition diagrams, pseudo code, and the like represent various processes which may be
substantially represented in computer readable medium and so executed by a computer or
processor, whether or not such computer or processor is explicitly shown.
[OO 131 Network operators today have a rich source of subscriber data having location
tracks gleaned from base-station associations, web URLs accessed by a user, call-data-records
(CDRs), billing and demographic information, etc. Conventionally, the network operators
leverage the rich source of subscriber data to drive analytics aimed at more effective
personalized service delivery. For instance, the network providers typically use the subscriber
data to attempt to provide personalized services, like providing content, such as videos, audio,
news, advertisements, etc. Such content is typically driven based on personal details,
preferences, and choices of the end users. However, owing to possible privacy related issues,
such as misuse of user's sensitive data, the subscriber data is usually not shared with third party
service providers and is used only by the service provider for providing the personalized
services. Typically, the network providers either have an in-house analytics team, or they partner
with trusted analytics providers.
100 141 Recently, different approaches for privacy protection while providing the
subscriber data to the third party service providers, such as advertisers and analysts as a part of
offering data-as-a-service have been proposed. The third party service providers may use the user
profiles for various purposes, such as offering personalized services to the users, providing
advertisements directed to the users based on their personal interests, and offering new services.
One of such existing approaches involves replacing user names with an arbitrary identifier before
providing the subscriber data to the third party service providers. Replacing the user name with
the arbitrary identifier may, however, not be sufficient as a malicious third party service provider
may still be able to identify the user. For instance, in case the third party service provider possess
some amount of information about the user, obtained from other sources, such as social
networking websites, the third party service provider may identify the user using the information.
[OOlS] For instance, in case the third party service provider wants to obtain information
about a specific user, the third party service provider may obtain all readily available information
about the user from the other sources. Based on the same, the third party service provider may
then create target group predicates, i.e., a set of conditions based on which the network provider
typically selects the subscriber data to be provided to the third party service provider such that
the subscriber data includes the user data corresponding to the specific user. The third party
service provider may then analyze the subscriber data to identi@ the specific user based on the
readily available information and thus obtain sensitive information about the specific user. For
examples, few instances of such privacy related attacks on user profiles of users of a social
networking websites have been reported recently.
[OO 161 Another approach for protecting private information of the users while offering
data-as-a-service to the third party service providers involves sharing the subscriber data after
adding certain noise, i.e., additional arbitrary data in each individual user's user data. In said
approach, each individual user's user data is first transformed to a low-dimensional space using a
technique of random projections, such as JL-Lemma schemes. Subsequently, differential-private
noise is added to the low-dimensional user data to ensure privacy via the theoretical definition of
differential-privacy. The target group predicates provided by the third party service provider are
then transformed into low-dimensional representation using the same technique of random
projections as used for transforming the user data. As will be understood, transforming the target
group predicates into low-dimensional representation involves transforming the target group
predicates into low bit vectors.
[0017] The low-dimensional user data is then compared with the low-dimensional
representation of the group predicates to identify the best matching users, i.e., the users that have
6
maximum features matching the target group predicates. The subscriber data having user data
corresponding to the best matching users may then be provided to the third party service
providers. Although this approach provides privacy protection to the user data, adding the noise,
however, significantly compromises data utility. Addition of the noise to the subscriber data
affects the quality of the subscriber data as the analysis performed on such subscriber data may
not be fully accurate. This is because the amount of the noise required to be added to the user
data is significantly large, thus reducing weightage of the actual features of the user data.
[00 181 According to an implementation of the present subject matter, system(s) and
method(s) for privacy protection while offering data-as-a-service, are described. The system(s)
and method(s) facilitate a network provider to provide user profiles of users, registered with the
network provider as part of offering data-as-a-service to a third party service provider without
affecting privacy of the users. As previously described, examples of the third party service
providers, hereinafter referred to as service providers, include, but are not limited to, advertisers,
marketing agencies, and analysts. The service providers may provide target group (TG)
predicates, i.e., a set of conditions or features based on which the network provider typically
selects the users whose user profiles may be provided to the service providers for their analysis.
[OO 191 In one implementation, the users are clustered into a plurality of clusters based on
their user profiles. Subsequently, differential-privacy (DP) noise may be added to the user
profiles and DP noise enabled dominant features, i.e., top k, say, 10 or 15 features of the users
associated with a DP noise enabled cluster that matches with the TG predicates may be provided
to the service providers. The system thus facilitates in ensuring privacy protection of the users
registered with the network providers. As will be understood, adding DP noise in a cluster
includes modifying the user profiles by modifLing weightage of features associated with the user
profile.
[0020] In said embodiment, upon receiving the TG predicates from the service providers,
the network providers may initially transform the user profile corresponding to each of a
plurality of users registered with the network provider. The user profiles may be transformed
using random projections, such as JL-Lemma and local sensitivity hashing to obtain a lowdimensional
signature corresponding to each of the user profile. The users may then be clustered
into one or more clusters based on the low-dimensional signatures corresponding to each user
such that the number of users assigned to each cluster is greater than a cluster size threshold. The
7
cluster size threshold may be understood as a predetermined minimum number of users that need
to be subscribed to a cluster at any given point of time.
[002 11 In one implementation, the steps of transforming the user profiles to obtain the
low-dimensional signatures and clustering the users based on the low-dimensional signatures
may be iterated for a predetermined number of times, corresponding to 'L' multiple independent
iterations, to obtain a predetermined number of sets. Each set of cluster contains one or more
clusters of the users such that each user is associated with a cluster in each of the predetermined
number of sets. Each user is thus associated with a predetermined number of clusters.
[0022] Further, for each of the predetermined number of sets of cluster, a centroid of
each of the one or more clusters is computed. A cosine similarity of each user with the centroid
of each of the predetermined number of clusters to which the user is associated is determined to
determine a closest cluster for the user. Each user is then assigned to the corresponding closest
cluster to obtain the plurality of clusters such that each user is associated with just one cluster. In
one embodiment, each of the plurality of clusters may be hrther analyzed to determine if the
number of users subscribed to the cluster is less than the cluster size threshold. All the clusters
having the users less than the cluster size threshold may then be classified as small size clusters
and two or more small size clusters may be merged together to obtain clusters having number of
users greater than the cluster size threshold. In one implementation, the two or more small size
clusters that are to be merged may be determined based on cosine-similarity of cluster centroids
between the two or more small size clusters such that the small size clusters that are closest to
each other are merged together. Upon merging, the cluster size of each of the plurality of clusters
may thus be greater than the cluster size threshold.
[0023] Once the clusters are obtained, the user profile information included within the
clusters is processed and modified. In one implementation, the plurality of clusters may be
processed to obtain a corresponding differential-privacy (DP) noise enabled cluster having a DP
noise enabled low-dimensional cluster centroid. As will be understood, a DP noise enabled
cluster is a cluster in which the user profiles have been modified by modifying weightages of
features associated with the user profile of the users assigned to the DP noise enabled cluster.
100241 The DP noise enabled low-dimensional cluster centroid of each cluster is then
transformed into low-dimensional space and compared with a low-dimensional representation of
8
the TG predicate to determine similarity between the DP noise enabled low-dimensional cluster
centroid and the low-dimensional representation of the TG predicates. The DP noise enabled
low-dimensional cluster centroid having a highest similarity with the low-dimensional
representation of the TG predicates may then be identified based on the comparing and the
cluster corresponding to the DP noise enabled low-dimensional cluster centroid may be
determined as a best matching cluster. Further, one or more DP noise enabled dominant features
of the user profiles corresponding to the users associated with the best matching cluster may then
be obtained. For example, top k, say, 10 or 15 features of the users may be identified and
provided to the service provider for their analysis and further uses.
[0025] The system(s) and method(s) of the present subject matter thus facilitate the
network providers to offer data-as-a-service in a privacy protected environment. Adding the DP
noise to the cluster of the users facilitates in ensuring that the user profiles provided to the
service providers are not exact copy of the user's actual records, thereby significantly reducing
the chances of the service providers identiQing the user's identity. Further, clustering the users
into the plurality of clusters before adding of the DP noise facilitates in increasing the utility of
the user profiles as amount of DP noise added to a cluster is less than amount of noise added to
individual user profiles. This is because the amount of DP-noise to be added to each cluster and
its centroid depends on how much a single user can affect the cluster and the centroid. Further,
forming clusters of a predetermined minimum size, i.e., having a predetermined number of end
users greater than or equal to the cluster size threshold helps in ensuring privacy protected
clustering as the DP noise to be added to the cluster is significantly low for the clusters having
the predetermined minimum size.
100261 It should be noted that the description and figures merely illustrate the principles
of the present subject matter. It will thus be appreciated that those skilled in the art will be able to
devise various arrangements that, although not explicitly described or shown herein, embody the
principles of the present subject matter and are included within its spirit and scope. Furthermore,
all examples recited herein are principally intended expressly to be for pedagogical purposes to
aid the reader in understanding the principles of the present subject matter and the concepts
contributed by the inventor(s) to furthering the art, and are to be construed as being without
limitation to such specifically recited examples and conditions. Moreover, all statements herein
reciting principles, aspects, and embodiments of the present subject matter, as well as specific
examples thereof, are intended to encompass equivalents thereof.
100271 It will also be appreciated by those skilled in the art that the words during, while,
and when as used herein are not exact terms that mean an action takes place instantly upon an
5 initiating action but that there may be some small but reasonable delay, such as a propagation
delay, between the initial action and the reaction that is initiated by the initial action.
Additionally, the words "connected" and "coupled" are used throughout for clarity of the
description and can include either a direct connection or an indirect connection.
[0028] The manner in which the systems and the methods of the present subject matter
10 may be implemented has been explained in details with respect to the Figures 1 and 2. While
0 aspects of described system(s) and method(s) of the present subject matter can be implemented
in any number of different computing systems, environments, and/or configurations, the
embodiments are described in the context of the following system(s).
[0029] Figure 1 illustrates a network environment 100 implementation of privacy
15 protection while offering data-as-a-service, in accordance with an embodiment of the present
subject matter. The network environment 100 includes a privacy protection system 102,
hereinafter referred to as system 102 communicating with a third party server 104 through a
network 106.
[0030] The network 106 may be a wireless network, a wired network, or a combination
20 thereof. The network 106 can also be an individual network or a collection of many such
individual networks, interconnected with each other and functioning as a single large network,
0 e.g., the Internet or an intranet. The network 106 can be implemented as one of the different
types of networks, such as intranet, local area network (LAN), wide area network (WAN), the
internet, and such. The network 106 may either be a dedicated network or a shared network,
25 which represents an association of the different types of networks that use a variety of protocols.
Further, the network 106 may include network devices, such as network switches, hubs, routers,
HBAs, for providing a communication link between the system 102 and the third party server
104.
[003 11 The system 102 may be implemented as any of a variety of computing devices,
including, for example, servers, a workstation, and a mainframe computer. The third party server
104 may be implemented as any of a variety of computing devices, including, for example,
servers, a desktop PC, a notebook or portable computer, a workstation, a mainframe computer,
an entertainment device, cellular phones, smart phones, personal digital assistants (PDAs),
portable computers, desktop computers, tablet computers, phablets, and an internet appliance.
Although the system 102 is shown as an entity, the system 102 may also be implemented as a
distributed computing system including multiple intermediary nodes distributed over a network
where each node can be implemented as a computing device, such as a laptop computer, a
desktop computer, a notebook, a workstation, a mainframe computer, a server, and the like.
Further, the intermediary nodes may be connected through an intermediate network (not shown
in the figure) for the purpose of communications and exchange of data. Furthermore, one or
more replicas (not shown in the figure) of the system 102 may be implemented in the network
environment 100 with each of the replicas performing functions similar to the system 102.
[0032] In one implementation, the system 102 includes one or more processor(s) 108,I/O
interface(s) 1 10, and a memory 1 12 coupled to the processor 108. The processor(s) 108 may be
implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal
processors, central processing units, logic circuitries, and/or any devices that manipulate signals
based on operational instructions. Among other capabilities, the processor(s) 108 is configured to
fetch and execute computer-readable instructions stored in the memory 112.
100331 The hnctions of the various elements shown in the figure, including any
functional blocks labeled as "processor(s)", may be provided through the use of dedicated
hardware as well as hardware capable of executing software in association with appropriate
software. When provided by a processor, the functions may be provided by a single dedicated
processor, by a single shared processor, or by a plurality of individual processors, some of which
may be shared. Moreover, explicit use of the term "processor" should not be construed to refer
exclusively to hardware capable of executing software, and may implicitly include, without
limitation, digital signal processor (DSP) hardware, network processor, application specific
integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for
storing software, random access memory (RAM), non-volatile storage. Other hardware,
conventional andlor custom, may also be included.
1 1
[0034] The 110 interface(s) 110 may include a variety of software and hardware
interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, and an
external memory. Further, the 110 interfaces 1 10 may facilitate multiple communications within
a wide variety of protocol types including, operating system to application communication, inter
process communication, etc.
100351 The memory 1 12 can include any computer-readable medium known in the art
including, for example, volatile memory, such as static random access memory (SRAM) and
dynamic random access memory (DRAM), and/or non-volatile memory, such as read only
memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and
magnetic tapes.
100361 Further, the system 102 may include module(s) 114 and data 116. The modules
114 and the data 116 may be coupled to the processor(s) 108. The modules 114, amongst other
things, include routines, programs, objects, components, data structures, etc., which perform
particular tasks or implement particular abstract data types. The modules 114 may also be
implemented as, signal processor(s), state machine(s), logic circuitries, and/or any other device
or component that manipulate signals based on operational instructions. In another aspect of the
present subject matter, the modules 114 may be computer-readable instructions which, when
executed by a processor/processing unit, perform any of the described functionalities. The
machine-readable instructions may be stored on an electronic memory device, hard disk, optical
disk or other machine-readable storage medium or non-transitory medium. In one
implementation, the computer-readable instructions can be also be downloaded to a storage
medium via a network connection.
100371 In an implementation, the module(s) 114 includes an interaction module 118, a
clustering module 120, a cluster transformation module 122, a computation module 124, and
other module(s) 126. The other module(s) 126 may include programs or coded instructions that
supplement applications or functions performed by the system 102. The data 1 16 includes user
data 128, clustering data 130, cluster transformation data 132, computation data 134, and other
data 136. The other data 136 amongst other things, may serve as a repository for storing data that
is processed, received, or generated as a result of the execution of one or more modules in the
module(s) 1 14. Although the data 1 16 is shown internal to the system 102, it may be understood
that the data 116 can reside in an external repository (not shown in the figure), which may be
coupled to the system 102. The system 102 may communicate with the external repository
through the 110 interface(s) 1 10 to obtain information from the data 1 16.
[0038] The system 102 may be implemented by a network provider, say, a
5 communication network provider for providing subscriber data. The subscriber data may include,
but is not limited to, user profiles of users registered with the network provider to the third party
server 104. The third party server 104 may be hosted by a third party service provider, such as
advertisers and analysts. The third party service provider may use the user profiles for various
purposes, such as offering personalized services to the users, providing advertisements directed
10 to the users based on their personal interests, and offering new services. In order to obtain user
profiles of users having certain particular features relevant for their analysis, the service 6 providers may provide target group (TG) predicates, i.e., a set of conditions or features based on
which the network provider may selects the users whose user profiles may be provided to the
service providers for their analysis. The network provider may thus host the system 102 to offer
15 data-as-a-service to the third party service provider, hereinafter referred to as service provider.
100391 For instance, a shop owner of a sport goods shop located between an office
complex and a residential complex may wish to get insights about type of sports inventory that
the shop owner should stock in its shop so as to attract maximum people commuting via the
route between the residential complex and the office complex. The shop owner may further wish
20 to place advertisements for such products in relevant websites which its subscribers have a high
chance of visiting. Further amongst the users that commute via this route, the shop owner may
wish to target the users who have interest in sports. From such a target group, the shop owner
0 may want to know which kind of sports websites or sport products these users are interested in.
The shop owner may thus create, through the third party server 104, TG predicates having
25 demographics, such as age, gender, and income of the users; location, such as the residential
complex, the office complex and the route for communicating; and semantics, such as sports.
The third party server 104 may further transmit the TG predicates to the system 102.
[0040] According to an embodiment of the present subject matter, the system 102 may
facilitate the network provider to provide the user profiles to the service provider, through the
30 third party server 104 by ensuring that the privacy of the users associated with the user profiles is
retained. Initially, the interaction module 1 18 may receive the TG predicates from the third party
13
server 104 and save the same in the user data 128. The interaction module 11 8 may further
obtain the subscriber data having the user profiles of a plurality of users registered with the
network provider. In one implementation, the interaction module 11 8 may access a subscriber
database 138 coupled to the system 102 to obtain the user profiles. The subscriber database 138
5 may be understood as a database maintained by the network provider for storing user profiles of
the users. The user profiles for each user may be generated and regularly updated by the network
provider based on location tracks gleaned from base-station associations, web URLs accessed by
a user, call-data-records(CDRs), billing and demographic information, etc.
[0041] For instance, in the above described example of the sports good shop, the
10 interaction module 118 may obtain user profiles of the users that commute along the route
between the office complex and the residential complex, are interested in sports related activities
and have demographic information as mentioned in the TG predicates.
[0042] Upon obtaining the user profiles of all the users, the clustering module 120 may
process the user profiles for clustering the users into a plurality of clusters. Initially, for
15 clustering the users, the clustering module 120 may transform the user profile corresponding to
each of the plurality of users to obtain a low-dimensional signature of dimensionality 'K' bits,
corresponding to each user. In one implementation, K may be chosen using equation (I) as given
below:
K= O( (log N) / $1 . . . . . . . . . . . . . . . . . .. (1)
20 where N is the number of user profiles to be transformed, and where y is a utility parameter.
[0043] The user profiles may be transformed using random projection schemes, such as
JL-Lemma and local sensitivity hashing (LSH). For instance, in order to use LSH for
transforming the user profiles, the clustering module 120 may each user profile, represented as
UI, and project the user profiles along 'K' independently chosen random vectors, say, rl, r2, .. .
25 rK to obtain a projected user profile Ul' of the dimensionality 'K' bits. The random vectors, say,
rl, r2, ..., rK may be selected from a Gaussian N(0,1), i.e., zero mean and 1 as standard
deviation. The projected user profile U1' is a k-bit bit-vector, where the value of it" bit depends
I
, on whether the vector-dot-product (Ul .ri) is positive or negative. In case the vector-dot-product
is positive, the ith bit will be 1, in case the vector-dot-product is negative, the ith bit will be 0. The
30 k-bit bit-vector thus obtained for each profile may be determined as the low-dimensional
signature corresponding to the user. In one implementation, to ensure that the random projection
via JL-Lemma or LSH technique maintains distances between any pair of points within a
distortion bound of (1+ y).
[0044] Further, the clustering module 120 may initiate an initial clustering process to
cluster the users into one or more clusters based on the low-dimensional signature corresponding
to each user. In one implementation, the clustering module 120 may cluster the users using a
variant of random projection technique called LSHForest. In said technique, the clustering
module 120 forms a prefix-tree of the low-dimensional signature obtained for each user such that
all users are represented as leafs in the LSH-Prefix-Tree. Further, the clusters may be formed
such that the number of users assigned to each cluster is greater than a cluster size threshold, say,
a predetermined value 'M'. The cluster size threshold may be understood as a predetermined
minimum number of users that have to be subscribed to a cluster at any given point of time.
[0045] The clustering module 120 may further repeat the initial clustering process, i.e.,
iterate the steps of transforming the user profiles to obtain the low-dimensional signatures and
clustering the users based on the low-dimensional signatures for a predetermined number of
times, say, L to obtain a predetermined number of sets of clusters. For instance, in case of the
random projection being LSH, 'L' may be a standard LSH parameter. The predetermined
number of sets is formed such that each set contains one or more clusters of the users such that
each user is associated with one cluster in each of the predetermined number of sets. Thus, if L
number of sets are formed, then each user will be associated with the predetermined number L of
clusters. Creating the predetermined number of sets of clusters facilitates in increasing the utility
of the user profiles as such iterations help in avoiding possible errors that may be caused due to
the random nature of the LSHForest technique.
[0046] Once the predetermined number of sets of clusters are formed, the clustering
module 120 performs refinement of the clusters to obtain the plurality of clusters such that each
user is associated with just one cluster from among the plurality of clusters. The clustering
module 120 initially computes a centroid of each of the one or more clusters, for each of the
predetermined number of sets of cluster. Further the clustering module 120 ascertains a cosine
similarity of each user, i.e., the user profile corresponding to the user with the centroid of each of
the predetermined number of clusters to which the user is associated. Based on the cosine
similarity of the user with each centroid, the clustering module 120 determines a closest cluster
for each user and assigns the user to the corresponding closest cluster to obtain the plurality of
15
clusters such that each user is associated with just one cluster. In one embodiment, the clustering
module 120 may further analyze each of the plurality of clusters to determine small size clusters,
i.e., clusters having number of users less than the cluster size threshold. The clustering module
120 may then determine cosine similarity between the centroids of each of the small size clusters
to determine two or more small size clusters that may be merged together to obtain clusters
having number of users greater than the cluster size threshold. Thus, the small size clusters that
are closest to each other may be merged together by the clustering module 120 such that the
cluster size of each of the plurality of clusters may be greater than the cluster size threshold.
Further, the clustering module may save the plurality of clusters thus generated in the clustering
data 130.
100471 The plurality of clusters thus generated may be further analyzed and processed by
the cluster transformation module 122. The cluster transformation module 122 may transform the
cluster centroids of the plurality of clusters to a low-dimensional space using the random
projections, such as the JL-Lemma and LSH technique. Subsequently, the cluster transformation
module 122 may perform Laplacian transformation on each of the plurality of clusters to obtain a
corresponding differential-privacy (DP) noise enabled cluster having a DP noise enabled lowdimensional
cluster centroid.
[0048] A DP noise enabled cluster may be defined as a cluster in which the user profiles
have been modified by modifying weightages of features associated with the user profile. In one
implementation, the weightage of features associated with the user profile are modified by
adding random numerical values selected from a Laplacian distribution set to obtain the modified
weightages. Adding such DP noise facilitates in ensuring that the service provider is not able to
identify the users whose user profiles are shared with the service provider as the user profiles are
no more exact copies of an actual user profile of the user. Further, by clustering the users such
that each user is associated with just one cluster helps in decreasing the DP noise added to the
clusters as such a clustering ensures that each user can affect just one cluster and its associated
centroid. As will be understood, the amount of DP noise that needs to be added to compute the
DP noise enabled centroids is a function of how much each user can affect the centroid of the
cluster to which it belongs. Given that minimum cluster size of each cluster is equal to 'M', the
amount of DP noise added to ensure (E, 8) may be determined to be equal to the value computed
by the equation 2 given below:
16
DP noise = Laplacian 8 x L x D x In /(M x ((2x E )1 ........( 2) 7)
Where, 'D' is an intrinsic dimensionality, i.e., number of distinct features across all users in a
cluster, L is the number of iterations of the clustering process performed to obtain clusters, M is
the threshold cluster size, and where probability (f(D1) = v) = (ec * Probability(f(D2) = v)) + 6,
5 where Dl & D2 are two datasets (with and without a single user record) that are
indistinguishable by looking at the results v of the finction f(). The amount of DP noise thus
added is substantially lower than other clustering methods, such as KMeans clustering as the
above described process of clustering ensures that each user affects at most one cluster in each of
the predetermined number of iterations.
10 [0049] Further, transforming the cluster centroids before performing the Laplacian ' transformation facilitates in reducing the amount of DP noise that needs to be added to the
clusters as the DP noise that is to be added becomes independent of an intrinsic dimensionality
'D'. The resulting DP noise that needs to be added is thus equal to value computed by the
equation 3 given below:
DP noise = Laplacian (3)
[OOSO] The DP noise enabled clusters and the DP noise enabled low-dimensional cluster
centroids thus obtained may be saved in the cluster transformation data 132. The computation
module 124 may then compare the DP noise enabled low-dimensional cluster centroid of each
cluster with a low-dimensional representation of the TG predicate. In one implementation, the
20 computation module 124 may obtain the low-dimensional representation by transforming the TG
predicates using the random projections, such as the JL-Lemma and LSH technique used to
transform the cluster centroids of the plurality of clusters to the low-dimensional space. The
computation module 124 may subsequently determine a cosine similarity between the DP noise
enabled low-dimensional cluster centroid and the low-dimensional representation of the TG
25 predicates based on the comparing to identify a best matching cluster. In one implementation, the
cluster corresponding to the DP noise enabled low-dimensional cluster centroid that has a highest
cosine similarity with the low-dimensional representation of the TG predicates may be
determined as the best matching cluster by the computation module 124.
[0051] For instance, in the above described example of the sports good shop, the
clustering module 120 may cluster the user profiles into the plurality of clusters and the
computation module 124 may determine the cluster that best matches the TG predicates provided
by the shop owner.
5 [0052] The computation module 124 further determines one or more DP noise enabled
dominant features of the user profiles corresponding to the users associated with the best
matching cluster by analyzing the DP noise enabled user cluster. The computation module 124
thus computes a target group of the users, i.e., the users associated with the best matching cluster
and the DP noise enabled dominant features of the target group. For example, top k, say, 10 or 15
10 features of the users may be identified as the DP noise enabled dominant features for being
provided to the third party server. In one implementation, the computation module 124 may
determine the DP noise enabled dominant features using one or more known top-K algorithms
using exponential mechanism of differential-privacy. The computation module 124 may hrther
generate one or more histograms of the DP noise enabled dominant features for being provided
15 to the service provider. Further, the computation module 124 may save the DP noise enabled
dominant features and/or the histograms of the DP noise enabled dominant features in the
computation data 134. The interaction module 118 may subsequently transmit the DP noise
enabled dominant features and/or the histograms of the DP noise enabled dominant features to
the third party server 106.
20 [0053] For instance, in the above described example of the sports good shop, the
computation module 124 may determine the DP noise enabled dominant features of the best
matching cluster and the interaction module 118 may subsequently transmit the DP noise
0 enabled dominant features to the shop owner via the third party server 104. The computation
module 124 may further generate a histogram based on the DP noise enabled dominant features.
25 In said example, the DP noise enabled dominant features and the histogram allows the shop
owner to determine which kind of sports related websites these desired TG members or user
visit. The histogram may further include popular clickstreams data that facilitate the shop owner
to determine the products whose advertisements were clicked on by the users. All this
information can be used by the shop owner to stock appropriate inventory, and also send targeted
30 advertisements about hi shop on these popular websites visited by these subscribers. The system
102 may hrther facilitate the shop owner may to render advertisements directly to these TG
18
members using chargeable application programming interfaces (APIs) for pushing targeted
advertisements to the users while ensuring that privacy protection of its users by not revealing
the actual identity and user profiles of the users who were members of the best matching cluster.
[0054] Figure 2 illustrates a method 200 of privacy protection while offering data-as-aservice,
in accordance with an embodiment of the present subject matter. The order in which the
method is described is not intended to be construed as a limitation, and any number of the
described method blocks can be combined in any order to implement the method 200 or any
alternative methods. Additionally, individual blocks may be deleted from the methods without
departing from the spirit and scope of the subject matter described herein. Furthermore, the
method(s) can be implemented in any suitable hardware, software, firmware, or combination
thereof.
[0055] The method(s) may be described in the general context of computer executable
instructions. Generally, computer executable instructions can include routines, programs, objects,
components, data structures, procedures, modules, functions, etc., that perform particular
hnctions or implement particular abstract data types. The method may also be practiced in a
distributed computing environment where functions are performed by remote processing devices
that are linked through a communications network. In a distributed computing environment,
computer executable instructions may be located in both local and remote computer storage
media, including memory storage devices.
100561 A person skilled in the art will readily recognize that steps of the method(s) can
be performed by programmed computers. Herein, some embodiments are also intended to cover
program storage devices, for example, digital data storage media, which are machine or
computer readable and encode machine-executable or computer-executable programs of
instructions, where said instructions perform some or all of the steps of the described method.
The program storage devices may be, for example, digital memories, magnetic storage media,
such as a magnetic disks and magnetic tapes, hard drives, or optically readable digital data
storage media. The embodiments are also intended to cover both communication network and
communication devices configured to perform said steps of the exemplary method(s).
[0057] At block 202, target group (TG) predicates are received by a network provider
from a service provider. In one implementation, a privacy protection system, for example, the
19
system 102 may receive the TG predicates from a third party server, say, the third party server
104 hosted by the service provider. The TG predicates may be defined as a set of conditions
based on which the service provider may select user profiles to be provided to the service
provider.
5 [OOSB] At block 204, a plurality of users registered with the network provider is clustered
into a plurality of clusters using a method of random projection. In one implementation, the users
are clustered based on user profiles corresponding to each of the plurality of users. Initially, the
user profile corresponding to each user is transformed using random projections, such as JLLemma
and local sensitivity hashing to obtain a low-dimensional signature corresponding to
10 each user profile. The users may then be clustered into one or more clusters based on the lowdimensional
signatures such that the number of users assigned to each cluster is greater than a
cluster size threshold, i.e., a predetermined minimum number of users that have to be subscribed
to a cluster at any given point of time. The steps of transforming the user profiles and clustering
the users based on the low-dimensional signatures may be subsequently iterated to obtain a
15 predetermined number of sets such that each set of cluster contains one or more clusters of the
users such that each user is associated with a cluster in each of the predetermined number of sets.
Each user is thus associated with a predetermined number of clusters.
[0059] Further, for each user, a closest cluster is identified from among the
predetermined number of clusters to which the user is associated based on a cosine similarity
20 between the user profile of the user and centroid of each cluster. Each user is then assigned to the
corresponding closest cluster to obtain the plurality of clusters such that each user is associated
with just one cluster. Each cluster may be further analyzed to determine if the number of users
0 subscribed to the cluster is less than the cluster size threshold. Two or more small size clusters,
having users less than the cluster size threshold, may then be merged together to obtain clusters
25 having number of users greater than the cluster size threshold. Upon merging, the cluster size of
each of the plurality of clusters may thus be greater than the cluster size threshold.
[0060] At block 206, Laplacian transformation is performed on each of the plurality of
clusters to obtain a corresponding differential-privacy (DP) noise enabled cluster having a DP
noise enabled low-dimensional cluster centroid. In one implementation, the system 102 may
30 initially transform centroid of each cluster to a low-dimensional space using the random
projections, such the JL-Lemma and LSH technique so that the DP noise enabled low-
20
dimensional cluster centroid are computed in the low-dimensional space. As will be understood,
a DP noise enabled cluster is a cluster in which the user profiles have been modified by either
adding additional arbitrary data or replacing one or more fields of the user profiles by the
arbitrary data.
5 [0061] At block 208, a best matching cluster is determined from among the plurality of
clusters. In one implementation, the best matching cluster corresponds to the DP noise enabled
low-dimensional cluster centroid having a highest cosine similarity with a low-dimensional
representation of the TG predicates.
[0062] At block 210, one or more DP noise enabled dominant features of the user
10 profiles are obtained. In one implementation, DP noise enabled dominant features are obtained
0 for one or more users associated with the best matching cluster. Once obtained, the DP noise
enabled dominant features may be provided to the service provider.
[0063] Fig. 3 illustrates a graph 300 depicting a comparison of data utility and data
privacy of user profiles provided to the third party service providers using the present subject
15 matter and conventional techniques.
[0064] In the graph 300, number of users desired to be in a target group whose data is
provided to the service provider is represented along a horizontal axis 302, while average cosine
similarity between the user profiles and the TG predicates is represented on vertical axis 304.
Thus, for a particulhr set of TG predicates, any point (x, y) on the graph 300 denotes that for a
20 desired TG-size of 'x', the average cosine-similarity of the members of the returned user profiles
was 'y'.
0 [0065] For the comparison between the different techniques, data was obtained from
dataset of a microblogging site. The data was obtained using public APIs of the microblogging
site and URLs of the users of the microblogging site were used to build semantic profiles of
25 every user. In order to do so, keywords in the HTML content of the URLs were extracted and
used to create keyword histograms of the URLs for mapping each URL to an ontological
category. By mapping all the URLs used in their texts by a user, an ontological profile of every
user was created by creating a histogram over approx 1320 categories of ontologies. Thus, the
intrinsic dimensionality of the dataset was approx 1320, and each user's profile was represented
using a unit-norm (L2) weighted histogram over the category weights of this ontology. The
dataset comprised approx 70,000 users.
I00661 Curve 306 represents data utility and data privacy achieved when the user profiles
are provided as raw individual user-level records after replacing user names with an arbitrary
identifier. As illustrated, although this approach maximizes data utility, however, due to the
reasons previously described, this approach suffers from privacy leaks. Curve 308 represents
data utility and data privacy achieved when the user profiles are randomly selected for being
provided to the service provider. Curve 310 represents data utility and data privacy achieved
when the user profiles are provided after performing DP noise enabled low-dimensional
transformation using JL-Lemma of individual user profiles. As previously described, in said
approach, each individual user's user data is first transformed to a low-dimensional space using a
technique of random projections called JL-Lemma and DP noise is subsequently added. Curve
312 represents data utility and data privacy achieved when the user profiles are provided afier
performing low-dimensional transformation using JL-Lemma of individual user profiles. In said
approach, no DP noise is added to the user profiles.
[0067] Curve 314 represents data utility and data privacy achieved when the user profiles
are provided after transformation of each user profile using KMeans clustering algorithm. Curve
316 represents data utility and data privacy achieved when the user profiles are provided after
DP noise enabled transformation of each user profile using KMeans clustering algorithm. As
illustrated, using KMeans clustering algorithm results in high amount of DP-noise since each
user can affect the clustering process significantly.
[0068] Curve 3 18 represents data utility and data privacy achieved when the user profiles
are provided after transformation of each user profile using a method of clustering the user
profiles before transformation. Curve 320 represents data utility and data privacy achieved when
the user profiles are provided after DP noise enabled transformation of each user profile using
the present subject matter. As illustrated in the graph 300, the method proposed as per the present
subject matter offers significantly better data utility and data privacy than the other DP enabled
versions of the different approaches.
[0069] Although embodiments for the present subject matter have been described in a
language specific to structural features and/or method(s), it is to be understood that the invention
22
is not necessarily limited to the specific features or method(s) described. Rather, the specific
features and methods are disclosed as exemplary embodiments of the present subject matter.

I/We claim: 0
. -
1. A method for privacy protection while offering data-as-a-service, the method comprising:
receiving target group (TG) predicates from a service provider, wherein the TG
predicates define a set of conditions for a network provider for selecting data to be provided to
5 the service provider as a part of offering the data-as-a-service;
clustering a plurality of users registered with the network provider into a plurality of
clusters based on user profiles corresponding to each of the plurality of users;
transforming each of the plurality of clusters to obtain a corresponding differentialprivacy
(DP) noise enabled cluster having a DP noise enabled low-dimensional cluster
10 centroid;
43 determining a best matching cluster, from among a plurality of DP noise enabled clusters,
wherein the best matching cluster corresponds to the DP noise enabled low-dimensional
cluster centroid having a highest similarity with a low-dimensional representation of the TG
predicates; and
15 obtaining one or more DP noise enabled dominant features of the user profiles
corresponding to one or more users, from among the plurality of users, associated with the
best matching cluster for being provided to the service provider as a part of the data-as-aservice.
2. The method as claimed in claim 1, wherein the clustering further comprises:
20 creating a predetermined number of sets of clusters of the plurality of users, wherein each
0 set of cluster contains one or more clusters of the plurality of users, and wherein each of the
plurality of users is associated with one cluster in each of the predetermined number of sets,
and wherein the number of users assigned to each cluster is greater than a cluster size
threshold;
25 computing, for each of the predetermined number of sets, a centroid of each of the one or
more clusters;
ascertaining, for each of the predetermined number of sets, a cosine similarity of each of
the plurality of users with the centroid of the cluster to which the user is associated in the
predetermined number of sets to determine a closest cluster for the user; and
24
,
assigning es
-
~cho f the plurality of users to the corresponding closest cluster to o'bhinwtke'
plurality of clusters, wherein each of the plurality of users is associated with at most one
cluster, and wherein the number of users assigned to each cluster is greater than a cluster size
threshold.
5 3. The method as claimed in claim 2, wherein the creating the predetermined number of sets of
clusters further comprising:
transforming the user profile corresponding to each of the plurality of users using random
projections to obtain a low-dimensional signature corresponding to each of the user profile;
clustering the plurality of users into the one or more clusters based on the low-
10 dimensional signatures corresponding to each user, wherein the number of users assigned to
6 each cluster is greater than the cluster size threshold; and
iterating, for a predetermined number of times equal to the predetermined number of sets,
the transforming and the clustering the plurality of users into the one or more clusters to
obtain the predetermined number of sets.
15 4. The method as claimed in claim 2, wherein the clustering further comprises:
comparing, for each of the plurality of clusters, the number of end users subscribed to the
cluster with the cluster size threshold to identify one or more small size clusters, and wherein
the number of users subscribed to the small size cluster is less than the cluster size threshold;
and
20 merging, two or more small size clusters from among the one or more small size clusters
0 to obtain clusters having number of users greater than the cluster size threshold, and wherein
the two or more small size clusters are merged based on cosine-similarity of cluster centroids
between the two or more small size clusters.
5. The method as claimed in claim 1, wherein the determining the best matching cluster further
25 comprises comparing the DP noise enabled low-dimensional cluster centroid of each of the
plurality of clusters with the low-dimensional representation of the TG predicate to determine
cosine similarity between the DP noise enabled low-dimensional cluster centroid and the lowdimensional
representation of the TG predicates.
I . "k. .I- - , 2 ~ ~ ~ 2 b \ ,3 r ?, 1: ,*:% 0) -- ,, -7 y-- rF-3 c-' . . ,
d
L- - 3
6. The method as claimed in claim 1, wherein the determining the best matching cluster further
comprises transforming the TG predicates using random projections to obtain the lowdimensional
representation of the TG predicates.
7. The method as claimed in claim 1, wherein the transforming the clusters further comprises:
5 transforming centroid of each the plurality of clusters using random projections to obtain
low-dimensional representation of the centroids; and
performing Laplacian transformation on the plurality of clusters to obtain the
corresponding differential-privacy (DP) noise enabled cluster having the DP noise enabled
low-dimensional cluster centroid.
10 8. A privacy protection system (1 02) comprising:
a processor (1 08);
an interaction module (1 18) coupled to the processor (108) to receive target group (TG)
predicates from a service provider, wherein the TG predicates define a set of conditions for a
network provider for selecting data to be provided to the service provider as a part of offering
15 the data-as-a-service;
a clustering module (120) coupled to the processor (108) to cluster a plurality of users
registered with the network provider into a plurality of clusters based on user profiles
corresponding to each of the plurality of users;
a cluster transformation module (1 22) coupled to the processor (I 08) to transform each of
20 the plurality of clusters to obtain a corresponding differential-privacy (DP) noise enabled
cluster having a DP noise enabled low-dimensional cluster centroid; and
0 computation module (124) coupled to the processor (1 08) to:
determine a best matching cluster, from among a plurality of DP noise enabled
clusters, wherein the best matching cluster corresponds to the DP noise enabled low-
25 dimensional cluster centroid having a highest similarity with a low-dimensional
representation of the TG predicates; and
obtain one or more DP noise enabled dominant features of the user profiles
corresponding to one or more users, from among the plurality of users, associated with
the best matching cluster for being provided to the service provider as a part of the dataas-
a-service.
9. The privacy protection system (102) as claimed in claim 8, wherein the clustering module
(1 20) further:
5 creates a predetermined number of sets of clusters of the plurality of users, wherein each
set of cluster contains one or more clusters of the plurality of users, and wherein each of the
plurality of users is associated with at least one cluster in each of the predetermined number of
sets, and wherein the number of users assigned to each cluster is greater than a cluster size
threshold;
10 computes, for each of the predetermined number of sets, a centroid of each of the one or
0 more clusters;
ascertains, for each of the predetermined number of sets, a cosine similarity of each of the
plurality of users with the centroid of the at least one cluster to which the user is associated in
the predetermined number of sets to determine a closest cluster for the user; and
15 assigns each of the plurality of users to the corresponding closest cluster to obtain the
plurality of clusters, wherein each of the plurality of users is associated with at most one
cluster, and wherein the number of users assigned to each cluster is greater than the cluster
size threshold.
10. The privacy protection system (102) as claimed in claim 9, wherein the clustering module
20 (1 20) further:
0 transforms the user profile corresponding to each of the plurality of users using random
projections to obtain a low-dimensional signature corresponding to each of the user profile;
clusters the plurality of users into the one or more clusters based on the low-dimensional
signatures corresponding to each user, wherein the number of users assigned to each cluster is
25 greater than the cluster size threshold; and
iterates, for a predetermined number of times equal to the predetermined number of sets,
the transforming and the clustering the plurality of users into the one or more clusters to
obtain the predetermined number of sets.
I I . The privacy protection system (102) as claimed in claim 8, wherein the clustering module
(1 20) further:
compares, for each of the plurality of clusters, the number of users subscribed to the
cluster with a cluster size threshold to identify one or more small size clusters, and wherein
the number of users subscribed to the small size cluster is less than the cluster size threshold;
and
merges, two or more small size clusters from among the one or more small size clusters
to obtain clusters having number of users greater than the cluster size threshold, and wherein
the two or more small size clusters are merged based on cosine-similarity of cluster centroids
between the two or more small size clusters.
12. The privacy protection system (102) as claimed in claim 8, wherein the computation module
(124) further compares the DP noise enabled low-dimensional cluster centroid of each of the
plurality of clusters with the low-dimensional representation of the TG predicate to determine
cosine similarity between the DP noise enabled low-dimensional cluster centroid and the lowdimensional
representation of the TG predicates.
13. The privacy protection system (102) as claimed in claim 8, wherein the computation module
(124) further transforms the TG predicates using random projections to obtain the lowdimensional
representation of the TG predicates.
14. The privacy protection system (102) as claimed in claim 8, wherein the cluster
transformation module (122) fbrther performs Laplacian transformation on the plurality of
clusters to obtain the corresponding differential-privacy (DP) noise enabled cluster having the
DP noise enabled low-dimensional cluster centroid.
15. A non-transitory computer-readable medium having embodied thereon a computer program
for executing a method of providing privacy protection while offering data-as-a-service, the
method comprising:
receiving target group (TG) predicates from a first user, wherein the TG predicates define
a set of conditions for a network provider for selecting data to be provided to the first user as a
part of offering the data-as-a-service;
clustering a plurality of users registered with the network provider into a plurality of
clusters based on user profiles corresponding to each of the plurality of users;
transform each of the plurality of clusters to obtain a corresponding differential-privacy
(DP) noise enabled cluster having a DP noise enabled low-dimensional cluster centroid;
determining a best matching cluster, from among a plurality of DP noise enabled clusters,
wherein the best matching cluster corresponds to the DP noise enabled low-dimensional
5 cluster centroid having a highest similarity with a low-dimensional representation of the TG
predicates; and
obtaining one or more DP noise enabled dominant features of the user profiles
corresponding to one or more users, from among the plurality of users, associated with the
best matching cluster for being provided to the first user as a part of the data-as-a-service.

Documents

Application Documents

#	Name	Date
1	3627-del-2013-GPA.pdf	2014-04-28
2	3627-del-2013-Form-5.pdf	2014-04-28
3	3627-del-2013-Form-3.pdf	2014-04-28
4	3627-del-2013-Form-2.pdf	2014-04-28
5	3627-del-2013-Form-1.pdf	2014-04-28
6	3627-del-2013-Drawings.pdf	2014-04-28
7	3627-del-2013-Description (Complete).pdf	2014-04-28
8	3627-del-2013-Correspondence-others.pdf	2014-04-28
9	3627-del-2013-Claims.pdf	2014-04-28
10	3627-del-2013-Abstract.pdf	2014-04-28
11	3627-del-2013-Correspondence-Others-(12-06-2014).pdf	2014-06-12