FORM 2
THE PATENTS ACT, 1970 (39 of 1970) & THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10, rule 13)
1. Title of the invention'.
DATA SECURITY AND PRIVACY
2. Appiicant(s)
NAME I NATIONALITY I ADDRESS
TATA CONSULTANCY Nirmal Building, 9th Floor, Nariman Point,
Indian
SERVICES LIMITED Mumbai-400021, Maharashtra, India
?. Preamble to the description
COMPLETE SPECIFICATION
The following specification particularly describes the invention and the manner in which it
is to be performed.
TECHNICAL FIELD
[001] The present subject matter relates, in general, to a system and a
method for data security and privacy and, in particular, to a system and a method for data aggregation while maintaining security and privacy of the data.
BACKGROUND
[002] Privacy, in general, can be defined as limited availability of access
to a system or a process, and to all features related to the system or the process. Preservation of privacy is important for customers of an organization providing information technology (IT) services. For example, customers might send private information to a database of the organization and may not want actual contents of the information to be known to the organization.
[003] Recent enhancements in computing and communication capabilities
of various computing systems and devices have made it easy to access and process, over a network, large amounts of information. This has led to security concerns and privacy issues for the information, such as a customer's data. Moreover, the World Wide Web makes it easy for the information to be accessed and collected from anywhere in the world.
[004] With increasing connectivity, people are using technologies, such as
satellite network, cellular network, and cloud computing in both organization-to-consumer and organization-to-organization environments to share information. Such technologies enable a third party individual or organization, which is not supposed to have access to the information, to covertly acquire the information shared amongst the organizations or consumers. However, depending upon the nature of the information, users may not be willing to divulge information to a third party individual or organization.
[005] [n order to alleviate the security and privacy concerns, a number of
techniques have been proposed. However, such techniques are computationally intensive and hence, are not suitable for limited resource environments.
SUMMARY
[006] This summary is provided to introduce concepts related to a system
and a method for data security and privacy, which is further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
[007] The method for data security and privacy comprises receiving
encrypted randomized data from a source, selected from amongst a plurality of sources and decrypting the encrypted randomized data to obtain randomized data. The method further comprises selecting another source, from amongst the plurality of sources, which has not participated in the data aggregation. The randomized data is then encrypted to obtain another encrypted randomized data and the other encrypted randomized data is sent to the other source for aggregating another data element available with the other source.
BRIEF DESCRIPTION OF DRAWINGS
[008] The detailed description is provided with reference to the
accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and components.
[009] Fig. I illustrates an exemplary network environment implementing
data aggregation, in accordance with the present subject matter.
[0010] Fig. 2 illustrates exemplary components of an aggregator
overseeing data aggregation, in accordance with an embodiment of the present subject matter.
[0011] Fig. 3 illustrates an exemplary method for implementing a data
security and privacy scheme, in accordance with an embodiment of the present subject matter.
[0012] Fig. 4 illustrates an exemplary plot showing a comparison of
probabilities of data disclosure between a cluster-based privacy-preserving data aggregation (CPDA) scheme and the data security and privacy scheme, in accordance with an embodiment of the present subject matter.
[0013] Fig. 5 illustrates an exemplary bar-chart showing a comparison of
computational time spent in computing an aggregated value using the data security and privacy scheme with that spent in computing the aggregated value using a CPDA scheme, in accordance with an embodiment of the present subject matter.
DETAILED DESCRIPTION
[0014] Devices, and methods thereof, implementing data security and
privacy are described herein. In an embodiment, a system and a method for secure data aggregation, while at the same time preserving privacy, is described. A data security and privacy scheme, used to employ the systems and methods, is also described herein.
[0015] Preservation of privacy has become a concern in today's context of
penetration of the Internet and other communication technologies, such as cellular communication, intranet, and Wireless Sensor Network (WSN). Privacy preservation is important from perspectives of both an individual as well as an organization. For example, two or more parties, either of which may be an individual or an organization, owning data may need to share their respective data, say for aggregation of their data, with a third party, which also may be an individual or an organization, without revealing actual content of their data. Data aggregation, i.e., performing a mathematical operation, such as summation, on a set of data, needs to be done in a
manner that maintains data security and privacy. The owners are hereinafter interchangeably referred to as sources.
[0016] Data security and privacy is also important in scenarios where data
collected in a network is processed within the network, say using collaborative computation. Conventionally, data mining techniques are implemented to preserve the privacy of the data. These techniques, however, are computationally intensive and are not suitable for environments with limited computational resources, such as processing and memory resources. For example, in a WSN, the sources typically have low computational resources and hence conventional techniques may not be possible to implement in a WSN.
[0017J To this end, a system, and methods thereof, are described, which
facilitate data security and privacy preservation with the help of a secure key management scheme and using data perturbation techniques. The secure key management scheme is implemented before any data aggregation process. The secure key management scheme ensures that no entity, which may be either a source or an aggregator, may be able to access data, to which the entity is not supposed to have access. Data communication takes place only after the secure key management scheme has been implemented. The data perturbation technique is implemented to preserve privacy of the data, when the data is shared between two sources or between a source and the aggregator, say for aggregation purpose.
[0018] The aggregation of data, in accordance with the present subject
matter, is based on the concept of Secure Multiparty Computation (SMC). Generally, SMC can be understood as computation of a function f(xj, x2,..., xn) on data xt, \2,..., xn in a network with n sources of data, where each source, say source i, knows only its data x* and f(xi, x2, ..., xn). In an example, let the function f(xi, X2, ..-, xn) be summation. The sources perturb their respective data using the data perturbation techniques, such as randomization and send perturbed data to an aggregator, which oversees the aggregation. The aggregator may send aggregated data, which is in the perturbed form, for further processing. All communication, for example, between two
sources or between a source and the aggregator, is done using the secure key management scheme.
[0019] The secure key management scheme is implemented to enhance
data security. Encryption keys, referred to as keys hereinafter, are distributed amongst
the sources and the aggregator for both source-to-source communication and source-
to-aggregator communication. The secure key management scheme randomizes key
distribution prior to exchange of the perturbed data. Randomizing key distribution for
both types of communication, i.e., source-to-source and source-to-aggregator,
enhances security of the data to be exchanged. In an implementation, a source may
share data with another source only via the aggregator, and not directly.
J0020J Once the secure key management scheme has been deployed, data
can be shared amongst the sources and the aggregator, say for the aggregation. Data is shared after perturbing the data using the data perturbation technique. In an embodiment the data perturbation technique employed is randomization, which converts raw data into randomized data. The randomisation of the raw data makes it difficult for an entity, which has no knowledge of the randomization, to extract raw data from the randomized data. In an embodiment, modular arithmetic concepts are used to randomize the data and to compute an aggregated value, in a randomized form, of the data available with the sources. Modular arithmetic concepts demand low computational resources from the sources, and hence, are computationally less intensive.
[0021] Some assumptions regarding the network and the systems, such as
the sources and the aggregator, have to be made before implementing the present
scheme. It is assumed that information is transmitted and received error-free, i.e., no
packet drop takes place. The sources in the network, for example, a WSN, and the
aggregator are cooperative towards computing the aggregated value, but
uncooperative towards deducing the data of other sources or the aggregator.
[0022] The method of maintaining data security and privacy described
herein has low computational complexity, which makes it suitable for practical
implementation, particularly in cases where the sources do not have much computational capabilities, for example in a WSN. The data security and privacy system of the present subject matter requires low computational resources and is, hence, cost-effective and simple. These and other aspects of the data security and privacy system are described in conjunction with the exemplary systems as provided below.
[0023] Fig. 1 illustrates an exemplary network environment 100
implementing data aggregation, in accordance with the present subject matter. The network environment J 00 includes sources 102-1, 102-2, ..., 102-n, collectively referred to as sources 102, which are owners or providers of data. In an embodiment, the sources 102 may be sensor nodes of a WSN, or a data receiver in the WSN. Accordingly, the sources 102 may either generate their data, or receive their data from any source such as a data transmitter, a sensor, and a manually provided input.
[0024] The sources 102 are connected through a network 104 to an
aggregator 106. The sources 102 communicate with the aggregator 106 to provide either their data or an intermediate aggregated value for further processing, such as aggregation of the data. The sources 102 do not share the data in raw form and instead provide the data in a randomized form, i.e., the sources 102 do not reveal content of their data or the intermediate aggregated value. The aggregator 106 includes a supervisor 108 that initiates and supervises data aggregation to be performed on a set of inputs, say X|, X2, ..., xn. The sources 102 may generate or accumulate the data on instructions provided by the supervisor 108.
[0025] Each of the sources 102, such as the source 102-1 includes a
randomizer, such as the randomizer 110-1, which randomizes the data using the concepts of modular arithmetic. Randomization decreases the probability that the aggregator 106, or any other source, would be able to understand the content of randomized data, thereby enhancing privacy of the data.
[0026] In operation, a secure key management scheme is deployed prior to
sharing any data within the network environment 100 to ensure security of the data. The secure key management scheme establishes two types of keys, namely source-source keys and source-aggregator keys. The source-source keys are distributed to secure data exchange amongst the sources 102, while the source-aggregator keys are distributed to secure data exchange amongst any source, such as the source 102-1, and the aggregator 106.
[0027] The secure key management scheme ensures that no entity, which
may be either the aggregator 106 or a source, say the source 102-1, may be able to access data to which the entity is not supposed to have access. Once the secure key management scheme has been deployed, the supervisor 108 may request the sources 102, one-by-one, to aggregate the data in a cumulative fashion.
[0028] In an implementation, the supervisor 108 requests a source, such as
the source 102-1, to initiate data aggregation. The source 102-1, which in this case is the initiator source, randomizes its data upon the request by the supervisor 108. The source 102-1 is now associated with an aggregated data element, i.e., with a data element that has been aggregated. The source 102-1 also performs a neighbourhood search to find out other sources, from amongst the sources 102, which it is connected to directly, and compiles a list of such other sources. The source 102-1 sends this list of the other sources to the aggregator 106. In an implementation, the supervisor 108 keeps note of all sources, which are part of the sources that have already participated, i.e., the sources that are associated with aggregated data elements.
[0029] If at least one of the other sources has not already participated, the
supervisor 108 randomly selects one source, say the source 102-2, from amongst the other sources that have not already participated. The source 102-2 may be said to be associated with an aggregated data element. The aggregator sends information pertinent to the source 102-2 to the source 102-t. Accordingly, the source 102-1 sends the randomized data to the source 102-2. In an implementation, the source 102-
] sends the randomized data to the source 102-2 via the aggregator 106. As a result, the source to aggregator, and vice-versa, communication is secured by virtue of the secure key management scheme.
[0030] The source 102-2 evaluates an intermediate aggregated value, in a
randomized form, using its own data and the randomized data received from the source 102-1. The source 102-2 performs a neighbourhood search, similarly to source 102-1, to find out other sources it is connected to directly, and compiles a list of such other sources. The source 102-2 then sends the intermediate aggregate value to another source, say the source 102-3, on similar lines as mentioned above for the source 102-1.
[0031] Similar procedure is followed for all the sources 102. Once all the
sources 102 have participated, the aggregator 106 requests the last source, say the source 102-n, to send the intermediate aggregated value, which becomes a final aggregated value and is still in the randomized form, to the aggregator 106, The aggregator, upon receiving the final aggregated value, requests the first source, 102-1, to compute the aggregated value in a raw form.
[0032] In case one or more of the sources 102 have not yet participated,
which are not directly connected to the last participating source, i.e., the source 102-n, the supervisor 108 randomly chooses one of the sources that have not yet participated. It then sends information, such as a source identification, related to the chosen source to the source that participated last, i.e., the source 102-n. The source 102-n sends its intermediate aggregated value to the aggregator 106, which forwards the intermediate aggregate value to the chosen source. Similarly, the aggregator 106 involves all such sources, which are not directly connected to the last participating source, in data aggregation.
[0033] It would be noted that all the communication within the network
environment 100 is secured by virtue of the secure key management scheme, while
privacy of data is ensured by the randomization of the data. The aggregator 106 may use the final aggregated value for further processing.
[0034] The data aggregation and randomization are explained below in
mathematical terms for the purpose of clarity. It would be understood that the following implementation is not limiting, and similar approaches would be within the scope of the present subject matter.
[0035] The term data aggregation refers to a mathematical operation
performed on a set of data and includes summation, multiplication, minima, maxima, sorting, finding median, mode and mean, statistical analysis, etc. Data aggregation can be represented as a mathematical function f (xi, x2, ..., xn) performed on a set of variables. As an example, the function may be defined using the following equation:
Y(t) = f(di(t),d1(t),...,dn(t)) (1)
, where dj(t), ..., d^Ct) are N data elements, and Y(t) is an aggregated value evaluated at a time instant "t" for which the N data elements are available.
[0036] Without loss of generality, the function f (xi, xi, ..., xn) may be
considered to be, for example, a sum function, as shown in equation (2). Hence,
no = !>„(/) (2)
)t=l
[0037] The present subject matter provides a randomization technique for
privacy preservation during data aggregation. The randomization technique, as explained below, implements concepts of modular arithmetic. It would be understood that the specific implementation of the technique is not to be construed as limiting.
[0038] In the present system, each of the sources 102 owns information; say
a source 102-1 owns a data element XI, which it is not willing to share with the other sources, which are part of the sources 102, and the aggregator 106. The output Y (t) has a value, say X, in a range of [0, M], where M is a real number. The value X is to
be evaluated without revealing the content of the data elements xu x2, ■■■, xn of the sources 102 to any source other than the source owning the data element as well as to the aggregator 106. The value X may be defined using the following equation:
(3)
, where i = 1, 2,..., N.
[0039| The supervisor 108 supervises the process of evaluating X. The
supervisor 108 randomly chooses one of the sources 102, say the source 102-1, and requests it to initiate the computation. The source 102-1 possesses a data element xt. Upon a request from the supervisor 108, the randomizer 110-1 generates a random number v\ in the range of [0, M] and computes a value R1, which is a randomized form of the data element x1, using the following equation;
(4)
, where P is an arbitrarily large number.
[0040] The source 102-1 also finds out other sources, which are part of the
sources 102, it is connected to directly. The source 102-1 passes information, such as a source identification, related to the other sources to the aggregator 106. The aggregator 106 obtains information related to the sources that have already participated in the aggregation. If at least one of the sources connected to the source 102-1 has not already participated in the aggregation, the supervisor 108 randomly chooses one of such sources and sends information related to the chosen source, say the source 102-2, to the source 102-1. Accordingly, the source 102-1 passes the value Ri to the source 102-2 via the aggregator 106.
[0041] The randomizer 110-2 computes a value R2, which is a randomized
form of an intermediate aggregated value of data elements X] and x2, using the following equation;
R2=(R}+x2)modP (5)
, where X2 is the data element available with the source 102-2.
[0042] Similar procedures, as for the source 102-1, are followed and the
source 102-2 sends R2 to a next source, say the source 102-3, via the aggregator 106. After multiple repetitions of the above procedure, starting from the source 102-1, a last remaining source, i.e., the source 102-n, may be reached. The randomizer 110-n computes Rn using the following equation:
(6)
where xn is a data element available with the source 102-n.
[0043] The supervisor 108, when it finds out that all the sources 102 have
participated, requests the last source, i.e., the source 102-n, to send the value Rn to it. The supervisor 108 now directs the first source, i.e., the source 102-1, to compute the summation X. The randomizer 110-1 computes X using the following equation:
[0044] The summation X is in a raw form, and is sent to the aggregator 106.
The aggregator 106 may use the summation X for further processing.
[0045] In case one or more of the sources 102 have not yet participated,
which are not directly connected to the last source, i.e., the source 102-n, the supervisor 108 randomly selects one of the sources that have not yet participated. It then sends information, such as a source identification, related to the selected source to the source that participated last, i.e., the source 102-n. The source 102-n sends the computed value Rn to the selected source. Similarly, the aggregator 106 involves all such sources, which are not directly connected to the last participating source, in data aggregation.
[0046] It can be observed that each of the sources 102 is assumed to have
used its correct data element X1. If there is no collusion, i.e., no two sources attempt to learn data element other than their own, a source 102-i may only learn the value Rj, which is an intermediate aggregated value of all the data elements up to the data element Xi. With this Rj, the source 102-i can calculate the sum of values of all the other sources, that have already participated using, for example, the equation (R; - Xj) mod P. However, if two or more sources collude, they can disclose more information related to the data elements. For example, if the two sources, for example, the sources 102-(i - 1) and 102-(i + 1), collude, they can learn the data element of the source 102-i. i.e., Xi = (R, - Rt.|) mod P. This possibility is averted by allowing the aggregator 106 to decide the next source, thereby averting any collusion. The aggregator 106 does this by choosing a source randomly from amongst the sources connected to the current source and not yet participated.
[0047J Nevertheless, there exists a possibility of colluding through
bypassing the aggregator 106. For example, a source, say the source 102-i, may directly send the computed value R, to a friend source, i.e., another colluding source, say a source 102i. To avoid such a scenario, source-to-source communication, except source-source key establishment, may be directed strictly via the aggregator 106 and any direct source-to-source communication may be prohibited. Such a scheme enhances data security, however, leading to increased communication costs.
[0048] Further, the aggregator 106 cannot be fully trusted. The aggregator
106 may attempt to get knowledge of a source's data, for example, by choosing each of the sources 102, one after another, to be both the initiating and the terminating source. In this way, the aggregator 106 can acquire the data of each of the sources 102. In order to avoid such a scenario, the initiating source, say the source 102-1, prior to sending the value X to the aggregator 106, compares the value X with its own data Xj. If both happen to be same, the source 102-1 indicates to the aggregator 106 that the aggregation cannot be performed. However, possibility of a false alarm arises when
the other sources' data are all zeros, because then the summation X would be equal to the initiating source's own data, X1. In a system with a large number of sources, such a possibility is, nevertheless, very less.
f 0049] It can also be observed that to acquire the data of each of the sources
102, the aggregator 106 does not need to be directly connected to all the sources 102. An indirect connection to the sources 102 would suffice. An example of such indirect connection would be that the source 102-1 may be connected to the source 102-2 and the source 102-2 may be connected to the source 102-n, which may be directly connected to the aggregator 106. If, however, less number of connections exist amongst the sources 102, then the probability of data disclosure decreases as in that case a number of the sources 102 would have to exchange data via the aggregator 106. However, key establishment for both source-source key and source-aggregator key becomes less secure due to the unavailability of a direct path from the aggregator 106 to some of the sources 102.
[0050] Fig. 2 illustrates exemplary components of the aggregator 106
overseeing data aggregation, in accordance with an embodiment of the present subject matter. The aggregator 106 may include one or more processor(s) 202, one or more I/O interface(s) 204, and a memory 206. The processor(s) 202 may include microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries and/or any devices that manipulate signals and data based on operational instructions. Among other capabilities, the processor(s) 202 are configured to fetch and execute computer-readable instructions stored in the memory 206.
[0051] The I/O interface(s) 204 may include a variety of software and
hardware interfaces, for example, interface for peripheral device(s) such as data input/output devices, storage devices, network devices, etc. The I/O interface(s) may include Universal Serial Bus (USB) ports, Ethernet ports, Host Bus Adaptors, etc and their corresponding device drivers. The I/O interface(s) 204, amongst other things,
facilitate receipt of information by the sources 102 from other devices in the network 104, such as the other sources 102, the aggregator 106, etc.
[0052] The memory 206 can include any computer-readable medium
known in the art including, for example, volatile memory (e.g., RAM) and/or nonvolatile memory (e.g., flash, etc.). The memory 206 further includes modules 208 and data 210. The modules 208 include the supervisor 108, a key manager 212, and other module(s) 214. The other module(s) 214 may include an operating system, or programs that supplement applications implemented by the aggregator 106. The data 210 serve as repositories for storing information associated with the modules 208 and any other information. The data 210 includes source data 216, key bank 218, and other data 220. The source data 216 stores data, such as the randomized for of a data element, received from the sources 102 during data aggregation.
[0053] The secure key management scheme and privacy preservation are
explained further with reference to the aggregator 106.
SECURE KEY MANAGEMENT SCHEME
[0054] The network environment 100 implements K number of keys to
provide secure communication amongst the sources 102 and the aggregator 106. The K keys are stored in each of the sources 102, such as the source 102-1. Out of the K keys, L keys are used for source-to-source communication, while the remaining (K-L) keys are used for source-to-aggregator communication. The (K-L) keys are shared amongst the sources 102 and the aggregator 106, so that the same keys are available with all the sources 102 and the aggregator 106.
SOURCE-AGGREGATOR KEY ESTABLISHMENT
[0055] Source-aggregator key sets, each of which contains the (K-L) keys,
for all source-aggregator pairs, such as the source-aggregator pair comprising the source 102-1 and the aggregator 106, are maintained in the key bank 218 in the aggregator 106. As the same keys are available with all of the sources 102 and the
aggregator 106, any source, such as the source 102-1, may understand the communication between another source, say the source 102-2, and the aggregator 106, thus leading to data security issues.
[0056} In order to alleviate such security concerns, the key manager 212
randomly permutes and reorders the source-aggregator key set for each source-aggregator pair, such as the source 102-1 and the aggregator 106, prior to distributing the source-aggregator keys amongst the sources 102 and the aggregator 106. Therefore, prior to each communication session, the key bank 218 contains source-aggregator key sets in arrangements different from respective previous arrangements. When a new source is added, the key manager 212 permutes and reorders the key bank 218 and the source-aggregator key set is stored in the key bank 218 for the new source.
[0057] Permuting and reordering the source-aggregator key set depends
upon the number of sources 102. For a large number of sources, for example, more than 12 sources, only permutation of the source-aggregator key set is enough to ensure security. However, for any number of sources below 12, both permutation and reordering are performed.
[0058] In operation, a source, say the source 102-n, communicates with the
aggregator 106 through one of the keys in a corresponding source-aggregator key set, stored in the source 102-n and the key bank 218, that the source 102-n shares with the aggregator 106. The randomizer 110-n generates a random number Rsa,n, where "sa" denotes source-aggregator and n denotes the source 102-n, between one and (K-L). This random number Rsa,n is sent to the aggregator 106 in plain text, i.e., in a raw form. The aggregator 106 understands that the source 102-n would encrypt its next message by the Rsan, key of the shared source-aggregator key set stored in the key bank 218. Every time the source 102-n communicates with the aggregator 106, it performs the steps mentioned above.
[0059] The permutation and ordering of the key bank 218 before each data
communication session lowers the probability of a security lapse during information exchange between any source-aggregator pair. It can also be observed that the random number Rsa,n is sent to the aggregator 106 in a raw form. However, acquisition of the random number Rsa,n by another source may not raise any vulnerability issue by virtue of the permutation and ordering of the key bank 218. There is a very high probability that the Rsa.n number key would be different for different source-aggregator pairs.
SOURCE-SOURCE KEY ESTABLISHMENT
[0060] Prior to establishing a secure source-source key, it is assumed that
all source-aggregator keys are securely established. In an embodiment, source-source key establishment is done through the aggregator 106 and not directly amongst the sources 102. Since the L keys are the same for all the sources 102, it is easy for a source, say the source 102-1, to understand the information exchanged between other sources. For example, the source 102-1 can understand the information exchanged between the source 102-1 and the source 102-j, where i is not equal to either 1 or j.
[0061] To avoid this situation, the randomizers 110-i and 110-j separately
permute the source-source key sets, available respectively with the sources 102-i and 102-j, containing the L keys dedicated for source-to-source communication and reorder the same randomly. After the reordering, the source 102-1and the source 102-j pass their respective permute and reordering functions, which is used for permuting and reordering their respective source-source key sets, to each other through the aggregator 106 using their respective source-aggregator keys shared with the aggregator 106.
[0062] After successful exchange of the permute functions, the source 102-i
sends a random number Rss.ij, where "ss" denotes source-source, i denotes the source 102-i and j denotes the source 102-j, between one and L to the source 102-j. The random number Rss,ij indicates a serial number of a particular key, from amongst the
source-source key set of the source 102-i, to be used by the source 102-i for encoding any information to be exchanged subsequently with the source 102-j. The Rss.ijth key will be used by the source 102-i for subsequent communications with the source 102-j until the data aggregation is complete. For a next session of data aggregation, the same key establishment procedure will be followed.
[0063] The secure key management scheme of the present subject matter
reduces security concerns during data communication amongst the sources 102 and the aggregator 106. The secure key management scheme ensures that no entity, which may be either a source or an aggregator, may be able to access data, to which the entity is not supposed to have access. Once, the keys, both source-source keys and source-aggregator keys, have been established, data communication may take place within the network environment 100.
[0064] Fig. 3 illustrates an exemplary method 300 for implementing a data
security and privacy scheme, in accordance with an embodiment of the present subject matter. The exemplary method may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, and the like that perform particular functions or implement particular abstract data types. The computer executable instructions can be stored on a computer readable medium and can be loaded or embedded in an appropriate device for execution.
[0065] The order in which the method is described is not intended to be
construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method, or an alternate method. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the invention described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof.
[0066] At block 305, a data element available with a source is randomized
to provide a randomized data element. In an implementation, the data element, say the data element Xi, is available with a source, such as the source 102-i. The data element is randomized based on the concepts of modular arithmetic, and the randomization is done by the randomizer 110-i. If the source 102-i is an initiator source, the randomized data element, say a randomized data element Rj, is a randomized form of the data element Xi. ff the source 102-i is not the initiator source, the randomized data element Ri is a randomized form of an intermediate aggregated value of all the data elements that have been aggregated. The source 102-i may now be referred to as a source associated with an aggregated data element.
[00671 At block 310, a list of one or more other sources directly connected
to the source is compiled. In an implementation, the source 102-i performs a neighborhood search and compiles the list. The list can be a collection of source identifications.
J0068J At block 315, the randomized data element is encrypted using a
source-aggregator key to provide an encrypted randomized data element. In an implementation, the source 102-i encrypts the randomized data element Rj using a source-aggregator key that it shares with an aggregator, such as the aggregator 106.
[0069] At block 320, the encrypted randomized data element and the list are
sent to an aggregator. In an implementation, the source 102-i sends the encrypted randomized data element and the list to the aggregator 106.
[0070] At block 325, the encrypted randomized data element is decrypted
using the source-aggregator key to provide the randomized data element. In an implementation, the decryption is performed by the aggregator 106 after it receives the encrypted randomized data element from the source 102-i. The decryption provides the randomized data element Ri.
[0071] At block 330, a source, which has not yet participated in data
aggregation and is, therefore, associated with an unaggregated data element, is selected from the list. In an implementation, the aggregator 106 obtains information pertaining to all the sources that are associated with Unaggregated data elements. The aggregator 106 randomly selects a source associated with an unaggregated data element from the list. The selection may be done by selecting sources, one-by-one, from the list and checking whether a selected source has participated yet in the data aggregation. The first such source, say the source 102-j; may be selected.
[00721 At block 335, the randomized data element is encrypted using
another source-aggregator key to provide another encrypted randomized data element. the selected source, i.e., the source 102-j, to encrypt the randomized data element.
[0073] At block 340, the other encrypted randomized data element is sent to
the selected source. In an implementation, the aggregator 106 sends the other encrypted randomized data element to the source 102-j,
[0074] Once, the source 102-j receives the other encrypted randomized data
element, the method 300 may be performed once again, such that the functions performed by the source 102-i, such as the blocks 305 to 320, would now be performed by the source 102-j. The method 300 is performed in a repetitive fashion, as mentioned above for the sources 102-i and 102-j, until &n the sources have participated in the data aggregation.
[0075] The data security and privacy scheme of the present subject matter
has low computational complexity by virtue of the modular arithmetic techniques applied for randomization. The low computational demand of the present scheme makes it suitable for implementation in scenarios where the sources of data, such as the sources 102, have limited computational capabilities for example in a WSN.
[0076] Performances, in respect of data privacy and computational
complexity, of the data security and privacy scheme according to the present subject matter are now explained below with reference to figures 4 and 5,
[0077] Fig. 4 illustrates an exemplary plot 400 showing a comparison of
probabilities of data disclosure, P (b), between a cluster-based privacy-preserving data aggregation (CPDA) scheme and the data security and privacy scheme according to the present subject matter.
[0078] Performance of the system, in accordance with the present subject
matter, can be assessed by simulation, and a comparison with the CPDA scheme. In the CPDA scheme, there exists a certain probability of data disclosure. This can only happen when sources within a cluster, i.e., a group of sources such as the sources 102, exchange data. This can be estimated using the following equation:
, where P (b) = probability of data disclosure, Dmax = maximum cluster size, pc = minimum cluster size (= 3, which includes two sources and one server), L = cluster size, b = probability tr size is m.
[0079] In the data sehat communication level privacy is compromised, P (k = m) =
probability that a clustecurity and privacy scheme of the present subject matter,
pc = Dmax = L = 3, P (k=m) =1. A plot of P (b) obtained after simulation for CPDA and the present data security and privacy scheme is shown in the figure.
[0080] An axis 405 depicts the probability P (b) in percentage, and an axis
410 depicts a probability in fraction that communication level privacy is compromised, denoted by the parameter b in the equation (8). A curve 415 shows the probability P (b) for the CPDA scheme, while a curve 420 shows the probability P (b) for the data security and privacy scheme of the present subject matter. It would be observed that the probability that privacy of data is compromised, i.e., P (b), in CPDA
has a steeper slope than that in the data security and privacy scheme as proposed in accordance with the present subject matter. Hence, the probability of data disclosure in a CPDA scheme rises sharply with a compromise in the communication level privacy, as compared to the probability of data disclosure in the data security and privacy scheme of the present subject matter.
[0081] Fig. 5 shows an exemplary bar-chart 500 illustrating a comparison of
computational time spent in computing an aggregated value using the data security and privacy scheme with that spent in computing the aggregated value using the CPDA scheme, according to an embodiment of the present subject matter. For the simulation, the sources 102 and a server implementing the aggregator 106 were chosen to be configured with Intel Core 2 duo processors with a processing speed of 3 gigahertz (GHz) and physical memories of 2 gigabytes (GB).
[0082] An axis 505 depicts computational time in seconds, while an axis
510 depicts a number of sources. A shaded bar 515 depicts the computational time spent by the CPDA scheme and a shaded bar 520 depicts the computational time spent by the data security and privacy scheme of the present subject matter.
[0083] The number of sources has been limited to five for the purposes of
the simulation. Fig. 5 shows that computational efficiency of the present scheme is enhanced as compared to the CPDA scheme. The computational time required in the present scheme for the aggregation is less than 10 microseconds (μ.s), whereas for the same configuration, the CPDA scheme requires about 30 s. In practical sensor nodes, which are same as the sources in the present embodiment, of a WSN, the available computational resources are less. Hence, the three-fold decrease in computational time, according the present scheme, would be substantial.
[0084] In the CPDA scheme, the computational time increases linearly with
the number of sources. The scheme of the present subject matter incurs less computational resources, as can be seen from the fig. 5. Moreover, the present scheme
is not affected considerably by addition of new sources, as is also evident from the fig. 5, The present scheme has an additional advantage of eliminating complex cluster formation required in the CPDA scheme. It is to be noted that the comparison is only with reference to the scheme of the present subject matter, and not with respect to any system thereof.
[0085] As can be seen from the foregoing description, the data security and
privacy system and method of the present subject matter has low computational complexity, which makes it suitable for practical implementation, particularly in cases where the sources have limited computational capabilities, for example in a
WSN.
[0086] Although embodiments for a device have been described in
language specific to structural features and methods, it is to be understood that the invention is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as exemplary implementations for the device.
I/We claim:
1. A method for data aggregation in a network environment (100) comprising a
plurality of sources (102), the method comprising:
receiving encrypted randomized data from a source associated with an
aggregated data element;
decrypting the encrypted randomized data to obtain randomized data;
encrypting the randomized data to obtain another encrypted randomized
data; and
transmitting the another encrypted randomized data to a source associated
with an unaggregated data element for aggregating the unaggregated data
element.
2. The method as claimed in claim 1, wherein the method further comprises receiving a list of sources directly connected to the source associated with the aggregated data element.
3. The method as claimed in claim 2, wherein the method further comprises:
obtaining information related to other sources associated with unaggregated data elements; and
selecting the source associated with the unaggregated data element based at least in part on the received list and the obtained information.
4. The method as claimed in claim I, wherein the method further comprises establishing at least one encryption key to secure data communication within the network environment (100).
5. The method as claimed in claim 4, wherein the encrypting further comprises:
obtaining a source-aggregator key, from amongst the at least one encryption key, associated at least with the source associated with the unaggregated data element; and encrypting the randomized data based on the source-aggregator key.
6. An aggregator (106) for supervising data aggregation in a network environment
(100) comprising a plurality of sources (102) of data elements that are to be .
aggregated, the aggregator (106) comprising:
at least one processor (202);
a memory (206) coupled to the at least one processor (202), the memory
(206) comprising:
a supervising module (108) configured to;
decrypt encrypted randomized data received from a source
associated with an aggregated data element to obtain randomized
data; and
encrypt the randomized data to obtain another encrypted
randomized data to be sent to a source associated with an
unaggregated data element for aggregating the unaggregated data
element.
7. The aggregator (106) as claimed in claim 6, wherein the memory (206) further comprises a key manager (212) configured to establish source-aggregator keys to secure data communication between at least one of the plurality of sources (102) and the aggregator (106).
8. The aggregator (106) as claimed in claim 6, wherein the memory (206) further comprises a key bank (218) to store the source-aggregator keys.
9. The aggregator (106) as claimed in claim 6. wherein the aggregator (106) is a part of a wireless sensor network.
10. The aggregator (106) as claimed in claim 6, wherein the aggregator (106) includes at least one of the plurality of sources (102).
11. A computer-readable medium having computer-executable instructions which when executed perform acts for aggregating data in a network environment (100) comprising a plurality of sources (102), the acts comprising:
decrypting encrypted randomized data generated by a source (102-1) associated with an aggregated data element to obtain randomized data;
obtaining information related to other sources associated with unaggregated
data elements;
selecting a source (102-2) associated with an unaggregated data element
based at least in part on the obtained information; and
encrypting the randomized data to obtain another encrypted randomized data
to be transmitted to the source (102-2) associated with the unaggregated data
element for aggregating the unaggregated data element.
12. The computer-readable medium as claimed in claim 11, wherein the acts further comprise instructing an aggregator (106) to establish at least one encryption key to secure data communication within the network environment (100).
13. The computer-readable medium as claimed in claim 11, wherein the selecting is further based on a list of sources directly connected to the source (102-1) associated with the aggregated data element.
14. The computer-readable medium as claimed in claim 11, wherein the encrypting is based on a source-aggregator key shared between an aggregator (106) and the source (102-2) associated with the unaggregated data element.
15. The computer-readable medium as claimed in claim 14, wherein the encrypting further comprises obtaining the source-aggregator key from a key bank (218) of the aggregator (106).