Abstract: Disclosed is a system and method for preserving privacy of data residing in a database. A db access and sensitive rule configuration module for enabling an admin user to configure access privilege corresponding to each user, a plurality of rules to be applied on a plurality of datasets in the data, and masking policy to be applied on the dataset. A query input module for receiving at least one query, from the user, to be executed on the database. The at least one query is received in order to retrieve the dataset. A query analysis module for analyzing the at least one query in order to determine presence of sensitive data in the dataset. A data transformation module for transforming the sensitive data by applying the masking policy configured on the dataset based on the rule, thereby preserving privacy of data residing in the database.
DESC:
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
SYSTEM AND METHOD FOR PRESERVING PRIVACY OF DATA RESIDING IN THE DATABASE
APPLICANT:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
The following specification describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
[001] The present disclosure in general relates to a field of a database access and sensitive data management. More particularly, the present disclosure relates to system(s) and method(s) preserving privacy of sensitive data in a database.
BACKGROUND
[002] Deployment of production databases (or database) in organizations serves plurality of requirements by storing large amount of data. The large amount of data is used by plurality of users while rendering services. The access of the production database to plurality of users increases possibility of errors. The errors comprises of errors related to updates. Also, while accessing the data for executing a query, there is a need to verify the query. Further, the production database stores sensitive information about customers. The step of verification results in a delay as there is no procedure of restricting an access to the production database.
[003] Most of the database management systems create a role based access of the database. The sensitive information is encrypted to ensure the confidentiality of sensitive data. Such systems also provide a filtration of queries to restrict an access of unwanted queries and many of the times queries are also modified during the execution phase. Masking techniques using query modification are also employed to provide a safe query execution. Use of security policies has also been employed to ensure a restricted access of the database.
SUMMARY
[004] Before the present systems and methods, are described, it is to be understood that this disclosure is not limited to the particular systems, and methodologies described, as there can be multiple possible embodiments which are not expressly illustrated in the present disclosure. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present application. This summary is provided to introduce concepts related to systems and methods for preserving privacy of data residing in a database, and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of the disclosure nor is it intended for use in determining or limiting the scope of the disclosure.
[005] In one implementation, a system for preserving privacy of data residing in a database is disclosed. In one aspect, the system may comprise a processor and a memory coupled to the processor. The processor may execute a plurality of modules present in the memory. The plurality of modules may further comprise a db access and sensitive rule configuration module, a query input module, a query analysis module, and a data transformation module. The db access and sensitive rule configuration module enables an admin user to configure access privilege corresponding to each user of a plurality of users. In one aspect, the access privilege may be configured for accessing data stored in a database. The db access and sensitive rule configuration module may further enable the admin user to configure a plurality of rules to be applied on a plurality of datasets in the data. In one aspect, a rule may be applied based on the access privilege of a user responsible for executing a query on the database. In one aspect, the rule may indicate presence of sensitive data in a dataset. The db access and sensitive rule configuration module may further enable the admin user to configure masking policy to be applied on the dataset based upon the rule. The query input module may receive at least one query, from the user, to be executed on the database. The at least one query may be received in order to retrieve the dataset. The query analysis module may analyze the at least one query in order to determine presence of the sensitive data in the dataset based on the rule and the contextual information associated with the query. The data transformation module may transform the sensitive data by applying the masking policy on the dataset based on the rule, thereby preserving privacy of data residing in the database.
[006] In another implementation, a method for preserving privacy of data residing in a database is disclosed. In order to preserve privacy of data residing in the database, initially, an admin user may be enabled to configure access privilege corresponding to each user of a plurality of users. The access privilege may be configured for accessing data stored in a database. The admin user may further be enabled to configure a plurality of rules to be applied on a plurality of datasets in the data. In one aspect, a rule may be applied based on the access privilege of a user responsible for executing a query on the database. The rule may indicate presence of sensitive data in a dataset. The admin user may further be enabled to configure masking policy to be applied on the dataset based upon the rule. Upon configuration, at least one query to be executed on the database may be received from the user. The at least one query may be received in order to retrieve the dataset of the data stored in the database. Once the query is received, the at least one query may be analyzed in order to determine presence of the sensitive data in the dataset based on the rule and contextual information associated with the query. Subsequent to the analysis, the sensitive data may be transformed by applying the masking policy on the dataset based on the rule, thereby preserving privacy of data residing in the database. In one aspect, the aforementioned method for preserving the privacy of the data residing in the database is performed by a processor using programmed instructions stored in a memory.
[007] In yet another implementation, non-transitory computer readable medium embodying a program executable in a computing device for preserving privacy of data residing in a database is disclosed. The program may comprise a program code for enabling an admin user to configure access privilege corresponding to each user of a plurality of users, wherein the access privilege is configured for accessing data stored in a database, a plurality of rules to be applied on a plurality of datasets in the data, wherein a rule may be applied based on the access privilege of a user responsible for executing a query on the database, and wherein the rule indicates presence of sensitive data in a dataset, and masking policy to be applied on the dataset based upon the rule. The program may further comprise a program code for receiving at least one query, from the user, to be executed on the database, wherein the at least one query may be received in order to retrieve the dataset. The program may further comprise a program code for analyzing the at least one query in order to determine presence of the sensitive data in the dataset based on the rule and contextual information associated with the at least one query. The program may further comprise a program code for transforming the sensitive data by applying the masking policy configured on the dataset based on the rule, thereby preserving privacy of data residing in the database.
BRIEF DESCRIPTION OF DRAWINGS
[008] The foregoing summary, as well as the following detailed description of preferred embodiments, are better understood when read in conjunction with the appended drawing. For the purpose of illustrating the invention, there is shown in the drawing an exemplary construction of the invention, however, the invention is not limited to the specific methods and system illustrated.
[009] The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
[0010] Figure 1 illustrates a network implementation of a system for preserving privacy of data residing in a database is shown, in accordance with an embodiment of the present disclosure.
[0011] Figure 2 illustrates the system, in accordance with an embodiment of the present disclosure.
[0012] Figure 3 illustrates a method for analyzing a query received, in accordance with an exemplary embodiment of the disclosure.
[0013] Figures 4, 5, and 6 illustrate a method for preserving the privacy of data residing in the database, in accordance with an embodiment of the present disclosure
DETAILED DESCRIPTION
[0014] Some embodiments of this disclosure, illustrating all its features, will now be discussed in detail. The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present disclosure, the exemplary, systems and methods are now described. The disclosed embodiments are merely exemplary of the disclosure, which may be embodied in various forms.
[0015] Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. However, one of ordinary skill in the art will readily recognize that the present disclosure is not intended to be limited to the embodiments illustrated, but is to be accorded the widest scope consistent with the principles and features described herein.
[0016] System(s) and Method(s) for preserving privacy of data residing in a database are disclosed. It may be understood that various users are accessing production database storing sensitive data such salaries of employees, address, contact number etc. which should not be disclosed to any user. In other words some datasets stored in the production database should be hidden in order to preserve the privacy, however usually gets accessed by a user when the user executes a query on the production database. In one aspect, the query is a Structured Query Language (SQL) query executed on the production database and the database is a Relational Database Management System (RDMS). Thus, in order to preserve the privacy of data residing in the production database, the present system enables an admin user to configure roles and privilege for a plurality of users responsible for accessing the database.
[0017] The present system and method enables the admin user to configure access privilege corresponding to each user of the plurality of users. The access privilege may be configured for accessing the data stored in the database. The admin user may further be enabled to configure a plurality of rules to be applied on a plurality of datasets of the data. It may be understood that the plurality of datasets is subset of the data stored in the database. In one aspect, a rule may indicate presence of sensitive data in a dataset, of the plurality of datasets, and the rule may be associated to a column, a table, or a schema. The admin user may further be enabled to configure masking policy to be applied on the dataset based upon the rule.
[0018] It may be understood that whenever at least one query is received from the user, considering the aforementioned configuration is being done by the admin user, the at least one query may be analyzed to determine presence of the sensitive data in the dataset retrieved upon executing the at least one query on the database. In one aspect, the at least one query may be analyzed by mapping each column, each table and each schema associated with the at least one query using a column list, a table list and a schema list extracted from the at least one query. After mapping, the rule, corresponding to the dataset retrieved from the column of the database, may be accessed. Upon accessing the rule, each column may be defined either as a sensitive column or a non-sensitive column based upon the rule. Based upon defining of the sensitive column, the query may be designated as a sensitive query in order to determine the presence of the sensitive data in the dataset retrieved upon executing the query on the database. In one aspect, the method(s) and the system(s) may facilitate identifying sensitive data in the dataset. The identification of the sensitive data may depend on type of query and the contextual information associated with the query.
[0019] After the detection of the sensitive data, the sensitive data may be transformed by applying the masking policy on the dataset thereby preserving the privacy of the data residing in the database. In one aspect, the masking policy applied on the dataset is one of a complete masking policy and a partial masking policy. Examples of the masking policy may include, but not limited to, a Substitution policy, a Retention policy, a Fixed Replacement policy, and a Random policy. Thus, in this manner, the dataset may be masked to hide the sensitive data from the plurality of users unauthorized to access the data retrieved upon executing the query on the database.
[0020] While aspects of described system and method for preserving the privacy of data residing in a database may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.
[0021] Referring now to Figure 1, a network implementation 100 of a system, hereinafter referred to as a system 102, for preserving privacy of data residing in a database is disclosed. In one embodiment, the system 102 initially, enables an admin user to configure access privilege corresponding to each user of a plurality of users. The access privilege may be configured for accessing data stored in a database. The admin user may further be enabled to configure a plurality of rules to be applied on a plurality of datasets in the data. In one aspect, a rule is applied based on the access privilege of a user responsible for executing a query on the database. The rule indicates presence of sensitive data in a dataset. The admin user may further be enabled to configure masking policy to be applied on the dataset based upon the rule. Upon configuration, the system 102 receives at least one query to be executed on the database. The at least one query may be received in order to retrieve the dataset. Once the at least one query is received, the system 102 analyzes the at least one query in order to determine presence of the sensitive data in the dataset based on the rule and contextual information associated with the at least one query. Subsequent to the analysis, the system 102 transforms the sensitive data by applying the masking policy configured on the dataset based on the rule, thereby preserving the privacy of the data residing in the database.
[0022] Although the present disclosure is explained considering that the system 102 is implemented on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, a cloud-based computing environment. It will be understood that the system 102 may be accessed by multiple users through one or more user devices 104-1, 104-2…104-N, collectively referred to as user 104 hereinafter, or applications residing on the user devices 104. In one implementation, the system 102 may comprise the cloud-based computing environment in which a user may operate individual computing systems configured to execute remotely located applications. Examples of the user devices 104 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 104 are communicatively coupled to the system 102 through a network 106.
[0023] In one implementation, the network 106 may be a wireless network, a wired network or a combination thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[0024] Referring now to Figure 2, the system 102 is illustrated in accordance with an embodiment of the present disclosure. In one embodiment, the system 102 may include at least one processor 202, an input/output (I/O) interface 204, and a memory 206. The at least one processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 202 is configured to fetch and execute computer-readable instructions stored in the memory 206.
[0025] The I/O interface 204 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 204 may allow the system 102 to interact with the user directly or through the client devices 104. Further, the I/O interface 204 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 204 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 204 may include one or more ports for connecting a number of devices to one another or to another server.
[0026] The memory 206 may include any computer-readable medium and computer program product known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 206 may include modules 208 and data 210.
[0027] The modules 208 include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the modules 208 may include a db access and sensitive rule configuration module 212, a query input module 214, a query analysis module 216, a sensitive/rule configuration intelligence module 217, a data transformation module 218, a query filtration module 220, a sensitive region finder module 222, and other modules 224. The other modules 224 may include programs or coded instructions that supplement applications and functions of the system 102. The modules 208 described herein may be implemented as software modules that may be executed in the cloud-based computing environment of the system 102.
[0028] The data 210, amongst other things, serves as a repository for storing data processed, received, and generated by one or more of the modules 208. The data 210 may also include a database 226 and other data 228. The other data 228 may include data generated as a result of the execution of one or more modules in the other modules 224.
[0029] In one implementation, at first, a user may use the client devices 104 to access the system 102 via the I/O interface 204. The user may register themselves using the I/O interface 204 in order to use the system 102. In one aspect, the user may accesses the I/O interface 204 of the system 102 for preserving privacy of data residing in a database 226. In order to preserve the privacy, the system 102 may employ the plurality of modules i.e. the db access and sensitive rule configuration module 212, the query input module 214, the query analysis module 216, the sensitive/rule configuration intelligence module 217, the data transformation 218, the query filtration module 220, the sensitive region finder module 222. The detailed working of the plurality of modules is described below.
[0030] Further referring to figure 2, it may be understood that various users are accessing production database storing sensitive data such salaries of employees, address, and contact number etc. In other words some dataset stored in the production database should be hidden in order to preserve the privacy, however usually gets accessed by a user when the user executes a query on the production database. In one aspect, the query is a Structured Query Language (SQL) query executed on the database 226 and the database is a Relational Database Management System (RDMS). In order to preserve the privacy of the data, initially, the db access and sensitive rule configuration module 212 facilitates to establish connection to the database 226 for which the system 102 acts as the proxy. In one aspect, the db access and sensitive rule configuration module 212 further specifies number of connections to be connected with the database 226 in a connection pool so that the number of connections specified may be reused when future requests for establishing the connection with the database 226 are required.
[0031] In order to establish the connection with the database 226, initially, the db access and sensitive rule configuration module 212 enables an admin user to configure access privilege for each user of a plurality of users. The access privilege may be configured for accessing data stored in the database 226. It may be understood that the plurality of users may have different privileges when it comes to accessing the database 226. In one embodiment, some users, of the plurality of users, may be assigned with a privilege to access entire data present in the database 226 whereas some other users, of the plurality of users, may be may be assigned with a privilege to access partial data present in the database 226. It may be understood that the data may be accessed upon executing the query on the database 226. In another embodiment, one or more users, of the plurality of users, may also be denied to access the data present in the database 226. In yet another embodiment, the db access and sensitive rule configuration module 212 groups one or more users, of the plurality users, responsible for executing the query grouped under a specific category of access privilege.
[0032] The db access and sensitive rule configuration module 212 may further enable the admin user to configure a plurality of rules to be applied on a plurality of datasets in the data. In one aspect, a rule, of plurality of rules, may be applied based on the access privilege of a user responsible for executing the query on the database 226. The rule indicates presence of sensitive data in a dataset of the plurality of datasets. It may be understood that the dataset is subset of the data stored in the database 226. In one embodiment, the rule may be associated to a column, a table, or a schema, wherein the column, the table, and the schema are associated with the query. It may be understood that the rule may be applied on response received upon executing the query on the database 226. It may be understood that when the one or more users grouped under the specific category of access privilege, executes the query on the database 226, the rule may be applied on the dataset of the data retrieved in order to preserve the privacy of the data residing in the database 226. In one embodiment, the privacy of the data may be preserved by applying a masking policy on the dataset. In order to apply the masking policy, the db access and sensitive rule configuration module 212 may further enable the admin user to configure the masking policy to be applied on the dataset based upon the rule.
[0033] In addition to the aforementioned configuration, the db access and sensitive rule configuration module 212 may further enables the admin user to configure one or more filtering rules, wherein each filtering rule may restrict the query from execution. Upon configuration, the one or more filtering rules are then applied by the query filtration module 220 on the query thereby restricting the execution of the query and access to the database 226. In one aspect, whenever a new query is received for execution, the new query is validated against the one or more filtering rules. If none of the one or more filtering rules are configured for the new query, the new query is executed on the database 226. On the other hand, if at least one of the one or more filtering rules is configured for the new query, the new query is restricted from execution on the database 226.
[0034] Once the aforementioned configuration is being done by the admin user, at least one query is received from the user by the query input module 214 for execution. Upon receiving the at least one query, the at least one query may be analyzed by the query analysis module 216. In one aspect, the sensitive region finder module 222 determines the presence of the sensitive data in the dataset retrieved upon executing the at least one query on the database 226.
[0035] Referring to figure 3 is a flow diagram illustrating functioning of the query analysis module 216. In one aspect, the at least one query is analyzed by the query analysis module 216 in a plurality of steps as mentioned below. The query analysis module 216 reads the query received from the user (as indicated at step 302). Upon receiving the at least one query, the query analysis module 216 extracts a SELECTION list and a FROM list from the at least one query (as indicated at step 304 and 306 respectively). The query analysis module 216 further compiles a COLUMN list from the SELECTION list (as indicated at step 308). The query analysis module 216 further compiles a TABLE list from the FROM list and the SELECTION list (as indicated at step 310).
[0036] In the next step, the query analysis module 216 checks, if a schema is selected by the user (as indicated at step 312). In one embodiment, if the schema is not selected by the user (as indicated at step 314), the query analysis module 216 compiles a SCHEMA list from the database 226 and from the at least one query. On the other hand, if the schema is selected by the user (as indicated at step 316), the query analysis module 216 parses expression in the COLUMN list in order to find out actual columns. In one embodiment, the sensitive region finder module 222 further determines presence of the sensitive data in the dataset. Upon parsing the expressions in the COLUMN list, the query analysis module 216 maps column, table and schema associated with the at least one query with the column list, the table list and the schema list extracted from the at least one query(as indicated at step 318). Further the query analysis module 216 access the rule of the plurality of rules (as indicated at step 320) in order to define each column either as a sensitive column or a non-sensitive column (as indicated at step 322).
[0037] The at least one query is called sensitive if there exist tuple ((C, TN, SN), True) in the Query_sensitivity set. The mapping module performs the following:
[0038] Mapping_Col_C = (Col_C, Table_T, Schema_S) which means that Col_C is part of the table Table_T which is again part of the Schema_S.Query_sensitivity = Mappings × {True, False}
[0039] In one embodiment, the sensitivity of each column may be defined by:
If Mapping_Col_C is part of the rule specification
Then Mapping_Col_C is marked as sensitive.
Generate pair ((Col_C, Table_T, Schema_S), True).
Add generated pair in the Query_sensitivity set.
[0040] Subsequent to the defining of sensitivity corresponding to each column, the query analysis module 216 designates the at least one query as a sensitive query based upon the defining of the sensitive column in order to determine the presence of the sensitive data in the dataset retrieved upon executing the at least one query on the database (as indicated at step 224). Subsequent to the defining of the sensitive query, the sensitive region finder module 222 may facilitate identifying sensitive data in the dataset. The identification of the sensitive data may depend on type of query and the contextual information associated with the query.
[0041] In one embodiment, the sensitive region finder module 222 identifies the sensitive data in the result returned after the execution of the query which will be used by the data transformation module 218 in order to hide the sensitive data. .After the detection of the sensitive data, the transformation module 218 transforms the sensitive data by applying the masking policy on the dataset based on the rule thereby preserving the privacy of the data residing in the database 226. In one embodiment, the masking policy applied on the dataset is one of a complete masking policy and a partial masking policy. Examples of the masking policy may include, but not limited to, a Substitution policy, Retention policy, Fixed Replacement policy, and Random policy. In one example, when the masking policy applied is the Substitution policy, a specified number of digits or characters are replaced with masking character either from start or from end. When the masking policy applied is the Retention policy, other than the specified number of digits or characters is replaced with masking character either from start or from end. When the masking policy applied is the fixed policy, a column value is replaced with a fixed string. When the masking policy applied is the Random policy, a random number is generated and then the random number is replaced with a column value.
[0042] After applying the masking policy the sensitive data present in the dataset may be masked by at least one of the masking policy and then displayed to the user. It may be observed that the at least one query may sometime surpass the masking policy and thereby displaying the sensitive data to the user. In one embodiment, the at least one query surpasses the masking policy when the masking policy applied is the partial masking policy. In another embodiment, the at least one query may also surpasses the masking policy when the masking policy applied is the complete masking policy as shown in scenario 1 mentioned below. Some of the scenarios where the at least one query surpasses the partial masking policy or the complete masking policy are mentioned as below:
[0043] Scenario 1: Admin user has created complete/partial/random masking Technique on Account_number (Number Data type)
[0044] Table: Account
Account_id Account_number Account_type Address
1 234334433234 Savings Pune
2 898403930334 Current Rajkot, Gujrat
[0045] Steps:
[0046] 1) Assume a number (N) having length equals to length of the sensitive column. Consider ‘i’ is initialized with the length of the number (N) assumed.
[0047] 2) Subtract account_number value corresponding to the sensitive column from the number (N) to derive a resultant length. After subtracting the account_number value from the number (N), assigning the resultant length to ‘i’.
[0048] 3). Increment or decrement the ith digit of the number (N).
[0049] 4) Repeat step 2 and 3 until the resultant length becomes zero.
[0050] 5) Based on the above, account_number may be guessed.
[0051] Scenario 2: Admin has created partial masking technique on Account_type (varchar Data type)
[0052] 1) Consider masking technique: substitute first ‘n’ characters of the Account_type with ‘X’ in order to mask the Account_type. In other words, the first ‘n’ characters are substituted with ‘X’.
[0053] Steps:
[0054] i) Assume a string having length equals to ‘n’ characters of the Account_type.
[0055] ii) Using the query input module 214, fetch the sensitive column by concatenating the string with its values as prefix to unmask the Account_type.
[0056] iii) Unmasked characters will be the Account_type.
[0057] 2) Consider Masking Technique: substitute last ‘n’ characters of the Account_type with ‘X’ in order to mask the Account_type. In other words, the last ‘n’ characters are substituted with ‘X’.
[0058] Steps:
[0059] i) Assume a string having length equals to ‘n’ characters of the Account_type.
[0060] ii) Using the query input module 214, fetch the sensitive column by concatenating the string with its values as suffix to unmask the Account_type.
[0061] iii) Unmasked characters will be the Account_type
[0062] 3) Consider Masking Technique: retain last ‘n’ characters of the Account_type and mask remaining characters with ‘X’.
[0063] i) Consider length of the Account_type as ‘l’and the unmasked characters.
[0064] ii) Using the query input module 214, fetch ‘n’ characters of the Account_type starts from ‘n1’ and ends with ‘n2’ by using ‘substring’ function,
[0065] wherein ‘n1’ equals to i*n, and wherein ‘n2’ equals to j*n
[0066] where ‘i’ is 0,1,2,3,4… and where ‘j’ is 1,2,3,4…..
[0067] iii) Repeat step 2 until ‘n2’ is less than or equals to ‘l’.
[0068] Scenario 3) Arithmetic operations which break partial masking techniques
[0069] a) SQL Function: Add (This can be used to re-identify of sensitive data), Method: substitute first ‘n’ characters of the Account_number with ‘X’
[0070] Steps:
[0071] i) Consider length of the Account_number as L, unmasked digits of the Account_number is (UD), length of the unmasked digits is (UDL) and length of masked digits corresponding to the Account_number as (MDL)
[0072] ii) Consider a number (AN) such that UDL = L – MDL and AN = 102*L–UDL-1
[0073] iii) Using the query input module 214, fetch the Account_number by adding AN the Account_number.
[0074] iv) Unmasked digits will be the Account_number.
[0075] b) Function: Multiplication, Method: substitute last ‘n’ digits of the Account_number with ‘X’
[0076] i) Consider length of the Account_number as L, unmasked digits of the Account_number is (UD), length of the unmasked digits is (UDL) and length of masked digits corresponding to the Account_number as (MDL)
[0077] ii) Consider a number (AN) such that AN = 10MDL
[0078] iii) Using the query input module 214, fetch the Account_number by multiplying AN the Account_number.
[0079] iv) Unmasked digits will be the Account_number.
[0080] 3) Function: Modulo Method: retain first ‘n’ digits of the will be the Account_number and masked remaining digits of the Account_number with ‘X’
[0081] Steps:
[0082] i) Consider length of the Account_number as L, unmasked digits of the Account_number is (UD), length of the unmasked digits is (UDL) and length of masked digits corresponding to the Account_number as (MDL)
[0083] ii) Consider a number (AN) such that AN = 10n, n = (L - UDL * i) - 1
[0084] where i = 0, 1, 2, 3… n
[0085] iii) Using the query input module 214, fetch the Account_number by implementing modulo with AN. It may be understood that upon fetching the Account_number, one or more digits corresponding to the Account_number may be unmasked.
[0086] iv) Repeat step 2 and step 3 till n > L
[0087] v) Concatenate each unmasked digit which is the Account_number.
[0088] 4) Function: Division, subtraction Method: retain last 4 X
[0089] Steps:
[0090] i) Consider length of the Account_number as L, unmasked digits of the Account_number is (MUD), length of the unmasked digits is (UDL) and length of masked digits corresponding to the Account_number as (MDL)
[0091] ii) Consider number AN such that MUD = 0, AN = 10n, n = (UDL * i)
[0092] where i = 0, 1, 2, 3……
[0093] iii) Using the query input module 214, fetch the account_number by subtracting the account_number from MUD to derive a subtracted account_number. Upon subtracting, dividing the subtracted account_number by AN to derive UD. It may be understood that upon fetching the Account_number, one or more digits corresponding to the Account_number may be unmasked such that MUD = UD * 10n + MUD
[0094] iv) Repeat step ii and step iii till L > n
[0095] v) MUD will be the Account_number.
[0096] In order to avoid the aforementioned scenarios, the sensitive/rule configuration intelligence module 217 generates one or more queries from the at least one query. After generating the one or more queries, one or more datasets retrieved upon executing each of the one or more queries may be previewed by the user. Further, the sensitive/rule configuration intelligence module 217 identifies at least one sub-query, from the one or more sub-queries, for which the dataset retrieved breaches the partial masking policy. The at least one query may breach the masking policy based on presence of one or more parameters in the at least one query. The one or more parameters may include, but not limited to, concat, replace, substring, sum, avg, min, max, count, addition, subtraction, multiplication and modulo. Based on the identification of the at least one query, the db access and sensitive rule configuration module 212 further enables the admin user to re-configure the masking policy, to the complete masking policy, to be applied on the dataset so that the dataset retrieved may be completely masked. Thus, in this manner, the system 102 facilitates to preserve the privacy of the data residing in the database 226.
[0097] In one embodiment, a skill calculator (not shown in the figure 2) uses a model with three parameters for dynamic assignment of activities. The parameters used by the model comprise consideration of a successful operation, consideration of a failed operation and consideration for a complexity of the operation.
[0098] Consideration of the successful operations parameter considers the number of the successful operations executed by the user, consideration of the failure operations parameters considers the number of the failed operations executed by the user and consideration for the complexity of the query considers the complexity of the query.
[0099] The skill calculator is configured to assign one or more activities to users and constraining the activities of the users in order to prevent harmful operations on the database.
[00100] While assigning one or more activities to users, Category of the users is considered. The Category is a set of users and the category p ( ) is a set of users with skill p. The system 100 further explains a category of the scheme.
[00101] After the activities are assigned to users, score calculation function (S ) is invoked. While calculating score calculating function, the system 102 considers the points contributed by the successful operations and failure operations. The system 102 applies either upgrade function or downgrade function depending on the score assigned after calculation of S .
[00102] S : Complexity_level * Successful_op – Complexity_level * Failed_op
[00103] Upgrade function (U ): upgrade the user ( ) from the Category to Category where p Max_Threshold ( ) (score( ): This will return the score assigned to the user )Downgrade function (D ):downgrades the user ( ) from the Category to the Category where qMin_Threshold ( (score( ): This will return the score assigned to the user ).
[00104] Referring now to Figure 4, a method 400 for preserving privacy of data residing in a database is shown, in accordance with an embodiment of the present disclosure. The method 400 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 400 may be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
[00105] The order in which the method 400 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 400 or alternate methods. Additionally, individual blocks may be deleted from the method 400 without departing from the spirit and scope of the disclosure described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method 400 may be considered to be implemented in the above described in the system 102.
[00106] At block 402, an admin user may be enabled to configure access privilege corresponding to each user of a plurality of users. In one aspect, the access privilege may be configured for accessing data stored in a database. The admin user may further be enabled to configure a plurality of rules to be applied on a plurality of datasets in the data. In one aspect, a rule may be applied based on the access privilege of a user responsible for executing a query on the database. The rule indicates presence of sensitive data in a dataset. The admin user may further be enabled to configure masking policy to be applied on the dataset based upon the rule. In one implementation, the admin user may be enabled by the db access and sensitive rule configuration module 212.
[00107] At block 404, at least one query may be received, from the user, to be executed on the database, wherein the at least one query is received in order to retrieve the dataset. In one implementation, the at least one query may be received by the query input module 214.
[00108] At block 406, the at least one query may be analyzed in order to determine presence of the sensitive data in the dataset based on the rule and contextual information associated with the at least one query. In one implementation, the at least one query may be analyzed by the query analysis module 216.
[00109] At block 408, the sensitive data may be transformed by applying the masking policy configured on the dataset based on the rule thereby preserving privacy of data residing in the database. In one implementation, the sensitive data may be transformed by the data transformation module 218.
[00110] Referring now to Figure 5, a method 406 for analyzing the at least one query in order to determine presence of the sensitive data is shown, in accordance with an embodiment of the present subject matter.
[00111] At block 502, column, table and schema associated with the at least one query may be mapped with the column list, the table list and the schema list extracted from the at least one query. In one implementation, the column, the table and the schema may be mapped by the query analysis module 216.
[00112] At block 504, the rule of the plurality of rules may be accessed. In one implementation, the rule may be accessed by the sensitive region finder module 222 or by the query analysis module 216.
[00113] At block 506, each column may be defined either as a sensitive column or a non-sensitive column based upon the rule. In one implementation, each column may be defined by the query analysis module 216.
[00114] At block 508, the at least one query may be designated as a sensitive query based upon the defining of the sensitive column in order to determine the presence of the sensitive data in the dataset retrieved upon executing the at least one query on the database. In one implementation, the at least one query may be designated by the query analysis module 216.
[00115] Referring now to Figure 6, a method 408 for transforming the sensitive data by applying a partial masking policy is shown, in accordance with an embodiment of the present subject matter.
[00116] At block 602, one or more queries may be generated from the query. In one implementation, the one or more queries may be generated by the sensitive/rule configuration intelligence 217.
[00117] At block 604, one or more datasets retrieved upon executing each of the one or more queries may be previewed. In one implementation, the one or more datasets may be previewed by the sensitive/rule configuration intelligence 217.
[00118] At block 606, at least one query, from the one or more queries may be identified for which a dataset retrieved breaches the masking policy. In one aspect, the at least one query breach the partial masking policy based on presence of one or more parameters in the at least one query. In one implementation, the at least one query may be identified by the sensitive/rule configuration intelligence 217.
[00119] At block 608, the admin user may be enabled to re-configure the masking policy, to a complete masking policy, to be applied on the dataset. In one implementation, the admin user may be enabled to re-configure the masking policy by the sensitive/rule configuration intelligence 217.
[00120] Although implementations for methods and systems for preserving privacy of data residing in a database have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for preserving the privacy of the data.
[00121] Exemplary embodiments discussed above may provide certain advantages. Though not required to practice aspects of the disclosure, these advantages may include those provided by the following features.
[00122] Some embodiments enable a system and a method to define access privileges to the database for a plurality thereby restricting one or more users from accessing sensitive dataset present in a database.
[00123] Some embodiments enable a system and a method to define one or more filtering rules associated to the access privileges in order to restrict the execution of at least one query capable of retrieving the sensitive dataset present in the database.
[00124] Some embodiments enable a system and a method to break a query into one or more queries in order to identify at least one query for which a dataset retrieved breaches the partial masking policy. ,CLAIMS:WE CLAIM:
1. A method for preserving privacy of data residing in a database, the method comprising:
enabling, by a processor, an admin user to configure:
access privilege corresponding to each user of a plurality of users, wherein the access privilege is configured for accessing data stored in a database,
a plurality of rules to be applied on a plurality of datasets in the data, wherein a rule is applied based on the access privilege of a user responsible for executing a query on the database, and wherein the rule indicates presence of sensitive data in a dataset, and
masking policy to be applied on the dataset based upon the rule;
receiving, by the processor, at least one query, from the user, to be executed on the database, wherein the at least one query is received in order to retrieve the dataset;
analyzing, by the processor, the at least one query in order to determine presence of the sensitive data in the dataset based on the rule and contextual information associated with the at least one query; and
transforming, by the processor, the sensitive data by applying the masking policy on the dataset based on the rule, thereby preserving privacy of data residing in the database.
2. The method of claim 1, wherein the at least one query is a Structured Query Language (SQL) query executed on the database, and wherein the database is a Relational Database Management System (RDMS).
3. The method of claim 1, wherein the rule is associated to a column, a table, or a schema.
4. The method of claim 1, wherein the admin user is further enabled to configure one or more filtering rules, and wherein the one or more filtering rules facilitate restricting execution of the at least one query and access to the database for at least one user.
5. The method of claim 1, wherein the masking policy is at least one of a Substitution policy, Retention policy, Fixed Replacement policy, and Random policy.
6. The method of claim 1 further comprising facilitating the admin user to preview the sensitive data transformed by applying the at least one masking policy.
7. The method of claim 1, wherein the masking policy applied on the dataset is one of a complete masking policy and a partial masking policy.
8. The method of claim 7, wherein when the masking policy applied is the partial masking policy or the complete masking,
generating one or more sub-queries from the query;
previewing one or more datasets retrieved upon executing each of the one or more sub-queries;
identifying at least one sub-query, from the one or more sub-queries, for which a dataset retrieved breaches the partial masking policy, wherein the at least one sub-query breach the partial masking policy based on presence of one or more parameters in the at least one sub-query; and
enabling the admin user to re-configure the masking policy, to the complete masking policy, to be applied on the dataset.
9. The method of claim 8, wherein the one or more parameters comprises concat, replace, substring, sum, avg, min, max, count, addition, subtraction, multiplication and modulo.
10. The method of claim 1 wherein the wherein the contextual information comprises, type of the query, order of the query, regular expression, primary key range and a SQL clause.
11. The method of claim 1, wherein the analyzing further comprises:
mapping the column, the table and the schema associated with the at least one query with a column list, a table list and a schema list extracted from the at least one query,
accessing the rule of the plurality of rules,
defining each column either as a sensitive column or a non-sensitive column based upon the rule, and
designating the at least one query as a sensitive query based upon the defining of the sensitive column in order to determine the presence of the sensitive data in the dataset retrieved upon executing the at least one query on the database.
12. A system for preserving privacy of data residing in a database, the system comprising:
a processor; and
a memory coupled to the processor, wherein the processor is capable of executing a plurality of modules stored in the memory, and wherein the plurality of modules comprising:
a db access and sensitive rule configuration module for enabling an admin user to configure:
access privilege corresponding to each user of a plurality of users, wherein the access privilege is configured for accessing data stored in a database,
a plurality of rules to be applied on a plurality of datasets in the data, wherein a rule is applied based on the access privilege of a user responsible for executing a query on the database, and wherein the rule indicates presence of sensitive data in a dataset, and
masking policy to be applied on the dataset based upon the rule;
a query input module for receiving at least one query, from the user, to be executed on the database, wherein the at least one query is received in order to retrieve the dataset;
a query analysis module for analyzing the at least one query in order to determine presence of the sensitive data in the dataset based on the rule and contextual information associated with the at least one query; and
a data transformation module for transforming the sensitive data by applying the masking policy on the dataset based on the rule, thereby preserving privacy of data residing in the database.
13. The system of claim 12, wherein the db access and sensitive rule configuration module further enables the admin user to configure one or more filtering rules, and wherein the one or more filtering rules are associated to the access privilege and the at least one query, and wherein the one or more filtering rules facilitate to restrict the execution of the at least one query.
14. The system of claim 12, wherein the query analysis module further comprising a sensitive region finder module for determining the presence of the sensitive data in the dataset, wherein the presence of the sensitive data is determined by:
mapping the column, the table and the schema associated with the at least one query with a column list, a table list and a schema list extracted from the at least one query,
accessing the rule of the plurality of rules,
defining each column either as a sensitive column or a non-sensitive column based upon the rule, and
designating the at least one query as a sensitive query based upon the defining of the sensitive column in order to determine the presence of the sensitive data in the dataset retrieved upon executing the at least one query on the database.
15. A non-transitory computer readable medium embodying a program executable in a computing device for preserving privacy of data residing in a database, the program comprising a program code:
a program code for enabling an admin user to configure:
access privilege corresponding to each user of a plurality of users, wherein the access privilege is configured for accessing data stored in a database,
a plurality of rules to be applied on a plurality of datasets in the data, wherein a rule is applied based on the access privilege of a user responsible for executing a query on the database, and wherein the rule indicates presence of sensitive data in a dataset, and
masking policy to be applied on the dataset based upon the rule;
a program code for receiving at least one query, from the user, to be executed on the database, wherein the at least one query is received in order to retrieve the dataset;
a program code for analyzing the at least one query in order to determine presence of the sensitive data in the dataset based on the rule and contextual information associated with the at least one query; and
a program code for transforming the sensitive data by applying the masking policy configured on the dataset based on the rule, thereby preserving privacy of data residing in the database.
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 3724-MUM-2013-FORM 1(16-12-2013).pdf | 2013-12-16 |
| 1 | 3724-MUM-2013-IntimationOfGrant16-01-2023.pdf | 2023-01-16 |
| 2 | 3724-MUM-2013-PatentCertificate16-01-2023.pdf | 2023-01-16 |
| 2 | 3724-MUM-2013-CORRESPONDENCE(16-12-2013).pdf | 2013-12-16 |
| 3 | Form-2(Online).pdf | 2018-08-11 |
| 3 | 3724-MUM-2013-PETITION UNDER RULE 137 [12-12-2022(online)].pdf | 2022-12-12 |
| 4 | Form 2.pdf | 2018-08-11 |
| 4 | 3724-MUM-2013-RELEVANT DOCUMENTS [12-12-2022(online)].pdf | 2022-12-12 |
| 5 | Figure of Abstract.jpg | 2018-08-11 |
| 5 | 3724-MUM-2013-Written submissions and relevant documents [12-12-2022(online)].pdf | 2022-12-12 |
| 6 | 3724-MUM-2013-FORM 26(6-3-2014).pdf | 2018-08-11 |
| 6 | 3724-MUM-2013-Correspondence to notify the Controller [01-12-2022(online)].pdf | 2022-12-01 |
| 7 | 3724-MUM-2013-FORM-26 [01-12-2022(online)]-1.pdf | 2022-12-01 |
| 7 | 3724-MUM-2013-CORRESPONDENCE(6-3-2014).pdf | 2018-08-11 |
| 8 | 3724-MUM-2013-FORM-26 [01-12-2022(online)].pdf | 2022-12-01 |
| 8 | 3724-MUM-2013-FER.pdf | 2019-11-21 |
| 9 | 3724-MUM-2013-US(14)-HearingNotice-(HearingDate-07-12-2022).pdf | 2022-11-18 |
| 9 | 3724-MUM-2013-OTHERS [21-05-2020(online)].pdf | 2020-05-21 |
| 10 | 3724-MUM-2013-CLAIMS [21-05-2020(online)].pdf | 2020-05-21 |
| 10 | 3724-MUM-2013-FER_SER_REPLY [21-05-2020(online)].pdf | 2020-05-21 |
| 11 | 3724-MUM-2013-COMPLETE SPECIFICATION [21-05-2020(online)].pdf | 2020-05-21 |
| 11 | 3724-MUM-2013-DRAWING [21-05-2020(online)].pdf | 2020-05-21 |
| 12 | 3724-MUM-2013-COMPLETE SPECIFICATION [21-05-2020(online)].pdf | 2020-05-21 |
| 12 | 3724-MUM-2013-DRAWING [21-05-2020(online)].pdf | 2020-05-21 |
| 13 | 3724-MUM-2013-CLAIMS [21-05-2020(online)].pdf | 2020-05-21 |
| 13 | 3724-MUM-2013-FER_SER_REPLY [21-05-2020(online)].pdf | 2020-05-21 |
| 14 | 3724-MUM-2013-OTHERS [21-05-2020(online)].pdf | 2020-05-21 |
| 14 | 3724-MUM-2013-US(14)-HearingNotice-(HearingDate-07-12-2022).pdf | 2022-11-18 |
| 15 | 3724-MUM-2013-FER.pdf | 2019-11-21 |
| 15 | 3724-MUM-2013-FORM-26 [01-12-2022(online)].pdf | 2022-12-01 |
| 16 | 3724-MUM-2013-CORRESPONDENCE(6-3-2014).pdf | 2018-08-11 |
| 16 | 3724-MUM-2013-FORM-26 [01-12-2022(online)]-1.pdf | 2022-12-01 |
| 17 | 3724-MUM-2013-Correspondence to notify the Controller [01-12-2022(online)].pdf | 2022-12-01 |
| 17 | 3724-MUM-2013-FORM 26(6-3-2014).pdf | 2018-08-11 |
| 18 | 3724-MUM-2013-Written submissions and relevant documents [12-12-2022(online)].pdf | 2022-12-12 |
| 18 | Figure of Abstract.jpg | 2018-08-11 |
| 19 | Form 2.pdf | 2018-08-11 |
| 19 | 3724-MUM-2013-RELEVANT DOCUMENTS [12-12-2022(online)].pdf | 2022-12-12 |
| 20 | Form-2(Online).pdf | 2018-08-11 |
| 20 | 3724-MUM-2013-PETITION UNDER RULE 137 [12-12-2022(online)].pdf | 2022-12-12 |
| 21 | 3724-MUM-2013-PatentCertificate16-01-2023.pdf | 2023-01-16 |
| 21 | 3724-MUM-2013-CORRESPONDENCE(16-12-2013).pdf | 2013-12-16 |
| 22 | 3724-MUM-2013-IntimationOfGrant16-01-2023.pdf | 2023-01-16 |
| 22 | 3724-MUM-2013-FORM 1(16-12-2013).pdf | 2013-12-16 |
| 1 | searchstrattoupload_14-11-2019.pdf |