Sign In to Follow Application
View All Documents & Correspondence

A Data Anonymization System And Method Based On Contextual Identification Of Personally Identifiable Information

Abstract: The present invention provides a data anonymization system based on contextual identification of personally identifiable information to reduce loss in value of data, which uses a concept of relationship between varieties of datasets to identify how know personally identifiable datasets are connected to other datasets in specific contexts via an input module (1), a pre-processing module (2), an identification module (3), a processor (4) and a masking module (5).

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
22 March 2023
Publication Number
15/2023
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
bd@ipquad.com
Parent Application

Applicants

SAPIO ANALYTICA PRIVATE LIMITED
604 A, B WING, KALEDONIA, SAHAR ROAD, SAMBHAJI NAGAR, ANDHERI EAST, MUMBAI 400069

Inventors

1. ASHWIN SRIVASTAVA
Fern 1104, Arkade Earth, Kanjur Marg E, Mumbai 400042 (Landmark: Off JVLR
2. ARPIT PALOD
72, Panna Dhay Colony, Chittaurgarh, Chittorgarh, Rajasthan - 312001

Specification

Description:FIELD OF THE INVENTION
[0001] The present invention relates to a field of data anonymization. More particularly, the present invention is directed towards a data anonymization system and method based on contextual identification of personally identifiable information to reduce loss in value of data.
BACKGROUND OF THE INVENTION
[0002] Data anonymization refers to the process of removing personally identifiable information (PII) from a dataset in order to protect the privacy of individuals represented in that data. The goal of data anonymization is to make it impossible or difficult to identify specific individuals within a dataset.
[0003] There are various techniques used to anonymize data, such as masking or deleting certain fields that contain PII, replacing identifiable values with random ones, or aggregating data so that it is only presented in a generalized form. It's important to note that while anonymization can help protect privacy, it is not fool proof and may not provide complete protection against re-identification.
[0004] Data anonymization is particularly important in situations where sensitive or personal data is being used for research or analysis. In some cases, legal or ethical regulations may require data to be anonymized before it can be shared or used for certain purposes.
[0005] Data anonymization is needed to protect the privacy and security of individuals whose personal information is collected, processed, and stored by organizations. In many jurisdictions, there are laws and regulations that require organizations to protect the privacy of individuals by anonymizing their personal data. Failure to comply with these regulations can result in legal penalties and reputational damage.
[0006] Data breaches can occur when cybercriminals gain unauthorized access to sensitive personal information. Anonymizing data can reduce the risk of a data breach because it removes personally identifiable information that could be used to identify individuals. Anonymizing data is particularly important when dealing with sensitive information such as health records or financial data. This type of information can be used to discriminate against individuals or to commit identity theft.
[0007] Anonymized data can be shared with third parties for research or analysis without compromising the privacy of individuals. This can lead to increased collaboration and innovation. Overall, data anonymization is an important tool for protecting the privacy and security of personal information and promoting ethical and responsible data practices.
[0008] A data anonymization system is a tool that helps organizations anonymize personal data. These systems use various techniques to remove or obscure personally identifiable information (PII) from data sets, while still retaining the usefulness of the data for analysis or research purposes.
[0009] There are many different types of data anonymization systems available, ranging from simple scripts that replace PII with random values, to more sophisticated tools that use machine learning algorithms to identify and remove sensitive information.
[0010] The data anonymization system includes masking which involves replacing certain values in a data set with other values, such as replacing names with initials, or masking part of a social security number. Generalization, this technique involves aggregating data so that it is presented in a more generalized form. For example, instead of reporting an individual's exact age, data may be grouped into age ranges. Perturbation, this technique involves adding random noise to data values to make it more difficult to re-identify individuals. For example, adding a random number to a person's income. Data synthesis, this technique involves creating a new data set that is statistically similar to the original data set but does not contain any PII. Data anonymization systems can be used by a variety of organizations, including government agencies, healthcare providers, and financial institutions. It's important to note that while data anonymization can help protect privacy, it is not fool proof and may not provide complete protection against re-identification.
[0011] CN102542209B discloses about a data anonymization method and system. The data anonymization method comprises the steps of: carrying out text analysis on the attribute value of a text type in data; replacing the attribute value of the text type in the data with the attribute value of a value type or a class type according to text analysis result; and carrying out anonymization processing on the data in which the attribute value of the text type is replaced by the attribute value of the value type or the class type. According to the invention, after anonymization processing, the data comprising the attribute value of the text type not only can prevent the privacy leakage based on the attribute value, but also still has use value.
[0012] US9231920B1 discloses a method and system for anonymizing data to be transmitted to a destination computing device is disclosed. Anonymization strategy for data anonymization is provided. Data to be transmitted is received from a user computer. Selective anonymization of the data is performed, based on a selected anonymization strategy, using an anonymization module. Accent preservation of data is selected. An accent value for the data is determined. The anonymized data with the determined accent value and data indicative of the selected anonymization strategy is transmitted to the destination computing device over a network.
[0013] In addition to this, certain datasets may be used for identifying an individual, despite not being considered personally identifiable information in the conventional sense. This may happen because a particular dataset when combined with another dataset in a particular context may lead to extraction of information that is personally identifiable. Secondly, sometimes complete anonymization removes so much information, that the real value of the data gets reduced. After all, the purpose of sharing big data is to do analytics on the same and derive intelligent information.
[0014] Hence, it is necessary to have processes of anonymization that are able to retain the value of data while doing the anonymization while also ensuring that privacy is maintained even when specific contexts are known while analysing multiple sets of data. Such a process can allow sharing of useful data with significantly more confidence post anonymization. It is missing today.
[0015] There are several systems and methods available in the public domain which are related to the act of anonymization, but without taking into account the identification of hidden personally identifiable information. There are inventions with respect to identification of personally identifiable information as well, but they missed out on contextual information identification.
[0016] Therefore, there is a need of providing of a data anonymization system and method based on contextual identification of personally identifiable information to reduce loss in value of data.
OBJECTS OF THE INVENTION
[0017] The primary object of the present invention is to overcome the shortcomings of the prior art.
[0018] The present invention aims to provide a data anonymization system and method based on contextual identification of personally identifiable information.
[0019] The present invention aims to provide a data anonymization system and method to reduce value loss of data.
[0020] The present invention aims to provide a data anonymization system and method that uses the concept of relationship between different datasets to identify how known personally identifiable information datasets are connected to other datasets in specific contexts.
[0021] The present invention aims to provide a data anonymization system and method which include masking based on scoring of the level of personally identifiable information of data and the type of data.
SUMMARY OF THE INVENTION
[0022] The present invention pertains to data anonymization system and method based on contextual identification of personally identifiable information to reduce value loss of data by using the concept of relationship between different datasets to identify how known personally identifiable information datasets are connected to other datasets in specific contexts which also include masking based on scoring of the level of personally identifiable information of data and the type of data.
[0023] According to an embodiment, the present invention provides a data anonymization system based on contextual identification of personally identifiable information, comprising of, an input module for receiving an input context from an operator, a pre-processing module linked with said input module for extracting a list of available datasets that are mapped to said input context, an identification module connected with said pre-processing module, wherein said identification module is configured to extract s known personally identifiable information along with other miscellaneous personally identifiable information across a plurality of cells, a processor and a making module.
[0024] According to another embodiment, the present invention provides a data anonymization system based on contextual identification of personally identifiable information in which the processor is connected with said identification module for measuring a relationship between a plurality of columns across said available datasets and said processor is configured to, compute a depth of said relationship to produce a personally identifiable information score with respect to each column, identify a hidden contextual personally identifiable information which requires an action on anonymisation by processing said personally identifiable information score, evaluate an action of masking or redaction which has to be done on the basis of said score and a type of said input context, carry out an anonymisation to transform said input context into an anonymised data.
[0025] According to another embodiment, the present invention provides a data anonymization system based on contextual identification of personally identifiable information in which the masking module is linked with said processor for said action of masking or redaction, wherein said masking module is configured to extract said personally identifiable information and mapping to a set of actions, calculate a score of miscellaneous personally identifiable information, define a plurality of masking action via providing a set of instructions to perform said masking action.
[0026] According to another embodiment, the present invention provides a data anonymization method based on contextual identification of personally identifiable information include steps of, receiving an input context from an operator and extracting a list of available datasets that are mapped to said input context, extracting a known personally identifiable information along with other miscellaneous personally identifiable information across a plurality of cells, measuring a relationship between a plurality of columns across said available datasets and computing a depth of said relationship to produce a personally identifiable information score with respect to each column, identifying a hidden contextual personally identifiable information which requires an action on anonymisation by processing said personally identifiable information score, evaluating an action of masking or redaction which has to be done on the basis of said score and a type of said input context and carrying out an anonymisation to transform said input context into an anonymised data.
[0027] While the invention has been discussed and shown with particular emphasis on the preferred form, it should be obvious that other variants are feasible and would come within the scope of the present invention.
BRIEF DESCRIPTION OF THE DRAWING
[0028] These and other features, aspects, and advantages of the present invention will become better understood through the following description, appended claims, and accompanying drawings where:
[0029] Figure 1 depicts a block diagram of a data anonymization system based on contextual identification of personally identifiable information according to an embodiment of the present invention.
[0030] Figure 2 depicts a flow chart of the working of a making module according to an embodiment of the present invention.
[0031] Figure 3 depicts a flow chart of the data anonymization method based on contextual identification of personally identifiable information according to an embodiment of the present invention.
DESCRIPTION OF THE INVENTION
[0032] The following description includes the preferred best mode of one embodiment of the present invention. It will be clear from this description of the invention that the invention is not limited to these illustrated embodiments but that the invention also includes a variety of modifications and embodiments thereto. Therefore, the present description should be seen as illustrative and not limiting. While the invention is susceptible to various modifications and alternative constructions, it should be understood, that there is no intention to limit the invention to the specific form disclosed, but, on the contrary, the invention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention as defined in the claims.
[0033] In any embodiment described herein, the open-ended terms "comprising," "comprises,” and the like (which are synonymous with "including," "having” and "characterized by") may be replaced by the respective partially closed phrases "consisting essentially of," consists essentially of," and the like or the respective closed phrases "consisting of," "consists of, the like.
[0034] As used herein, the singular forms “a,” “an,” and “the” designate both the singular and the plural, unless expressly stated to designate the singular only.
[0035] The present invention pertains to data anonymization system and method based on contextual identification of personally identifiable information to reduce value loss of data by using the concept of relationship between different datasets to identify how known personally identifiable information datasets are connected to other datasets in specific contexts.
[0036] According to an embodiment, the present invention provides a data anonymization system based on contextual identification of personally identifiable information, comprising of, an input module for receiving an input context from an operator, a pre-processing module linked with said input module for extracting a list of available datasets that are mapped to said input context, an identification module connected with said pre-processing module, wherein said identification module is configured to extract s known personally identifiable information along with other miscellaneous personally identifiable information across a plurality of cells, a processor and a making module.
[0037] According to another embodiment, the present invention provides a data anonymization system based on contextual identification of personally identifiable information in which the processor is connected with said identification module for measuring a relationship between a plurality of columns across said available datasets and said processor is configured to, compute a depth of said relationship to produce a personally identifiable information score with respect to each column, identify a hidden contextual personally identifiable information which requires an action on anonymisation by processing said personally identifiable information score, evaluate an action of masking or redaction which has to be done on the basis of said score and a type of said input context, carry out an anonymisation to transform said input context into an anonymised data.
[0038] According to another embodiment, the present invention provides a data anonymization system based on contextual identification of personally identifiable information in which the masking module is linked with said processor for said action of masking or redaction, wherein said masking module is configured to extract said personally identifiable information and mapping to a set of actions, calculate a score of miscellaneous personally identifiable information, define a plurality of masking action via providing a set of instructions to perform said masking action.
[0039] According to another embodiment, the present invention provides a data anonymization method based on contextual identification of personally identifiable information include steps of, receiving an input context from an operator and extracting a list of available datasets that are mapped to said input context, extracting a known personally identifiable information along with other miscellaneous personally identifiable information across a plurality of cells, measuring a relationship between a plurality of columns across said available datasets and computing a depth of said relationship to produce a personally identifiable information score with respect to each column, identifying a hidden contextual personally identifiable information which requires an action on anonymisation by processing said personally identifiable information score, evaluating an action of masking or redaction which has to be done on the basis of said score and a type of said input context and carrying out an anonymisation to transform said input context into an anonymised data.
[0040] Referring to Figure 1, a block diagram of the data anonymization system based on contextual identification of personally identifiable information is depicted. The system comprises of an input module (1), a pre-processing module, an identification module, a processor (4) and a masking module (5).
[0041] The input module (1) is for receiving an input context from an operator, a pre-processing module (2) linked with said input module for extracting a list of available datasets that are mapped to said input context. The input module (1) includes but not limited to keyboard, mouse, scanner, audio.
[0042] The identification module (3) is connected to said pre-processing module (2), wherein said identification module is configured to extract s known personally identifiable information along with other miscellaneous personally identifiable information across a plurality of cells. The personally identifiable information includes but not limited to PAN card, Aadhar card. The miscellaneous personally identifiable information includes but not limited to banking information, address information.
[0043] The processor (4) is connected with said identification module (3) for measuring a relationship between a plurality of columns across said available datasets and said processor is configured to, compute a depth of said relationship to produce a personally identifiable information score with respect to each column, identify a hidden contextual personally identifiable information which requires an action on anonymisation by processing said personally identifiable information score, evaluate an action of masking or redaction which has to be done on the basis of said score and a type of said input context, carry out an anonymisation to transform said input context into an anonymised data.
[0044] The masking module (5) is linked with said processor (4) for said action of masking or redaction, wherein said masking module (5) is configured to extract said personally identifiable information and mapping to a set of actions, calculate a score of miscellaneous personally identifiable information, define a plurality of masking action via providing a set of instructions to perform said masking action as depicted in Figure 2.
[0045] The masking module (5), processing module (4) and identification module (3) are preferably controllers. The plurality of masking action includes but not limited to removal of data, partial removal of data, conversion of data into a range of data, replacing data by a random data value, replacing a data by a non-random data value. The system works in both wired and wireless mode.
[0046] Referring to Figure 3, a flow chart of the data anonymization method based on contextual identification of personally identifiable information is depicted. The include steps of:
[0047] Step 101: receiving an input context from an operator and extracting a list of available datasets that are mapped to said input context, extracting a known personally identifiable information along with other miscellaneous personally identifiable information across a plurality of cells.
[0048] Step 102: measuring a relationship between a plurality of columns across said available datasets and computing a depth of said relationship to produce a personally identifiable information score with respect to each column.
[0049] Step 103: identifying a hidden contextual personally identifiable information which requires an action on anonymisation by processing said personally identifiable information score.
[0050] Step 104: evaluating an action of masking or redaction which has to be done on the basis of said score and a type of said input context and carrying out an anonymisation to transform said input context into an anonymised data.
[0051] In addition to this, the personally identifiable information is identified by a tool and the miscellaneous personally identifiable information involves creation of libraries mapped to specific contexts for identification. The hidden personally identifiable information is extracted by tool that combine multiple datasets required for any context and identify a depth of relationship between each column and identify those columns that cross a threshold in relationship with multiple personally identifiable information columns. Scoring of the columns identified through hidden personally identifiable information extraction tools in terms of their potential to become personally identifiable information in any specific context. Further, the present invention uses a masking protocol that masks different columns of data in different ways based on a function of their personally identifiable information score and type of data.
[0052] The present invention is technically advanced due to use of the concept of relationship between different datasets to identify how known personally identifiable information datasets are connected to other datasets in specific contexts. The flow of the entire process, which starts with identification of generally known personally identifiable information and specific known personally identifiable information, and then leads to identification and scoring of other data closely connected to PII data, followed by masking based on the scoring and the type of data. The concept of masking based on scoring of the level of personally identifiable information of data and the type of data, thereby making anonymisation driven by artificial intelligence thus allowing a large number of ways of anonymising datasets.
[0053] The present invention also includes a machine-readable storage medium that may include read-only memory (ROM), random-access memory (RAM), magnetic disk storage media, optical storage media, flash-memory devices, and similar storage media. Further the present invention includes the database which is preferably a memory unit which is configured for storage and retrieval of input images and a pre-stored images and includes but not limited to, a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), a Programmable read-only memory (PROM), an Erasable Programmable read only memory (EPROM), an Electrically erasable programmable read only memory (EEPROM), a flash memory, and so forth and further to include or otherwise cover any type of the memory/database including known, related art, and/or later developed technologies.
[0054] The database of the present invention is not limited to above mentioned memory unit but also include a computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
[0055] A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
[0056] A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fibre-optic cable), or electrical signals transmitted through a wire.
[0057] The components of the present invention are connected via a network which may comprise copper transmission cables, optical transmission fibres, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
[0058] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative circuitry embodying the principals of the invention. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in machine readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
[0059] While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the submitted claims.
[0060] Although the field of the invention has been described herein with limited reference to specific embodiments, this description is not meant to be construed in a limiting sense. Various modifications of the disclosed embodiments, as well as alternative embodiments of the invention, will become apparent to persons skilled in the art upon reference to the description of the invention.
, Claims:I/ We Claim:
1. A data anonymization system based on contextual identification of personally identifiable information, comprising of:
an input module (1) for receiving an input context from an operator;
a pre-processing module (2) linked with said input module (1) for extracting a list of available datasets that are mapped to said input context;
an identification module (3) connected with said pre-processing module (2), wherein said identification module (3) is configured to extract s known personally identifiable information along with other miscellaneous personally identifiable information across a plurality of cells;
a processor (4) connected with said identification module (3) for measuring a relationship between a plurality of columns across said available datasets and said processor (4) is configured to:
compute a depth of said relationship to produce a personally identifiable information score with respect to each column;
identify a hidden contextual personally identifiable information which requires an action on anonymisation by processing said personally identifiable information score;
evaluate an action of masking or redaction which has to be done on the basis of said score and a type of said input context;
carry out an anonymisation to transform said input context into an anonymised data;
a masking module (5) linked with said processor (4) for said action of masking or redaction, wherein said masking module is configured to:
extract said personally identifiable information and mapping to a set of actions;
calculate a score of miscellaneous personally identifiable information;
define a plurality of masking action via providing a set of instructions to perform said masking action.
2. The data anonymization system based on contextual identification of personally identifiable information as claimed in claim 1, wherein said input module (1) include but not limited to a form that asks for context, that can be entered through keyboard, mouse, scanner, audio.
3. The data anonymization system based on contextual identification of personally identifiable information as claimed in claim 1, wherein said masking module, processing module and identification module (3) are preferably controllers.
4. The data anonymization system based on contextual identification of personally identifiable information as claimed in claim 1, wherein said known personally identifiable information include but not limited to PAN card, Aadhar card.
5. The data anonymization system based on contextual identification of personally identifiable information as claimed in claim 1, wherein said miscellaneous personally identifiable information include but not limited to banking information, address information.
6. The data anonymization system based on contextual identification of personally identifiable information as claimed in claim 1, wherein said plurality of masking action include but not limited to removal of data, partial removal of data, conversion of data into a range of data, replacing data by a random data value, replacing a data by a non-random data value.
7. The data anonymization system based on contextual identification of personally identifiable information as claimed in claim 1, wherein said system works in both wired and wireless mode.
8. A data anonymization method based on contextual identification of personally identifiable information include steps of:
receiving an input context from an operator and extracting a list of available datasets that are mapped to said input context;
extracting a known personally identifiable information along with other miscellaneous personally identifiable information across a plurality of cells;
measuring a relationship between a plurality of columns across said available datasets and computing a depth of said relationship to produce a personally identifiable information score with respect to each column;
identifying a hidden contextual personally identifiable information which requires an action on anonymisation by processing said personally identifiable information score;
evaluating an action of masking or redaction which has to be done on the basis of said score and a type of said input context;
carrying out an anonymisation to transform said input context into an anonymised data.
9. The data anonymization method based on contextual identification of personally identifiable information as claimed in claim 8, wherein said method is executed on a computing platform.

Documents

Application Documents

# Name Date
1 202321019880-STATEMENT OF UNDERTAKING (FORM 3) [22-03-2023(online)].pdf 2023-03-22
2 202321019880-REQUEST FOR EARLY PUBLICATION(FORM-9) [22-03-2023(online)].pdf 2023-03-22
3 202321019880-FORM FOR SMALL ENTITY(FORM-28) [22-03-2023(online)].pdf 2023-03-22
4 202321019880-FORM 1 [22-03-2023(online)].pdf 2023-03-22
5 202321019880-FIGURE OF ABSTRACT [22-03-2023(online)].pdf 2023-03-22
6 202321019880-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [22-03-2023(online)].pdf 2023-03-22
7 202321019880-DRAWINGS [22-03-2023(online)].pdf 2023-03-22
8 202321019880-DECLARATION OF INVENTORSHIP (FORM 5) [22-03-2023(online)].pdf 2023-03-22
9 202321019880-COMPLETE SPECIFICATION [22-03-2023(online)].pdf 2023-03-22
10 202321019880-Proof of Right [03-04-2023(online)].pdf 2023-04-03
11 202321019880-FORM-26 [03-04-2023(online)].pdf 2023-04-03
12 Abstract.jpg 2023-04-05