Sign In to Follow Application
View All Documents & Correspondence

Multiplicative Data Perturbation Technique Using Gaussian Noise For Categorical Data In Privacy Preserving Data Mining

Abstract: Nowadays, privacy is a big issue for everyone because the internet generates and shares vast amounts of data every day. Because typical privacy-preserving techniques do not protect sensitive data well enough, harmful assaults on sensitive data are possible. Traditional approaches may have several limitations owing to multiple attacks on sensitive data. Thus, data must be preserved before being shared with others. An efficient model should be built to maintain data without losing individual privacy. The typical data perturbation method only works with numerical data, whereas the new invention is meant to operate with categorical data. The Proposed Framework perturbs data for several columns at once. The suggested framework's performance is evaluated in two ways. Initially, it keeps the data mining model accurate. Second, it protects the original data privacy while minimizing data loss. 4 Claims & 1 Figure

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
28 November 2022
Publication Number
51/2022
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
ipfc@mlrinstitutions.ac.in
Parent Application

Applicants

MLR Institute of Technology
Laxman Reddy Avenue, Dundigal

Inventors

1. Dr Ajmeera Kiran
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043
2. Dr D Vasumathi
Department of Computer Science and Engineering, JNTUHCEH, JNTUH, Hyderabad
3. Dr. K Srinivas Rao
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043
4. Dr. E. Anupriya
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043
5. Dr.P. Subhashini
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043
6. Mr.M. Srinivasa Rao
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043
7. Mr P. Purushotham
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043
8. Mr S K Lokesh Naik
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043
9. Mr Mohammad Arshad
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043
10. Mr T Vinod
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043

Specification

Description:Field of Invention
The present invention relates to prevent information leakage in the field of data mining using Gaussian noise method. By choosing proper Privacy preserving techniques, the confidential information is secured while publishing data to unwanted third parties.
The Objectives of this Invention
By using traditional methods, results may fall into many drawbacks due to various attacks on sensitive information. Hence, there is a need to preserve the data before it being published to others. The primary objective of the proposed GNDP_C (Gaussian Noise based Data Perturbation for Categorical) invention which prevents personal data.
Background of the Invention
The volume of consumer data gathered by automated systems on the Internet has made privacy-preserving data mining a hot topic in recent years. The growth of electronic commerce on the Internet has resulted in massive storing of user transactional and personal data. Furthermore, developments in hardware technology have made it easier to track information about individuals through ordinary transactions. An example of a technique for safeguarding data privacy in a data mining application. The data records have varying levels of privacy. Condensed groups are created from data records based on privacy levels, and each condensed group has summary statistics. The summary statistics provide pseudo-data that can be used in data mining applications. The invention's ideas provide a novel framework for privacy-preserving data mining, where the privacy of each record varies greatly. In many real-world applications, distinct groups of people may have varying privacy requirements.

To construct privacy data protection algorithms with low data mining outcomes, The patent (US2015/8966648B2) Improved privacy preservation mechanisms for data mining are revealed. For example, a technique for maintaining data privacy in a data mining application includes the following steps/operations. The data records have varying levels of privacy. Condensed groups are created from data records based on privacy levels, and each condensed group has summary statistics. The summary statistics provide pseudo-data that can be used in data mining applications.
The disclosure presents a (US2020/10745694B2) computationally driven HTP microbial genomic engineering platform that blends molecular biology, automation, and advanced machine learning methods. This integrated platform uses HTP molecular tool sets to construct genetic design libraries based on scientific insight and iterative pattern recognition. The HTP genomic engineering platform is host-agnostic and can be used across taxa. The platform can regulate or improve any microbial host parameter.
In (US2015/9043250B2) described a method for protecting the privacy of data in a database with n entries. In one embodiment, the apparatus includes memory containing computer programme code configured to cause a processor to form a random matrix of dimension m by n, compress the dataset using the random matrix, form a pseudo inverse of the random matrix, and decompress the dataset using the pseudo inverse of the random matrix.

In (US2014/8650213B2), Elaborated distributed privacy-preserving data mining. A first entity in a distributed computing environment exchanges summary information with a second entity via a privacy-preserving data sharing protocol, maintaining the privacy of the summary information. The first entity can then harvest data from the second entity using the privacy-preserving data sharing mechanism. The first entity may get from the second entity, using the privacy-preserving data sharing protocol, information on the number of transactions that contain a certain itemset or that satisfy a given rule. Personal information is protected by divorcing it from user identification (US7475085B2). In this article, new strategies for privacy-preserving multidimensional data mining are revealed. For example, the following steps/operations generate at least one output data set from at least one input data set for use in data mining. A relevance coefficient is used to choose at least one relevant attribute from each input data set. The at least one output data set includes the at least one relevant attribute of the at least one input data set, as determined by the at least one relevance coefficient.
In (US2007/730242B2) Elaborated methods and apparatus for creating output data sets from input data sets for use in data mining are disclosed. First, data statistics are built from one or more input data sets. The result data set is then generated. The output data set differs from the input data set yet retains some correlations. The correlations may be between dimensions of a multidimensional input data collection. A considerable percentage of input data can be buried to increase the privacy of the data mining process. In this (US2010/0017870A1), current invention offers a method and system for mining continually generated data from network sensors used to monitor data communication in a computer network. Using privacy-preserving distributed data stream mining algorithms, the system computes global network threat statistics.

Summary of the Invention
The current invention may propose methods for privacy-preserving multidimensional data mining. The innovative technique may retain linkages between distinct dimensions in the new anodized data set. So, the new anodized data set retains implicit data increasing the amount of masked data in the input multidimensional data set improves user privacy. A larger number of records can be grouped into a single statistical group and homogenized. Statistically removing abnormalities from the original data set can improve classification accuracy.
Detailed Description of the Invention
Discreet With improved technologies, storage devices, and software, big traditional databases and real-time data are now widespread. Stock market, satellite weather, online transactions, internet traffic, communications, etc. are sources of real-time data. Traditional databases can be kept and accessed later, unlike real time data. Real time data is a stream of data that must be processed in real time and not saved. Both datasets are mined differently.

Privacy-preserving data mining sometimes uses data perturbation. It's useful for apps that need to export/publish sensitive data. This paper presents Additive Perturbation-based Privacy Preserving Data Mining (PPDM) to increase data accuracy without knowing individual value information. Randomly perturbing individual values before publishing preserves privacy.
This is the strategy. Algorithm divides original dataset D into k-record groups. Each group is 2-part. One is a group center randomly selected from the original dataset, while the other is (k-1) members found using k-1 nearest neighbors. These k records are eliminated before generating the following group. With a small locality, k records can be regenerated to retain covariance and distribution.

The above graphic depicts the used framework's privacy-preserving Geometric data perturbation technique. Figure highlights two steps of the procedure. The top left box of the picture is MOA (Massive Online Analysis) Generator or UCI repository, sources of real time/data stream or traditional data. Data stream generator captures and stores the data in the database. Dataset D is delivered to data mining systems like MOA, WEKA, Rapid Miner, R-Programming, Orange, etc. and classified. Then dataset D is updated using Multiplicative Data Perturbation to get dataset D'. Now, this classification technique is applied to dataset D'. Next, compare the dataset’s results.
Individual users may withhold data owing to privacy concerns. Thus, flawed analysis. Data mining demands precise input. Sensitive user data privacy must be respected. In this issue, we introduce Privacy-Preserving Data Mining (PPDM). In order to preserve personal data, privacy-preserving data mining uses large aggregate results. In order to protect an individual's sensitive data, data perturbation, randomization, and anonymization are widely used techniques. A novel privacy-preserving data mining architecture is built using three approaches. The GNDP C technique protects personal data. Individual sensitive information is retained by adding some noise (Gaussian Noise) to the original data. The suggested GNDP Categorical technique achieved 83.26 % accuracy for NB and 85.80 % for J48.
4 Claims & 1 Figure

Brief description of Drawing
In the figures which are illustrated exemplary embodiments of the invention.

Figure 1: Proposed GNDP_Categorical Scheme of Privacy Preserving Data Mining. , Claims:The scope of the invention is defined by the following claims:
Claims:
1. A system/method to prevent information leakage in the field of data mining using Gaussian noise method, said system/method comprising the steps of:
a) The system starts with MOA stream generator/UCI machine libraries (1), then it has the machine stream generator to remove the unwanted files (2), from that the required dataset is created (3).
b) Apply the classification (4) to filter out the original information, then we will apply the geometric data perturbation technique (5) to preserve the data (6).
2. As mentioned in claim 1, the invented system starts with UCI Machine libraries to collect all the data. Then this will give to Stream generator to filter the data’s as sensitive data.
3. As per claim 1, from the original dataset, we will apply the geometric data perturbation technique to enhance the privacy of the user data.
4. As mentioned in claim 1, after applying the proposed invention the accuracy of the data privacy will increase to 90.42%.

Documents

Application Documents

# Name Date
1 202241068290-COMPLETE SPECIFICATION [28-11-2022(online)].pdf 2022-11-28
1 202241068290-REQUEST FOR EARLY PUBLICATION(FORM-9) [28-11-2022(online)].pdf 2022-11-28
2 202241068290-DRAWINGS [28-11-2022(online)].pdf 2022-11-28
2 202241068290-FORM-9 [28-11-2022(online)].pdf 2022-11-28
3 202241068290-EDUCATIONAL INSTITUTION(S) [28-11-2022(online)].pdf 2022-11-28
3 202241068290-FORM FOR SMALL ENTITY(FORM-28) [28-11-2022(online)].pdf 2022-11-28
4 202241068290-EVIDENCE FOR REGISTRATION UNDER SSI [28-11-2022(online)].pdf 2022-11-28
4 202241068290-FORM FOR SMALL ENTITY [28-11-2022(online)].pdf 2022-11-28
5 202241068290-FORM 1 [28-11-2022(online)].pdf 2022-11-28
5 202241068290-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [28-11-2022(online)].pdf 2022-11-28
6 202241068290-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [28-11-2022(online)].pdf 2022-11-28
6 202241068290-FORM 1 [28-11-2022(online)].pdf 2022-11-28
7 202241068290-EVIDENCE FOR REGISTRATION UNDER SSI [28-11-2022(online)].pdf 2022-11-28
7 202241068290-FORM FOR SMALL ENTITY [28-11-2022(online)].pdf 2022-11-28
8 202241068290-EDUCATIONAL INSTITUTION(S) [28-11-2022(online)].pdf 2022-11-28
8 202241068290-FORM FOR SMALL ENTITY(FORM-28) [28-11-2022(online)].pdf 2022-11-28
9 202241068290-DRAWINGS [28-11-2022(online)].pdf 2022-11-28
9 202241068290-FORM-9 [28-11-2022(online)].pdf 2022-11-28
10 202241068290-REQUEST FOR EARLY PUBLICATION(FORM-9) [28-11-2022(online)].pdf 2022-11-28
10 202241068290-COMPLETE SPECIFICATION [28-11-2022(online)].pdf 2022-11-28