Sign In to Follow Application
View All Documents & Correspondence

Method And System For Generating Phonetically Similar Masked Data

Abstract: A method and system is provided for generating a group of phonetically similar masked data. The present application provides a method and system for generating a group of phonetically similar masked data; comprises preprocessing of input dataset values comprising a list of fictitious data values to be used as masked data; determining a plurality of groups of phonetically similar data values present in the dataset list; and deriving metaphone for each input data value to be masked; generating a first numeric code from derived metaphone value of input data value to be masked; selecting one group of phonetically similar data values out of the plurality of groups of phonetically similar data values based on the generated first numeric code; and generating a second numeric code from input data value for selecting a masked value from a plurality of fictitious data group.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
18 June 2016
Publication Number
51/2017
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
ip@legasis.in
Parent Application
Patent Number
Legal Status
Grant Date
2023-11-24
Renewal Date

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th Floor, Nariman Point, Mumbai-400021, Maharashtra, India

Inventors

1. MANDPE, Ashvini Sakharam
Tata Consultancy Services Limited Bldg.No.7 - (floors 4th - 7th), Commerzone Survey No. 144/145, Samrat Ashok Path,Off Airport Road,Yerwada,Pune-411006,Maharashtra,India
2. GHODESWAR, Rahul Krushna
Tata Consultancy Services Limited Bldg.No.7 - (floors 4th - 7th), Commerzone Survey No. 144/145, Samrat Ashok Path,Off Airport Road,Yerwada,Pune-411006,Maharashtra,India
3. ROY, Ashim
Tata Consultancy Services Limited Bldg.No.7 - (floors 4th - 7th), Commerzone Survey No. 144/145, Samrat Ashok Path,Off Airport Road,Yerwada,Pune-411006,Maharashtra,India

Specification

Claims:1. A method for generating a group of phonetically similar masked data; said method comprising processor implemented steps of:

a.preprocessing of input dataset values comprising a list of fictitious data values to be used as masked data; determining a plurality of groups of phonetically similar data values present in the dataset list; and deriving metaphone for each input data value to be masked, using a standard metaphone generator module (202);
b. generating a first numeric code from derived metaphone value of input data value to be masked, using a first numeric code generation module (204);
c. selecting one group of phonetically similar data values out of the plurality of groups of phonetically similar data values based on the generated first numeric code using a phonetically similar data values group selection module (206); and
d. generating a second numeric code from input data value for selecting a masked value from a plurality of fictitious data group using a second numeric code generation module (208).

2. The method as claimed in claim 1, further comprises of determining a group of masked output values from the plurality of fictitious data group by mapping the input data group to one of the data group in output.

3. The method as claimed in claim 2, wherein the group of masked output values are utilized to fetch a consistent masked value for input data.

4. The method as claimed in claim 2, wherein for a particular group the masked output values group is consistent irrespective of occurrences of groups.

5. The method as claimed in claim 1, wherein for a particular input data, the masked output values from said determined group is consistent irrespective of its occurrences.

6. A system (200) for generating a group of phonetically similar masked data; said system (200) comprising:

a. a processor;
b. a data bus coupled to said processor;
c. a computer-usable medium embodying computer code, said computer-usable medium being coupled to said data bus, said computer program code comprising instructions executable by said processor and configured for executing:

a standard metaphone generator module (202) adapted for preprocessing of input dataset values comprising a list of fictitious data values to be used as masked data; determining a plurality of groups of phonetically similar data values present in the dataset list; and deriving metaphone for each input data value to be masked;
a first numeric code generation module (204) adapted for generating a first numeric code from derived metaphone value of input data value to be masked;
a phonetically similar data values group selection module (206) adapted for selecting one group of phonetically similar data values out of the plurality of groups of phonetically similar data values based on the generated first numeric code; and
a second numeric code generation module (208) adapted for generating a second numeric code from input data value for selecting a masked value from a plurality of fictitious data group.
, Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
METHOD AND SYSTEM FOR GENERATING PHONETICALLY SIMILAR MASKED DATA

Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

The following specification particularly describes the invention and the manner in which it is to be performed.
FIELD OF THE INVENTION
[001] The present application generally relates to data privacy and data masking in particular. More particularly, the application provides a method and system for generating a group of phonetically similar masked data.

BACKGROUND OF THE INVENTION
[002] Data Masking is one of the essential requirement in areas where customer sensitive information needs to be protected from unauthorized access. In data masking, the customer sensitive data is replaced with fictitious values before being shared for testing activities. At the same time, the masked output should maintain data variations, data distribution, data privacy, look and feel of original data, data integrity, and data consistency for flawless data testing. Data may also contain phonetically similar words which may sound same but are spelt differently. For an example, often multiple variations of a person's name are observed in such data due to data entry from multiple sources within an enterprise.

[003] A majority of existing solutions relies on masking variations in input data with altogether different names. Resultant, the masked output will be unique, random or consistent. Some of the prior art literature vaguely describe about a masking system to mask phonetically similar data by replacing all the variants of input data with a same masked value; wherein the variance of original production data is removed by changing the data distribution post masking. However, prior art literature has never considering the variants of input data as a part of a single group, which are phonetically similar. In addition, prior art literature has never explored about masked output maintaining look and feel of original data including data variation by distributing the dataset value according to its phonetic properties.

[004] In addition, the prior art literature requires to maintains a list of words and their phonetic equivalent, thus for any new data, the mapping has to be added in the list before processing the data. However, prior art literature has never explored eliminating need to maintain the map of original data and its metaphone, wherein the metaphone are generated at runtime, thus removing the possibility of backward traceability of original data. Prior art literature also restricts existing data masking systems to be executed only on file and voice. However, prior art literature has never explored extending the same to different data sources, such as RDBMS database, mainframe files, common files, log files, pdf, doc, docx etc. Prior art literature is also silent on providing combination of metaphone generation and masking of phonetically similar words for maintaining the integrity and consistency of data enterprise wide.

[005] Prior art literature have illustrated various data masking tools and techniques, however, generating a group of phonetically similar masked data, wherein masked output maintains look and feel of original data including data variation is still considered as one of the biggest challenges of the technical domain.

OBJECTIVES OF THE INVENTION
[006] In accordance with the present invention, the primary objective is to provide a method and system for generating a group of phonetically similar masked data.

[007] Another objective of the invention is to provide a method and system for generating a group of phonetically similar masked data, wherein masked output maintains look and feel of original data including data variation.

[008] Another objective of the invention is to provide a method and system for preprocessing of input dataset values comprising a list of fictitious data values to be used as masked data; determining a plurality of groups of phonetically similar data values present in the dataset list; and deriving metaphone for each input data value to be masked.

[009] Another objective of the invention is to provide a method and system for generating a first numeric code from derived metaphone value of input data value to be masked.

[0010] Another objective of the invention is to provide a method and system for selecting one group of phonetically similar data values out of the plurality of groups of phonetically similar data values based on the generated first numeric code.

[0011] Another objective of the invention is to provide a method and system for generating a second numeric code from input data value for selecting a masked value from a plurality of fictitious data group.

[0012] Other objectives and advantages of the present invention will be more apparent from the following description when read in conjunction with the accompanying figures, which are not intended to limit the scope of the present disclosure.

SUMMARY OF THE INVENTION
[0013] Before the present methods, systems, and hardware enablement are described, it is to be understood that this invention is not limited to the particular systems, and methodologies described, as there can be multiple possible embodiments of the present invention which are not expressly illustrated in the present disclosure. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present invention which will be limited only by the appended claims.

[0014] The present application provides a method and system for generating a group of phonetically similar masked data.

[0015] The present application provides a computer implemented method for generating a group of phonetically similar masked data, wherein said method comprises processor implemented steps of preprocessing of input dataset values comprising a list of fictitious data values to be used as masked data; determining a plurality of groups of phonetically similar data values present in the dataset list; and deriving metaphone for each input data value to be masked; generating a first numeric code from derived metaphone value of input data value to be masked; selecting one group of phonetically similar data values out of the plurality of groups of phonetically similar data values based on the generated first numeric code; and generating a second numeric code from input data value for selecting a masked value from a plurality of fictitious data group.

[0016] The present application provides a system (200) for generating a group of phonetically similar masked data; said system (200) comprising a processor; a data bus coupled to said processor; a computer-usable medium embodying computer code, said computer-usable medium being coupled to said data bus, said computer program code comprising instructions executable by said processor and configured for executing a standard metaphone generator module (202) adapted for preprocessing of input dataset values comprising a list of fictitious data values to be used as masked data; determining a plurality of groups of phonetically similar data values present in the dataset list; and deriving metaphone for each input data value to be masked; a first numeric code generation module (204) adapted for generating a first numeric code from derived metaphone value of input data value to be masked; a phonetically similar data values group selection module (206) adapted for selecting one group of phonetically similar data values out of the plurality of groups of phonetically similar data values based on the generated first numeric code; and a second numeric code generation module (208) adapted for generating a second numeric code from input data value for selecting a masked value from a plurality of fictitious data group.

BRIEF DESCRIPTION OF THE DRAWINGS
[0017] The foregoing summary, as well as the following detailed description of preferred embodiments, are better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the drawings exemplary constructions of the invention; however, the invention is not limited to the specific methods and system disclosed. In the drawings:

[0018] Figure 1: shows a flow chart illustrating a method for generating a group of phonetically similar masked data; and

[0019] Figure 2: shows a block diagram illustrating system architecture for generating a group of phonetically similar masked data.

DETAILED DESCRIPTION OF THE INVENTION
[0020] Some embodiments of this invention, illustrating all its features, will now be discussed in detail.

[0021] The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items.

[0022] It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, the preferred, systems and methods are now described.

[0023] The disclosed embodiments are merely exemplary of the invention, which may be embodied in various forms.

[0024] The elements illustrated in the Figures inter-operate as explained in more detail below. Before setting forth the detailed explanation, however, it is noted that all of the discussion below, regardless of the particular implementation being described, is exemplary in nature, rather than limiting. For example, although selected aspects, features, or components of the implementations are depicted as being stored in memories, all or part of the systems and methods consistent with the attrition warning system and method may be stored on, distributed across, or read from other machine-readable media.

[0025] The techniques described above may be implemented in one or more computer programs executing on (or executable by) a programmable computer including any combination of any number of the following: a processor, a storage medium readable and/or writable by the processor (including, for example, volatile and non-volatile memory and/or storage elements), plurality of input units, and plurality of output devices. Program code may be applied to input entered using any of the plurality of input units to perform the functions described and to generate an output displayed upon any of the plurality of output devices.

[0026] Each computer program within the scope of the claims below may be implemented in any programming language, such as assembly language, machine language, a high-level procedural programming language, or an object-oriented programming language. The programming language may, for example, be a compiled or interpreted programming language. Each such computer program may be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a computer processor.

[0027] Method steps of the invention may be performed by one or more computer processors executing a program tangibly embodied on a computer-readable medium to perform functions of the invention by operating on input and generating output. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, the processor receives (reads) instructions and data from a memory (such as a read-only memory and/or a random access memory) and writes (stores) instructions and data to the memory. Storage devices suitable for tangibly embodying computer program instructions and data include, for example, all forms of non-volatile memory, such as semiconductor memory devices, including EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROMs. Any of the foregoing may be supplemented by, or incorporated in, specially-designed ASICs (application-specific integrated circuits) or FPGAs (Field-Programmable Gate Arrays). A computer can generally also receive (read) programs and data from, and write (store) programs and data to, a non-transitory computer-readable storage medium such as an internal disk (not shown) or a removable disk.

[0028] Any data disclosed herein may be implemented, for example, in one or more data structures tangibly stored on a non-transitory computer-readable medium. Embodiments of the invention may store such data in such data structure(s) and read such data from such data structure(s).

[0029] The present application provides a computer implemented method and system for generating a group of phonetically similar masked data.

[0030] Referring to Figure 1 is a flow chart illustrating a method for generating a group of phonetically similar masked data.

[0031] The process starts at step 102, input dataset values are preprocessed comprising a list of fictitious data values to be used as masked data; a plurality of groups of phonetically similar data values present in the dataset list is determined; and metaphone for each input data value to be masked is derived. At the step 104, a first numeric code is generated from derived metaphone value of input data value to be masked. At the step 106, one group of phonetically similar data values is selected out of the plurality of groups of phonetically similar data values based on the generated first numeric code. The process ends at the step 108, a second numeric code is generated from input data value for selecting a masked value from a plurality of fictitious data group.

[0032] Referring to Figure 2 is a block diagram illustrating system architecture for generating a group of phonetically similar masked data.

[0033] In an embodiment of the present invention, a system (200) is provided for generating a group of phonetically similar masked data.

[0034] The system (200) for generating a group of phonetically similar masked data comprising a processor; a data bus coupled to said processor; a computer-usable medium embodying computer code, said computer-usable medium being coupled to said data bus, said computer program code comprising instructions executable by said processor and configured for executing a standard metaphone generator module (202); a first numeric code generation module (204); a phonetically similar data values group selection module (206); and a second numeric code generation module (208).

[0035] In another embodiment of the present invention, said standard metaphone generator module (202) is adapted for preprocessing of input dataset values comprising a list of fictitious data values to be used as masked data; determining a plurality of groups of phonetically similar data values present in the dataset list; and deriving metaphone for each input data value to be masked.

[0036] In another embodiment of the present invention, the first numeric code generation module (204) adapted for generating a first numeric code from derived metaphone value of input data value to be masked.

[0037] In another embodiment of the present invention, the phonetically similar data values group selection module (206) adapted for selecting one group of phonetically similar data values out of the plurality of groups of phonetically similar data values based on the generated first numeric code.

[0038] In another embodiment of the present invention, the second numeric code generation module (208) adapted for generating a second numeric code from input data value for selecting a masked value from a plurality of fictitious data group.

[0039] In another embodiment of the present invention, the method for generating a group of phonetically similar masked data further comprises of determining a group of masked output values from the plurality of fictitious data group by mapping the input data group to one of the data group in output.

[0040] In another embodiment of the present invention, the method for generating a group of phonetically similar masked data comprises of determining the group of masked out values from a pool of fictitious dataset list, further the input data group is mapped to one of the data group in output. The group of masked output values are utilized to fetch a consistent masked value for input data. For a particular group the masked output values group is consistent irrespective of occurrences of groups. For a particular input data, the masked output values from said determined group is consistent irrespective of its occurrences.

[0041] The present invention provides the method and system for generating a group of phonetically similar masked data, which is a combination of metaphone generation and masking of phonetically similar words for maintaining the integrity and consistency of data enterprise wide. The present invention considers the variants of input data as a part of a single group, which are phonetically similar. Thus, the masked output retains real time characteristics of original data by replacing variants of input data with different values which will also be variants of masked data value. The present invention also eliminates need to maintain the map of original data and its metaphone, wherein the metaphone are generated at runtime, thus removing the possibility of backward traceability of original data. The present invention extends the method and system for generating a group of phonetically similar masked data to different data sources, such as RDBMS database, mainframe files, common files, log files, pdf, doc, docx etc.

Documents

Application Documents

# Name Date
1 201621020922-IntimationOfGrant24-11-2023.pdf 2023-11-24
1 Form 3 [18-06-2016(online)].pdf 2016-06-18
2 201621020922-PatentCertificate24-11-2023.pdf 2023-11-24
2 Form 20 [18-06-2016(online)].jpg 2016-06-18
3 Form 18 [18-06-2016(online)].pdf_102.pdf 2016-06-18
3 201621020922-Written submissions and relevant documents [06-03-2023(online)].pdf 2023-03-06
4 Form 18 [18-06-2016(online)].pdf 2016-06-18
4 201621020922-Correspondence to notify the Controller [17-02-2023(online)].pdf 2023-02-17
5 Drawing [18-06-2016(online)].pdf 2016-06-18
5 201621020922-FORM-26 [17-02-2023(online)]-1.pdf 2023-02-17
6 Description(Complete) [18-06-2016(online)].pdf 2016-06-18
6 201621020922-FORM-26 [17-02-2023(online)]-2.pdf 2023-02-17
7 Form 26 [03-08-2016(online)].pdf 2016-08-03
7 201621020922-FORM-26 [17-02-2023(online)].pdf 2023-02-17
8 Other Patent Document [04-08-2016(online)].pdf 2016-08-04
8 201621020922-US(14)-HearingNotice-(HearingDate-23-02-2023).pdf 2023-01-17
9 201621020922-CLAIMS [31-07-2020(online)].pdf 2020-07-31
9 REQUEST FOR CERTIFIED COPY [12-07-2017(online)].pdf 2017-07-12
10 201621020922-COMPLETE SPECIFICATION [31-07-2020(online)].pdf 2020-07-31
10 201621020922-CORRESPONDENCE(IPO)-(CERTIFIED LETTER)-(14-07-2017).pdf 2017-07-14
11 201621020922-FER_SER_REPLY [31-07-2020(online)].pdf 2020-07-31
11 201621020922-FORM 3 [13-10-2017(online)].pdf 2017-10-13
12 201621020922-OTHERS [31-07-2020(online)].pdf 2020-07-31
12 ABSTRACT1.jpg 2018-08-11
13 201621020922-FER.pdf 2020-01-31
13 201621020922-Power of Attorney-100816.pdf 2018-08-11
14 201621020922-Correspondence-100816.pdf 2018-08-11
14 201621020922-Form 1-100816.pdf 2018-08-11
15 201621020922-Correspondence-100816.pdf 2018-08-11
15 201621020922-Form 1-100816.pdf 2018-08-11
16 201621020922-FER.pdf 2020-01-31
16 201621020922-Power of Attorney-100816.pdf 2018-08-11
17 ABSTRACT1.jpg 2018-08-11
17 201621020922-OTHERS [31-07-2020(online)].pdf 2020-07-31
18 201621020922-FER_SER_REPLY [31-07-2020(online)].pdf 2020-07-31
18 201621020922-FORM 3 [13-10-2017(online)].pdf 2017-10-13
19 201621020922-COMPLETE SPECIFICATION [31-07-2020(online)].pdf 2020-07-31
19 201621020922-CORRESPONDENCE(IPO)-(CERTIFIED LETTER)-(14-07-2017).pdf 2017-07-14
20 201621020922-CLAIMS [31-07-2020(online)].pdf 2020-07-31
20 REQUEST FOR CERTIFIED COPY [12-07-2017(online)].pdf 2017-07-12
21 201621020922-US(14)-HearingNotice-(HearingDate-23-02-2023).pdf 2023-01-17
21 Other Patent Document [04-08-2016(online)].pdf 2016-08-04
22 201621020922-FORM-26 [17-02-2023(online)].pdf 2023-02-17
22 Form 26 [03-08-2016(online)].pdf 2016-08-03
23 201621020922-FORM-26 [17-02-2023(online)]-2.pdf 2023-02-17
23 Description(Complete) [18-06-2016(online)].pdf 2016-06-18
24 201621020922-FORM-26 [17-02-2023(online)]-1.pdf 2023-02-17
24 Drawing [18-06-2016(online)].pdf 2016-06-18
25 Form 18 [18-06-2016(online)].pdf 2016-06-18
25 201621020922-Correspondence to notify the Controller [17-02-2023(online)].pdf 2023-02-17
26 Form 18 [18-06-2016(online)].pdf_102.pdf 2016-06-18
26 201621020922-Written submissions and relevant documents [06-03-2023(online)].pdf 2023-03-06
27 Form 20 [18-06-2016(online)].jpg 2016-06-18
27 201621020922-PatentCertificate24-11-2023.pdf 2023-11-24
28 Form 3 [18-06-2016(online)].pdf 2016-06-18
28 201621020922-IntimationOfGrant24-11-2023.pdf 2023-11-24
29 201621020922-RENEWAL OF PATENTS [12-06-2025(online)].pdf 2025-06-12

Search Strategy

1 2020-01-2717-33-05_27-01-2020.pdf

ERegister / Renewals

3rd: 23 Feb 2024

From 18/06/2018 - To 18/06/2019

4th: 23 Feb 2024

From 18/06/2019 - To 18/06/2020

5th: 23 Feb 2024

From 18/06/2020 - To 18/06/2021

6th: 23 Feb 2024

From 18/06/2021 - To 18/06/2022

7th: 23 Feb 2024

From 18/06/2022 - To 18/06/2023

8th: 23 Feb 2024

From 18/06/2023 - To 18/06/2024

9th: 17 Jun 2024

From 18/06/2024 - To 18/06/2025

10th: 12 Jun 2025

From 18/06/2025 - To 18/06/2026