A Method Of Identifying One Or More Unique Users From A Plurality Of

< Back

A Method Of Identifying One Or More Unique Users From A Plurality Of Data Sources

Abstract: The present invention relates to a method of identifying unique user/s from a plurality of data sources. In one embodiment this is accomplished by receiving a plurality of users’ profile data in a system which are collected from various sources, categorizing the received each user data profile into multiple classification parameter category, clustering the received users’ data profile into homogenous groups based on the classification parameters to form one or more clusters of user data. Executing Uniquification logic and probability logic on each cluster in order to determine whether the user profile data is same or not, if the user data found to be same, an UUID is assigned to the user and will be updated to all the similar users who are having similar detailed and probability percentage will be updated. FIG. 3

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

05 December 2018

Publication Number

24/2020

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

bhaskar@ipexcel.com

Parent Application

Applicants

Emart Solutions India Pvt Ltd

#102, Mezzanine Floor, Triguna Icon, #21, Arekempanahalli, Hosur Road, Bengaluru, Karnataka

Inventors

1. Birendra Kumar Sahu

#102, Mezzanine Floor, Triguna Icon, #21, Arekempanahalli, Hosur Road, Bengaluru, Karnataka 560027.

2. Aditya Bhamidipaty

#102, Mezzanine Floor, Triguna Icon, #21, Arekempanahalli, Hosur Road, Bengaluru, Karnataka 560027.

Specification

Claims: A method of identifying one or more unique users from a plurality of data sources, the method comprising:
receiving a plurality of users’ profile data and associated metadata in a system, wherein the plurality of users’ profile data and the associated metadata is collected from the plurality of data sources comprising social media, campaign management system, leads management system, customer relationship management system and response management system;
classifying each of the plurality of users’ profile data into a class based on at least one of one or more high confident parameters, one or more medium confident parameters and one or more low confident parameters;
clustering the classified plurality of users’ profile data into homogenous groups such that each of the plurality of users’ profile data having similar attributes of the one or more high confident parameters, the one or more medium confident parameters and the one or more low confident parameters are kept in one group thereby forming a plurality of clusters of the classified plurality of the users’ profile data; and
executing Uniquification logic and probability logic in sequential iterations in each of the plurality of clusters of the classified plurality of user’s profile data,
wherein assigning an UUID (Universal Unique Identity) if user found same based on the percentage of probability attributed,
wherein updating UUID of the user to remaining users’ profile data having similar details and updating probability percentage of the classified one or more of user’s profile data in reaming one or more clusters; and
identifying one or more unique users based on the percentage of probability attributed to each user of the classified plurality of user’s profile data, wherein each of the one or more unique users identified having similar associated details.
The method as claimed in claim 1, wherein the one or more high confident parameters comprises an email ID, an AADHAR ID, a unique customer ID, a passport number, a Google+ account ID and a photograph.

The method as claimed in claim 1, wherein the one or more medium confident parameters comprises a contact number, a facsimile number, a Facebook ID, a Twitter ID, a LinkedIn ID, an account number, a date of birth, a user ID of website, a MAC ID, a device IP, An IP address and a cookie ID.

The method as claimed in claim 1, wherein the one or more low confident parameters comprises a name, an address and a location.
The method as claimed in claim 1, wherein the remaining clusters are updated after each iteration of the Uniquification logic and probability logic until the one or more users on all the clusters are uniquified.

The method as claimed in claim 1, further comprising:
creating a FH UUI for new user data profile, wherein the new user data profile is categorized in an appropriate cluster and executing the Uniquification logic and probability logic on the new user data profile.
The method as claimed in claim 1, wherein the probability logic includes a high probability and a low probability, wherein the high probability refers when the data is uniquified based on the high confident parameter category of the user data profile and the low probability refers when the data is uniquified based on the low or medium confident parameter category of the user data profile.

The method as claimed in claim 1, wherein if the user is uniquified with high probability based on the medium and low confident parameter category, the average probability is validated with the high confident parameter category to retrieve the total probability percentage.

The method as claimed in claim 1, wherein the confident parameters are identified from, if two users have one of the high confident parameter matching, same UUID is configured to be assigned to all the matched users with probability percentage return by probability algorithm, 80% when one parameter is matching, 90% when two parameters are matching, 95% when three parameters are matching and 99% when four parameters.

The method as claimed in claim 4, wherein if two users on which any of the two Medium Confident parameters are matching, same UUID is configured to be assigned to all the matched users with probability percentage return by probability algorithm, 70% when two parameters are matching, 75% when three parameters are matching, 80% when four parameters are matching.

The method as claimed in claim 4, wherein if any users on which any of the one medium confident parameter and the two low confident parameters are matching, same UUID is configured to be assigned to all the matched users with probability percentage return by probability algorithm.

The method as claimed in claim 4, wherein if any users on which any of the three low confident parameters are matching, same UUID is configured to be assigned to all the matched users with probability percentage return by probability algorithm.

The method as claimed in claim 4, wherein each classification parameter has own weights to calculate the probability

The method as claimed in claim 1, wherein the clustering for user profile Uniquification including, all the users with no NULL values for defined classification attributes in data table are configured to be filtered out before applying the clustering algorithm based on classification attributes.

The method as claimed in claim 9, wherein, if all the users which have values for high confident classification attributes are configured to be moved to cluster 1, wherein each cluster data stored in different temporary table, wherein, if all the users having values for medium confident classification attributes are configured to be moved to cluster 2 MINUS cluster 1, and wherein, if all the users having values for low confident classification attributes are configured to be moved to cluster 3 MINUS cluster 1 and cluster 2.

The method as claimed in claim 10, wherein the Uniquification logic and probability algorithm is configured to be executed in cluster1 and if the user found same, the UUID of first user will be updated to all the users who are similar detailed and probability percentage is configured to be updated and repeat the steps for all the cluster until the users on all the clusters are uniquified.

The method as claimed in claim 1, wherein classification logic comprises
1. Create clusters N0, N1, N2, N3, ...., Nn // where n is classification parameters and
// Ni is i th cluster
2. for all clusters
3. Do
4. Initialize Ni <- default initial entry for i th parameter
5. end loop;
6. mark each entry of input data
X1, X2, X3, ..., Xm as m different point in free space
7. for each of X1, X2, X3, ..., Xi //calculating distance of each point
8. Do // from each cluster
9. for each N0, N1, N2, N3, ....., Nj
10. Do
11. Calculate distance of each point from center as
D(x,c)= | di-cj| /2 // dj is distance vector of Xi point and Cj is mean centroid value
// D(x, c) is mean distance of Xi from Nj
12. Darray[k] <=: D(x,c)
13. End inner loop
14. For all entry of Darray
15. if(Darray[k] < Darray[k-1]) Then
16. T = k // cluster number
17. Dtemp = Darray[k]
18. Else
19. Dtemp = Dold
20. End If
21. End loop
22. CT = ?_(ii =1)^l Pii + Dtemp / (k+1) // new center of cluster NT
23. End outer loop
24. Return set of final clusters
25. End

The method as claimed in claim 1, wherein the uniquification logic executed herein comprises
1. Let h1, h2, h3, …, hn are high level parameters
// high level parames are those params which has high chance to be unique
2. For each row R1, R2, R3, ..., Ri of ununified clustered data
3. Do
4. For each of h1, h2, h3, …, hn
5. Do
6. If Rj [hk] = Ri [hk] then
7. Ri[parent] <= Rj [id] and Ri [duplicate] <= yes
8. CALL FUZZIFICATION
9. End If
10. Move pointer to next parameter
11. End inner loop
12. If Ri [duplicate] = No then
13. Ri [parent] <= Ri [id]
14. End if
15. Move pointer to next data row
16. End loop //end of high parameter check
17. Let M1, M2, M3, ..., Mx are medium level parameters
18. For each R1, R2, R3, …, Ry
19. Do
20. If Ry[duplicate] = No Then
21. For each of M1, M2, M3, ..., Mx
22. Do
23. If Rj [Mii] = Ri [Mii] Then
24. For Mii+1 to Mx Do
25. If Rj [ Mk] = Ri[Mk] Then
26. Ri [parent] <= Rj [id] and Ri [duplicate] <= yes
27. CALL FUZZIFICATION
28. End If
29. End loop
30. End if
31. End loop
32. End if
33. End loop // end of medium level uniquification
34. Let L1, L2, L3, ..., Lx are low level parameters
35. Follow step 17 - 31
36. Return unified set of data
37. End

The method as claimed in claim 13, wherein the FUZZIFICATION logic employed herein comprises
Input: Set of parent child data
Output: Probability value of similarity
1. Let W[x][y] = {C1, V1}, {C2, V2}, {C3, V3}, …, {Ci, Vi}
//Ci is ith column and Vi is ith value of similarity
2. For each of {C1, V1}, {C2, V2}, {C3, V3}, …, {Ci, Vi}
3. Do
4. If Ci not in {Ch1, Ch2, Ch3, …, Chn} Then //Chi is most high impact parameters
5. If child [Ci] = parent [Ci] Then
6. SCORE(next) <= SCORE(prev) + Vi // SCORE(prev) is score upto last parameter
7. End inner if
8. Else
9. If child [Ci] = parent [Ci] Then
10. SCORE(next) <= SCORE(prev) + Vi
11. Else If child [Ci]! = parent [Ci] Then
12. SCORE(next) <= SCORE(prev) - Vi //negative impact params not matched
13. End of inner if
14. End if
15. End loop
16. Probability <= (SCORE (final) /?_(ii =1)^n Vii )*100
17. Return Probability value
18. End
, Description:FIELD OF THE INVENTION
[0001] This invention relates generally to systems and methods for automated marketing and, more particularly, to a method and system of identifying unique user/s from a plurality of data sources.
BACKGROUND
[0002] In recent years social and business networking has become very popular as a way of allowing people to connect and communicate with each other and share both information about themselves and digital content through the internet and over great distances using websites/applications such as twitter, LinkedIn, Google++, Facebook, CRM, POS etc. Such websites/applications allow users to post information about their interests, fields and activities for others to read, and to read the information that others have posted. Thus, users are able to provide information to a large number of people by posting only once, rather than having to communicate directly with each other person with whom it is desired to share information. However, these types of sites are typically set up such that people must establish a connection with one another before they may view each other's information, so that communication and sharing information is generally limited to being between people who already know each other. Thus, while these sites allow people to find out about the fields, interests of people they already know, they do not allow to find other people based solely upon their common fields, interests.
[0003] A number of professional and social sites allow people to indicate fields, interests and provide various ways of matching up with strangers. However, these sites expressly include demographic data (name, mobile, email id, sex, age, race, marital status, education, employment, etc.) and physical attributes as part of their matching algorithms. In addition to this, it is believed that the scope and type of matching is limited, for example allowing a user only to designate one or more of a small number of very broad categories; for example, Yahoo Personals lists such categories as arts, family, travel, cooking, outdoor activities, playing sports, etc., Facebook lists of social, entertainment, awareness, etc. and LinkedIn to profession, business, etc. As a result, these sites as well as algorithms used are typically not specific enough to be well suited to finding people of common specific fields, interests, even ignoring the potential presumptions of further interest by their users.
[0004] Further, internet may make it easier in some ways to find people of similar fields, interests, for example by searching for an enterprise, business, club, organization or user forum but related to specified and certain fields only. Presently available marketing automation software, often gather large amounts of customer profile data, collected from different data sources like social medial, campaign management system, Leads management system, customer relationship management system and response management system. However, there is no method available for searching of individuals who have similar fields across the various platforms or websites. Moreover, traditional approaches are not more sufficient for preventing the reidentification of participants in large social/data science datasets.
[0005] Therefore, there is a need in the art for method of identifying one or more unique users from a plurality of data sources.
SUMMARY OF THE INVENTION
[0006] In accordance with one embodiment of the disclosure, method for identifying method of identifying one or more unique users from a plurality of data sources is provided. The method steps include receiving a plurality of users’ profile data and associated metadata in a system. The plurality of users’ profile data and the associated metadata is collected from the plurality of data sources comprising social media, campaign management system, leads management system, customer relationship management system and response management system. The method includes classifying each of the plurality of users’ profile data into a class based on at least one of one or more high confident parameters, one or more medium confident parameters and one or more low confident parameters. Method also includes clustering the classified plurality of users’ profile data into homogenous groups such that each of the plurality of users’ profile data having similar attributes of the one or more high confident parameters, the one or more medium confident parameters and the one or more low confident parameters are kept in one group thereby forming a plurality of clusters of the classified plurality of the users’ profile data. The method also includes executing Uniquification logic and probability logic in sequential iterations in each of the plurality of clusters of the classified plurality of user’s profile data. Finally, the method includes step of identifying one or more unique users based on the percentage of probability attributed to each user of the classified plurality of user’s profile data, wherein each of the one or more unique users identified having similar associated details. This innovation will create 3 levels of identity building called User Level of Interaction, Channel Level and Account Level. Each and every data point is tagged with Persona ID, Each Persona ID have mapped with Channel ID and FHAccountID based on Uniquification algorithm.
[0007] In a further embodiment of the present invention, the method further includes step of creating a FH UUI for new user data profile, wherein the new user data profile is categorized in an appropriate cluster and executing the Uniquification logic and probability logic on the new user data profile.
[0008] To further clarify the advantages and features of the present disclosure, a more particular description of the disclosure will follow by reference to specific embodiments thereof, which are illustrated in the appended figures. It is to be appreciated that these figures depict only typical embodiments of the disclosure and are therefore not to be considered limiting in scope. The disclosure will be described and explained with additional specificity and detail with the appended figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The disclosure will be described and explained with additional specificity and detail with the accompanying figures in which:

[0010] FIG. 1 illustrates a block diagram of a conceptual example of a process for creating an activity-based social network or data network;

[0011] FIG. 2 illustrates a customer profile user which are uniquified with positive probability and with negative probability, in accordance with one embodiment of the present invention;

[0012] FIG. 3 illustrates a flow chart of a method of identifying unique user/s from a multichannel marketing platform, according to one embodiment of the present invention;

[0013] FIG. 4 illustrates a block diagram of a computer system suitable for implementing method of FIG. 3, according to an embodiment; and
[0014] FIG. 5 illustrates a schematic diagram of a computer system suitable for implementing method of FIG. 3, according to an embodiment.
[0015] Further, those skilled in the art will appreciate that elements in the figures are illustrated for simplicity and may not have necessarily been drawn to scale. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the figures by conventional symbols, and the figures may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures with details that will be readily apparent to those skilled in the art having the benefit of the description herein.
DETAILED DESCRIPTION
[0016] For the purpose of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications in the illustrated system, and such further applications of the principles of the disclosure as would normally occur to those skilled in the art are to be construed as being within the scope of the present disclosure.
[0017] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more devices or sub-systems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other devices, sub-systems, elements, structures, components, additional devices, additional sub-systems, additional elements, additional structures or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.
[0018] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting.
[0019] In the following specification and the claims, reference will be made to a number of terms, which shall be defined to have the following meanings. The singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise.
[0020] Embodiments of the present disclosure relate to system and method for monitoring and managing waste in real time is provided.

[0021] FIG. 1 discloses a process for accumulating data or information from an activity-based out of social network or any other data network. In the example of FIG. 1, data is accumulated from a plurality of users who are all involved in an activity. The activity may include any appropriate physical or virtual activity or activities, as noted above. For example, activity may be an event, such as a meeting, a conference, a party, a sports game, or, simply, a group of people present at a location at a particular time. In other examples, activity may be virtual. For example, activity may include online group, e.g., gamers, bloggers, chatters, or the like, who are in electronic communication over a course of time. In other examples, activity may be a combination of physical and virtual.

[0022] In the example of FIG. 1, users in the activity where each are having a computing device. The computing device may be, or include, any of the computing devices not limited to mobile phones, laptops, desktops, etc. In this example, information is sent from each computing device to a system. The system may include one or more computer programs that are executable to identify participants in an activity based on information and who are all associated with the activity, and to output a portal (e.g., a Web page) (not shown in figure).

[0023] Information may be based, e.g., on data obtained from computing devices. In example implementations described below, each computing device may be or may include an application installed thereon. That app may communicate with system in order to enable creating and joining an activity-based social network. For example, participants in an activity may activate the apps on their respective computing devices. Following their activation, the apps may obtain information for use by system in creating an activity-based social network. Different types of information may be obtained and used, as described in more detail below. Likewise, different mechanisms may be used to obtain the information, as also described in more detail below. Similar actions may be taken to join an existing activity-based social network.

[0024] System may retrieve including data, or have metadata associated therewith, that relates to one or more attributes of the corresponding activity. For example, for a particular activity, the data may identify the location of the corresponding activity, the date of that activity, participants in the activity, information about the participants (e.g., their contact information, title, and the like), the subject of the activity, and so forth. The systems and processes described herein, and variations thereof, may be implemented in any appropriate electronic system, with any appropriate computing devices and computing equipment. An example of such a system is described above with respect to FIG. 1. Other systems, however, may be used.

[0025] FIG. 2 illustrates a customer profile user which are uniquified with positive probability and with Negative probability, in accordance with one embodiment of the present invention. As shown in figure, where STAT (*) -> Main customer profile user in which all Right-side cycles (customer profiles) are uniquified with positive probability so they are called unqualified users. All Left side cycles (customer profiles) are uniquified with Negative probability so they are called as related users
Color code -> Red uniquified with High probability
Brown uniquified with Medium probability
Green uniquified with Low probability

[0026] FIG. 3 illustrates a flow chart of a method of identifying unique user(s) from a multichannel marketing platform, according to one embodiment of the present invention.
[0027] The invention proposed for Uniquification logic, wherein the proposal is divided by clustering algorithm based on classification parameters and probability algorithm. The classification algorithm will divide the data in clusters for Uniquification and probability algorithm will tag each user for probability percent of Uniquification level. The present invention also proposes the visual representation to show the uniquified user details. All the user’s data captured from different sources were uniquified with high probability, which could be merged in Master card with source on where information has captured. One can view the master card details by clicking the STAR node, also can view all the cards for users who are uniquified by clicking the cycle node as shown in figure 2(a).

[0028] FIG. 3 illustrates a flow chart of a method for Uniquification and probability algorithm for user profiles and social science datasets.

[0029] The method (300) receives a plurality of users’ profile data in a system in step 310, wherein the plurality of users’ profile data and the associated metadata is collected from various sources including but not limited to social media, campaign management system, leads management system, customer relationship management system and response management system.

[0030] The method (300) classifies the received each of the plurality of users’ profile data into a class based on at least one of one or more high confident parameters, one or more medium confident parameters and one or more low confident parameters at step 320. The medium confident parameter category and low confident parameter category. The high confident parameter category including email ID, Aadhar ID, Unique customer id, passport no, Google+ Account ID and photo. The medium confident parameter category including mobile No., phone No., facsimile No., Facebook ID, Twitter ID, LinkedIn ID, account No., date of birth, User ID of website, Mac ID, device IP, IP Address, cookie ID and the low confident parameter category including name, address, location (area, city, country), etc. Classification based on 3 level of analysis is based on confident of classification parameters to identify if it is the same user. The user can configure the own classification parameters based on uniqueness of his system dynamically.

[0031] The logic to identify the same user from confident classification parameters as follows:
If two users on which any of one high confident parameter is matching, same UUID (Universally Unique Identifier) assigned to all the matched users with probability percentage return by probability algorithm. Let’s say 80% when one parameter is matching, 90% when two parameters are matching, 95% when three parameters are matching and 99% when four parameters.

[0032] If two users on which any of two medium confident parameters are matching, same UUID assigned to all the matched users with probability percent return by probability algorithm. Let’s say 70% when two parameters are matching, 75% when three parameters are matching, 80% when four parameters are matching.

[0033] If any users on which any of one medium confident parameter and two low confident parameters are matching, same UUID assigned to all the matched users with probability percent return by probability algorithm.

[0034] If any users on which any of three low confident parameters are matching, same UUID assigned to all the matched users with probability percent return by probability algorithm, say 40%.

[0035] Each classification parameter has own weights to calculate the probability. Example, the weights of high confident parameters is high compared to medium confident parameters weights.

[0036] The method (300) includes clustering the classified plurality of users’ profile data into homogenous groups such that each of the plurality of users’ profile data having similar attributes of the one or more high confident parameters, the one or more medium confident parameters and the one or more low confident parameters are kept in one group thereby forming a plurality of clusters of the classified plurality of the users’ profile data in step 330. The clustering is required to group the data in different buckets for Uniquification performance point of view and the Uniquification can be performed within the clusters for high probability Uniquification and different clusters for low Probability Uniquification.

Step 1: Using a learning algorithm to extract rules from (create a model of) the training data. The training data are pre-classified rules.

Step 2: Evaluate the rules on data.

Step 3: Apply the rules to (classify) new data.

Step 4: Calculate the linking distance, Assign the data in to same cluster with similar distance.

Step 5: Predict and assign data to different cluster of new cases using training data directly using nearest neighbor classification (rule based).

[0037] How Proposed Clustering will work for user profile Uniquification:
Step1: All users comprising NULL values for defined classification attributes in data are configured to be filtered out before applying the clustering algorithm based on classification attributes i.e. all the data which have NULL values for all classification attributes.

Step2: All the users which have values for high confident classification attributes has moved to cluster1 (Each cluster data store to different temporary table).
Step 3: All the users which have values for medium confident classification attributes has moved to cluster2 (Each cluster data store to different temporary table) MINUS cluster1.
Step 4: All the users having values for low confident classification attributes has moved to cluster3 (Each cluster data store to different temporary table) MINUS cluster1 & cluster2.

Step 5: The Uniquification logic and probability algorithm will be executed in cluster1 and if user found same, the UUID of first user will be updated to all users who are similar detailed and probability percent will be updated.

Step 6: Step 5 will be updated for all the clusters until users on all the clusters are uniquified.

Step 7: Now Uniquification logic and Probability algorithm will be executed across the clusters. The Uniquification logic and Probability algorithm will be executed on cluster 1 vs cluster 2, cluster 1 vs cluster 3 and cluster 2 vs cluster 3.

[0038] The method (300) includes executing in sequential iterations in each of the plurality of clusters of the classified plurality of user’s profile data in order to determine whether the user profile data is same or not, in step 340. If the user data is found to be same, an UUID is assigned to the user and will be updated to all the similar users who are having similar details and probability percentage will be updated.

[0039] The method (300) includes assigning an UUID (Universal Unique Identity) if user found same based on the percentage of probability attributed at step 350. The UUID is a 16-octet (128-bit) code Generation represented by 32 lowercase hexadecimal digits, displayed in five groups separated by hyphens, in the form 8-4-4-4-12 for a total of 36 characters (32 alphanumeric characters and four hyphens). For example: 123e4567-e89b-12d3-a456-426655440000

[0040] The first 3 sequences are interpreted as complete hexadecimal numbers, while the final 2 as a plain sequence of bytes. The byte order is most significant byte first (known as network byte order) (note that GUID's (globally unique identifier) byte order is different). This form reflects UUID's division into fields which apparently originate from the structure of the initial time and MAC-based version.

[0041] However, these probabilities only hold when the UUIDs are generated using sufficient entropy. Otherwise, the probability of duplicates could be significantly higher, since the statistical dispersion might be lower. Unique identifiers are required for distributed applications, so that UUIDs do not clash even when data from many devices is merged, the randomness of the seeds and generators used on every device must be reliable for the life of the application. If the aforementioned is not feasible, then recommends using a namespace variant instead.

[0042] The method (300) includes updating UUID of the user to remaining users’ profile data having similar details and updating probability percentage of the classified one or more of user’s profile data in reaming one or more clusters at step 360. Updating all the clusters until users on all the clusters are uniquified, the Uniquification logic and probability logic is executed across the clusters.

[0043] The method (300) includes identifying one or more unique users based on the percentage of probability attributed to each user of the classified plurality of user’s profile data, wherein each of the one or more unique users identified having similar associated details at step 370. Therefore, finding out same person having different profiles on different online platforms.

[0044] Further, the method creates a FH UUI for new user data profile, the new user data profile is categorized in an appropriate cluster and execute the Uniquification logic and probability logic on the new user data profile. The FH UUI for new user data profile is categorized in an appropriate cluster and executing the Uniquification logic and probability logic on the new user data profile. For new users, FH UUI created and inserted in User Master Data as normal process.

[0045] PROCEDURE CLASSIFICATION ALGORITHM
Input: unclassified data
Output: classified data as a set of N clusters.

1. Create clusters N0, N1, N2, N3, ...., Nn // where n is classification parameters and
// Ni is i th cluster
2. for all clusters
3. Do
4. Initialize Ni <- default initial entry for i th parameter
5. end loop;
6. mark each entry of input data
X1, X2, X3, ..., Xm as m different point in free space
7. for each of X1, X2, X3, ..., Xi //calculating distance of each point
8. Do // from each cluster
9. for each N0, N1, N2, N3, ....., Nj
10. Do
11. Calculate distance of each point from center as
D(x,c)= | di-cj| /2 // dj is distance vector of Xi point and Cj is mean centroid value
// D(x, c) is mean distance of Xi from Nj
12. Darray[k] <=: D(x,c)
13. End inner loop
14. For all entry of Darray
15. if(Darray[k] < Darray[k-1]) Then
16. T = k // cluster number
17. Dtemp = Darray[k]
18. Else
19. Dtemp = Dold
20. End If
21. End loop
22. CT = ?_(ii =1)^l Pii + Dtemp / (k+1) // new center of cluster NT
23. End outer loop
24. Return set of final clusters
25. End

[0046] Positive Probability distribution:
Basic idea: choose the attribute for Uniquification which will result in the smallest tree.
Heuristic: choose the attribute that produces the “purest” nodes i.e. High Confident classification Parameters.

[0047] Properties we require from a purity measure: When node is pure, measure should be zero. When impurity is maximal (i. e. all classes equally likely), measure should be maximal. The measure should obey multistage property (i. e. decisions can be made in several stages), wherein entropy is the only function that satisfies all three properties.
[0048] Given a probability distribution (P1,P2,...,Pn), the information required to predict an event is the distribution’s entropy.
Entropy (P1, P2,...,Pn) = -P1* log(P1)-P2*log(P2) - ...- Pn*log(Pn). When the base of log is 2, then entropy is in bits.
Example: entropy of the class distribution in the User profile data (9 matched and 5 not matched). Entropy (9/14,5/14) = -(9/14) * log(9/14) -(5/14)*log(5/14) = 0.94.
• Information in a set, Info ([9,5]) = Entropy (9/14,5/14).
• Information in a set, Info ([9,5]) = Entropy (9/14,5/14).
• Attribute outlook splits the set in three subsets, [9,5] = [2,3] + [4,0] + [3,2]. The information in this partition is Info([2,3],[4,0],[3,2]) = (5/14)*Info([2,3]) + (4/14)*Info([4,0]) + (5/14)*Info([3,2]).
• Information gain = information before splitting – information after splitting.
• Probability (High) = Info([9,5]) - Info([2,3],[4,0],[3,2]).
• Probability (High) = 0.247, Probability (Medium)=0.029, Probability (Normal)=0.152, Probability (Low)=0.048.
• Best attribute: Probability (High) as highest score.

[0049] Negative Probability distribution:
The probability of the outcome of an experiment is never negative, but quasiperiodic distributions can be defined that allow a negative probability for some events. These distributions may apply to unobservable events or conditional probabilities.

[0050] Considering the situation when data collected for users from two different source and the user is uniquified with high probability due to his address, Phone Number, Mail ID, location were matched. However, we cannot classify both users to same user even positive probability of Uniquification of both users is high if both user Passport numbers are different. The Passport numbers classification parameter is nullified of high probability with negative probability. Result of the high probability of Uniquification will be converted to low probability due to negative probability impact. In this case, both the data point tagged as relative instead of uniquified.

[0051] PROCEDURE FUZZIFICATION ALGORITHM
Uniquification logic for users:
1) UUID should be created for each user registered – UUID should be unique for 1000M records.

2) Same UUID should be assigned to the user across the account base on classification logic given below and same UUID should be assigned to duplicate user but user information is stored in database as duplicate user.

3) The all the data is unified in First run and Uniquification between the clusters can be done on multiple runs.

4) All the new users will assign the new UUID on real time when user is created. The user data will be inserted in appropriate cluster based on classification algorithm.

5) Uniquification can be taken place for new users based on Uniquification login on particular cluster and probability algorithm to get the probability ranking (Probability percent for Uniquification).

6) Database operation trigger when any insert or update happened in user master table to insert the data in appropriate cluster based on classification algorithm. Further, clean-up procedure can be executed once in a month to maintain the quality of data and always unique UUID if any.

7) Process should be run on high performance table using in-memory operation and Partition should be created by hash on mapping and user master table and update the partition as data grows.

[0052] PROCEDURE UNIQUIFICATION ALGORITHM
1. Let h1, h2, h3, …, hn are high level parameters
// high level parames are those params which has high chance to be unique
2. For each row R1, R2, R3, ... , Ri of ununified clustered data
3. Do
4. For each of h1, h2, h3, …, hn
5. Do
6. If Rj [hk] = Ri [hk] then
7. Ri[parent] <= Rj [id] and Ri [duplicate] <= yes
8. CALL FUZZIFICATION
9. End If
10. Move pointer to next parameter
11. End inner loop
12. If Ri [duplicate] = No then
13. Ri [parent] <= Ri [id]
14. End if
15. Move pointer to next data row
16. End loop //end of high parameter check
17. Let M1, M2, M3, ..., Mx are medium level parameters
18. For each R1, R2, R3, …, Ry
19. Do
20. If Ry[duplicate] = No Then
21. For each of M1, M2, M3, ... , Mx
22. Do
23. If Rj [Mii] = Ri [Mii] Then
24. For Mii+1 to Mx Do
25. If Rj [ Mk] = Ri[Mk] Then
26. Ri [parent] <= Rj [id] and Ri [duplicate] <= yes
27. CALL FUZZIFICATION
28. End If
29. End loop
30. End if
31. End loop
32. End if
33. End loop // end of medium level uniquification
34. Let L1, L2, L3, ..., Lx are low level parameters
35. Follow step 17 - 31
36. Return unified set of data
37. End

[0053] FIG. 4 illustrates a block diagram of a computer system suitable for implementing method of FIG. 3, according to an embodiment. In various embodiments, the user device may comprise a personal computing device (e.g., smart phone, a computing tablet, a personal computer, laptop, PDA, Bluetooth device, key FOB, badge, etc.) capable of communicating with the network. The merchant server and/or service provider may utilize a network computing device (e.g., a network server) capable of communicating with the network. It should be appreciated that each of the devices utilized by users and service providers may be implemented as a computer system 400 in a manner as follows.

[0054] The computer system 400 includes a bus 402 or other communication mechanism for communicating information data, signals, and information between various components of computer system 400. The components include an input/output (I/O) component 304 that processes a user’s action, such as selecting keys from a keypad/keyboard, selecting one or more buttons, image, or links, and/or moving one or more images, etc, and sends a corresponding signal to bus 402. The I/O component 404 may also include an output component, such as a display 411 and a cursor control 413 (such as a keyboard, keypad, mouse, etc.). An optional audio input/output component 405 may also be included to allow a user to use voice for inputting information by converting audio signals. The audio I/O component 405 may allow the user to hear audio. A transceiver or network interface 406 transmits and receives signals between computer system 400 and other devices, such as another user device, a merchant server, or a service provider server via network 460. In another embodiment, the transmission is wireless, although other transmission mediums and methods may also be suitable. One or more processors 412, which may be a micro-controller, digital signal processor (DSP), or other processing component, processes these various signals, such as for display on the computer system 400 or transmission to other devices via a communication link 418. The processor(s) 412 may also control transmission of information, such as cookies or IP addresses, to other devices.

[0055] The components of computer system 400 also include a system memory component 414 (e.g., RAM), a static storage component 416 (e.g., ROM), and/or a disk drive 417. The computer system 400 performs specific operations by processor(s) 412 and other components by executing one or more sequences of instructions contained in the system memory component 414. Logic may be encoded in a computer readable medium, which may refer to any medium that participates in providing instructions to processor(s) 412 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. In various embodiments, non-volatile media includes optical or magnetic disks, volatile media includes dynamic memory, such as the system memory component 414, and transmission media includes coaxial cables, copper wire, and fibre optics, including wires that comprise the bus 402. In one embodiment, the logic is encoded in non-transitory computer readable medium. In an example, transmission media may take the form of acoustic or light waves, such as those generated during radio wave, optical and infrared data communications.

[0056] FIG. 5 illustrates a schematic diagram of a computer system suitable for implementing method of FIG. 3, according to an embodiment. A user data is extracted from various sources, including but not limited to, social media, web/commerce clickstreams, CRM data, 3rd party data, email marketing data, call centre data and POS data. The extracted user data is processed using the Uniquification logic and probability logic in sequential iterations to identify the one or more unique users. Sharing the Uniquified data to a plurality of customers such as, but not limited to, Ad exchange services, email marketing services, website services, Ad Hoc audience segmentation services and operational reports for campaign performance.

[0057] Some common forms of computer readable media include, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EEPROM, FLASH-EEPROM, any other memory chip or cartridge, or any other medium from which a computer is adapted to read.

[0058] In various embodiments of the present disclosure, execution of instruction sequences to practice the present disclosure may be performed by the computer system 400. In various other embodiments of the present disclosure, a plurality of computer systems 400 coupled by communication link 418 to the network (e.g., such as a LAN, WLAN, PTSN, and/or various other wired or wireless networks, including telecommunications, mobile, and cellular phone networks) may perform instruction sequences to practice the present disclosure in coordination with one another.

[0059] Where applicable, various embodiments provided by the present disclosure may be implemented using hardware, software, or combinations of hardware and software. Also, where applicable, the various hardware components and/or software components set forth herein may be combined into composite components comprising software, hardware, and/or both without departing from the spirit of the present disclosure. Where applicable, the various hardware components and/or software components set forth herein may be separated into sub-components comprising software, hardware, or both without departing from the scope of the present disclosure. In addition, where applicable, it is contemplated that software components may be implemented as hardware components and vice-versa.
[0060] Software, in accordance with the present disclosure, such as program code and/or data, may be stored on one or more computer readable mediums. It is also contemplated that software identified herein may be implemented using one or more general purpose or specific purpose computers and/or computer systems, networked and/or otherwise. Where applicable, the ordering of various steps described herein may be changed, combined into composite steps, and/or separated into sub-steps to provide features described herein.

[0061] The foregoing disclosure is not intended to limit the present disclosure to the precise forms or particular fields of use disclosed. As such, it is contemplated that various alternate embodiments and/or modifications to the present disclosure, whether explicitly described or implied herein, are possible in light of the disclosure. Having thus described embodiments of the present disclosure, persons of ordinary skill in the art will recognize that changes may be made in form and detail without departing from the scope of the present disclosure. Thus, the present disclosure is limited only by the claims.

[0062] The various advantages the invention provides, solutions to users who are the same or near to same users based on data match on different parameter. Further, it suggests the positive probability for users to be the same and suggests the negative probability for users to not be same. It also suggests how many fields must be combined before 100%, or nearly 100%, of users have at least one unique parameter matched, or combination of parameters matched. It also suggests what fields make users more vulnerable than others, suggests finding out the uniquifiability between users and predicts other factors than Uniquification about a user.

[0063] While specific language has been used to describe the disclosure, any limitations arising on account of the same are not intended. As would be apparent to a person skilled in the art, various working modifications may be made to the method in order to implement the inventive concept as taught herein.
[0064] The figures and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, order of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts need to be necessarily performed. Also, those acts that are not dependant on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples.

Documents

Application Documents

#	Name	Date
1	201841046001-STATEMENT OF UNDERTAKING (FORM 3) [05-12-2018(online)].pdf	2018-12-05
2	201841046001-FORM 1 [05-12-2018(online)].pdf	2018-12-05
3	201841046001-DRAWINGS [05-12-2018(online)].pdf	2018-12-05
4	201841046001-DECLARATION OF INVENTORSHIP (FORM 5) [05-12-2018(online)].pdf	2018-12-05
5	201841046001-COMPLETE SPECIFICATION [05-12-2018(online)].pdf	2018-12-05