Sign In to Follow Application
View All Documents & Correspondence

System And Method For Activity Detection From Metadata Features Of E Mails

Abstract: Techniques for activity detection from metadata features of e-mails are disclosed. In an embodiment, metadata features of e-mails associated with a user are extracted. Further, the features are represented as a rectangular matrix. Furthermore, the matrix is projected onto a right singular vector space. Moreover, a dimension of each vector is reduced using t-SNE. Also, density based clustering is performed on the vectors to obtain e-mail clusters. Similarity between pairs of e-mail clusters is then determined based on the features in corresponding pair of e-mail clusters. Further, the e-mail clusters are modelled as nodes on a graph and edges are created between pairs of the nodes based on similarity between corresponding pairs of e-mail clusters. Furthermore, one or more of the e-mail clusters of a community are determined using the modeled graph. Also, an activity of the user is detected based on the one or more of the e-mail clusters.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
09 December 2016
Publication Number
24/2018
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
ip@legasis.in
Parent Application
Patent Number
Legal Status
Grant Date
2023-11-04
Renewal Date

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th floor, Nariman point, Mumbai-400021, Maharashtra, India

Inventors

1. AGARWAL, Puneet
Tata Consultancy Services Limited, Plot No. A-44 & A-45,Ground,1st to 5th Floor & 10th floor, Block - C & D, Sector- 62, Noida-201309, Uttar Pradesh, India
2. PATIDAR, Mayur
Tata Consultancy Services Limited, Plot No. A-44 & A-45,Ground,1st to 5th Floor & 10th floor, Block - C & D, Sector- 62, Noida-201309, Uttar Pradesh, India
3. ROHATGI, Shaurya
Tata Consultancy Services Limited, Plot No. A-44 & A-45,Ground,1st to 5th Floor & 10th floor, Block - C & D, Sector- 62, Noida-201309, Uttar Pradesh, India
4. SINGH, Mahesh Prasad
Tata Consultancy Services Limited, Plot No. A-44 & A-45,Ground,1st to 5th Floor & 10th floor, Block - C & D, Sector- 62, Noida-201309, Uttar Pradesh, India
5. SHROFF, Gautam
Tata Consultancy Services Limited, 10th floor, TCS Innovation labs Delhi, ASF Insignia Building, Canton Buildwel P Ltd.m, Block C, Gwal Pahari, Gurgaon-122003, Haryana, India
6. CHAUDHARY, Ashish
Tata Consultancy Services Limited, Plot No. A-44 & A-45,Ground,1st to 5th Floor & 10th floor, Block - C & D, Sector- 62, Noida-201309, Uttar Pradesh, India

Specification

Claims:
1. A processor-implemented method comprising:
receiving multiple electronic mails (e-mails) associated with a user;
extracting metadata features of each of the multiple e-mails, wherein the metadata features comprise unique document names in the multiple e-mails, unique people in the multiple e-mails, and subject strings in the multiple e-mails;
representing the extracted metadata features of the multiple e-mails as a rectangular matrix, wherein rows in the rectangular matrix are vector representations of the multiple e-mails;
projecting the rectangular matrix onto a right singular vector space;
reducing a dimension of each of the projected vectors using t-distributed stochastic neighbor embedding;
performing density based clustering on the projected vectors with reduced dimension to obtain a plurality of e-mail clusters;
determining similarity between each pair of the plurality of e-mail clusters based on the metadata features in a corresponding pair of the plurality of e-mail clusters;
modelling the plurality of e-mail clusters as nodes on a graph and creating edges between pairs of the nodes based on the similarity between corresponding pairs of the plurality of e-mail clusters;
determining one or more of the plurality of e-mail clusters of a community using the modeled graph; and
detecting an activity of the user based on the determined one or more of the plurality of e-mail clusters.

2. The method as claimed in claim 1, wherein columns of the rectangular matrix represent the multiple e-mails, people and document names.

3. The method as claimed in claim 2, further comprising:
computing values of the column representing the multiple e-mails based on subject similarity between corresponding pairs of the multiple e-mails;
computing values of the column representing the people based on relationship between the multiple e-mails and unique people; and
computing values of the column representing the document names based on relationship between the multiple e-mails and unique document names.

4. The method as claimed in claim 1, wherein projecting each of the vectors onto the right singular vector space, comprises:
projecting each of the vectors onto a latent vector space based on right singular vectors defined from a singular value decomposition (SVD).

5. The method as claimed in claim 1, wherein determining similarity between each pair of the plurality of e-mail clusters based on the metadata features in the corresponding pair of the plurality of e-mail clusters, comprises:
computing a first similarity score between each pair of the plurality of e-mail clusters based on the subject strings;
computing a second similarity score between each pair of the plurality of e-mail clusters based on the unique people;
computing a third similarity score between each pair of the plurality of e-mail clusters based on the unique document names;
computing a final similarity score as linear combination of the first, second and third similarity scores using a predetermined weight; and
determining the similarity between each pair of the plurality of e-mail clusters based on the computed final similarity score.

6. The method as claimed in claim 1, wherein creating edges between pairs of the nodes based on the similarity between corresponding pairs of the plurality of e-mail clusters, comprises:
creating an edge between each pair of the nodes when the similarity between associated pair of the plurality of e-mail clusters is greater than a predetermined threshold.

7. The method as claimed in claim 1, wherein detecting the activity of the user based on the determined one or more of the plurality of e-mail clusters comprises:
merging the determined e-mail clusters of the community; and
detecting the activity of the user based on the merged e-mail clusters.

8. A system comprising:
one or more memories; and
one or more hardware processors, the one or more memories coupled to the one or more hardware processors, wherein the one or more hardware processors are capable of executing programmed instructions stored in the one or more memories to:
receive multiple electronic mails (e-mails) associated with a user;
extract metadata features of each of the multiple e-mails, wherein the metadata features comprise unique document names in the multiple e-mails, unique people in the multiple e-mails, and subject strings in the multiple e-mails;
represent the extracted metadata features of the multiple e-mails as a rectangular matrix, wherein rows in the rectangular matrix are vector representations of the multiple e-mails;
project the rectangular matrix onto a right singular vector space;
reduce a dimension of each of the projected vectors using t-distributed stochastic neighbor embedding;
perform density based clustering on the projected vectors with reduced dimension to obtain a plurality of e-mail clusters;
determine similarity between each pair of the plurality of e-mail clusters based on the metadata features in a corresponding pair of the plurality of e-mail clusters;
model the plurality of e-mail clusters as nodes on a graph and create edges between pairs of the nodes based on the similarity between corresponding pairs of the plurality of e-mail clusters;
determine one or more of the plurality of e-mail clusters of a community using the modeled graph; and
detect an activity of the user based on the determined one or more of the plurality of e-mail clusters.

9. The system as claimed in claim 8, wherein columns of the rectangular matrix represent the multiple e-mails, people and document names.

10. The system as claimed in claim 9, wherein the one or more hardware processors are further capable of executing programmed instructions to:
compute values of the column representing the multiple e-mails based on subject similarity between corresponding pairs of the multiple e-mails;
compute values of the column representing the people based on relationship between the multiple e-mails and unique people; and
compute values of the column representing the document names based on relationship between the multiple e-mails and unique document names.

11. The system as claimed in claim 8, wherein the one or more hardware processors are capable of executing programmed instructions to:
project each of the vectors onto a latent vector space based on right singular vectors defined from a singular value decomposition (SVD).

12. The system as claimed in claim 8, wherein the one or more hardware processors are capable of executing programmed instructions to:
compute a first similarity score between each pair of the plurality of e-mail clusters based on the subject strings;
compute a second similarity score between each pair of the plurality of e-mail clusters based on the unique people;
compute a third similarity score between each pair of the plurality of e-mail clusters based on the unique document names;
compute a final similarity score as linear combination of the first, second and third similarity scores using a predetermined weight; and
determine the similarity between each pair of the plurality of e-mail clusters based on the computed final similarity score.

13. The system as claimed in claim 8, wherein the one or more hardware processors are capable of executing programmed instructions to:
create an edge between each pair of the nodes when the similarity between associated pair of the plurality of e-mail clusters is greater than a predetermined threshold.

14. The system as claimed in claim 8, wherein the one or more hardware processors are capable of executing programmed instructions to:
merge the determined e-mail clusters of the community; and
detect the activity of the user based on the merged e-mail clusters.
, Description:
FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
SYSTEM AND METHOD FOR ACTIVITY DETECTION FROM METADATA FEATURES OF E-MAILS

Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

The following specification particularly describes the disclosure and the manner in which it is to be performed.
TECHNICAL FIELD
[001] The embodiments herein generally relate to activity detection from electronic mails (e-mails), and, more particularly, to system and method for activity detection from metadata features of the e-mails.
BACKGROUND
[002] Generally, large enterprises exchange a large number of electronic mails (e-mails) associated with various activities every day. Such e-mails exchanged in the large enterprises are different from those observed in other domains as a number of people involved in the e-mails is higher and a large fraction of e-mails contain attachments. Most of such e-mails are not machine or bot generated. Further, activities can contain e-mails with multiple different subjects. Therefore, there is no clear identifier to bring all e-mails of an activity together.

SUMMARY
[003] The following presents a simplified summary of some embodiments of the disclosure in order to provide a basic understanding of the embodiments. This summary is not an extensive overview of the embodiments. It is not intended to identify key/critical elements of the embodiments or to delineate the scope of the embodiments. Its sole purpose is to present some embodiments in a simplified form as a prelude to the more detailed description that is presented below.
[004] In view of the foregoing, an embodiment herein provides methods and systems for activity detection from metadata features of electronic mails (e-mails). In one aspect, a processor-implemented method includes steps of: receiving multiple e-mails associated with a user; extracting metadata features of each of the multiple e-mails, wherein the metadata features comprise unique document names in the multiple e-mails, unique people in the multiple e-mails, and subject strings in the multiple e-mails; representing the extracted metadata features of the multiple e-mails as a rectangular matrix, wherein rows in the rectangular matrix are vector representations of the multiple e-mails; projecting the rectangular matrix onto a right singular vector space; reducing a dimension of each of the projected vectors using t-distributed stochastic neighbor embedding; performing density based clustering on the projected vectors with reduced dimension to obtain a plurality of e-mail clusters; determining similarity between each pair of the plurality of e-mail clusters based on the metadata features in a corresponding pair of the plurality of e-mail clusters; modelling the plurality of e-mail clusters as nodes on a graph and creating edges between pairs of the nodes based on the similarity between corresponding pairs of the plurality of e-mail clusters; determining one or more of the plurality of e-mail clusters of a community using the modeled graph; and detecting an activity of the user based on the determined one or more of the plurality of e-mail clusters.
[005] In another aspect, a system for activity detection from metadata features of e-mails is provided. The system includes one or more memories; and one or more hardware processors, the one or more memories coupled to the one or more hardware processors wherein the one or more hardware processors are capable of executing programmed instructions stored in the one or more memories to: receive multiple e-mails associated with a user; extract metadata features of each of the multiple e-mails, wherein the metadata features comprise unique document names in the multiple e-mails, unique people in the multiple e-mails, and subject strings in the multiple e-mails; represent the extracted metadata features of the multiple e-mails as a rectangular matrix, wherein rows in the rectangular matrix are vector representations of the multiple e-mails; project the rectangular matrix onto a right singular vector space; reduce a dimension of each of the projected vectors using t-distributed stochastic neighbor embedding; perform density based clustering on the projected vectors with reduced dimension to obtain a plurality of e-mail clusters; determine similarity between each pair of the plurality of e-mail clusters based on the metadata features in a corresponding pair of the plurality of e-mail clusters; model the plurality of e-mail clusters as nodes on a graph and create edges between pairs of the nodes based on the similarity between corresponding pairs of the plurality of e-mail clusters; determine one or more of the plurality of e-mail clusters of a community using the modeled graph; and detect an activity of the user based on the determined one or more of the plurality of e-mail clusters.
[006] In yet another aspect, a non-transitory computer-readable medium having embodied thereon a computer program for executing a method for activity detection from metadata features of e-mails. The method includes steps of: receiving multiple e-mails associated with a user; extracting metadata features of each of the multiple e-mails, wherein the metadata features comprise unique document names in the multiple e-mails, unique people in the multiple e-mails, and subject strings in the multiple e-mails; representing the extracted metadata features of the multiple e-mails as a rectangular matrix, wherein rows in the rectangular matrix are vector representations of the multiple e-mails; projecting the rectangular matrix onto a right singular vector space; reducing a dimension of each of the projected vectors using t-distributed stochastic neighbor embedding; performing density based clustering on the projected vectors with reduced dimension to obtain a plurality of e-mail clusters; determining similarity between each pair of the plurality of e-mail clusters based on the metadata features in a corresponding pair of the plurality of e-mail clusters; modelling the plurality of e-mail clusters as nodes on a graph and creating edges between pairs of the nodes based on the similarity between corresponding pairs of the plurality of e-mail clusters; determining one or more of the plurality of e-mail clusters of a community using the modeled graph; and detecting an activity of the user based on the determined one or more of the plurality of e-mail clusters.
[007] It should be appreciated by those skilled in the art that any block diagram herein represents conceptual views of illustrative systems embodying the principles of the present subject matter. Similarly, it is appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computing device or processor, whether or not such computing device or processor is explicitly shown.

BRIEF DESCRIPTION OF THE FIGURES
[008] The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and modules.
[009] FIG. 1 illustrates a block diagram of a system for activity detection from metadata features of electronic mails (e-mails), in accordance with an example embodiment.
[0010] FIG. 2 illustrates a t-distributed stochastic neighbor embedding plot of e-mails, in accordance with an example embodiment.
[0011] FIG. 3 illustrates a graph representing e-mail clusters, in accordance with an example embodiment.
[0012] FIG. 4 illustrates a flow diagram of a method for activity detection from metadata features of e-mails, in accordance with an example embodiment.
[0013] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems and devices embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
DETAILED DESCRIPTION
[0014] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
[0015] The present subject matter herein provides a system and method for activity detection from metadata features of electronic mails (e-mails), in accordance with an example embodiment. The present subject matter proposes a two stage process for automatic activity detection from users’ e-mails. In the first stage, e-mail dataset is represented as a rectangular matrix using metadata features such as e-mails, people involved, and names of the documents attached in the e-mails. Thus, only surface features of e-mails available in email metadata, such as people, document names, and subject strings, are used without considering contents of the e-mails or attachments. Further, the e-mails are represented in a latent feature space using singular value decomposition (SVD). Further dimensionality reduction is performed on the projected vectors using t-Distributed Stochastic Neighbor embedding (t-SNE). The e-mails are then clustered using a density based clustering algorithm in t-SNE space. The clusters thus obtained have high precision. However, at the same time, the e-mails of a single activity are also scattered into more than one cluster.
[0016] In the second stage, pair-wise similarity between all pairs of high-precision clusters is computed using group properties of the clusters. Further, the clusters are modeled as nodes of a graph and edges are modeled based on the pairwise cluster similarity. Furthermore, a community detection algorithm is applied on this cluster-graph and all clusters of a community are merged to obtain a final set of activity groupings.
[0017] The methods and systems are not limited to the specific embodiments described herein. In addition, the method and system can be practiced independently and separately from other modules and methods described herein. Each device element/module and method can be used in combination with other elements/modules and other methods.
[0018] The manner, in which the system and method for activity detection from metadata features of e-mails, has been explained in details with respect to the FIGS. 1 through 4. While aspects of described methods and systems for activity detection from metadata features of e-mails can be implemented in any number of different systems, utility environments, and/or configurations, the embodiments are described in the context of the following exemplary system(s).
[0019] FIG. 1 illustrates a block diagram of a system 100 for activity detection from metadata features of e-mails, in accordance with an example embodiment. In an example embodiment, the system 100 may be embodied in, or is in direct communication with a computing device. The system 100 includes or is otherwise in communication with one or more hardware processors such as processor(s) 102, one or more memories such as a memory 104, and a network interface unit such as a network interface unit 106. In an embodiment, the processor 102, memory 104, and the network interface unit 106 may be coupled by a system bus such as a system bus or a similar mechanism. Although FIG. 1 shows example components of the system 100, in other implementations, the system 100 may contain fewer components, additional components, different components, or differently arranged components than depicted in FIG. 1.
[0020] The processor 102 may include circuitry implementing, among others, audio and logic functions associated with the communication. For example, the processor 102 may include, but are not limited to, one or more digital signal processors (DSPs), one or more microprocessor, one or more special-purpose computer chips, one or more field-programmable gate arrays (FPGAs), one or more application-specific integrated circuits (ASICs), one or more computer(s), various analog to digital converters, digital to analog converters, and/or other support circuits. The processor 102 thus may also include the functionality to encode messages and/or data or information. The processor 102 may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor 102. Further, the processor 102 may include functionality to execute one or more software programs, which may be stored in the memory 104 or otherwise accessible to the processor 102.
[0021] The functions of the various elements shown in the figure, including any functional blocks labeled as “processor(s)”, may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation DSP hardware, network processor, application specific integrated circuit (ASIC), FPGA, read only memory (ROM) for storing software, random access memory (RAM), and non-volatile storage. Other hardware, conventional, and/or custom, may also be included.
[0022] The interface(s) 106 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, and a printer. The interface(s) 106 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite.
[0023] The one or more memories such as a memory 104, may store any number of pieces of information, and data, used by the system to implement the functions of the system. The memory 104 may include for example, volatile memory and/or non-volatile memory. Examples of volatile memory may include, but are not limited to volatile random access memory. The non-volatile memory may additionally or alternatively comprise an electrically erasable programmable read only memory (EEPROM), flash memory, hard drive, or the like. Some examples of the volatile memory includes, but are not limited to, random access memory, dynamic random access memory, static random access memory, and the like. Some example of the non-volatile memory includes, but are not limited to, hard disks, magnetic tapes, optical disks, programmable read only memory, erasable programmable read only memory, electrically erasable programmable read only memory, flash memory, and the like. The memory 104 may be configured to store information, data, applications, instructions or the like for enabling the system 100 to carry out various functions in accordance with various example embodiments. Additionally or alternatively, the memory 104 may be configured to store instructions which when executed by the processor 102 causes the system to behave in a manner as described in various embodiments. The memory 104 includes an activity detection module 108 and other modules. The module 108 and other modules include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. The other modules may include programs or coded instructions that supplement applications and functions of the system 100.
[0024] In operation, the activity detection module 108 receives multiple e-mails associated with a user. Further, the activity detection module 108 extracts metadata features of each of the e-mails. For example, the metadata features include unique document names in the multiple e-mails, unique people in the multiple e-mails, and subject strings in the multiple e-mails. In an example embodiment, the activity detection module 108 receives a set of e-mails M = {M1, M2... Mm} associated with a user. An e-mail (Mi) can be represented as a set of key-value pairs, i.e., Mi = {{ID, Mi}, {From, pf }, {To, {pt1, pt2, ...}},{Cc,{pc1, pc2, ...}},{Date, time-stamp},{Subject, s1},{Body, bi}, {Attachments,{fn1, fn2, ...}}}. Here, ID is a unique identifier for an e-mail, values of the keys {From, To, CC} are sets of people, Date contains the date and time of the e-mail sent/received, subject refers to a title (i.e., subject string) of the e-mail, value of the key Body is all textual contents of the e-mail, and the key Attachment is a set of file-names of the attachments of the e-mail.
[0025] Against the set of e-mails, the activity detection module 108 extracts the set of all unique people P = {P1... Pp} by taking a union of the people involved in all e-mails, i.e., in From, To, CC fields. Similarly, the activity detection module 108 extracts a set of unique document names D = {D1... Dd} by taking the union of file-names of the attachments of all the e-mails in M. In an example, when e-mails are exchanged between people, if one of the participants in the activity deletes the attachment and then replies to all the people, the name of the files attached are still retained in the e-mail trail as a deleted document reference. When extracting document names of the e-mail, the activity detection module 108 extracts the document names present in such references also.
[0026] Furthermore, the activity detection module 108 represents the extracted metadata features as a rectangular matrix. Rows in the rectangular matrix are vector representations of the e-mails and columns of the rectangular matrix represent the e-mails, people and document names. For example, the set of emails are represented in form of rectangular matrix Am×n. Here, m is a number of e-mails and n is a number of metadata features. Each e-mail is represented as n = (m + p + d) dimension vector, as shown in below Table 1, having a dimension corresponding to every element of the set {M1... Mm, P1, ..., Pp, D1, ..., Dd}, where M1-Mm are set of e-mails, P1-Pp are people, and D1-Dp are document names.

Table 1
[0027] In an example embodiment, the activity detection module 108 computes values of the column representing the e-mails based on subject similarity between corresponding pairs of the multiple e-mails. In this example, a value for a cell type A[Mi][Mj], {i, j}E{1 ... m} is computed by subject similarity between e-mail pairs. Generally, e-mail writers attempt to summarize its contents in its subject string, using as few words as possible. The activity detection module 108 extracts all forms of noun’s (NN, NNS, NNP, NNPS) from a subject string of each e-mail using Stanford part-of-speech (POS) tagger and use these as proxies for named entities. If Ni and Nj represent the set of nouns extracted from subjects of emails Mi and Mj respectively, the activity detection module 108 calculates Jaccard similarity (JSC) between a pair of e-mail subjects (using only nouns) and uses the scores after tf-idf weighting as a similarity measure an. For example, the similarity measure an ¬computed using the following equation:
(1)
[0028] In some embodiments, the activity detection module 108 finds that some domain-specific bi-grams which are frequently used by users are not tagged as noun by the POS tagger. So, after filtering stop words from each subject string, the activity detection module 108 extracts sets of bi-grams Bi and Bj from subjects of e-mails Mi and Mj, respectively. Further, the activity detection module 108 calculates measure ab, similar to an of equation 1, using bi-grams Bi and Bj. Also, the activity detection module 108 computes a final score between the e-mail pair (Mi, Mj) as an average of an(Mi, Mj) and ab(Mi, Mj). The email-to-email cells of A, i.e., A[Mi][Mj ], are filled as follows :

where ts is a threshold for subject similarity required for creation of the matrix A and a value of ts is obtained using Bayesian optimization. The Bayesian optimization estimates an optimal choice of a parameter by sampling in areas of high uncertainty (exploration) and in areas where it is likely to have optimal value of an objective function (exploitation). Here, F-measure of activities is used as the objective function. The parameter chosen by using the Bayesian Optimization can be personalized for every user.
[0029] Further, the activity detection module 108 computes values of the column representing the people based on relationship between the multiple e-mails and unique people. The activity detection module 108 encodes the relatedness of people with an e-mail in the rectangular matrix A by setting A[Mi][Pj ] = ß(Mi, Pj). The ß(Mi, Pj) is computed using the below example equation.

where P: {P1… Pq} is a set of people present in an email Mi, Pj ? {P1… Pq}, and 1P (k) is an indicator function that have a value of 1, if the field k is not empty, otherwise zero.
[0030] Furthermore, the activity detection module 108 computes values of the column representing the document names based on relationship between the multiple e-mails and unique document names. For example, e-mail-document relationship is determined by considering all documents attached in an e-mail as equally important. Let D:{D1, ..., Dk} be the set of documents present in an e-mail Mi, then corresponding entry of the matrix, A[Mi][Dj ] = , is computed as:
= , where Dj {D1…Dk}
[0031] In addition, the activity detection module 108 projects the rectangular matrix onto a right singular vector space. In an example, the activity detection module 108 projects each of the vectors onto a latent vector space based on right singular vectors defined from SVD. As discussed earlier, an activity is characterized by a subset of features and it is difficult to determine those features. Therefore, the activity detection module 108 determines a low-rank approximation of the matrix (A) using k right singular vectors V from its SVD, A = USVT. The activity detection module 108 choses first k singular vectors such that at least p fraction of energy of singular values is preserved. If E = , where are the singular values, the first k right singular vectors are chosen such that , where, Ek = . Here, p, obtained using Bayesian optimization, is a fraction of total singular value energy to be retained that is required for latent feature representation of e-mail vectors. Then, the activity detection module 108 projects the matrix A into the latent vector space represented by k right singular vectors, i.e., columns of V. So, if V = [Vk, Vn-k+1] = [v1 … vk, vk+1 … vn] in the SVD and A = U , then A` = AVk = A[v1 … vk] Rmxk.
[0032] In addition, the activity detection module 108 reduces a dimension of each of the projected vectors using t-SNE. The activity detection module 108 reduces the dimension of these vectors to R2 using t-SNE. For example, the parameter perplexity, obtained using Bayesian optimization is used for t-SNE transformation of e-mail vectors in latent feature space to R2. Generally, t-SNE provides two dimensional (2D) representation of the e-mail. In the t-SNE transformation, distances between neighboring points remain small, while distant points remain far from each other. The similarity between data points is captured by probability distributions: the KL divergence between the probability distributions in a high-dimensional representation and that in reduced dimension representation is minimized using gradient descent. The t-SNE plot of a subset of e-mails of a user is shown in FIG. 2, here e-mails of the same activities are shown by same mark. From t-SNE Plot 200, it is observed that e-mails from one activity are represented by same marker.
[0033] Moreover, the activity detection module 108 performs density based clustering on the projected vectors with reduced dimension to obtain a plurality of e-mail clusters. For example, the activity detection module 108 cluster e-mails using density based clustering algorithm, such as Density-based spatial clustering of applications with noise (DBSCAN) to identify clusters as they are visible in FIG. 2. It can be observed from FIG. 2 that a close cluster of points usually contains e-mails of one activity only, while e-mails of an activity get distributed into multiple clusters. For getting emails which are part of high density region in one cluster, DBSCAN with low values for parameters, such as minpts and eps are used. For example, eps, minpts taken by DBSCAN algorithm for clustering the e-mails in t-SNE space are obtained using Bayesian optimization. As a result, e-mail clusters {c1... cr} which are large in number but have very low intra-cluster noise are obtained.
[0034] Also, the activity detection module 108 determines similarity between each pair of the plurality of e-mail clusters based on the metadata features in a corresponding pair of the plurality of e-mail clusters. In an example embodiment, the activity detection module 108 computes a first similarity score (CSS) between each pair of the plurality of e-mail clusters based on the subject strings. The CSS capturing the similarity on the basis of email-subjects present across two e-mail clusters is computed using the following example equation:
CSS (ci, cj) =
[0035] Further, in this example embodiment, the activity detection module 108 computes a second similarity score between each pair of the plurality of e-mail clusters based on the unique people. For obtaining similarity score (CSp) between a pair of e-mail clusters {ci, cj} based on the people involved, assume that every cluster is a document containing people as words. Each email-cluster is represented by a vector of people vpi = {p1…pp} with their tf-idf score. The activity detection module 108 then uses cosine-similarity for computing the similarity score between email- cluster pair. The activity detection module 108 computes the second similarity score using the following example equation:
CSP (ci, cj) = .
[0036] Further, the activity detection module 108 computes a third similarity score between each pair of the plurality of e-mail clusters based on the unique document names. For example, the activity detection module 108 computes the third similarity score based on documents CSd using the following equation:
CSd (ci, cj) = . .
[0037] In addition, the activity detection module 108 computes a final similarity score as linear combination of the first, second and third similarity scores using a predetermined weight. For example, the activity detection module 108 computes the final similarity score between a pair of e-mail clusters as linear combination of the three scores CS = (CSs, CSp, CSd), using a weight vector w = (ws, wp, wd), i.e., S (ci, cj) = WT X CS. For example, values of ws, wp, wd are obtained using the Bayesian optimization. Also, the similarity between each pair of the plurality of e-mail clusters is determined based on the computed final similarity score.
[0038] Further, the activity detection module 108 models the plurality of e-mail clusters as nodes on a graph and creates edges between pairs of the nodes based on the similarity between corresponding pairs of the e-mail clusters. In an embodiment, an edge between each pair of the nodes is created when the similarity between associated pair of the plurality of e-mail clusters is greater than a predetermined threshold. For example, the clusters are modeled as graph G = (V, E) having V ? {c1… cr} and an edge e ? E is created between a pair of nodes only if the cluster similarity CS (ci, cj) > tcm, where tcm is a predetermined threshold. For example, tcm is determined using Bayesian optimization. A sample graph of clusters 300 for sample dataset is illustrated in FIG. 3, after removing standalone nodes. In the graph 300, node-ids are written as .
[0039] Furthermore, the activity detection module 108 determines one or more of the plurality of e-mail clusters of a community using the modeled graph. Since clusters of an activity form communities, problem of cluster merging is transformed into community detection from email-cluster graph 300. In the e-mail cluster graph 300, edge weights are defined using similarity between email-cluster pairs, i.e., w(eij) = CS(ci, cj). The activity detection module 108 then uses modularity-based community detection algorithm (e.g., Louvain Modularity) to detect e-mail clusters of same community. In addition, the activity detection module 108 merges the determined e-mail clusters of same community. In other words, all e-mail clusters falling into the same community are merged. Moreover, the activity detection module 108 detects the activity of the user based on the merged e-mail clusters.
[0040] FIG. 4 illustrates a flow diagram of a method 400 for activity detection from metadata features of e-mails, in accordance with an example embodiment. The processor-implemented method 400 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 400 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communication network. The order in which the method 400 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 400, or an alternative method. Furthermore, the method 400 can be implemented in any suitable hardware, software, firmware, or combination thereof. In an embodiment, the method 400 depicted in the flow chart may be executed by a system, for example, the system 100 of FIG. 1.
[0041] At block 402, multiple e-mails associated with a user are received. At block 404, metadata features of each of the multiple e-mails are extracted. For example, the metadata features include unique document names in the multiple e-mails, unique people in the multiple e-mails, and subject strings in the multiple e-mails. At block 406, the extracted metadata features of the multiple e-mails are represented as a rectangular matrix. For example, rows in the rectangular matrix are vector representations of the multiple e-mails and columns of the rectangular matrix represent the multiple e-mails, people and document names. In this example, values of the column representing the multiple e-mails are computed based on subject similarity between corresponding pairs of the multiple e-mails. Further, values of the column representing the people are computed based on relationship between the multiple e-mails and unique people. Furthermore, values of the column representing the document names are computed based on relationship between the multiple e-mails and unique document names.
[0042] At block 408, the rectangular matrix is projected onto a right singular vector space. In an embodiment, each of the vectors is projected onto a latent vector space based on right singular vectors defined from SVD. At block 410, a dimension of each of the projected vectors is reduced using t-SNE. At block 412, density based clustering is performed on the projected vectors with reduced dimension to obtain a plurality of e-mail clusters.
[0043] At block 414, similarity between each pair of the plurality of e-mail clusters is determined based on the metadata features in a corresponding pair of the plurality of e-mail clusters. In an example embodiment, a first similarity score between each pair of the plurality of e-mail clusters is computed based on the subject strings. Further, a second similarity score between each pair of the plurality of e-mail clusters is computed based on the unique people. Furthermore, a third similarity score between each pair of the plurality of e-mail clusters is computed based on the unique document names. In addition, a final similarity score is computed as linear combination of the first, second and third similarity scores using a predetermined weight. Also, the similarity between each pair of the plurality of e-mail clusters is determined based on the computed final similarity score.
[0044] At block 416, the plurality of e-mail clusters are modeled as nodes on a graph and edges are created between pairs of the nodes based on the similarity between corresponding pairs of the plurality of e-mail clusters. In an embodiment, an edge between each pair of the nodes is created when the similarity between associated pair of the plurality of e-mail clusters is greater than a predetermined threshold.
[0045] At block 418, one or more of the plurality of e-mail clusters of a community are determined using the modeled graph. At block 420, an activity of the user is detected based on the determined one or more of the plurality of e-mail clusters. In an example embodiment, the determined e-mail clusters of same community are merged. Further, the activity of the user is detected based on the merged e-mail clusters.
[0046] The various embodiments described in FIGS. 1-4 propose a technique for activity detection from metadata features of e-mails. The objective of activity detection is to group e-mails of a user such that every group characterizes ‘a fine-grained task that users need to perform collaboratively as part of their work’. Thus, handling heavy traffic of e-mails by grouping the user's e-mails into different groups which provides a summarized view of user's inbox.
[0047] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[0048] It is, however to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such non-transitory computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[0049] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[0050] The foregoing description of the specific implementations and embodiments will so fully reveal the general nature of the implementations and embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the embodiments as described herein.
[0051] The preceding description has been presented with reference to various embodiments. Persons having ordinary skill in the art and technology to which this application pertains will appreciate that alterations and changes in the described structures and methods of operation can be practiced without meaningfully departing from the principle, spirit and scope.

Documents

Application Documents

# Name Date
1 Form 3 [09-12-2016(online)].pdf 2016-12-09
2 Form 20 [09-12-2016(online)].jpg 2016-12-09
3 Form 18 [09-12-2016(online)].pdf_246.pdf 2016-12-09
4 Form 18 [09-12-2016(online)].pdf 2016-12-09
5 Drawing [09-12-2016(online)].pdf 2016-12-09
6 Description(Complete) [09-12-2016(online)].pdf_245.pdf 2016-12-09
7 Description(Complete) [09-12-2016(online)].pdf 2016-12-09
8 Other Patent Document [28-12-2016(online)].pdf 2016-12-28
9 201621042215-HARD COPY OF FORM 1-30-12-2016.pdf 2016-12-30
10 Form 26 [21-01-2017(online)].pdf 2017-01-21
11 ABSTRACT1.JPG 2018-08-11
12 201621042215-ORIGINAL UNDER RULE 6(1A) OTHERS-240117.pdf 2018-08-11
13 201621042215-FER.pdf 2020-07-10
14 201621042215-OTHERS [09-01-2021(online)].pdf 2021-01-09
15 201621042215-FER_SER_REPLY [09-01-2021(online)].pdf 2021-01-09
16 201621042215-COMPLETE SPECIFICATION [09-01-2021(online)].pdf 2021-01-09
17 201621042215-CLAIMS [09-01-2021(online)].pdf 2021-01-09
18 201621042215-ABSTRACT [09-01-2021(online)].pdf 2021-01-09
19 201621042215-PatentCertificate04-11-2023.pdf 2023-11-04
20 201621042215-IntimationOfGrant04-11-2023.pdf 2023-11-04

Search Strategy

1 searchE_10-07-2020.pdf

ERegister / Renewals

3rd: 08 Dec 2023

From 09/12/2018 - To 09/12/2019

4th: 08 Dec 2023

From 09/12/2019 - To 09/12/2020

5th: 08 Dec 2023

From 09/12/2020 - To 09/12/2021

6th: 08 Dec 2023

From 09/12/2021 - To 09/12/2022

7th: 08 Dec 2023

From 09/12/2022 - To 09/12/2023

8th: 08 Dec 2023

From 09/12/2023 - To 09/12/2024

9th: 09 Dec 2024

From 09/12/2024 - To 09/12/2025