Sign In to Follow Application
View All Documents & Correspondence

Method And System For Feature Specific Distance Measure Selection For Access Pattern Mining In Cloud

Abstract: ABSTRACT METHOD AND SYSTEM FOR FEATURE-SPECIFIC DISTANCE MEASURE SELECTION FOR ACCESS PATTERN MINING IN CLOUD The disclosure herein relates to a method and system for feature-specific distance measure selection for access pattern mining in cloud. Access control mechanisms must effectively capture and analyse user access patterns to prevent unauthorized activities and adhere to the principle of least privilege. Conventional methods of identifying user access patterns utilize a generic distance measure for all the features in event logs. Same distance measure may not be suitable to identify patterns in all the features. Thus, the embodiments of present disclosure identify feature-specific distance measure for each feature in the event log and generate a combined distance measure based on weights and correlation coefficients of the features. Access patterns are identified using a clustering model based on the combined distance measure for access pattern mining. The combined distance measure changes dynamically as the event log changes due to which access patterns can be identified more accurately. [To be published with FIG. 2]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
07 September 2023
Publication Number
11/2025
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th Floor, Nariman Point Mumbai Maharashtra India 400021

Inventors

1. PRAKASH, Vakkalagadda Satya Sai
Tata Consultancy Services Limited Plot No 1, Survey No. 64/2, Software Units Layout, Serilingampally Mandal, Madhapur, Hyderabad Telangana India 500034
2. REDDY, Rajidi Satish Chandra
Tata Consultancy Services Limited Plot No 1, Survey No. 64/2, Software Units Layout, Serilingampally Mandal, Madhapur, Hyderabad Telangana India 500034
3. GOPU, Srinivas Reddy
Tata Consultancy Services Limited Plot No 1, Survey No. 64/2, Software Units Layout, Serilingampally Mandal, Madhapur, Hyderabad Telangana India 500034

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
METHOD AND SYSTEM FOR FEATURE-SPECIFIC DISTANCE MEASURE SELECTION FOR ACCESS PATTERN MINING IN CLOUD
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD [001] The disclosure herein generally relates to the field of cloud computing, and, more particularly, to a method and system for feature-specific distance measure selection for access pattern mining in cloud.
BACKGROUND [002] Cloud computing has revolutionized the way organizations store, process, and access their data. The cloud offers scalability, flexibility, and cost-efficiency and empowers businesses to rapidly adapt to changing market demands. While cloud computing offers many benefits, it has brought forth new security challenges, particularly with Identity and Access Management (IAM). The rapid deployment of applications on the cloud often leads to the misconfiguration of permissions and entitlements, with users receiving overly privileged and unnecessary permissions. These unwarranted permissions expand the attack surface and pose significant risks to organizations. Thus, IAM has become increasingly challenging and complex in a cloud environment. Recent high-profile attacks have underscored the urgent need for improved and effective access control mechanisms. For instance, the Capital One data breach in 2019 exposed the personal information of millions of customers due to a misconfiguration in the cloud infrastructure, highlighting the criticality of properly managing access privileges and configurations. Similarly, the Equifax data breach in 2017, which impacted millions of individuals, was a consequence of the exploitation of a vulnerability that could have been mitigated through effective access control and timely patch management. Malicious or compromised accounts may abuse excessive privileges assigned to a user or account to gain unauthorized access to sensitive resources. Attackers can also use such privileges in lateral movement techniques to compromise other parts of the system. These incidents highlight the pressing need for a robust IAM that employs the principle of least privilege which states that a user should only have required level of access to resources and the access should not last longer than required to perform a necessary action or task at hand. In other terms, the access control system should be able to determine the minimum set of privileges required

by a user to complete a task and guarantee that the user is granted only those privileges.
[003] Contextual access policies based on user access behavior can be an effective way to enforce the least access privilege in a cloud environment. While the major cloud vendors, such as Context-Aware Access, BeyondCorp by Google Cloud, and Global condition keys by Amazon Web Services, offer context aware policy capabilities, they have not been widely adopted by cloud consumers. The limited adoption of the cloud native context aware policy services can be attributed to the following challenges: inadequacy of available mechanisms that can provide contextual inputs to generate policies and limited support in detecting accurate user access patterns that are needed to employ the principle of least access privilege. The core problem lies in the lack of comprehensive understanding and analysis of user access patterns in cloud environments. The cloud-native access control mechanisms, which often rely on static and predetermined authorization policies, grant privileges based on predefined roles or user profiles. These mechanisms fail to adapt to dynamic user behavior and may grant excessive privileges to users. These excessive privileges pose greater security risk to the cloud services.
[004] Cloud access and audit logs can provide visibility into how, when and from where users access resources. This information can be useful in developing a robust ML powered pattern mining mechanism which can provide contextual inputs for least access privilege policies. Some conventional methods were proposed for extracting or mining user access patterns in access control systems. However, the said conventional methods have certain limitations such as: (i) Pre-processing of data: Each method requires data to be of a homogeneous datatype. This results in mass pre-processing of data wherein it may lose some of its invaluable feature-specific properties. (ii) Standard and generic distance measures: While certain distance measures used by the methods are based on frequency and data repetition, other distance measures consider all the features to be of a similar type and use the same process on all the features. To address these challenges, it is crucial to build an efficient Machine Learning (ML) based user access behavior/access pattern analyzer.

SUMMARY
[005] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for feature-specific distance measure selection for access pattern mining in cloud is provided. The method includes obtaining a plurality of event logs associated with services, principals, and entities in a cloud network. Each of the plurality of event logs comprises a plurality of features and associated feature values. Further, the method includes determining a feature-specific distance measure corresponding to each of the plurality of features based on the associated feature values and identifying dependencies among the plurality of features based on correlation coefficients of the plurality of features computed using one or more correlation coefficient techniques. The method further includes assigning a weight for each of the plurality of features based on the identified dependencies, number of features referred by the feature-specific distance measure, and complexity of the feature-specific distance measure. Further, the method includes determining a combined distance measure based on the feature-specific distance measure corresponding to each of the plurality of features, weight assigned to the plurality of features and correlation coefficients of the plurality of features. Furthermore, the method includes obtaining the access patterns by clustering the plurality of event logs using a clustering model based on the combined distance measure for access pattern mining in the cloud network.
[006] In another aspect, a system for feature-specific distance measure selection for access pattern mining in cloud is provided. The system includes a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: obtain a plurality of event logs associated with services, principals, and entities in a cloud network. Each of the plurality of event logs comprises a plurality of features and associated feature values. Further, the one

or more hardware processors are configured to determine a feature-specific distance measure corresponding to each of the plurality of features based on the associated feature values and identifying dependencies among the plurality of features based on correlation coefficients of the plurality of features computed using one or more correlation coefficient techniques. The one or more hardware processors are further configured to assign a weight for each of the plurality of features based on the identified dependencies, number of features referred by the feature-specific distance measure, and complexity of the feature-specific distance measure. Further, the one or more hardware processors are configured to determine a combined distance measure based on the feature-specific distance measure corresponding to each of the plurality of features, weight assigned to the plurality of features and correlation coefficients of the plurality of features. Furthermore, the one or more hardware processors are configured to obtain the access patterns by clustering the plurality of event logs using a clustering model based on the combined distance measure for access pattern mining in the cloud network.
[007] In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause a method for feature-specific distance measure selection for access pattern mining in cloud. The method includes obtaining a plurality of event logs associated with services, principals, and entities in a cloud network. Each of the plurality of event logs comprises a plurality of features and associated feature values. Further, the method includes determining a feature-specific distance measure corresponding to each of the plurality of features based on the associated feature values and identifying dependencies among the plurality of features based on correlation coefficients of the plurality of features computed using one or more correlation coefficient techniques. The method further includes assigning a weight for each of the plurality of features based on the identified dependencies, number of features referred by the feature-specific distance measure, and complexity of the feature-specific distance measure. Further, the method includes determining a combined distance measure based on the feature-specific distance measure corresponding to

each of the plurality of features, weight assigned to the plurality of features and correlation coefficients of the plurality of features. Furthermore, the method includes obtaining the access patterns by clustering the plurality of event logs using a clustering model based on the combined distance measure for access pattern mining in the cloud network.
[008] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS [009] The accompanying drawings, which are incorporated in and
constitute a part of this disclosure, illustrate exemplary embodiments and, together
with the description, serve to explain the disclosed principles:
[010] FIG. 1 illustrates an exemplary system for feature-specific distance
measure selection for access pattern mining in cloud, according to some
embodiments of the present disclosure.
[011] FIG. 2 illustrates a flow diagram of a processor implemented method
for feature-specific distance measure selection for access pattern mining in cloud,
according to some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS [012] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[013] Access control is crucial in securing cloud environments, and the principle of least privilege plays a vital role in minimizing potential security risks. One important aspect of cloud security is the management of user permissions,

which control’s access to resources and can be a critical line of defense against unauthorized access or data breaches. In a cloud environment, users can have complex and changing needs, creating a challenge for administrators to manage permissions effectively. Furthermore, cloud environments can be highly dynamic requiring permissions to be managed in real-time and on a large scale. To address these challenges, it is necessary to understand how users are using their permissions in a cloud environment. This requires analysis of the permissions assigned to users, as well as how and when these permissions are being used. However, traditional approaches to permission management are often based on manual and ad-hoc methods, which can be time-consuming and error-prone. Over the years, significant research efforts have been dedicated to mine user access patterns in cloud environments. However, existing literature predominantly focuses on analyzing a particular set of features while overlooking the semantic information associated with these features. There is a need for automated techniques that can mine user access patterns from large-scale access and audit logs generated by cloud services. [014] Traditional pattern mining approaches often rely on analyzing basic features such as timestamp, accessed resource, and user identity. While these features provide valuable insights, they limit the patterns mined by focusing on a specific format such as temporal patterns (timestamp-based) and periodic patterns (frequency and repetition based) due to which several other patterns go unnoticed. The limited effectiveness of pattern mining approaches could be attributed to the following reasons: (i) Lack of flexibility: Traditional models are trained on a set of features with a requirement to mine patterns of a specific format, where the addition of new features or requirements requires re-training the complete model or developing a new model. (ii) Lack of support for heterogeneity: Each of the features obtained from logs is preprocessed and converted into a standard data type that is suitable to be fed into the mining model, thereby losing the feature-specific semantic information in the process. (iii) Lack of inclusion of feature properties in models: Each feature may represent a different kind of environmental aspect in or outside the cloud. Valuable patterns may go undetected by utilizing standard distance measures that simply look for the difference between two values to find

their similarity score. Therefore, it is necessary to introduce the semantic nature of the features into the models so that hidden patterns can also be effectively mined, and the feature context be effectively utilized. Further, the conventional methods use a generic distance measure for all the features in the logs. Same distance measure may not be suitable to identify patterns in all the features. Thus, there is a need for identifying feature-specific distance measures for effectively performing pattern mining.
[015] Embodiments of present disclosure provide a method and system for feature-specific distance measure selection for access pattern mining in cloud. Initially the system obtains event logs associated with services, principals, and entities in a cloud network. Each event log includes a plurality of features and associated feature values. Then, a feature-specific distance measure corresponding to each of the plurality of features is determined based on the associated feature values and dependencies among the plurality of features are identified based on correlation coefficients of the plurality of features computed using one or more correlation coefficient techniques. Later, a weight is assigned for each of the plurality of features based on the identified dependencies, number of features referred by the feature-specific distance measure, and complexity of the feature-specific distance measure. Finally, a combined distance measure is determined based on the feature-specific distance measure corresponding to each of the plurality of features, weight assigned to the plurality of features and correlation coefficients of the plurality of features and access patterns are obtained by clustering the event logs using a clustering model based on the combined distance measure for access pattern mining in the cloud network.
[016] Referring now to the drawings, and more particularly to FIGS. 1 and 2, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[017] FIG. 1 illustrates an exemplary block diagram of a system for feature-specific distance measure selection for access pattern mining in cloud,

according to some embodiments of the present disclosure. In an embodiment, the system 100 includes one or more processors (104), communication interface device(s) (106) or Input/Output (I/O) interface(s) (106) or user interface (106), and one or more data storage devices or memory (102) operatively coupled to the one or more processors (104). The one or more processors (104) that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud, and the like.
[018] The I/O interface device(s) (106) can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) (106) receives event logs from the cloud network as input and provides access patterns as output. The memory (102) may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. Functions of the components of system 100 are explained in conjunction with flow diagram depicted in FIG. 2 for feature-specific distance measure selection for access pattern mining in cloud.
[019] In an embodiment, the system 100 comprises one or more data storage devices or the memory (102) operatively coupled to the processor(s) (104) and is configured to store instructions for execution of steps of the method (200) depicted in FIG. 2 by the processor(s) or one or more hardware processors (104).

The steps of the method of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG. 1 and the steps of flow diagrams as depicted in FIG. 2. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods, and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
[020] FIG. 2 is a functional block diagram of a method 200 for feature-specific distance measure selection for access pattern mining in cloud, according to some embodiments of the present disclosure.
[021] At step 202 of the method 200, the one or more hardware processors 104 are configured to obtain a plurality of event logs associated with services, principals, and entities in a cloud network such as Google Cloud Platform (GCP). In an embodiment, the plurality of event logs maybe access and audit logs associated with the cloud environment. Each of the plurality of event logs includes a plurality of features and associated feature values. In an embodiment, the plurality of features includes principal (action performer-user or a service account or daemon), timestamp (time of action), permission (privilege used by the principal), resource (resource accessed by the principal) and other environmental features like ip address, user agent string, location, version of resource, method used to perform the permission action, whether the access to utilize the permission is granted and so on. In an embodiment, the plurality of event logs are obtained in a table format as in Table 1 (split into tables 1A, 1B and 1C for better visualization).
Table 1A

Sl. No
1 2 3 4 Principal Email callerIp callerSuppliedUserAgent

UserS private google-api-go-client/0.5,gzip(gfe)

UserS private google-api-go-client/0.5,gzip(gfe)

UserS private stubbyclient

UserSV 14.142.130.59 None

5 UserSV 14.142.130.59 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
6 Edg/109.0.1518.52,gzip(gfe)

UserSV 14.142.130.59 Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
7 Edg/109.0.1518.52,gzip(gfe)

UserSV 14.142.130.59 Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
8 Edg/109.0.1518.52,gzip(gfe)

UserSV 14.142.130.59 Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
9 Edg/109.0.1518.52,gzip(gfe)

UserSV 14.142.130.59 Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
10 Edg/109.0.1518.52,gzip(gfe)

UserSV 14.142.130.59 Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.52,gzip(gfe)

Table 1B
Sl. No
1
2
3 4 5 6 Service Name Method Name permission grant ed Resource Name

compute.go ogleapis.co m v1.compute.zones .list compute.zon es.list TRUE projects/proje ct_ex/zones

compute.go ogleapis.co m v1.compute.insta nces.list compute.inst ances.list TRUE projects/proje ct_ex/zones/e urope-west1-d/instances

containeran alysis.googl eapis.com grafeas.v1.Grafea s.ListOccurrences containeranal ysis.occurren ces.list TRUE projects/0000 006db8b312f e

cloudbilling .googleapis. com GetResourceBilli ngInfo resourcemana ger.projects.g et TRUE projects/proje ct_ex

compute.go ogleapis.co m v1.compute.regio ns.list compute.regi ons.list TRUE projects/proje ct_ex/regions

clouderrorr google.devtools.c errorreportin TRUE projects/proje

eporting.go louderrorreportin g.groups.list ct_ex/groupS
ogleapis.co m g.v1beta1.ErrorSt atsService.ListGr tats
7 oupStats

clouderrorr google.devtools.c errorreportin TRUE projects/proje
eporting.go louderrorreportin g.groups.list ct_ex/groupS
ogleapis.co m g.v1beta1.ErrorSt atsService.ListGr tats
8 oupStats

monitoring. google.monitorin monitoring.m TRUE projects/proje
googleapis. g.v3.MetricServic etricDescript ct_ex
com e.ListMetricDescr ors.list
9 iptors

monitoring. google.monitorin monitoring.m TRUE projects/proje
googleapis. g.v3.MetricServic etricDescript ct_ex
com e.ListMetricDescr ors.list
10 iptors

monitoring. google.monitorin monitoring.m TRUE projects/proje
googleapis. g.v3.MetricServic etricDescript ct ex
com e.ListMetricDescr iptors ors.list

Table 1C
Sl. No
1
2 3 4 5 6 7 versio n Locatio n projected timestamp receiveTimestamp

v1 Global projectex 2023-01-17T08:35:07.30 7922Z 2023-01-17T08:35:07.458547 842Z

None None projectex 2023-01-17T08:35:09.05 2536Z 2023-01-17T08:35:09.215391 818Z

None None projectex 2023-01-17T08:35:10.80 0328963Z 2023-01-17T08:35:10.800328 963Z

None None projectex 2023-01-17T08:37:00.62 0582Z 2023-01-17T08:37:01.005100 847Z

v1 Global projectex 2023-01-17T08:37:01.75 3113Z 2023-01-17T08:37:01.973928 835Z

None None projectex 2023-01-17T08:37:02.91 3366182Z 2023-01-17T08:37:03.615436 886Z

None None projectex 2023-01-17T08:37:02.91 2023-01-17T08:37:04.113697

8 9 10 3543881Z 536Z

None None projectex 2023-01-17T08:37:03.08 2497192Z 2023-01-17T08:37:03.933809 993Z

None None projectex 2023-01-17T08:37:03.08 3283422Z 2023-01-17T08:37:04.118273 674Z

None None projectex 2023-01-17T08:37:03.18 9299Z 2023-01-17T08:37:03.646440 545Z
[022] In another embodiment, the plurality of event logs is in JSON (JavaScript Object Notation) format. An example log is as follow-{"entries": [{ "protoPayload": { "@type": "", "authenticationInfo": {
"principalEmail":"", "principalSubject":"" },
"requestMetadata": {
"callerIp":"", "callerSuppliedUserAgent":"", "requestAttributes": { "time":"", "auth": {} },
"destinationAttributes": {} },
"serviceName":"", "methodName":"", "authorizationInfo": [ {
"permission":"", "granted": true, "resourceAttributes": { "service":"", "name":"", "type":"" }

],
"resourceName":"", "numResponseItems":"",
"request": {
"@type": ""
},
"resourceLocation": {
"currentLocations": [ "" ]
}
},
"insertId": "-45jfg0e3n7tk",
"resource": {
"type":"",
"labels": {
"method":"", "project_id":"",
"service":"", "version":"", "location": ""
}
},
"timestamp":"", "severity":"",
"logName":"", "receiveTimestamp":""
}
]}
[023] The raw event logs obtained in JSON format are preprocessed to obtain a final dataset useful for training and analysis. Initially, the logs with complex JSON format are flattened into a simple JSON format where all the keys and values exist at the same level, as in a dictionary key-value pair. This provides feasibility in extracting and converting the JSON data into a structured Comma-Separated Value (CSV) formatted dataset. The key-value pairs are then extracted from the flattened JSON log records and are stored in a CSV file. Thus, the raw logs are converted into a structured dataset. Then, irrelevant columns are dropped from the dataset and the plurality of features such as the user email, user string, accessed resource, utilized permission, and timestamp are extracted from the

structured dataset. An example list of features extracted from the plurality of event logs and their associated feature value (one example) is given in Table 3.
Table 3

Feature Example Value
principalEmail user.email@mail.com

Timestamp 2023-02-02T06:18:18.66Z
projectId internal
Location us-central
Version 8.1.1
resourceName projects/ops/global/ruleABC
Method v1.compute.rules.insert
Service compute.googleapis.com
Granted true/false
Permission compute.rules.create
callerSuppliedUserAgent google-api-go-client/0.5
callerIp 192.168.23.32
[024] Further, at step 204 of the method 200, the one or more hardware processors 104 are configured to determine a feature-specific distance measure (alternatively referred as feature-specific distance method) corresponding to each of the plurality of features based on the associated feature values. Initially a subset of the plurality of event logs is obtained based on a batch size of a clustering model used to obtain access patterns and a representative regular expression is determined for each of the plurality of features based on associated feature values comprised in the subset of the plurality of event logs. The representative regular expression is then matched with a data format assigned to each of the plurality of distance measures to obtain a first set of distance measures. Further, the feature values comprised in the subset of the plurality of event logs are matched with one or more pre-defined feature databases to identify a matched feature database and a plurality of distance measures linked to the matched feature database are selected as a second set of distance measures. Finally, a match percentage of the first set of distance

measures and the second set of distance measures with the data format of the feature values is calculated and a distance measure with highest match percentage is selected as the feature-specific distance measure.
[025] A few example distance measures (alternatively referred as distance methods or distance scores) available in the memory 102 are:
1. Method Name: Method A Method: {Code/Steps} Supporting data type: IP address Supporting data format: *.*.*.* Complexity of method: 4
2. Method Name: Method B Method: {Code/Steps} Supporting data type: IP address Supporting data format: 192.*.*.* Complexity of method: 6
3. Method Name: Method C Method: {Code/Steps} Supporting data type: Timestamp
Supporting data format: DD-MM-YYYYZHH:MM:SST Complexity of method: 2
4. Method Name: Method D
Method: {Code/Steps}
Supporting data type: Email
Supporting data format: *@*.*
Complexity of method: 8
[026] From the table 3, consider the list of features principalEmail (or User), callerIP (or ip address), user agent, service, method, permission, resource, location, timestamp. One among the distance measures (method A, B, C and D) is selected for each of these features as feature-specific distance measure at step 204. For example, consider feature ‘IP Address’. Utilizing regular expression representation for the feature values for IP Address, the data format is determined.

For example, the regular expression from the feature value for IP Address in table 3 is “192.*.*.*”. Method A and method B closely match this expression. Using string similarity check techniques, match percentage of data format of each suitable method with the determined regular expression is calculated. The match percentage for Method A is 75% and for Method B is 100%. So, Method B with the highest match percentage is selected as feature-specific distance measure for IP Address feature.
[027] In an embodiment, the code/steps of the distance measures are defined based on certain factors related to the features. For example, considering the ‘principalEmail’ feature, the distance between two users may depend on various factors such as roles assigned to each user, projects of the users, etc. Thus, an amalgamation of user properties are utilized to arrive at a distance measure for the users. Distance between each user role or user permission, distance between the projects of users, distance between any other properties of the users are determined and based on the weightage assigned to each property, a final distance measure between the users are defined. Similarly, for ‘resource type’, ‘service type’ features, distance measure depends on determining the actual distance between the hierarchy of such types. For example, a service type of SQLFunctions is technically different from another service type of BigDataQuery considering their resource heads and package hierarchies. Thus, the distance measure for these features can be computed by using resource hierarchy graphical representations and determining the node hops between the individual values. For this, a resource hierarchy graph is maintained for all the resources and services in the cloud. The distance between two resources is measured as the number of hops required to make to reach from one resource node to another in the resource hierarchy graph. For the ‘method’ feature listed in table 3, the distance measure can be calculated by representing the services and individual methods supported by each service in a graphical representation and thereby determining the distance based on the graph’s nodes and edges. For example, the distance between two methods com.sqlfunction.read and com.bigdata.query.print can be calculated based on the graphical representation of the hierarchical services and inherent methods supported by the services. Consider

another feature ‘Permission’, distance between two permissions can be calculated as the shortest distance between the two permissions in a permission hierarchy graph of the cloud. Permission hierarchy graph contains all the permissions and their associated roles along with the relationship between these roles. The ‘timestamp’ feature listed in table 3 has various sub-features such as day of the week, day, month, hour, and minute. Each of the sub-features are treated as a circular continuous feature instead of treating them as a continuous integer type. For example, for the 7 days in a week, the distance between day 1 and day 7 is larger than day 7 and day 1. Such kind of distance measures need to be considered to effectively utilize time and date features. Similar measures can be defined for the other features such as months, hours, and minutes. Different variants of these kind of distance measures are considered and one among them is selected at the step 204. [028] Once the feature-specific distance measure is selected for each of the plurality of features, at step 206 of the method 200, the one or more hardware processors 104 are configured to identify dependencies among the plurality of features based on correlation coefficients of the plurality of features computed using one or more correlation coefficient techniques such as Karl Pearson’s Coefficient of Correlation, Spearman’s rank correlation coefficient and the like. In an embodiment, correlation coefficients of each of the plurality of features are computed as follows: (1) Choose a target feature from among the plurality of features. (2) Calculate correlation coefficients with each of the remaining features from the plurality of features using one or more correlation coefficient techniques (for example, Pearson coefficient). (3) Calculate the average of all the correlation coefficients calculated in the previous step. (4) Assign the average value as the correlation coefficient for the target feature selected in step 1. (5) Repeat the same process for all the other features in the list of features. For example, in step 1, principalEmail is chosen as target feature. In step 2, For each of the remaining features from the list of features, correlation coefficients are calculated using Pearson correlation technique. Correlation between the target feature ‘principalEmail’ and ‘callerIp’ is 0.6 (positive correlation); correlation coefficient between ‘principalEmail’ and ‘permission’ is -0.2 (negative correlation);

correlation coefficient between ‘principalEmail’ and ‘time of day’ is 0.1 (positive). Next, in step 3, the average of these correlation coefficients is calculated as 0.233. The calculated average is the correlation coefficient of the target feature ‘principalEmail’ i.e., 0.233. This is repeated for all the remaining features from the list of features by considering each of them as target feature.
[029] Once the dependencies among the plurality of features are identified, at step 208 of the method 200, the one or more hardware processors 104 are configured to assign a weight for each of the plurality of features based on the identified dependencies, number of features referred by the feature-specific distance measure, and complexity of the feature-specific distance measure. An average of the identified dependencies, number of features referred by the feature-specific distance measure (alternatively called as inner features), and complexity of the feature-specific distance measure is calculated for each of the plurality of features. Value of the identified dependencies is taken as the correlation coefficients computed to identify the dependencies. Complexity of the feature-specific distance measure is a score assigned based on their worst-case complexity using categorical value to numerical value conversion techniques. The calculated average is normalized for each of the plurality of features into [0,1) such that the cumulative average value for all the plurality of features is 1. The normalized values are assigned as weights to each of the plurality of features. For example, for the feature-specific distance measure ‘Method B’ of feature ‘IP Address’, number of inner features is 1 (assuming method B does not contain other known features), correlation coefficient is 0 (as calculated in step 206), complexity of the method = 6 (as defined in the description of method B). Thus, the average is 2.33. Once, averages for all features are calculated, they are normalized to get the weight of each feature.
[030] Once the weights are assigned to the plurality of features, at step 210 of the method 200, a combined distance measure is determined based on the feature-specific distance measure corresponding to each of the plurality of features, weight assigned to the plurality of features and correlation coefficients of the plurality of features according to equation 1.

�(�,�) = ∑��=1min(��(��,��),���_����)∗��∗�� (1)
In equation 1, �(�,�) is the combined distance measure, ��(��,��) is the feature-specific distance measure corresponding to feature �, ���_���� is a pre-defined threshold value, �� is weight assigned to feature � and �� is correlation coefficient of feature. For example, the feature-specific distance measure for the feature ‘principalEmail’ is MethodD, for ‘callerIp’ it is MethodA’ and so on. The calculated weights �� are as follows: principalEmail=2.33, callerIp=0.8, timeofday=0.2 and so on. Similarly, the correlation coefficients �� are as follows: principalEmail=0.233, callerIp=-0.3, timeofday=0.33 and so on. The threshold value ���_���� considered is 10. Thus, the combined distance measure is calculated as follows: � (�, �) = ��� (MethodD, 10) ∗ 2.33 ∗ 0.233 + ��� (MethodA, 10) ∗ 0.8 ∗ -0.3 + … + ���(MethodS, 10) ∗ 0.2 ∗ 0.33.
[031] Once the combined distance measure is determined, at step 212 of method 200, the one or more hardware processors 104 are configured to obtain access patterns by clustering the plurality of event logs using a clustering model based on the combined distance measure for access pattern mining in the cloud network. In an embodiment, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering model is used. Different clustering models can be used in different embodiments based on user requirement. The clustering model divides the event logs into multiple clusters based on the combined distance measure using which access pattern mining is performed in the cloud network. The combined distance measure changes dynamically as the input data (event logs) changes due to which access patterns can be identified more accurately. Through an in-depth analysis of the patterns mined from the event logs (access and audit logs), effective access control mechanisms can be developed that mitigate risks, enhance security, and ensure the integrity of cloud networks.
[032] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do

not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[033] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
[034] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[035] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily

defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[036] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[037] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor implemented method comprising:
obtaining (202), via one or more hardware processors, a plurality of event logs associated with services, principals, and entities in a cloud network, wherein each of the plurality of event logs comprises a plurality of features and associated feature values;
determining (204), via the one or more hardware processors, a feature-specific distance measure corresponding to each of the plurality of features based on the associated feature values;
identifying (206), via the one or more hardware processors, dependencies among the plurality of features based on correlation coefficients of the plurality of features computed using one or more correlation coefficient techniques;
assigning (208), via the one or more hardware processors, a weight for each of the plurality of features based on the identified dependencies, number of features referred by the feature-specific distance measure, and complexity of the feature-specific distance measure;
determining (210), via the one or more hardware processors, a combined distance measure based on the feature-specific distance measure corresponding to each of the plurality of features, weight assigned to the plurality of features and correlation coefficients of the plurality of features; and
obtaining access patterns by clustering (212), via the one or more hardware processors, the plurality of event logs using a clustering model based on the combined distance measure for access pattern mining in the cloud network.
2. The method as claimed in claim 1, wherein determining a feature-specific
distance measure corresponding to each of the plurality of features based on
their associated feature values comprises:

obtaining a subset of the plurality of event logs based on a batch size of the clustering model;
determining a representative regular expression for each of the plurality of features based on associated feature values comprised in the subset of the plurality of event logs;
matching the representative regular expression of each of the plurality of features with a data format assigned to each of the plurality of distance measures to obtain a first set of distance measures;
matching the feature values comprised in the subset of the plurality of event logs with one or more pre-defined feature databases to identify a matched feature database;
selecting a plurality of distance measures linked to the matched feature database as a second set of distance measures;
calculating a match percentage of the first set of distance measures and the second set of distance measures with the data format of the feature values; and
selecting a distance measure among the first set of distance measures and the second set of distance measures based on the match percentage.
3. The method as claimed in claim 1, wherein the combined distance measure is determined based on the feature-specific distance measure corresponding to each of the plurality of features, weight assigned to the plurality of features and correlation coefficients of the plurality of features by using the equation: �(�,�) = ∑��=1min(��(��,��),���_����) ∗ �� ∗ ��, wherein �(�, �) is the combined distance measure, ��(��, ��) is the feature-specific distance measure corresponding to feature �, ���_���� is a pre-defined threshold value, �� is weight assigned to feature � and �� is correlation coefficient of feature.
4. The method as claimed in claim 1, wherein assigning a weight for each of the plurality of features based on the identified dependencies, number of

features referred by the feature-specific distance measure, and complexity of the feature-specific distance measure comprises:
calculating an average of the identified dependencies, number of features referred by the feature-specific distance measure, and complexity of the feature-specific distance measure for each of the plurality of features, wherein value of the identified dependencies is taken as the correlation coefficients computed to identify the dependencies, and wherein complexity of the feature-specific distance measure is a score assigned based on their worst-case complexity using categorical value to numerical value conversion techniques; and
normalizing the calculated average for each of the plurality of features into [0,1) such that the cumulative average value for all the plurality of features is 1.
A system (100), comprising:
a memory (102) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
obtain a plurality of event logs associated with services,
principals, and entities in a cloud network, wherein each of the
plurality of event logs comprises a plurality of features and
associated feature values;
determine a feature-specific distance measure corresponding
to each of the plurality of features based on the associated feature
values;
identify dependencies among the plurality of features based
on correlation coefficients of the plurality of features computed
using one or more correlation coefficient techniques;

assign a weight for each of the plurality of features based on the identified dependencies, number of features referred by the feature-specific distance measure, and complexity of the feature-specific distance measure;
determine a combined distance measure based on the feature-specific distance measure corresponding to each of the plurality of features, weight assigned to the plurality of features and correlation coefficients of the plurality of features; and
obtain access patterns by clustering the plurality of event logs using a clustering model based on the combined distance measure for access pattern mining in the cloud network.
6. The system as claimed in claim 5, wherein the one or more hardware processors are configured to determine a feature-specific distance measure corresponding to each of the plurality of features based on their associated feature values by:
obtaining a subset of the plurality of event logs based on a batch size of the clustering model;
determining a representative regular expression for each of the plurality of features based on associated feature values comprised in the subset of the plurality of event logs;
matching the representative regular expression of each of the plurality of features with a data format assigned to each of the plurality of distance measures to obtain a first set of distance measures;
matching the feature values comprised in the subset of the plurality of event logs with one or more pre-defined feature databases to identify a matched feature database;
selecting a plurality of distance measures linked to the matched feature database as a second set of distance measures;

calculating a match percentage of the first set of distance measures and the second set of distance measures with the data format of the feature values; and
selecting a distance measure among the first set of distance measures and the second set of distance measures based on the match percentage.
7. The system as claimed in claim 5, wherein the combined distance measure
is determined based on the feature-specific distance measure corresponding to each of the plurality of features, weight assigned to the plurality of features and correlation coefficients of the plurality of features by using the equation: �(�,�) = ∑��=1min(��(��,��),���_����) ∗ �� ∗ ��, wherein �(�, �) is the combined distance measure, ��(��, ��) is the feature-specific distance measure corresponding to feature �, ���_���� is a pre-defined threshold value, �� is weight assigned to feature � and �� is correlation coefficient of feature.
8. The system as claimed in claim 5, wherein the one or more hardware processors are configured to assign a weight for each of the plurality of features based on the identified dependencies, number of features referred by the feature-specific distance measure, and complexity of the feature-specific distance measure by:
calculating an average of the identified dependencies, number of features referred by the feature-specific distance measure, and complexity of the feature-specific distance measure for each of the plurality of features, wherein value of the identified dependencies is taken as the correlation coefficients computed to identify the dependencies, and wherein complexity of the feature-specific distance measure is a score assigned based on their worst-case complexity using categorical value to numerical value conversion techniques; and

normalizing the calculated average for each of the plurality of features into [0,1) such that the cumulative average value for all the plurality of features is 1.

Documents

Application Documents

# Name Date
1 202321060285-STATEMENT OF UNDERTAKING (FORM 3) [07-09-2023(online)].pdf 2023-09-07
2 202321060285-REQUEST FOR EXAMINATION (FORM-18) [07-09-2023(online)].pdf 2023-09-07
3 202321060285-FORM 18 [07-09-2023(online)].pdf 2023-09-07
4 202321060285-FORM 1 [07-09-2023(online)].pdf 2023-09-07
5 202321060285-FIGURE OF ABSTRACT [07-09-2023(online)].pdf 2023-09-07
6 202321060285-DRAWINGS [07-09-2023(online)].pdf 2023-09-07
7 202321060285-DECLARATION OF INVENTORSHIP (FORM 5) [07-09-2023(online)].pdf 2023-09-07
8 202321060285-COMPLETE SPECIFICATION [07-09-2023(online)].pdf 2023-09-07
9 202321060285-Proof of Right [04-10-2023(online)].pdf 2023-10-04
10 202321060285-FORM-26 [19-01-2024(online)].pdf 2024-01-19
11 Abstract.jpg 2024-02-12
12 202321060285-FORM-26 [07-11-2025(online)].pdf 2025-11-07