Sign In to Follow Application
View All Documents & Correspondence

System For Machine Learning Enabled Forensic Artifact Triage And Correlation (Ml Fatc)

Abstract: The present disclosure provides a system for machine learning enabled forensic artifact triage and correlation (ML-FATC), comprising a data collection module configured to collect digital forensic artifacts, a preprocessing module adapted to preprocess said artifacts, a feature extraction module for extracting features from said artifacts, an ML model application module configured to apply supervised and unsupervised learning algorithms to said extracted features to categorize and prioritize artifacts, and a correlation module configured to correlate artifacts determined as relevant to a forensic investigation. Fig. 1 Drawings / FIG. 1 / FIG. 2 / FIG. 3

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
26 April 2024
Publication Number
23/2024
Publication Type
INA
Invention Field
BIO-MEDICAL ENGINEERING
Status
Email
Parent Application

Applicants

MARWADI UNIVERSITY
MARWADI UNIVERSITY, RAJKOT- MORBI HIGHWAY, AT GAURIDAD, RAJKOT – 360003, GUJARAT, INDIA
PARTH PARMAR
MARWADI UNIVERSITY, RAJKOT- MORBI HIGHWAY, AT GAURIDAD, RAJKOT – 360003, GUJARAT, INDIA
DR. KRUNAL VAGHELA
MARWADI UNIVERSITY, RAJKOT- MORBI HIGHWAY, AT GAURIDAD, RAJKOT – 360003, GUJARAT, INDIA
DR. MUNINDRA LUNAGARIA
MARWADI UNIVERSITY, RAJKOT- MORBI HIGHWAY, AT GAURIDAD, RAJKOT – 360003, GUJARAT, INDIA

Inventors

1. PARTH PARMAR
MARWADI UNIVERSITY, RAJKOT- MORBI HIGHWAY, AT GAURIDAD, RAJKOT – 360003, GUJARAT, INDIA
2. DR. KRUNAL VAGHELA
MARWADI UNIVERSITY, RAJKOT- MORBI HIGHWAY, AT GAURIDAD, RAJKOT – 360003, GUJARAT, INDIA
3. DR. MUNINDRA LUNAGARIA
MARWADI UNIVERSITY, RAJKOT- MORBI HIGHWAY, AT GAURIDAD, RAJKOT – 360003, GUJARAT, INDIA

Specification

Description:Field of the Invention

Generally, the present disclosure relates to forensic analysis systems. Particularly, the present disclosure relates to a system for machine learning enabled forensic artifact triage and correlation.
Background
The background description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
Digital forensics has become an essential part of cybersecurity, focusing on the recovery and investigation of material found in digital devices, often in relation to computer crime. The field has evolved to address the growing sophistication of cyberattacks and the increasing volume of digital data that must be analyzed during investigations. Traditional methods for forensic artifact collection and analysis have relied heavily on manual processes, which are time-consuming and may not be efficient in handling large volumes of data or in identifying subtle patterns indicative of cyber threats.
One well-known system in the realm of digital forensics involves the use of static analysis techniques for the collection and examination of digital artifacts. These techniques are designed to extract data without altering or executing the content in any way. While effective for ensuring the integrity of the data, the static analysis is limited by its inability to detect sophisticated attacks that employ dynamic execution to evade detection. The primary drawback associated with static analysis is its limited scope in identifying only those threats that are visible without the execution of potential malware or malicious scripts.
Another commonly employed technique in digital forensics is dynamic analysis, which involves executing the digital artifacts in a controlled environment to observe their behavior. This method allows for the detection of malicious activities that only manifest during runtime. However, dynamic analysis faces challenges in terms of scalability and the risk of missing stealthy attacks that are designed to detect and evade analysis environments. Moreover, both static and dynamic analysis methods require significant manual effort to correlate artifacts across different sources and to prioritize them effectively for investigations, leading to delays and potential oversight in critical forensic analyses.
Furthermore, the increasing complexity and volume of digital forensic artifacts necessitate the adoption of advanced methods to efficiently process and analyze data. Manual correlation and prioritization of forensic artifacts are becoming increasingly impractical due to the sheer scale of data involved in modern digital investigations. These conventional methods struggle to keep pace with the rapid advancements in technology and the evolving tactics used by cyber adversaries.
In light of the above discussion, there exists an urgent need for solutions that overcome the problems associated with conventional methods for the collection, analysis, correlation, and prioritization of digital forensic artifacts. Solutions that can automate these processes, enhance the accuracy of analyses, and reduce the time required for investigations are crucial for improving the effectiveness of digital forensic investigations.

Summary
Generally, the present disclosure relates to forensic analysis systems. Particularly, the present disclosure relates to a system for machine learning enabled forensic artifact triage and correlation.
The following presents a simplified summary of various aspects of this disclosure in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements nor delineate the scope of such aspects. Its purpose is to present some concepts of this disclosure in a simplified form as a prelude to the more detailed description that is presented later.
The following paragraphs provide additional support for the claims of the subject application.
The Machine Learning-Enabled Forensic Artifact Triage and Correlation (ML-FATC) system represents a sophisticated approach to managing and analyzing digital forensic data. At its core, the system includes a suite of interconnected modules designed to streamline the identification, categorization, and correlation of digital artifacts, which are crucial for forensic investigations. These modules include a data collection module, a preprocessing module, a feature extraction module, an ML model application module, and a correlation module. Together, these components work seamlessly to enhance the efficiency and effectiveness of forensic analysis through the use of advanced machine learning techniques.
In an embodiment, the data collection module of the ML-FATC system is engineered to gather digital forensic artifacts from a wide array of digital sources. This includes, but is not limited to, computers, mobile devices, and cloud services. The ability to collect data from multiple sources is fundamental to the system's design, allowing for a comprehensive analysis that covers the full spectrum of digital footprints left by individuals across various platforms.
In an embodiment, the ML model application module within the system incorporates models that have been trained on labeled datasets. These models are adept at categorizing artifacts into pre-defined categories that are highly relevant to forensic investigations. This targeted categorization is crucial for sifting through vast amounts of data and identifying the pieces of information that are most pertinent to the investigation at hand.
In an embodiment, the feature extraction module leverages natural language processing (NLP) techniques to scrutinize textual artifacts. This module is capable of extracting key entities, identifying relevant keywords, and discerning sentiment from text-based data. The use of NLP techniques enriches the analysis by providing deeper insights into the textual content contained within the digital artifacts, which can be critical for understanding the context and intent behind the data.
In an embodiment, the ML model application module utilizes unsupervised learning algorithms to uncover patterns and relationships within the artifacts that may indicate suspicious activities. These algorithms are adept at detecting anomalies and patterns without the need for pre-labeled data, making them invaluable for identifying previously unknown or unexpected connections within the data.
In an embodiment, the application of deep learning models, such as convolutional neural networks and recurrent neural networks, is a highlight of the ML model application module. These models are particularly effective for analyzing multimedia artifacts, including images and videos, providing a level of analysis that goes beyond simple textual or data-driven investigations to include the rich content contained within multimedia files.
In an embodiment, the correlation module of the ML-FATC system employs graph analysis techniques. These techniques are instrumental in visualizing and analyzing the connections between different artifacts, offering investigators a powerful tool for uncovering relationships and interactions that may not be immediately apparent. Through graph analysis, the system can reveal complex networks of data interactions, aiding in the reconstruction of events or activities related to the investigation.
In an embodiment, the ML model application module is designed to continuously improve its performance through feedback mechanisms and active learning methodologies. This adaptability ensures that the system remains effective even as new types of digital artifacts emerge and as forensic investigation techniques evolve. By adjusting its models based on real-world feedback, the ML-FATC system stays at the forefront of forensic analysis technology.
In an embodiment, the ML-FATC system incorporates robust data management and privacy controls. These controls include encryption and access management features that safeguard the integrity and confidentiality of the forensic artifacts collected during an investigation. Such measures are critical for maintaining the trustworthiness of the analysis and ensuring that the system adheres to legal and ethical standards related to data privacy.
Finally, in an embodiment, the method for ML-FATC in digital forensic investigations outlines a comprehensive process that starts with the collection of digital forensic artifacts from various sources. This is followed by the preprocessing of the collected artifacts, feature extraction, and the application of both supervised and unsupervised machine learning algorithms to categorize and prioritize artifacts based on their relevance to the investigation. The culmination of this process is the correlation of relevant artifacts, which enables investigators to uncover hidden patterns indicative of criminal behavior and to reconstruct digital timelines. This methodical approach facilitates a deeper understanding of the data involved in forensic investigations, ultimately aiding in the pursuit of justice.

Brief Description of the Drawings

The features and advantages of the present disclosure would be more clearly understood from the following description taken in conjunction with the accompanying drawings in which:
FIG. 1 illustrates a system for machine learning enabled forensic artifact triage and correlation (ML-FATC), in accordance with the embodiments of the present disclosure;
FIG. 2 illustrates a method for machine learning enabled forensic artifact triage and correlation in digital forensic investigations, in accordance with the embodiments of the present disclosure; and
FIG. 3 illustrates a process flow diagram of a machine learning enabled forensic artifact triage and correlation, in accordance with the embodiments of the present disclosure.

Detailed Description
In the following detailed description of the invention, reference is made to the accompanying drawings that form a part hereof, and in which is shown, by way of illustration, specific embodiments in which the invention may be practiced. In the drawings, like numerals describe substantially similar components throughout the several views. These embodiments are described in sufficient detail to claim those skilled in the art to practice the invention. Other embodiments may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims and equivalents thereof.
The use of the terms “a” and “an” and “the” and “at least one” and similar referents in the context of describing the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The use of the term “at least one” followed by a list of one or more items (for example, “at least one of A and B”) is to be construed to mean one item selected from the listed items (A or B) or any combination of two or more of the listed items (A and B), unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.
Generally, the present disclosure relates to forensic analysis systems. Particularly, the present disclosure relates to a system for machine learning enabled forensic artifact triage and correlation.
Pursuant to the "Detailed Description" section herein, whenever an element is explicitly associated with a specific numeral for the first time, such association shall be deemed consistent and applicable throughout the entirety of the "Detailed Description" section, unless otherwise expressly stated or contradicted by the context.
FIG. 1 illustrates a system (100) for machine learning enabled forensic artifact triage and correlation (ML-FATC), in accordance with the embodiments of the present disclosure. The system (100) includes multiple interconnected modules designed to facilitate the efficient processing and analysis of digital forensic artifacts. The disclosed system (100) aims to enhance the capabilities of forensic investigators by automating the identification, categorization, and correlation of relevant digital evidence.
In an embodiment, a data collection module (102) is configured to collect digital forensic artifacts from various digital sources. These artifacts may include files, logs, memory dumps, and other digital evidence that could be relevant to a forensic investigation. The data collection module (102)is designed to interface with a wide range of digital environments, ensuring the comprehensive gathering of potential forensic evidence.
In an embodiment, a preprocessing module (104) is adapted to preprocess said artifacts collected by the data collection module. Preprocessing involves preparing the collected artifacts for further analysis by performing operations such as formatting, normalization, and cleaning. This module ensures that the artifacts are in a suitable state for feature extraction and analysis, thereby improving the accuracy and efficiency of subsequent processes.
In an embodiment, a feature extraction module (106) is responsible for extracting features from said artifacts that have been preprocessed. This module analyzes the artifacts to identify characteristics or attributes that are relevant for the machine learning analysis. The feature extraction process is crucial for transforming raw data into a structured format that can be utilized by machine learning algorithms.
In an embodiment, an ML model application module (108)108 is configured to apply supervised and unsupervised learning algorithms to said extracted features to categorize and prioritize artifacts. This module employs a variety of machine learning models to analyze the features and identify patterns or anomalies indicative of forensic relevance. The use of both supervised and unsupervised learning algorithms allows for the effective handling of labeled and unlabeled data, enhancing the system (100)'s ability to categorize and prioritize forensic artifacts accurately.
In an embodiment, a correlation module (110)is configured to correlate artifacts determined as relevant to a forensic investigation. This module analyzes the categorized and prioritized artifacts to identify relationships between different pieces of evidence. The correlation process is vital for reconstructing events, understanding the scope of an incident, and providing investigators with a comprehensive view of the evidence. The module employs advanced algorithms to ensure that correlations are accurately identified, thereby facilitating the investigative process.
In an embodiment, the data collection module (102) of the system (100) is further configured to collect artifacts from multiple digital sources, thereby enhancing the breadth and depth of forensic investigations. These sources encompass computers, mobile devices, cloud services, and other digital platforms where potential evidence might reside. By integrating capabilities to interface with a wide variety of digital environments, the module ensures comprehensive evidence gathering. This extensive collection capability is crucial for forensic investigations, as digital evidence can be scattered across various devices and platforms. The ability to collect data from such a diverse range of sources not only ensures that no potential evidence is overlooked but also aids in constructing a more complete picture of the digital activity associated with a forensic case. The inclusiveness of the data collection module (102)significantly boosts the effectiveness of the forensic analysis process by providing a rich set of artifacts for examination.
In an embodiment, the ML model application module (108)includes models trained on labeled datasets, enhancing the system (100)'s ability to categorize artifacts into pre-defined categories relevant to forensic investigations. This capability is crucial for organizing vast amounts of digital evidence in a manner that is meaningful for forensic analysis. The employment of supervised learning techniques, wherein models are trained using datasets that have been labeled with correct outcomes, allows for the accurate categorization of artifacts. Such categorization enables investigators to quickly identify relevant pieces of evidence and prioritize their analysis efforts accordingly. The use of labeled datasets ensures that the models are well-informed and can make accurate predictions about the categorization of new artifacts. This approach streamlines the triage process, helping to filter out irrelevant data and focus resources on analyzing evidence that is most likely to contribute to the resolution of a case.
In an embodiment, the feature extraction module (106)employs natural language processing (NLP) techniques to analyze textual artifacts, thereby extracting entities, keywords, and sentiment. This capability is instrumental in handling the vast amounts of unstructured text data that forensic investigations often encounter. By applying NLP techniques, the module can automatically identify and extract relevant information from documents, emails, chat logs, and other text-based evidence. The extraction of entities and keywords facilitates the categorization and prioritization of artifacts, while sentiment analysis provides insights into the emotional tone of communications, which can be pivotal in understanding the context of certain events or interactions. This sophisticated analysis of textual artifacts through NLP techniques significantly enhances the system (100)'s ability to process and analyze digital evidence, thereby aiding in the more effective and efficient resolution of forensic investigations.
In an embodiment, the ML model application module (108)utilizes unsupervised learning algorithms to detect patterns and relationships indicative of suspicious activities within the artifacts. This approach allows the system (100) to identify anomalies and patterns without the need for pre-labeled datasets. Unsupervised learning algorithms analyze the data to find hidden structures or relationships that may suggest suspicious or malicious activities. By detecting these patterns, the module provides valuable insights that can lead to the identification of potential threats or evidence of criminal activities. This capability is especially useful in situations where the nature of the malicious activity is unknown or when dealing with novel types of cyber threats. The use of unsupervised learning algorithms in analyzing artifacts ensures that the system (100) remains adaptable and effective in uncovering suspicious activities, thereby enhancing the overall efficacy of forensic investigations.
In an embodiment, the ML model application module (108)applies deep learning models, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs), for the analysis of multimedia artifacts. This implementation acknowledges the growing importance of multimedia data, such as images, videos, and audio recordings, in forensic investigations. Deep learning models, particularly CNNs and RNNs, are well-suited for analyzing complex patterns within multimedia content, enabling the system (100) to extract relevant features and identify significant information hidden within such data. The application of deep learning models facilitates the recognition of faces, objects, speech, and patterns of activity that can be crucial evidence in forensic cases. This capability significantly enhances the system (100)'s ability to process and analyze a wide range of digital evidence, thereby providing investigators with comprehensive insights into the multimedia artifacts associated with a case.
In an embodiment, the correlation module (110)employs graph analysis techniques to visualize and analyze connections between different artifacts. This approach leverages the power of graph theory to map the relationships and interactions between various pieces of evidence, facilitating the understanding of complex relationships and interactions within the data. Graph analysis enables the identification of patterns and networks of activity that might not be apparent through linear analysis methods. By visualizing these connections, investigators can more easily understand the relationships between different artifacts, uncover hidden associations, and construct a coherent narrative of events. The application of graph analysis techniques in the correlation module (110) significantly aids in the reconstruction of events and the identification of key evidence, enhancing the effectiveness of forensic investigations.
In an embodiment, the ML model application module (108) is further adapted to adjust its models based on feedback mechanisms and active learning methodologies. This adaptability ensures that the system (100) continuously improves its performance over time. Feedback mechanisms allow the system (100) to learn from its predictions and the outcomes of investigations, refining its models to increase accuracy and efficiency. Active learning methodologies enable the system (100) to query investigators for inputs on uncertain predictions, incorporating human expertise into the learning process. This dynamic adjustment of models based on feedback and active learning ensures that the system (100) remains effective even as new types of digital evidence and cyber threats emerge. The capacity for continuous learning and adaptation enhances the system (100)'s long-term utility and effectiveness in forensic investigations.
In an embodiment, the system (100) includes robust data management and privacy controls, including encryption and access controls, to maintain the integrity and confidentiality of the collected forensic artifacts. This aspect addresses the critical need for secure handling and storage of sensitive digital evidence. By implementing encryption, the system (100) ensures that all collected artifacts are protected from unauthorized access or tampering, preserving the evidence's integrity. Access controls further safeguard the evidence by restricting access to authorized personnel only, preventing potential breaches of confidentiality. These data management and privacy controls are essential for maintaining the trustworthiness of the forensic investigation process, ensuring that the evidence remains admissible in legal proceedings and that privacy concerns are adequately addressed. The inclusion of these controls underscores the system (100)'s commitment to upholding the highest standards of data security and privacy protection in forensic analysis.
FIG. 2 illustrates a method 200 for machine learning enabled forensic artifact triage and correlation in digital forensic investigations, in accordance with the embodiments of the present disclosure. At step 202, the method initiates with the collection of digital forensic artifacts from a wide array of digital sources, including but not limited to computers, mobile devices, and cloud-based services. This step ensures a comprehensive evidence base for the investigation. At step 204, the method involves preprocessing said collected artifacts, which includes cleaning, normalizing, and formatting the data to prepare it for further analysis. This step improves the quality and usability of the data. At step 206, the method encompasses extracting features from the preprocessed artifacts. This process identifies and isolates significant attributes or characteristics from the data, which are crucial for the analysis that follows. The step 208 involves applying both supervised and unsupervised machine learning algorithms to the extracted features. This step categorizes and prioritizes the artifacts according to their relevance to the ongoing investigation, aiding in efficient evidence analysis. At step 210, the method includes correlating relevant artifacts to uncover hidden patterns and behaviors indicative of criminal activity. This step also aids in reconstructing digital timelines, providing a comprehensive understanding of the events under investigation.
FIG. 3 illustrates a process flow diagram of a machine learning enabled forensic artifact triage and correlation, in accordance with the embodiments of the present disclosure. The process begins with the collection of forensic artifacts from varied digital sources. Subsequent to this initial step, the artifacts undergo a preprocessing phase to ensure data uniformity and to enhance analytical compatibility. The preprocessed data is then subjected to feature extraction, where distinct attributes of the artifacts are identified. These extracted features form the basis for the application of ML models, which undertake the task of analyzing and classifying the data. This classification is crucial for the triage of artifacts, which is represented by a decision diamond indicating a binary outcome: if artifacts are deemed relevant, they proceed to the correlation phase; if not, the triage process concludes. The correlation step is critical for unveiling patterns and relationships in the data that could indicate criminal behavior, completing the investigative process.
Example embodiments herein have been described above with reference to block diagrams and flowchart illustrations of methods and apparatuses. It will be understood that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, respectively, can be implemented by various means including hardware, software, firmware, and a combination thereof. For example, in one embodiment, each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations can be implemented by computer program instructions. These computer program instructions may be loaded onto a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create means for implementing the functions specified in the flowchart block or blocks.
Throughout the present disclosure, the term ‘processing means’ or ‘microprocessor’ or ‘processor’ or ‘processors’ includes, but is not limited to, a general purpose processor (such as, for example, a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a microprocessor implementing other types of instruction sets, or a microprocessor implementing a combination of types of instruction sets) or a specialized processor (such as, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), or a network processor).
The term “non-transitory storage device” or “storage” or “memory,” as used herein relates to a random access memory, read only memory and variants thereof, in which a computer can store data or software for any duration.
Operations in accordance with a variety of aspects of the disclosure is described above would not have to be performed in the precise order described. Rather, various steps can be handled in reverse order or simultaneously or not at all.
While several implementations have been described and illustrated herein, a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein may be utilized, and each of such variations and/or modifications is deemed to be within the scope of the implementations described herein. More generally, all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific implementations described herein. It is, therefore, to be understood that the foregoing implementations are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, implementations may be practiced otherwise than as specifically described and claimed. Implementations of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the scope of the present disclosure.

Claims

I/We claims:

A system (100) for machine learning enabled forensic artifact triage and correlation (ML-FATC), comprising: a data collection module (102) configured to collect digital forensic artifacts; a preprocessing module (104) adapted to preprocess said artifacts; a feature extraction module (106) for extracting features from said artifacts; an ML model application module (108) configured to apply supervised and unsupervised learning algorithms to said extracted features to categorize and prioritize artifacts; and a correlation module (110) configured to correlate artifacts determined as relevant to a forensic investigation.
The system (100) of claim 1, wherein the data collection module (102) is further configured to collect artifacts from multiple digital sources, including but not limited to computers, mobile devices, and cloud services.
The system (100) of claim 1, wherein the ML model application module (108) includes models trained on labeled datasets for the categorization of artifacts into pre-defined categories relevant to forensic investigations.
The system (100) of claim 1, wherein the feature extraction module (106) employs natural language processing techniques to analyze textual artifacts and extract entities, keywords, and sentiment.
The system (100) of claim 1, wherein the ML model application module (108) utilizes unsupervised learning algorithms to detect patterns and relationships indicative of suspicious activities within the artifacts.
The system (100) of claim 1, wherein the ML model application module (108) applies deep learning models, including convolutional neural networks and recurrent neural networks, for the analysis of multimedia artifacts.
The system (100) of claim 1, wherein the correlation module (110) employs graph analysis techniques to visualize and analyze connections between different artifacts.
The system (100) of claim 1, wherein the ML model application module (108) is further adapted to adjust its models based on feedback mechanisms and active learning methodologies.
The system (100) of claim 1, wherein the system includes robust data management and privacy controls, including encryption and access controls, to maintain the integrity and confidentiality of the collected forensic artifacts.
A method (200) for machine learning enabled forensic artifact triage and correlation in digital forensic investigations, the method (200) comprising: collecting digital forensic artifacts from various digital sources; preprocessing said collected artifacts; extracting features from the preprocessed artifacts; applying supervised and unsupervised machine learning algorithms to the extracted features to categorize and prioritize the artifacts based on their relevance to an investigation; and correlating relevant artifacts to uncover hidden patterns indicative of criminal behavior and to reconstruct digital timelines.

SYSTEM FOR MACHINE LEARNING ENABLED FORENSIC ARTIFACT TRIAGE AND CORRELATION (ML-FATC)

The present disclosure provides a system for machine learning enabled forensic artifact triage and correlation (ML-FATC), comprising a data collection module configured to collect digital forensic artifacts, a preprocessing module adapted to preprocess said artifacts, a feature extraction module for extracting features from said artifacts, an ML model application module configured to apply supervised and unsupervised learning algorithms to said extracted features to categorize and prioritize artifacts, and a correlation module configured to correlate artifacts determined as relevant to a forensic investigation.

Fig. 1

Drawings
/
FIG. 1

/
FIG. 2

/
FIG. 3

, Claims:I/We claims:

A system (100) for machine learning enabled forensic artifact triage and correlation (ML-FATC), comprising: a data collection module (102) configured to collect digital forensic artifacts; a preprocessing module (104) adapted to preprocess said artifacts; a feature extraction module (106) for extracting features from said artifacts; an ML model application module (108) configured to apply supervised and unsupervised learning algorithms to said extracted features to categorize and prioritize artifacts; and a correlation module (110) configured to correlate artifacts determined as relevant to a forensic investigation.
The system (100) of claim 1, wherein the data collection module (102) is further configured to collect artifacts from multiple digital sources, including but not limited to computers, mobile devices, and cloud services.
The system (100) of claim 1, wherein the ML model application module (108) includes models trained on labeled datasets for the categorization of artifacts into pre-defined categories relevant to forensic investigations.
The system (100) of claim 1, wherein the feature extraction module (106) employs natural language processing techniques to analyze textual artifacts and extract entities, keywords, and sentiment.
The system (100) of claim 1, wherein the ML model application module (108) utilizes unsupervised learning algorithms to detect patterns and relationships indicative of suspicious activities within the artifacts.
The system (100) of claim 1, wherein the ML model application module (108) applies deep learning models, including convolutional neural networks and recurrent neural networks, for the analysis of multimedia artifacts.
The system (100) of claim 1, wherein the correlation module (110) employs graph analysis techniques to visualize and analyze connections between different artifacts.
The system (100) of claim 1, wherein the ML model application module (108) is further adapted to adjust its models based on feedback mechanisms and active learning methodologies.
The system (100) of claim 1, wherein the system includes robust data management and privacy controls, including encryption and access controls, to maintain the integrity and confidentiality of the collected forensic artifacts.
A method (200) for machine learning enabled forensic artifact triage and correlation in digital forensic investigations, the method (200) comprising: collecting digital forensic artifacts from various digital sources; preprocessing said collected artifacts; extracting features from the preprocessed artifacts; applying supervised and unsupervised machine learning algorithms to the extracted features to categorize and prioritize the artifacts based on their relevance to an investigation; and correlating relevant artifacts to uncover hidden patterns indicative of criminal behavior and to reconstruct digital timelines.

SYSTEM FOR MACHINE LEARNING ENABLED FORENSIC ARTIFACT TRIAGE AND CORRELATION (ML-FATC)

Documents

Application Documents

# Name Date
1 202421033143-OTHERS [26-04-2024(online)].pdf 2024-04-26
2 202421033143-FORM FOR SMALL ENTITY(FORM-28) [26-04-2024(online)].pdf 2024-04-26
3 202421033143-FORM 1 [26-04-2024(online)].pdf 2024-04-26
4 202421033143-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [26-04-2024(online)].pdf 2024-04-26
5 202421033143-EDUCATIONAL INSTITUTION(S) [26-04-2024(online)].pdf 2024-04-26
6 202421033143-DRAWINGS [26-04-2024(online)].pdf 2024-04-26
7 202421033143-DECLARATION OF INVENTORSHIP (FORM 5) [26-04-2024(online)].pdf 2024-04-26
8 202421033143-COMPLETE SPECIFICATION [26-04-2024(online)].pdf 2024-04-26
9 202421033143-FORM-9 [07-05-2024(online)].pdf 2024-05-07
10 202421033143-FORM 18 [08-05-2024(online)].pdf 2024-05-08
11 202421033143-FORM-26 [12-05-2024(online)].pdf 2024-05-12
12 202421033143-FORM 3 [13-06-2024(online)].pdf 2024-06-13
13 202421033143-RELEVANT DOCUMENTS [09-10-2024(online)].pdf 2024-10-09
14 202421033143-POA [09-10-2024(online)].pdf 2024-10-09
15 202421033143-FORM 13 [09-10-2024(online)].pdf 2024-10-09