Abstract: A system (100) for network intrusion detection is disclosed. An input data collection module (110) collects one or more intrusion detection datasets from one or more sources. A data pre-processing module (120) pre-processes the one or more intrusion detection datasets. A feature selection module (130) selects one or more features from corresponding one or more intrusion detection datasets. A stacking ensemble model designing module (140) develops a base classification model for at least one of a binary classification, a multiclass classification or a combination thereof of each of one or more selected features using one or more base classifiers, generate a meta-classification model using a decision jungle classifier to predict one or more attack types for the network intrusion detection. A model performance evaluation module (150) performs a statistical significance test for comparison of performance of each of the one or more base classifiers on each of the one or more intrusion detection datasets. FIG. 1
Claims:1. A system (100) for network intrusion detection comprising:
a processing subsystem (105) hosted on a server (108), wherein the processing subsystem (105) is configured to execute on a network to control bidirectional communications among a plurality of modules comprising:
an input data collection module (110) configured to collect one or more intrusion detection datasets from one or more sources;
a data pre-processing module (120) operatively coupled to the input data collection module (110), wherein the data pre-processing module (120) is configured to pre-process the one or more intrusion detection datasets collected by using one or more data pre-processing techniques;
a feature selection module (130) operatively coupled to the data pre-processing module (120), wherein the feature selection module (130) is configured to select one or more features from corresponding one or more intrusion detection datasets using one or more feature selection techniques;
a stacking ensemble model designing module (140) operatively coupled to the feature selection module (130), wherein the stacking ensemble model designing module (140) is configured to:
develop a base classification model for at least one of a binary classification, a multiclass classification or a combination thereof of each of one or more selected features using one or more base classifiers; and
generate a meta-classification model using a decision jungle classifier to predict one or more attack types for the network intrusion detection based on a classification result obtained from the base classification model; and
a model performance evaluation module (150) operatively coupled to the stacking ensemble model designing module (140), wherein the model performance evaluation module (150) is configured to perform a statistical significance test for comparison of performance of each of the one or more base classifiers on each of the one or more intrusion detection datasets.
2. The system (100) as claimed in claim 1, wherein the one or more intrusion detection datasets comprises at least one of UNSW NB-15, CICIDS 2017 and CICDDOS 2019.
3. The system (100) as claimed in claim 1, wherein the one or more data pre-processing techniques comprises at least one of a logarithmic scaling technique, minimum-maximum normalization technique or a combination thereof.
4. The system (100) as claimed in claim 1, wherein the one or more feature selection techniques comprises at least one of fisher score, mutual information, spearman correlation or a combination thereof.
5. The system (100) as claimed in claim 1, wherein the one or more base classifiers comprises at least one of a binary classifier, a multiclass classifier or a combination thereof.
6. The system (100) as claimed in claim 5, wherein the binary classifier comprises a binary locally deep support vector machine, a binary logistic regression and a binary Bayes point machine.
7. The system (100) as claimed in claim 5, wherein the multiclass classifier comprises a multiclass neural network, a multiclass logistic regression, and a multiclass decision forest.
8. The system (100) as claimed in claim 1, wherein the one or more attack types comprises at least one of benign, file transfer protocol-patator, secure shell-patator, disk operating system golden eye, disk operating system hulk, disk operating system slowloris, disk operating system slowhttptest, bot, portscan, heatbleed, structured query language injections, cross-site scripting, brute force, distributed denial-of-service, infiltration, lightweight directory access protocol, user datagram protocol, portmap or a combination thereof.
9. The system (100) as claimed in claim 1, wherein the processing subsystem (105) further comprising a virtual machine creation module (160) operatively coupled to the stacking ensemble model designing module (140), wherein the virtual machine creation module (160) is configured to create a virtual machine environment for execution of a corresponding stacking ensemble model to identify the one or more attack types for the network intrusion detection.
10. The system (100) as claimed in claim 1, wherein the processing subsystem (105) further comprising a stack ensemble model deployment module (170) is configured to enable deployment of a stack ensemble model generated as a web-service for the network intrusion detection, wherein the stack ensemble model comprises the base classification model and the meta-classification model.
11. A method (300) comprising:
collecting, by an input data collection module of a processing subsystem, one or more intrusion detection datasets from one or more sources (310);
pre-processing, by a data pre-processing module of the processing subsystem, the one or more intrusion detection datasets collected by using one or more data pre-processing techniques (320);
selecting, by a feature selection module of the processing subsystem, one or more features from corresponding one or more intrusion detection datasets using one or more feature selection techniques (330);
developing, by a stacking ensemble model designing module of the processing subsystem, a base classification model for at least one of a binary classification, a multiclass classification or a combination thereof of each of one or more selected features using one or more base classifiers (340);
generating, by the stacking ensemble model designing module of the processing subsystem, a meta-classification model using a decision jungle classifier to predict one or more attack types for the network intrusion detection based on a classification result obtained from the base classification model (350); and
performing, by a model performance evaluation module of the processing subsystem, a statistical significance test for comparison of performance of each of the one or more base classifiers on each of the one or more intrusion detection datasets (360).
Dated this 28th day of September 2021
Signature
Harish Naidu
Patent Agent (IN/PA-2896)
Agent for the Applicant
, Description:BACKGROUND
[0001] Embodiments of the present disclosure relate to a system for monitoring network traffic, systems and applications and more particularly to a system and a method for network intrusion detection.
[0002] Technological advancements occurring in the field of cybersecurity emphasize on the application of Artificial Intelligence (AI) techniques to improve the security landscape. Over the years, both adversaries as well as the research community have been relying on AI approaches to offend and defend computer networks. Cyber criminals use well equipped tools to compromise organizational assets whereas security experts count on modern machine learning algorithms to mitigate the ever-increasing cyber threats. In recent times, it has been observed that vast amount of data
is generated on networks. Thus, it becomes indispensable to apply machine learning techniques to distinguish between malicious and normal network instances. Typically, network-based intrusion detection system are used to inspect the network traffic to discover malicious events.
[0003] Conventionally, the system available for network intrusion detection includes predicting one or more attack types based on historical data using one or more classifiers. However, prediction result generated by such one or more classifiers utilized by the conventional system varies for different types of datasets. Also, accuracy of the one or more classifiers is compromised due to irrelevant feature selection approach involved and complexity of data. Moreover, when the complexity of data increases, machine learning based prediction models poses several challenges in pertaining to processing speed, concept drift, variance and bias issues, noisy data, class imbalance and the like. Furthermore, prediction models or approaches utilized by the conventional system for detecting the network intrusion hampers confidentiality, integrity and availability of computer resources.
[0004] Hence, there is a need for an improved system and a method for network intrusion detection in order to address aforementioned issues.
BRIEF DESCRIPTION
[0005] In accordance with an embodiment of the present disclosure, a system for network intrusion detection is disclosed. The system includes a processing subsystem hosted on a server. The processing subsystem is configured to execute on a network to control bidirectional communications among a plurality of modules. The processing subsystem includes an input data collection module configured to collect one or more intrusion detection datasets from one or more sources. The processing subsystem also includes a data pre-processing module configured to pre-process the one or more intrusion detection datasets collected by using one or more data pre-processing techniques. The processing subsystem also includes a feature selection module configured to select one or more features from corresponding one or more intrusion detection datasets using one or more feature selection techniques. The processing subsystem also includes a stacking ensemble model designing module configured to develop a base classification model for at least one of a binary classification, a multiclass classification or a combination thereof of each of one or more selected features using one or more base classifiers. The stacking ensemble model designing module is configured to generate a meta-classification model using a decision jungle classifier to predict one or more attack types for the network intrusion detection based on a classification result obtained from the base classification model. The processing subsystem also includes a model performance evaluation module configured to perform a statistical significance test for comparison of performance of each of the one or more base classifiers on each of the one or more intrusion detection datasets.
[0006] In accordance with another embodiment of the present disclosure, a method for network intrusion detection is disclosed. The method includes collecting, by an input data collection module of a processing subsystem, one or more intrusion detection datasets from one or more sources. The method also includes pre-processing, by a data pre-processing module of the processing subsystem, the one or more intrusion detection datasets collected by using one or more data pre-processing techniques. The method also includes selecting, by a feature selection module of the processing subsystem, one or more features from corresponding one or more intrusion detection datasets using one or more feature selection techniques. The method also includes developing, by a stacking ensemble model designing module of the processing subsystem, a base classification model for at least one of a binary classification, a multiclass classification or a combination thereof of each of one or more selected features using one or more base classifiers. The method also includes generating, by the stacking ensemble model designing module of the processing subsystem, a meta-classification model using a decision jungle classifier to predict one or more attack types for the network intrusion detection based on a classification result obtained from the base classification model. The method also includes performing, by a model performance evaluation module of the processing subsystem, a statistical significance test for comparison of performance of each of the one or more base classifiers on each of the one or more intrusion detection datasets.
[0007] To further clarify the advantages and features of the present disclosure, a more particular description of the disclosure will follow by reference to specific embodiments thereof, which are illustrated in the appended figures. It is to be appreciated that these figures depict only typical embodiments of the disclosure and are therefore not to be considered limiting in scope. The disclosure will be described and explained with additional specificity and detail with the appended figures.
BRIEF DESCRIPTION OF THE DRAWINGS
The disclosure will be described and explained with additional specificity and detail with the accompanying figures in which:
[0008] FIG. 1 is a block diagram of a system for network intrusion detection in accordance with an embodiment of the present disclosure;
[0009] FIG. 2 is a schematic representation of an exemplary embodiment of a system for network intrusion detection of FIG. 1 in accordance with an embodiment of the present disclosure;
[0010] FIG. 3 is a block diagram of a computer or a server in accordance with an embodiment of the present disclosure; and
[0011] FIG. 4 (a) and FIG. 4 (b) is a flow chart representing the steps involved in a method for network intrusion detection in accordance with an embodiment of the present disclosure.
[0012] Further, those skilled in the art will appreciate that elements in the figures are illustrated for simplicity and may not have necessarily been drawn to scale. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the figures by conventional symbols, and the figures may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures with details that will be readily apparent to those skilled in the art having the benefit of the description herein.
DETAILED DESCRIPTION
[0013] For the purpose of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications in the illustrated system, and such further applications of the principles of the disclosure as would normally occur to those skilled in the art are to be construed as being within the scope of the present disclosure.
[0014] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more devices or sub-systems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other devices, sub-systems, elements, structures, components, additional devices, additional sub-systems, additional elements, additional structures or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.
[0015] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting.
[0016] In the following specification and the claims, reference will be made to a number of terms, which shall be defined to have the following meanings. The singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise.
[0017] Embodiments of the present disclosure relate to a system and a method for network intrusion detection. The system includes a processing subsystem hosted on a server. The processing subsystem is configured to execute on a network to control bidirectional communications among a plurality of modules. The processing subsystem includes an input data collection module configured to collect one or more intrusion detection datasets from one or more sources. The processing subsystem also includes a data pre-processing module configured to pre-process the one or more intrusion detection datasets collected by using one or more data pre-processing techniques. The processing subsystem also includes a feature selection module configured to select one or more features from corresponding one or more intrusion detection datasets using one or more feature selection techniques. The processing subsystem also includes a stacking ensemble model designing module configured to develop a base classification model for at least one of a binary classification, a multiclass classification or a combination thereof of each of one or more selected features using one or more base classifiers. The stacking ensemble model designing module is configured to generate a meta-classification model using a decision jungle classifier to predict one or more attack types for the network intrusion detection based on a classification result obtained from the base classification model. The processing subsystem also includes a model performance evaluation module configured to perform a statistical significance test for comparison of performance of each of the one or more base classifiers on each of the one or more intrusion detection datasets.
[0018] FIG. 1 is a block diagram of a system (100) for network intrusion detection in accordance with an embodiment of the present disclosure. The system (100) includes a processing subsystem (105) hosted on a server (108). In one embodiment, the server (108) may include a cloud server. In another embodiment, the server (108) may include a local server. The processing subsystem (105) is configured to execute on a network (not shown in FIG. 1) to control bidirectional communications among a plurality of modules. In one embodiment, the network may include a wired network such as local area network (LAN). In another embodiment, the network may include a wireless network such as Wi-Fi, Bluetooth, Zigbee, near field communication (NFC), infra-red communication (RFID) or the like.
[0019] The processing subsystem (105) includes an input data collection module (110) configured to collect one or more intrusion detection datasets from one or more sources. In one embodiment, the one or more intrusion detection datasets may include at least one of UNSW NB-15, CICIDS 2017 and CICDDOS 2019. In such embodiment, the one or more sources may include, but not limited to, one or more research laboratories, universities, industries and the like. The UNSW NB-15 dataset has 42 features and samples found in the dataset belong to nine attack categories namely analysis, backdoor, denial of service, exploits, fuzzers, generic, reconnaissance, shellcode and worms. Again, the CICIDS 2017 dataset is comparatively a newer offering that was created using benign-profile system for recording the abstract behaviour of users. It encompasses benign and attack traces that were derived based on user characteristics of various protocols like HTTP/S, FTP, SSH and others. Seventy-eight features are found in this dataset. Network instances pertaining to fourteen attack types are found in this dataset namely bots, infiltration, brute force, distributed denial of service, DOS golden eye, DOS HULK, DOS Slowhttptest, DOS Slowloris, FTP-patator, SSH-patator, heartbleed, portscan, sql injections and cross-site scripting. Similarly, CICDDOS 2019 is another new dataset that has only DDOS attack instances in its topology with 87 features. Some attack scenarios captured in this dataset include LDAP, portmap, DNS, UDP-lag, UDP, NetBIOS, SSDP, MSSQL, NTP and Syn.
[0020] The processing subsystem (105) also includes a data pre-processing module (120) configured to pre-process the one or more intrusion detection datasets collected by using one or more data pre-processing techniques. In one embodiment, the one or more data pre-processing techniques may include at least one of a logarithmic scaling technique, minimum-maximum normalization technique or a combination thereof. The logarithmic scaling technique was used to reduce range of one or more features. Again, the minimum-maximum normalization technique was applied to determine the minimum and maximum value of ith feature so that each feature value could be transformed to [0,1] by using equation 1 as follows:
v’ = vi – min i / max i – min i …………………… (1)
where, mini and maxi denote the minimum and maximum values of
the feature respectively whereas vi is the value of feature at time i.
[0021] The processing subsystem (105) also includes a feature selection module (130) configured to select one or more features from corresponding one or more intrusion detection datasets using one or more feature selection techniques. In one embodiment, the one or more feature selection techniques may include at least one of fisher score, mutual information, spearman correlation or a combination thereof. 9 features were selected based on mutual information from the set of 42 features found in UNSW NB-15 dataset. Again, in order to select the 12 relevant features from CICIDS 2017 dataset that contains 78 attributes, the Fisher score technique was used. A non-parametric measure called spearman correlation was applied on CICDDOS 2019 dataset to select 13 features from the available set of 87 features. As used herein, the term ‘Fisher Score’ is relatively a simple method to obtain the scores of attributes in the dataset. For example, Fisher Scores (FS) are calculated, and 12 pertinent features are included for classification from CICIDS 2017 dataset. Based upon threshold that is computed by considering the average FS value, the attribute space is decided. It can be noted that a and b are the classes in consideration. na refers to the samples found in the dataset. µi indicates the mean score of the features and µi na signifies the mean score of the features in ath class. σi,a indicates the variance score of the features found in ath class.
FS = a=1 b⅀ na (µi, a - µi)2 / a=1 b⅀ na σi, a2 ……… (2)
[0022] Similarly, the term ‘Mutual information’ is defined as a technique in information theory to determine the dependency between two random variables X and Y. MI is a measure of information on Y furnished by X. Formally, MI can be calculated using equation (3) as follows:
I (X: Y) = ⅀ x € X ⅀y € Y X P (x, y) log P (x, y) / P(x) P(y)……… (3)
where, P(X) and P(Y) are the marginal distributions of X and Y.
[0023] Again, Spearman correlation coefficient is calculated using the ranks of the variables but not by considering actual values. In accordance with Pearson correlation coefficient, Spearman coefficient also ranges from -1 to +1 to quantify monotonic relationships between two variables.
[0024] The processing subsystem (105) also includes a stacking ensemble model designing module (140) configured to develop a base classification model for at least one of a binary classification, a multiclass classification or a combination thereof of each of one or more selected features using one or more base classifiers. In one embodiment, the one or more base classifiers may include at least one of a binary classifier, a multiclass classifier or a combination thereof. In such embodiment, the binary classifier may include a binary locally deep support vector machine (BLDSVM), a binary logistic regression (BLR) and a binary Bayes point machine (BBPM). As used herein, the term ‘BLDSVM’ is defined as a non-linear SVM that operates using sigmoid function to gain consistent speed. The locally deep SVM is quite efficient in dealing with computationally deep features and is known to be exponentially faster than other traditional implementations of SVM that further help in reducing computational costs. Again, the BLR is chosen as a base classifier to obtain accurate predictions. It works by applying a logistic function to the training set and thus predicting the probability for all the data points. In another embodiment, the multiclass classifier may include a multiclass neural network (MNN), a multiclass logistic regression (MLR), and a multiclass decision forest (MDF).
[0025] The stacking ensemble model designing module (140) is configured to generate a meta-classification model using a decision jungle classifier to predict one or more attack types for the network intrusion detection based on a classification result obtained from the base classification model. As used herein, the term ‘meta classification model’ is defined as a machine learning model which is implicit to stacking that works by acquiring knowledge from different base classifiers and aggregating their outputs to produce final predictions. As used herein, the term ‘decision jungle’ is defined as a recent extension to decision forest is the decision jungle technique that was selected as the meta-learner in the proposed work. Directed Acyclic Graphs (DAGs) form the core of this technique that is also an established method to conserve memory space. The decision jungle contributes towards improving generalization. Apart from exhibiting strong discriminative power, it also limits the exponential growth of decision trees. The decision jungle has an innate ability to overlook noisy features and used for meta-classification, but the execution time of decision forest is considerably longer during the trials.
[0026] In a specific embodiment, the one or more attack types may include at least one of benign, file transfer protocol-patator, secure shell-patator, disk operating system golden eye, disk operating system hulk, disk operating system slowloris, disk operating system slowhttptest, bot, portscan, heatbleed, structured query language injections, cross-site scripting, brute force, distributed denial-of-service, infiltration, lightweight directory access protocol, user datagram protocol, portmap or a combination thereof.
[0027] The processing subsystem (105) also includes a model performance evaluation module (150) configured to perform a statistical significance test for comparison of performance of each of the one or more base classifiers on each of the one or more intrusion detection datasets. While building machine learning models, it often becomes imperative to compare the performance of classifiers and the best way to achieve this is to perform statistical significance tests. In one embodiment, the significance test may include, but not limited to, a Friedman test and Nemenyi post-hoc tests. As used herein, the term ‘Friedman test’ is defined as a nonparametric test for analysing the performance of classifiers on multiple datasets. Similarly, upon rejecting null hypothesis, the post-hoc test is conducted to determine the pairwise comparisons. In this context, the null hypothesis (H0) suggests that there is no performance difference among classifiers whereas an alternate hypothesis (H1) indicates that at least one classifier performs differently.
[0028] Friedman test statistic can be calculated as shown in equation (4) as follows:
Friedman statistic = (d - 1) Q / d (k - 1) – Q ……… (4)
Where, `Q' can be calculated as follows, shown in equation (5).
Q = 12 / dk (k +1) ⅀j=1 k ((Rj – d (k+1) / 2))2
[0029] In a particular embodiment, the processing subsystem (105) further includes a virtual machine creation module (160) operatively coupled to the stacking ensemble model designing module (140). The virtual machine creation module (160) is configured to create a virtual machine environment for execution of a corresponding stacking ensemble model to identify the one or more attack types for the network intrusion detection. On Virtual machine 1 (VM 1), the stacking ensemble operates to identify all the attack types found in UNSW NB-15 dataset. Similarly, to identify attack types found in CICIDS 2017 and CICDDOS 2019 datasets, we executed the stacking ensemble onVM2 andVM3 respectively. Each virtual machine facilitates a dedicated operating environment to users for mitigating various attack types belonging to heterogeneous data sources. Predictive solutions when automated can be consumed by cloud users.
[0030] In a specific embodiment, the processing subsystem (105) further includes a stack ensemble model deployment module (170) configured to enable deployment of a stack ensemble model generated as a web-service for the network intrusion detection, wherein the stack ensemble model includes the base classification model and the meta-classification model.
[0031] FIG. 2 is a schematic representation of an exemplary embodiment of a system for network intrusion detection of FIG. 1 in accordance with an embodiment of the present disclosure. Considering an example, wherein the system (100) is utilized to identify one or more attack types for the network intrusion detection. The system (100) provides an automated solution for reliable intrusion detection that is based on stacking approach to identify one or more attack types. The system (100) for detecting the one or more attack types collects one or more intrusion detection datasets via an input data collection module (110). Here, the input data collection module is located on a processing subsystem (105) which is hosted on a cloud server (108). For example, suppose the system (100) is utilized for detecting the network intrusion in case of a UNSW NB-15 dataset, then the input data collection module (110) collects an intrusion detection datasets from a corresponding research laboratory (102). Here, the UNSW NB-15 dataset has 42 features.
[0032] Once, the dataset is collected, a data pre-processing module (120) pre-process the one or more intrusion detection datasets collected by using one or more data pre-processing techniques. For example, the one or more data pre-processing techniques may include at least one of a logarithmic scaling technique, minimum-maximum normalization technique or a combination thereof. Upon processing of the intrusion dataset, a feature selection module (130) selects one or more features from corresponding one or more intrusion detection datasets using one or more feature selection techniques. In the example used herein, the one or more feature selection techniques may include at least one of fisher score, mutual information, spearman correlation or a combination thereof. 9 features were selected based on mutual information from the set of 42 features found in UNSW NB-15 dataset.
[0033] Again, upon feature selection, a stacking ensemble model designing module (140) develops a base classification model for at least one of a binary classification, a multiclass classification or a combination thereof of each of one or more selected features using one or more base classifiers. For example, the one or more base classifiers may include at least one of a binary classifier, a multiclass classifier or a combination thereof. In such an example, the binary classifier may include a binary locally deep support vector machine (BLDSVM), a binary logistic regression (BLR) and a binary Bayes point machine (BBPM). Again, the multiclass classifier includes a multiclass neural network (MNN), a multiclass logistic regression (MLR), and a multiclass decision forest (MDF).
[0034] Based on a classification result obtained from the base classification model, the stacking ensemble model designing module (140) generates a meta-classification model using a decision jungle classifier to predict one or more attack types for the network intrusion detection. In the example used herein, the one or more attack type classes may include normal, analysis, backdoor, denial of service, exploits, fuzzers, generic, reconnaissance, shellcode, worms and the like. For example, for the UNSW NB-15 dataset, a neural network classifier used in the proposed work performed 100 iterations during the learning process. The learning rate is a significant hyper-parameter since it is important to ascertain how efficiently the neural network model adapts itself to the learning process. A very minute value of 0.1 is set as learning rate with respect to UNSW NB-15 dataset whereas 0.2 is assigned as learning rate.
[0035] Further, a model performance evaluation module (150) performs a statistical significance test for comparison of performance of each of the one or more base classifiers on the intrusion detection dataset. While building machine learning models, it often becomes imperative to compare the performance of classifiers and the best way to achieve this is to perform statistical significance tests. For example, the significance test may include, but not limited to, a Friedman test and Nemenyi post-hoc tests. The system (100) further includes a virtual machine creation module (160) which is configured to create a virtual machine environment for execution of a corresponding stacking ensemble model to identify the one or more attack types for the network intrusion detection. On virtual machine 1 (VM 1), the stacking ensemble operates to identify all the attack types found in UNSW NB-15 dataset. Such VM 1 provides a dedicated operating environment to users for mitigating various attack types belonging to heterogeneous data sources.
[0036] In addition, the processing subsystem (105) further includes a stack ensemble model deployment module (170) configured to enable deployment of a stack ensemble model generated as a web-service for the network intrusion detection, wherein the stack ensemble model includes the base classification model and the meta-classification model. Thus, the system (100) when executed on the UNSW NB-15 dataset with 60 % training dataset and 40% testing dataset and 42 features selected, accuracy achieved is 92.9%, false positive rate (FPR) is 5.2 % and testing time is 30 seconds. Similarly, for 9 features, accuracy achieved is 99.8 %, FPR is 0.38 % and testing time is 4 seconds. Further, results obtained on testing dataset of UNSW NB-15 is represented as follows. Accuracy is 0.998, precision is 0.998, recall is 0.999, F1-score is 0.998 and FPR is 0.38%. In addition, the confusion matrix on testing set of UNSW NB-15 is computed as follows where, attack-attack =119232, attack-normal =109, normal-attack =213 and normal-normal =55787. Therefore, the system (100) emphasized on a meta-classification approach to propose a network intrusion detection model on a cloud environment which is automated as well as deployed as a web service to mitigate security incidents in complex large-scale networks.
[0037] FIG. 3 is a block diagram of a computer or a server in accordance with an embodiment of the present disclosure. The server (200) includes processor(s) (230), and memory (210) operatively coupled to the bus (220). The processor(s) (230), as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instruction word microprocessor, an explicitly parallel instruction computing microprocessor, a digital signal processor, or any other type of processing circuit, or a combination thereof.
[0038] The memory (210) includes several subsystems stored in the form of executable program which instructs the processor (230) to perform the method steps illustrated in FIG. 1. The memory (210) includes a processing subsystem (105) of FIG.1. The processing subsystem (105) further has following modules: an input data collection module (110), a data pre-processing module (120), a feature selection module (130), a stacking ensemble model designing module (140), a model performance evaluation module (150), a virtual machine creation module (160) and a stack ensemble model deployment module (170).
[0039] The input data collection module (110) is configured to collect one or more intrusion detection datasets from one or more sources. The data pre-processing module (120) is configured to pre-process the one or more intrusion detection datasets collected by using one or more data pre-processing techniques. The feature selection module (130) is configured to select one or more features from corresponding one or more intrusion detection datasets using one or more feature selection techniques. The stacking ensemble model designing module (140) is configured to develop a base classification model for at least one of a binary classification, a multiclass classification or a combination thereof of each of one or more selected features using one or more base classifiers. The stacking ensemble model designing module (140) is configured to generate a meta-classification model using a decision jungle classifier to predict one or more attack types for the network intrusion detection based on a classification result obtained from the base classification model. The model performance evaluation module (150) is configured to perform a statistical significance test for comparison of performance of each of the one or more base classifiers on each of the one or more intrusion detection datasets. The virtual machine creation module (160) is configured to create a virtual machine environment for execution of a corresponding stacking ensemble model to identify the one or more attack types for the network intrusion detection. The stack ensemble model deployment module (170) is configured to enable deployment of a stack ensemble model generated as a web-service for the network intrusion detection, wherein the stack ensemble model comprises the base classification model and the meta-classification model.
[0040] The bus (220) as used herein refers to be internal memory channels or computer network that is used to connect computer components and transfer data between them. The bus (220) includes a serial bus or a parallel bus, wherein the serial bus transmits data in bit-serial format and the parallel bus transmits data across multiple wires. The bus (220) as used herein, may include but not limited to, a system bus, an internal bus, an external bus, an expansion bus, a frontside bus, a backside bus and the like.
[0041] FIG. 4 (a) and FIG. 4 (b) is a flow chart representing the steps involved in a method (300) for network intrusion detection in accordance with an embodiment of the present disclosure. The method (300) includes collecting, by an input data collection module of a processing subsystem, one or more intrusion detection datasets from one or more sources in step 310. In one embodiment, collecting the one or more intrusion detection datasets from the one or more sources may include collecting at least one of UNSW NB-15, CICIDS 2017 and CICDDOS 2019. In such embodiment, the one or more sources may include, but not limited to, one or more research laboratories, universities, industries and the like.
[0042] The method (300) also includes pre-processing, by a data pre-processing module of the processing subsystem, the one or more intrusion detection datasets collected by using one or more data pre-processing techniques in step 320. In one embodiment, processing the one or more intrusion detection datasets collected by using one or more data pre-processing techniques may include processing each of the one or more datasets using at least one of a logarithmic scaling technique, minimum-maximum normalization technique or a combination thereof.
[0043] The method (300) also includes selecting, by a feature selection module of the processing subsystem, one or more features from corresponding one or more intrusion detection datasets using one or more feature selection techniques in step 330. In some embodiment, selecting the one or more features from the corresponding one or more intrusion detection datasets using the one or more feature selection techniques may include selecting the one or more relevant features using at least one of fisher score, mutual information, spearman correlation or a combination thereof.
[0044] The method (300) also includes developing, by a stacking ensemble model designing module of the processing subsystem, a base classification model for at least one of a binary classification, a multiclass classification or a combination thereof of each of one or more selected features using one or more base classifiers in step 340. In one embodiment, developing the base classification model for the at least one of the binary classification, the multiclass classification or a combination thereof may include developing the base classification model using the one or more base classifiers including at least one of a binary classifier, a multiclass classifier or a combination thereof. In such embodiment, the binary classifier may include a binary locally deep support vector machine (BLDSVM), a binary logistic regression (BLR) and a binary Bayes point machine (BBPM). In another embodiment, the multiclass classifier may include a multiclass neural network (MNN), a multiclass logistic regression (MLR), and a multiclass decision forest (MDF).
[0045] The method (300) also includes generating, by the stacking ensemble model designing module of the processing subsystem, a meta-classification model using a decision jungle classifier to predict one or more attack types for the network intrusion detection based on a classification result obtained from the base classification model in step 350. In one embodiment, generating the meta-classification model using the decision jungle classifier to predict the one or more attack types may generating the meta-classification model to predict at least one of benign, file transfer protocol-patator, secure shell-patator, disk operating system golden eye, disk operating system hulk, disk operating system slowloris, disk operating system slowhttptest, bot, portscan, heatbleed, structured query language injections, cross-site scripting, brute force, distributed denial-of-service, infiltration, lightweight directory access protocol, user datagram protocol, portmap or a combination thereof.
[0046] The method (300) also includes performing, by a model performance evaluation module of the processing subsystem, a statistical significance test for comparison of performance of each of the one or more base classifiers on each of the one or more intrusion detection datasets in step 360. In some embodiment, performing the statistical significance test for comparison of performance of each of the one or more base classifiers on each of the one or more intrusion detection datasets may include performing the significance test including, but not limited to, a Friedman test and Nemenyi post-hoc tests.
[0047] Various embodiments of the present disclosure provide an automated machine learning model to address the ongoing issues of network intrusion detection by amalgamation of cloud computing and machine learning using a concept called stacking.
[0048] Moreover, the present disclosed system have relied on some pragmatic base learners to further boost the performance of decision jungle such as the meta-classifier to combine multiple techniques thereby it improves the overall predictions. Thus, the inconsistencies of one technique are easily compensated by the other techniques involved in prediction process.
[0049] Furthermore, the present disclosed system provides an automated solution and is transformed as a web service. Whenever the web service is provisioned, it can serve as a mechanism to mitigate security incidents in complex large-scale networks. Furthermore, in order to offer better customization capabilities to cloud users upon subscription, one or more virtual machines are configured and each of them is dedicated to the detection of specific attack types found in the different datasets.
[0050] It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the disclosure and are not intended to be restrictive thereof.
[0051] While specific language has been used to describe the disclosure, any limitations arising on account of the same are not intended. As would be apparent to a person skilled in the art, various working modifications may be made to the method in order to implement the inventive concept as taught herein.
[0052] The figures and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, the order of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts need to be necessarily performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples.
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 202141043968-FORM 3 [07-10-2022(online)].pdf | 2022-10-07 |
| 1 | 202141043968-STATEMENT OF UNDERTAKING (FORM 3) [28-09-2021(online)].pdf | 2021-09-28 |
| 2 | 202141043968-PROOF OF RIGHT [28-09-2021(online)].pdf | 2021-09-28 |
| 2 | 202141043968-Written submissions and relevant documents [07-10-2022(online)].pdf | 2022-10-07 |
| 3 | 202141043968-POWER OF AUTHORITY [28-09-2021(online)].pdf | 2021-09-28 |
| 3 | 202141043968-Correspondence to notify the Controller [15-09-2022(online)].pdf | 2022-09-15 |
| 4 | 202141043968-US(14)-HearingNotice-(HearingDate-27-09-2022).pdf | 2022-09-05 |
| 4 | 202141043968-FORM 1 [28-09-2021(online)].pdf | 2021-09-28 |
| 5 | 202141043968-ENDORSEMENT BY INVENTORS [07-06-2022(online)].pdf | 2022-06-07 |
| 5 | 202141043968-DRAWINGS [28-09-2021(online)].pdf | 2021-09-28 |
| 6 | 202141043968-FER_SER_REPLY [07-06-2022(online)].pdf | 2022-06-07 |
| 6 | 202141043968-DECLARATION OF INVENTORSHIP (FORM 5) [28-09-2021(online)].pdf | 2021-09-28 |
| 7 | 202141043968-FORM 3 [07-06-2022(online)].pdf | 2022-06-07 |
| 7 | 202141043968-COMPLETE SPECIFICATION [28-09-2021(online)].pdf | 2021-09-28 |
| 8 | 202141043968-REQUEST FOR CERTIFIED COPY [24-12-2021(online)].pdf | 2021-12-24 |
| 8 | 202141043968-FORM-26 [07-06-2022(online)].pdf | 2022-06-07 |
| 9 | 202141043968-FORM-9 [24-12-2021(online)].pdf | 2021-12-24 |
| 9 | 202141043968-OTHERS [07-06-2022(online)].pdf | 2022-06-07 |
| 10 | 202141043968-FER.pdf | 2022-04-01 |
| 10 | 202141043968-FORM 18A [28-01-2022(online)].pdf | 2022-01-28 |
| 11 | 202141043968-FER.pdf | 2022-04-01 |
| 11 | 202141043968-FORM 18A [28-01-2022(online)].pdf | 2022-01-28 |
| 12 | 202141043968-FORM-9 [24-12-2021(online)].pdf | 2021-12-24 |
| 12 | 202141043968-OTHERS [07-06-2022(online)].pdf | 2022-06-07 |
| 13 | 202141043968-FORM-26 [07-06-2022(online)].pdf | 2022-06-07 |
| 13 | 202141043968-REQUEST FOR CERTIFIED COPY [24-12-2021(online)].pdf | 2021-12-24 |
| 14 | 202141043968-COMPLETE SPECIFICATION [28-09-2021(online)].pdf | 2021-09-28 |
| 14 | 202141043968-FORM 3 [07-06-2022(online)].pdf | 2022-06-07 |
| 15 | 202141043968-DECLARATION OF INVENTORSHIP (FORM 5) [28-09-2021(online)].pdf | 2021-09-28 |
| 15 | 202141043968-FER_SER_REPLY [07-06-2022(online)].pdf | 2022-06-07 |
| 16 | 202141043968-DRAWINGS [28-09-2021(online)].pdf | 2021-09-28 |
| 16 | 202141043968-ENDORSEMENT BY INVENTORS [07-06-2022(online)].pdf | 2022-06-07 |
| 17 | 202141043968-FORM 1 [28-09-2021(online)].pdf | 2021-09-28 |
| 17 | 202141043968-US(14)-HearingNotice-(HearingDate-27-09-2022).pdf | 2022-09-05 |
| 18 | 202141043968-POWER OF AUTHORITY [28-09-2021(online)].pdf | 2021-09-28 |
| 18 | 202141043968-Correspondence to notify the Controller [15-09-2022(online)].pdf | 2022-09-15 |
| 19 | 202141043968-Written submissions and relevant documents [07-10-2022(online)].pdf | 2022-10-07 |
| 19 | 202141043968-PROOF OF RIGHT [28-09-2021(online)].pdf | 2021-09-28 |
| 20 | 202141043968-STATEMENT OF UNDERTAKING (FORM 3) [28-09-2021(online)].pdf | 2021-09-28 |
| 20 | 202141043968-FORM 3 [07-10-2022(online)].pdf | 2022-10-07 |
| 1 | SearchHistoryE_30-03-2022.pdf |