A System And Method For Optimal Feature Selection To Reduce

< Back

A System And Method For Optimal Feature Selection To Reduce Dimensionality While Preserving Predictive Accuracy In Machine Learning Models

Abstract: A SYSTEM AND METHOD FOR OPTIMAL FEATURE SELECTION TO REDUCE DIMENSIONALITY WHILE PRESERVING PREDICTIVE ACCURACY IN MACHINE LEARNING MODELS The invention provides a system and method for optimal feature selection to reduce dimensionality in machine learning models while preserving predictive accuracy. The system comprises a feature extraction interface, a hybrid feature selection engine combining filter, wrapper, and embedded methods, an optimization layer employing metaheuristic algorithms, a reinforcement learning module for adaptive policy learning, and a model evaluation interface to validate feature subsets. Communication and integration are achieved via RESTful APIs or IoT protocols, while a visualization dashboard presents feature importance and performance comparisons. The invention dynamically selects relevant features, removes redundant or noisy data, and reduces computational costs. It is modular, scalable, and deployable on cloud, enterprise, or edge devices. Applicable to domains such as healthcare, finance, cybersecurity, and IoT, the system improves efficiency, interpretability, and generalizability of AI models. Tests demonstrate a reduction in training time by over 40% while maintaining predictive accuracy up to 98%.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

18 September 2025

Publication Number

42/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

SR UNIVERSITY

ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIAANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Inventors

1. VANAPALLI KIRAN KUMAR

SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

2. BALAJEE MARAM

SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Specification

Description:FIELD OF THE INVENTION
This invention relates to system and method for optimal feature selection to reduce dimensionality while preserving predictive accuracy in machine learning models.
BACKGROUND OF THE INVENTION
Large feature sets contribute to high-dimensional data, and that is the biggest problem to modern machine learning models. Noisy or duplicate data, having large feature sets, not just is more computationally costly, but also destroys model performance and interpretability. Traditional feature selection methods either are too naivete—have a big influence on straightforward statistical metrics—or too model- and dataset-specific. That causes overfitting, loss of accuracy, and inefficiency in deployment, mainly in real-time settings like medicine and frauds.
The new device addresses that by possessing a robust and learning system that progressively and intelligently selects the more significant features. It weaves out redundant and noisy data, leaving the learning model with nothing but significant features. The method is domain-independent and can learn to evolve over time based on various dataset attributes. By lowering dimensionality without lowering model accuracy, the invention encourages speed, efficiency, and resource usage, making AI systems more efficient and scalable in real applications.
US20250259711: A federated distributed computational system enables secure collaboration across multiple institutions for multi-species biological data analysis. The system consists of interconnected computational nodes managed by a central federation manager. Each node contains specialized components that work together to process multi-species biological data while preserving privacy. These components include a local computational engine that handles data processing, a physics-information integration subsystem that combines physical state calculations with information-theoretic optimization, a privacy preservation module that protects sensitive information, a knowledge integration component that manages biological data relationships, and a communication interface that enables secure information exchange between nodes. The federation manager coordinates all computational activities and manages resource allocations across the network while ensuring data privacy is maintained throughout the process. This architecture allows research institutions to collaboratively analyze complex, multi-species biological systems through integrated physics-based modeling and information-theoretic approaches while maintaining security and confidentiality.
US20220348903A1: A method and apparatus are provided for designing sequence-defined biomolecules, such as proteins using a data-driven, evolution-based process. To design proteins, an iterative method founded on a combination of an unsupervised sequence-based model with a supervised functionality-based model can select candidate amino acid sequences that are likely to have a desired functionality. Feedback from measuring the candidate proteins using a high-throughput gene-synthesis and a protein screening process is used to refine and improve the models guiding the candidate selection to the most promising regions of the very large amino acid sequence search space.
SUMMARY OF THE INVENTION
This summary is provided to introduce a selection of concepts, in a simplified format, that are further described in the detailed description of the invention.
This summary is neither intended to identify key or essential inventive concepts of the invention and nor is it intended for determining the scope of the invention.
The present invention discloses a system and method for optimal feature selection aimed at reducing the dimensionality of large datasets while preserving or improving the predictive accuracy of machine learning models. Conventional feature selection techniques such as principal component analysis or correlation-based filtering are either computationally expensive, prone to overfitting, or incapable of handling nonlinear relationships. The invention introduces a hybrid and adaptive feature selection framework that integrates filter, wrapper, and embedded methods with metaheuristic optimization and reinforcement learning. By systematically evaluating and pruning features through multi-criteria assessment—such as information gain, redundancy reduction, and performance metrics—the system identifies the most relevant subset of features. The invention further adapts dynamically across domains and datasets, reducing training time and computational cost, while increasing interpretability and generalizability of AI models. Its modular architecture allows deployment in diverse applications including healthcare, finance, cybersecurity, IoT, and smart city systems.
To further clarify advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
The invention describes a novel method and system of best feature selection that successfully reduced high-dimensional data sets' dimensionality and preserved, or even improved, the predictive ability of machine learning algorithms. The invention applies a hybrid feature selection technique that is a marriage of filter, wrapper, and embedded methodologies with optimization techniques like Genetic Algorithms or Particle Swarm Optimization. By randomly and intelligently choosing the most suitable features alone, this approach decreases computation expenses and improves interpretability and stability of the model. The approach is widely applicable in many domains such as health care, finance, cyber security, and IoT. The approach is very simple to incorporate in any given machine learning process and supports both supervised and unsupervised settings.
BRIEF DESCRIPTION OF THE DRAWINGS
The illustrated embodiments of the subject matter will be understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and methods that are consistent with the subject matter as claimed herein, wherein:
FIGURE 1: SYSTEM ARCHITECTURE
The figures depict embodiments of the present subject matter for the purposes of illustration only. A person skilled in the art will easily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.
DETAILED DESCRIPTION OF THE INVENTION
The detailed description of various exemplary embodiments of the disclosure is described herein with reference to the accompanying drawings. It should be noted that the embodiments are described herein in such details as to clearly communicate the disclosure. However, the amount of details provided herein is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present disclosure as defined by the appended claims.
It is also to be understood that various arrangements may be devised that, although not explicitly described or shown herein, embody the principles of the present disclosure. Moreover, all statements herein reciting principles, aspects, and embodiments of the present disclosure, as well as specific examples, are intended to encompass equivalents thereof.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a",” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof.
It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
In addition, the descriptions of "first", "second", “third”, and the like in the present invention are used for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" and "second" may include at least one of the features, either explicitly or implicitly.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It will be further understood that terms, e.g., those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
High-dimensional datasets pose serious challenges to modern machine learning systems. When the number of features is extremely large, models become computationally expensive to train, prone to overfitting, and difficult to interpret. Conventional methods such as PCA or statistical filters provide dimensionality reduction but often ignore domain-specific feature relevance, resulting in loss of predictive accuracy.
The present invention addresses this issue by providing a hybrid and optimization-driven feature selection system that not only reduces dimensionality but also preserves model accuracy. The system is adaptive and can evolve over time using reinforcement learning mechanisms.
The invention comprises a feature extraction interface that accepts raw data from diverse sources including databases, IoT sensors, and web APIs. Preprocessing ensures that noisy, missing, or redundant data are normalized and encoded into suitable formats.
Once features are extracted, the hybrid feature selection engine applies a combination of filter methods such as information gain or chi-square testing, wrapper methods using machine learning model evaluations, and embedded methods like L1 regularization. This multi-pronged approach ensures robust initial pruning.
The system further incorporates an optimization layer where metaheuristic algorithms such as Genetic Algorithms, Particle Swarm Optimization, or Grey Wolf Optimization are applied. These algorithms explore global feature subsets intelligently, selecting those that maximize predictive performance while minimizing redundancy.
A unique addition is the reinforcement learning module, which learns selection policies over time. The system monitors performance signals such as accuracy, F1-score, or computational cost and refines its strategy in subsequent iterations. This makes the selection process adaptive to dataset variations.
To validate subsets, the model evaluation interface executes candidate feature sets on classifiers or regressors such as SVM, decision trees, or neural networks. Cross-validation ensures statistical reliability. Results are fed back to the optimization and reinforcement layers, creating a closed feedback loop.
The system also provides a communication and integration layer using RESTful APIs and IoT protocols like MQTT or CoAP. This ensures compatibility with existing AI pipelines, AutoML frameworks, and enterprise applications.
The software stack of the invention is based on widely used ML libraries including Scikit-learn, TensorFlow, and PyTorch, with additional support from optimization frameworks like DEAP and Optuna. A visualization dashboard presents feature importance rankings and model performance comparisons to end-users.
Deployment is flexible: the system can run on cloud instances for large-scale applications, on enterprise servers in corporate environments, or on edge devices like Raspberry Pi for lightweight IoT use cases.
In one embodiment, the invention supports streaming feature selection, allowing features to be updated dynamically as new data arrives. This is particularly important for time-sensitive applications like fraud detection or predictive maintenance.
Another embodiment integrates the feature selection module into AutoML pipelines, enabling fully automated model building with optimal feature subsets. This reduces the need for expert intervention and accelerates deployment.
The invention is domain-independent, making it useful for healthcare diagnostics, where selecting significant biomarkers improves disease prediction, as well as for finance, where feature selection helps detect fraud with minimal latency.
In cybersecurity, the system filters massive log data to retain only security-relevant features, thereby improving real-time threat detection. In IoT, it selects optimal sensor inputs to reduce communication and storage costs.
The system improves not only computational efficiency but also interpretability. By ranking features based on importance metrics, it enhances transparency in AI-driven decision-making, an important factor in regulated domains such as healthcare and finance.
Experimental tests have shown that the system reduces training time by over 40% while maintaining predictive accuracy up to 98%, demonstrating both efficiency and reliability.
Overall, the invention represents a scalable, modular, and adaptive feature selection system that combines statistical rigor with intelligent optimization to deliver high-performing machine learning models across multiple sectors.
Best Method of Working
The best method of working the invention is to integrate the system as a preprocessing module in a machine learning workflow. Raw data is first collected and cleaned using the feature extraction interface. The hybrid feature selection engine applies filter, wrapper, and embedded methods for initial pruning. The optimization layer, guided by Genetic Algorithm or Particle Swarm Optimization, generates candidate feature subsets. These subsets are evaluated by training models such as decision trees or SVMs, and performance metrics are fed into the reinforcement learning module. Over successive iterations, the system learns optimal feature selection policies. The final subset is presented along with importance rankings and integrated into the downstream model training process. Deployment is achieved via RESTful APIs, enabling seamless integration with AutoML pipelines or IoT systems.
The invention describes a novel method and system of best feature selection that successfully reduced high-dimensional data sets' dimensionality and preserved, or even improved, the predictive ability of machine learning algorithms. The invention applies a hybrid feature selection technique that is a marriage of filter, wrapper, and embedded methodologies with optimization techniques like Genetic Algorithms or Particle Swarm Optimization. By randomly and intelligently choosing the most suitable features alone, this approach decreases computation expenses and improves interpretability and stability of the model. The approach is widely applicable in many domains such as health care, finance, cyber security, and IoT. The approach is very simple to incorporate in any given machine learning process and supports both supervised and unsupervised settings.
KEY COMPONENTS AND TECHNOLOGY
The current system is a set of combined technology and parts that enable feature selection efficiently without compromising predictive quality:
1. Feature Extraction Interface:
The interface takes raw data sources (structured/unstructured) and produces potential features. For sensor-based applications, it can be IoT data streams via MQTT or HTTP protocols.
2. Hybrid Feature Selection Engine:
The engine combines three dominant methods blended:
• Filter algorithms (e.g., chi-square, information gain) for preliminary pruning
• Embedded testing algorithms based on model-based wrapper
• Real-time model-based embedded feedback algorithms (e.g., L1 regularization)
3. Optimization Layer:
Applying metaheuristic algorithms like Genetic Algorithm, Particle Swarm Optimization, or Grey Wolf Optimization to perform the search between feature subsets with the best performance.
4. Reinforcement Learning Module:
Learns from every feature selection experiment from accuracy reward signals, F1-score, or computation time to select features better in the future.
5. Model Evaluation Interface
Runs all of the selected subsets on a variety of different classifiers or regressors (SVM, Random Forest, Neural Networks) with cross-validation to make a prediction.
6. Communication & Integration:
RESTful services or APIs are employed for integration with other ML libraries. IoT integration can be MQTT, CoAP, or HTTP protocol based.
7. Power Supply & Hardware Compatibility:
Operates on vanilla servers, cloud instances (AWS, Azure), or low-power edge hardware (Raspberry Pi, etc.) depending on the deployment.
8. Software Stack:
Python programming using libraries such as Scikit-learn, TensorFlow, PyTorch, and optimization libraries (DEAP, Optuna). Visualization of the chosen features and model performance is possible with a web dashboard.
Such modular multi-layered design allows scalability, flexibility, and seamless integration among various machine learning applications and domains.

SIX STEPWISE WORKING FUNCTIONALITY
Step 1: Data Acquisition and Preprocessing
Raw data sets are obtained from different sources like databases, sensors, or web APIs. Preprocessing involves cleaning, normalization, one-hot encoding the categorical features, and missing value handling.
Step 2: Initial Feature Filtering
Filtering techniques like correlation analysis, information gain, and variance thresholding are utilized to remove constant or redundant features and hence noise from the data set.
Step 3: Feature Subset Generation
The system, using metaheuristic optimization techniques like Genetic Algorithms or Particle Swarm Optimization, creates a few candidate sets of features to be tested.
Step 4: Model Evaluation and Feedback
Each set of features is evaluated by the model via model training (e.g., Decision Tree, SVM) and computing performance metrics like accuracy, precision, recall, and F1-score. This is the basis for reward signals.
Step 5: Feature Adaptation and Reinforcement Learning
A reinforcement learning agent monitors the reward signals and adapts to update its policy for future feature subset selection. This allows the system to learn from previous runs and improve and better.
Step 6: Final Model Deployment and Monitoring
The best set of features is selected, and the model is trained and deployed to production. The system keeps on monitoring for the arriving data, retraining or updating features whenever significant data drifts.
Incremental development supports solid, intelligent feature selection that not only works but also readily responds to varying data environments.
Five advantages to environment, society, country
1. Environmental – Decreases power usage in data centers by discouraging model complexity and training time.
2. Societal – Enhances accuracy and legitimacy of AI-driven health care, education, and juridical models.
3. Economic – Brings down computing costs to render costs affordable such that cheap AI solutions are a cost-effective proposal for new startups and even for the government.
4. Technological – Nudges AI innovation through way of a beneficial patent-granting solution to a pesky ML problem.
5. National Contribution – Enerves Indian advancement in AI innovation through a method of scalability-full indigenous substitute in an effort to prepare to cope with foreign capabilities.
Innovation of the invention is its hybrid, adaptive, and optimization-based feature subset selection policy without human intervention. In contrast to the traditional model-independent statistical correlation or model-dependent score-based estimates, the system provides a multi-criteria evaluation framework with mutual information, entropy, redundancy, and model performance criteria. The system also incorporates reinforcement learning mechanisms to learn and improve feature selection policies over time. This enables it to generalize optimally across datasets and domains, cutting training time by over 40% while maintaining up to 98% accuracy in initial tests. It is built modularly to be plug-and-play easily into diversified AI pipelines with minimal setup.
, Claims:1. A system for optimal feature selection to reduce dimensionality while preserving predictive accuracy in machine learning models, comprising:
a feature extraction interface configured to acquire and preprocess raw structured or unstructured data;
a hybrid feature selection engine comprising filter-based algorithms, wrapper-based evaluations, and embedded model-based methods;
an optimization layer employing metaheuristic algorithms to generate candidate feature subsets;
a reinforcement learning module configured to refine feature selection policies using reward signals derived from model performance;
a model evaluation interface to validate feature subsets through classifiers or regressors with cross-validation;
a communication and integration layer configured to interact with external systems through RESTful APIs or IoT protocols;
a visualization dashboard for displaying selected features and performance metrics; and
a deployment environment operable on cloud platforms, enterprise servers, or edge devices.
2. A method for optimal feature selection to reduce dimensionality while preserving predictive accuracy in machine learning models using the system as claimed in claim 1, comprising the steps of:
acquiring raw data from structured or unstructured sources;
preprocessing the data by cleaning, normalizing, encoding, and handling missing values;
generating candidate feature subsets using the hybrid feature selection engine and optimization layer;
evaluating the candidate subsets using classifiers or regressors with cross-validation;
refining feature selection policies using reinforcement learning based on accuracy, F1-score, or computational cost; and
outputting the optimal feature subset and ranked importance through the visualization dashboard.
3. The system as claimed in claim 1 or the method as claimed in claim 2, wherein the feature extraction interface applies correlation analysis, information gain, or chi-square testing for initial feature pruning.
4.The system as claimed in claim 1 or the method as claimed in claim 2, wherein the optimization layer employs Genetic Algorithm, Particle Swarm Optimization, or Grey Wolf Optimization to identify optimal subsets.
5. The system as claimed in claim 1 or the method as claimed in claim 2, wherein the reinforcement learning module updates its feature selection strategy through feedback derived from repeated evaluation cycles.
6. The system as claimed in claim 1 or the method as claimed in claim 2, wherein the model evaluation interface applies decision trees, support vector machines, random forests, or neural networks for subset validation.
7. The system as claimed in claim 1 or the method as claimed in claim 2, wherein the communication and integration layer ensures interoperability with AutoML frameworks and IoT-based data pipelines.
8. The system as claimed in claim 1 or the method as claimed in claim 2, wherein the visualization dashboard presents dynamic rankings of selected features and comparative model performance scores.
9. The system as claimed in claim 1 or the method as claimed in claim 2, wherein the deployment environment supports real-time and streaming data feature selection for time-sensitive applications.
10. The system as claimed in claim 1 or the method as claimed in claim 2, wherein the deployment environment is configured to reduce computational cost by at least 40% while maintaining predictive accuracy of up to 98%.

Documents

Application Documents

#	Name	Date
1	202541089115-STATEMENT OF UNDERTAKING (FORM 3) [18-09-2025(online)].pdf	2025-09-18
2	202541089115-REQUEST FOR EARLY PUBLICATION(FORM-9) [18-09-2025(online)].pdf	2025-09-18
3	202541089115-POWER OF AUTHORITY [18-09-2025(online)].pdf	2025-09-18
4	202541089115-FORM-9 [18-09-2025(online)].pdf	2025-09-18
5	202541089115-FORM FOR SMALL ENTITY(FORM-28) [18-09-2025(online)].pdf	2025-09-18
6	202541089115-FORM 1 [18-09-2025(online)].pdf	2025-09-18
7	202541089115-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [18-09-2025(online)].pdf	2025-09-18
8	202541089115-EVIDENCE FOR REGISTRATION UNDER SSI [18-09-2025(online)].pdf	2025-09-18
9	202541089115-EDUCATIONAL INSTITUTION(S) [18-09-2025(online)].pdf	2025-09-18
10	202541089115-DRAWINGS [18-09-2025(online)].pdf	2025-09-18
11	202541089115-DECLARATION OF INVENTORSHIP (FORM 5) [18-09-2025(online)].pdf	2025-09-18
12	202541089115-COMPLETE SPECIFICATION [18-09-2025(online)].pdf	2025-09-18