Abstract: Embodiments of the present disclosure relate to a system (100) and a method (700) of quantification of cancer hallmark activity. The system (100) implements a high-throughput neural multi-task learning architecture to quantify all cancer hallmark activities using transcriptomic data from routine tumor biopsies. A data acquisition unit (104) receives single-cell or bulk transcriptomic profiles, a hallmark scoring engine (110) computes digital scores using curated gene sets, and applies thresholding methods to generate binary hallmark annotations. These labeled datasets train an N-TML framework (106), which learns a shared representation across all hallmarks and applies task-specific layers for independent hallmark prediction. A model training and validation unit (108) oversees training using repeated cross-validation and optimizes the network with balanced loss functions and early stopping criteria. A prediction output interface (112) displays hallmark probabilities through interpretable visualizations and a clinical integration unit (114) correlates hallmark activity with staging, drug response, and survival outcomes.
Description:TECHNICAL FIELD
[0001] The present disclosure relates to the field of computational oncology. More particularly, the present disclosure relates to a system and method of quantification of cancer hallmark activity.
BACKGROUND
[0002] Cancer is a highly heterogeneous disease driven by complex molecular mechanisms, yet current diagnostic methods, such as histopathological grading, immunohistochemistry, and molecular profiling, are limited in their ability to comprehensively assess these mechanisms. These approaches typically focus on isolated biological aspects and fail to quantify the full spectrum of cancer hallmark activities in a single experiment. They overlook the dynamic interactions between cancer cells and the tumor microenvironment and do not integrate transcriptomic data to capture hallmark-specific processes. Although recent AI and machine learning tools have advanced biomarker prediction and treatment target identification, existing technologies still lack a unified, high-throughput framework for simultaneously quantifying all cancer hallmarks, leaving a critical gap in enabling personalized and holistic cancer diagnostics.
[0003] To address these limitations, the present disclosure provides a novel system and method that overcomes shortcomings of the prior art.
OBJECTS OF THE PRESENT DISCLOSURE
[0004] It is a primary object of the present disclosure to provide a system that simultaneously quantifies all cancer hallmark activities using tumor transcriptomic data.
[0005] It is another object of the present disclosure to provide a system that support precision oncology by enabling hallmark-based diagnosis and treatment planning.
SUMMARY
[0006] The present disclosure relates to the field of computational oncology. More particularly, the present disclosure relates to a system and method of quantification of cancer hallmark activity.
[0007] In an aspect, a system for quantification of cancer hallmark activity is disclosed. The system may include a processor and a memory coupled to the processor. The memory may include processor-executable instructions, which on execution, cause the processor to execute a sequence of tasks. The system may receive transcriptomic data from a tumor biopsy sample. The system may construct a normalized gene expression profile by applying rank transformation, log2 normalization, and feature standardization to the received transcriptomic data. The system may compute a probability score corresponding to a cancer hallmark based on the normalized gene expression profile by implementing a Neural Multi-Task Learning (N-MTL) framework trained on synthetic pseudo-bulk transcriptomic datasets.
[0008] In an aspect, a method for quantification of cancer hallmark activity is disclosed. The method may begin with receiving, by the processor, transcriptomic data from a tumor biopsy sample. The method may proceed with constructing, by the processor, a normalized gene expression profile by applying rank transformation, log2 normalization, and feature standardization to the received transcriptomic data. The method may end with computing, by the processor, a probability score corresponding to a cancer hallmark based on the normalized gene expression profile by implementing a Neural Multi-Task Learning (N-MTL) framework trained on synthetic pseudo-bulk transcriptomic datasets.
BRIEF DESCRIPTION OF DRAWINGS
[0009] The accompanying drawings are included to provide a further understanding of the present disclosure and are incorporated in, and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present disclosure, and together with the description, serve to explain the principles of the present disclosure.
[0010] In the figures, similar components, and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label with a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description applies to any one of the similar components having the same first reference label irrespective of the second reference label.
[0011] FIG. 1 illustrates an exemplary block diagram representation of a system for quantification of cancer hallmark activity, in accordance with an embodiment of the present disclosure.
[0012] FIG. 2 illustrates an exemplary representation of a workflow for cancer hallmark prediction using the system, in accordance with an embodiment of the present disclosure.
[0013] FIGs. 3A-3D illustrate exemplary representation of an evaluation of hallmark predictor performance on test and external validation datasets, in accordance with an embodiment of the present disclosure.
[0014] FIG. 4 illustrates an exemplary representation of probability distribution plots depicting predicted probabilities for hallmark-specific signatures shown across normal and cancer datasets, in accordance with an embodiment of the present disclosure.
[0015] FIG.s 5A-5D illustrate exemplary representations of heatmaps demonstrating the association between hallmark activity and clinical cancer staging in TCGA datasets, in accordance with an embodiment of the present disclosure.
[0016] FIG. 6 illustrates an exemplary representation of an impact of cancer drugs on hallmarks of cancer and overall patient survival, progression-free survival, and disease-free survival, in accordance with an embodiment of the present disclosure.
[0017] FIG. 7 illustrates an exemplary flowchart representation of a method for quantification of cancer hallmark activity, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
[0018] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit, and scope of the present disclosure as defined by the appended claims.
[0019] FIG. 1 illustrates an exemplary block diagram representation of a system for quantification of cancer hallmark activity, in accordance with an embodiment of the present disclosure.
[0020] Illustrated in Fig. 1 is a block diagram representation of the system for quantification of cancer hallmark activity 100 (hereafter referred to as system 100).
[0021] In an embodiment of the present disclosure, the system 100 may include a processing engine 102. The processing engine 102 may include a processor 102-2 operatively coupled with a memory 102-4. The processor 102-2 may include one or more microprocessors, Digital Signal Processors (DSPs), Application-Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), System-On-Chip (SoC) architectures, or any combination thereof, to execute machine-readable instructions stored in the memory 102-4. The processor 102-2 may enable simultaneous quantification of all cancer hallmark activities from transcriptomic data, facilitating comprehensive tumor profiling for precision oncology.
[0022] In an embodiment of the present disclosure, the memory 102-4 may include one or more non-transitory, computer-readable storage media, including, but not limited to, Random-Access Memory (RAM), Read-Only Memory (ROM), flash memory, magnetic storage, optical storage, or any combination thereof. The memory 102-4 may store machine-readable instructions executable by the processor 102-2. The memory 102-4 may further include volatile and/or non-volatile storage elements operatively coupled to the processor 102-2 to facilitate real-time processing and data retention for operation. The processing engine 102 may serve as the central control unit of the system 100, responsible for managing and coordinating all connected components.
[0023] In an embodiment of the present disclosure, the processor 102-2 may be operatively coupled with a data acquisition unit 104. The data acquisition unit 104 may be responsible for receiving transcriptomic input data derived from patient tumor biopsy samples in standardized formats such as Fragments Per Kilobase of transcript per Million mapped reads (FPKM) or Transcripts Per Million (TPM). The data acquisition unit 104 may preprocess raw sequencing reads by filtering low-quality base calls and aligning them to a reference genome using alignment tools prior to feature extraction. Gene expression matrices derived from this process may be structured into a high-dimensional input space suitable for downstream normalization. The data acquisition unit 104 may apply initial quality control filters, removing genes and samples that fall below predefined thresholds for read depth or variability. The data acquisition unit 104 may also support integration of metadata, including, but not limited to, tissue type, tumor stage, and clinical annotations, enabling contextual learning within the neural multi-task learning framework. Feature vectors prepared by the data acquisition unit 104 may be adapted for ingestion by machine learning models, including UCell-based scoring engines and deep shared encoders used in the hallmark prediction pipeline. By enabling reliable ingestion and transformation of diverse transcriptomic profiles, the data acquisition unit 104 may ensure standardized input for hallmark quantification across multiple cancer types within the system 100.
[0024] In an embodiment of the present disclosure, the processor 102-2 may be operatively coupled with a Neural Multi-Task Learning (N-TML) framework 106. The N-TML framework 106 may receive standardized gene expression vectors from the data acquisition unit 104 and encode them into a shared latent space capturing hallmark-agnostic biological representation. This shared representation may be processed through task-specific neural heads, each corresponding to one of the ten cancer hallmarks, enabling concurrent yet distinct learning paths within a unified architecture. These core hallmarks may include: (1) sustaining proliferative signaling to drive uncontrolled growth; (2) evading growth suppressors to bypass regulatory constraints; (3) resisting cell death to survive environmental and intracellular stress; (4) enabling replicative immortality to achieve limitless cell division; (5) inducing angiogenesis to ensure a continuous nutrient supply through neovascularization; and (6) activating invasion and metastasis to colonize distant tissues. This framework has been expanded to incorporate emerging hallmarks, such as (7) deregulating cellular energetics to sustain rapid proliferation, and (8) avoiding immune destruction by escaping immune surveillance. Enabling characteristics, including (9) genome instability and mutation, which accelerate tumor evolution, and (10) tumor-promoting inflammation, which supports a microenvironment conducive to malignancy, further illustrate the complexity of cancer biology. The N-TML framework 106 may utilize dense feedforward layers with non-linear activations, optimized using the Adam optimizer with task-specific loss weighting to balance predictive performance across hallmark tasks. During training, the N-TML framework 106 may employ binary cross-entropy as the primary loss function, augmented by early stopping and learning rate decay to enhance convergence stability. Regularization techniques, including, but not limited to, dropout and batch normalization may be employed to mitigate overfitting, especially in high-dimensional transcriptomic feature spaces. The N-TML framework 106 may be trained on synthetic pseudo-bulk biopsy datasets labeled via UCell scoring and Otsu’s thresholding, simulating real-world hallmark expression profiles. Through this architecture, the N-TML framework 106 may enable the system 100 to model hallmark interdependencies and produce probabilistic hallmark activation scores with high precision and biological relevance.
[0025] In an embodiment of the present disclosure, the processor 102-2 may be operatively coupled with a model training and validation unit 108. The model training and validation unit 108 may orchestrate the supervised learning process by ingesting labelled hallmark-specific synthetic datasets generated through the data acquisition unit 104. The model training and validation unit 108 may partition data using stratified five-fold cross-validation repeated across multiple iterations to ensure robustness and reduce variance in performance estimates. During training, the model training and validation unit 108 may optimize the N-TML framework 106 using the Adam optimizer with a low learning rate and gradient backpropagation to minimize binary cross-entropy loss across all hallmark tasks. The model training and validation unit 108 may apply early stopping criteria based on stagnation in validation loss and incorporate a learning rate scheduler to dynamically adjust model convergence rates. Validation outputs may be quantitatively evaluated using metrics, including, but not limited to, F1-score, balanced accuracy, precision, recall, Area Under the Receiver Operating Characteristic Curve (AUROC) and Area Under the Precision-Recall Curve (AUPRC) across both internal and external datasets. The model training and validation unit 108 may further assess model generalizability using independent transcriptomic cohorts, including those from The Cancer Genome Atlas and Genotype-Tissue Expression datasets. Through these processes, the model training and validation unit 108 may ensure that the N-TML framework 106 within the system 100 achieves high sensitivity, specificity, and reproducibility in cancer hallmark quantification.
[0026] In an embodiment of the present disclosure, the processor 102-2 may be operatively coupled with a hallmark scoring engine 110. The hallmark scoring engine 110 may compute hallmark-specific digital scores from transcriptomic data ingested via the data acquisition unit 104 using gene set activity quantification techniques, including, but not limited to, UCell. These digital scores may reflect the enrichment of hallmark-associated gene sets, each curated through literature mining and validated using hazard ratio thresholds derived from Cox proportional hazards modeling. The hallmark scoring engine 110 may apply Otsu’s thresholding method to binarize these digital scores, thereby generating ground-truth labels for supervised training within the N-TML framework 106. To ensure tissue-specific precision, the hallmark scoring engine 110 may dynamically adjust threshold values based on tissue origin metadata embedded in the input. The binarized hallmark labels may be transmitted to the model training and validation unit 108 to supervise multi-task learning and enable probabilistic output calibration. Further, the hallmark scoring engine 110 may assist in post-inference benchmarking by comparing predicted hallmark scores against empirical distributions observed in normal and malignant tissues. Through these mechanisms, the hallmark scoring engine 110 may facilitate biologically grounded supervision, contributing to the robust hallmark classification capabilities of the system 100.
[0027] In an embodiment of the present disclosure, the processor 102-2 may be operatively coupled with a prediction output interface 112. The prediction output interface 112 may receive probabilistic hallmark activation scores from the N-TML framework 106 following successful inference on preprocessed transcriptomic data supplied via the data acquisition unit 104. These scores may represent independent probability estimates for the presence or absence of each of the ten cancer hallmarks, derived from hallmark-specific output layers trained by the model training and validation unit 108. The prediction output interface 112 may apply a post-processing layer to calibrate these probabilities using techniques such as Platt scaling or isotonic regression to enhance interpretability for clinical contexts. Visualization modules within the prediction output interface 112 may render these scores as interactive plots, heatmaps, or radar charts, enabling oncologists to discern hallmark activity patterns across individual tumor profiles. The prediction output interface 112 may also integrate metadata from the hallmark scoring engine 110 to annotate predictions with gene-level or pathway-level insights, reinforcing biological interpretability. The prediction output interface 112 may further support longitudinal tracking of hallmark activity when serial biopsy data is available, facilitating dynamic treatment monitoring. Through these functionalities, the prediction output interface 112 may serve as an interpretive endpoint of the system 100, translating deep learning outputs into actionable clinical intelligence.
[0028] In an embodiment of the present disclosure, the processor 102-2 may be operatively coupled with a clinical integration unit 114. The clinical integration unit 114 may aggregate probabilistic hallmark scores generated by the N-TML framework 106 and visualized via the prediction output interface 112 with patient-specific clinical metadata, including, but not limited to, American Joint Committee on Cancer (AJCC) stage, Tumor, Node, Metastasis (TNM) classification, treatment regimen, and survival outcomes. The clinical integration unit 114 may apply statistical association models, including odds ratio estimation and Kolmogorov-Smirnov testing, to evaluate correlations between hallmark activity and clinical cancer staging data sourced through the data acquisition unit 104. The clinical integration unit 114 may leverage logistic regression models trained on large-scale datasets, including, but not limited to, The Cancer Genome Atlas, to compute hallmark-specific impact scores predictive of therapeutic response and patient survival metrics, including Overall Survival, Progression-Free Survival, and Disease-Free Survival. Time-to-event modeling using Cox proportional hazards regression may be applied to assess the prognostic significance of hallmark activations across diverse cancer types. The clinical integration unit 114 may also support drug-hallmark interaction mapping by analyzing treatment outcomes in relation to predicted hallmark engagement, thereby enabling hallmark-guided therapy optimization. Integration with electronic health records may allow bidirectional data flow, enabling real-time clinical decision support based on updated molecular and treatment information. Through these processes, the clinical integration unit 114 may translate deep learning-driven hallmark quantification into clinically actionable strategies within the system 100.
[0029] In an embodiment of the present disclosure, the processor 102-2 may be operatively coupled with user interface 116. The user interface 116 may enable clinicians and researchers to upload transcriptomic biopsy data to the data acquisition unit 104 through a secure, interactive web-based environment. The user interface 116 may visualize hallmark probability outputs computed by the N-TML framework 106 and passed through the prediction output interface 112 using dynamic graphical elements, including, but not limited to, violin plots, heatmaps, or dimensionality reduction embeddings (e.g., UMAP or t-SNE). The user interface 116 may also allow overlaying of clinical metadata processed by the clinical integration unit 114, enabling multi-layered interpretation of hallmark activities in relation to patient-specific outcomes. Comparative analysis modules may support benchmarking of current results against public cohorts used during training and validation by the model training and validation unit 108. Through these functionalities, the user interface 116 may serve as a primary point of interaction for exploring hallmark quantification results generated across components of the system 100, including outputs from the hallmark scoring engine 110.
[0030] In an embodiment of the present disclosure, the system 100 may receive the transcriptomic data from the tumor biopsy sample. The system 100 may construct the normalized gene expression profile by applying rank transformation, log2 normalization, and feature standardization to the received transcriptomic data. Further, the system 100 may compute the probability score corresponding to the cancer hallmark based on the normalized gene expression profile by implementing a Neural Multi-Task Learning (N-MTL) framework trained on synthetic pseudo-bulk transcriptomic datasets. The system 100 may initiate processing by acquiring transcriptomic data from a tumor biopsy sample through the data acquisition unit 104, which may ensure high-quality input by applying preliminary filtering and alignment procedures. The system 100 may construct the normalized gene expression profile by performing rank transformation, log2 normalization, and feature standardization to eliminate batch effects and harmonize inter-sample variability. The normalized profile may be transmitted to the N-TML framework 106, where a shared representation may be extracted across all hallmarks using dense, non-linear layers trained on synthetic pseudo-bulk transcriptomic datasets annotated via the hallmark scoring engine 110. The N-TML framework 106 may then project the shared features into hallmark-specific subnetworks to compute independent probability scores for each cancer hallmark. The model training and validation unit 108 may supervise this architecture using multi-task learning objectives, with binary cross-entropy loss minimized through the Adam optimizer and validated via repeated cross-validation to ensure generalizability. The resulting hallmark probabilities may be passed to the prediction output interface 112 for visual rendering and decision support, while the clinical integration unit 114 may correlate these scores with patient outcomes, staging, and treatment response. The user interface 116 may expose this pipeline through a secure portal, enabling real-time clinician access to molecular insights derived from multi-hallmark quantification in the system 100.
[0031] In an embodiment of the present disclosure, the system 100 may initiate molecular analysis by ingesting transcriptomic data through the data acquisition unit 104, which may apply filtering, alignment, and pre-formatting procedures to prepare gene expression matrices for normalization. The processor 102-2 may perform transcriptomic feature normalization by applying rank transformation to remove distributional biases across samples. Log2 scaling may be used to compress the dynamic range of expression values and stabilize variance. Batch-standardization techniques may be applied to mitigate batch effects and harmonize inter-sample variability across input datasets from different experimental platforms. Following normalization, the processor 102-2 may construct a high-dimensional feature space containing gene-level expression information. Dimensionality reduction may be executed by the N-TML framework 106 through shared hidden-layer representations that capture pan-hallmark biological signals. These shared layers may be trained on pseudo-bulk data synthesized from over 3.1 million single-cell transcriptomes, enabling robust generalization across tumor types.
[0032] In an embodiment of the present disclosure, the system 100 may implement shared encoding layers that model hallmark co-dependencies based on multi-label hallmark presence matrices derived from UCell scoring and Otsu’s thresholding. The shared representation may be critical for capturing the biological interplay between hallmarks, which often function in interdependent or compensatory patterns. Each hallmark-specific pathway may be represented by a parallel task-specific output head in the N-TML framework 106. The processor 102-2 may execute hallmark-specific classification using these parallel heads, trained independently with binary cross-entropy loss functions. The Adam optimizer may be used to update model weights during backpropagation, supporting convergence across all tasks simultaneously. The model training and validation unit 108 may regulate training using stratified five-fold cross-validation repeated twice to ensure statistical robustness. Validation loss monitoring and learning rate scheduling may further enhance model stability during training. The hallmark scoring engine 110 may provide binary hallmark labels derived from digital scores, which act as supervision signals for the N-TML framework 106.
[0033] In an embodiment of the present disclosure, to enable binary label generation, hallmark gene sets may be curated from multi-source literature databases and filtered by hazard ratio thresholds using Cox proportional hazards modeling. The hallmark scoring engine 110 may compute enrichment scores using UCell and binarize them using Otsu's method for each hallmark across tissue-specific samples. These labeled datasets may be fed to the processor 102-2 to train supervised models under the multi-task paradigm. The processor 102-2 may also execute cancer-normal tissue discrimination using supervised classification layers trained on probability distributions derived from normal datasets such as Genotype-Tissue Expression (GTEx) and Atlas of Normal Tissue Expression (ANTE), contrasted with hallmark-positive profiles from The Cancer Genome Atlas (TCGA), Cancer Cell Line Encyclopedia (CCLE), and Metastatic Tumor Project 500 (MET500). Discriminative performance may be evaluated using metrics such as Area Under the Receiver Operating Characteristic Curve (AUROC) and Area Under the Precision-Recall Curve (AUPRC).
[0034] In an embodiment of the present disclosure, the system 100 may compute odds ratios for hallmark-staging associations by leveraging AJCC stage and TNM classification metadata integrated via the clinical integration unit 114. These statistical models may quantify the strength of hallmark activation across tumor stages and metastatic progression levels. The clinical integration unit 114 may further correlate hallmark activity with survival metrics such as Overall Survival, Progression-Free Survival, and Disease-Free Survival. The processor 102-2 may evaluate hallmark-staging relationships by applying Kolmogorov-Smirnov testing to measure the statistical divergence of hallmark distributions across stages. The model training and validation unit 108 may assess cancer-type generalization by holding out datasets from external studies and evaluating performance via stratified k-fold cross-validation. Hallmark-labeled synthetic biopsies representing at least fourteen tumor types may be used for this purpose, ensuring the system 100 performs uniformly across tissue diversity.
[0035] In an embodiment of the present disclosure, the system 100 may also perform co-expression dependency learning by implementing deep feedforward layers trained on aggregated pseudo-bulk transcriptomic vectors. These deep representations may facilitate learning of complex non-linear relationships between genes involved in hallmark pathways. Multitask objectives applied across shared and task-specific layers may ensure that learning is balanced across hallmarks. Regularization techniques such as dropout and batch normalization may be applied during training to prevent overfitting. The prediction output interface 112 may display hallmark probability scores in real time through visual modalities including heatmaps, bar plots, or dimensionality reduction scatter plots. These outputs may be rendered to the clinician through the user interface 116, allowing exploration of tumor-specific hallmark activations.
[0036] In an embodiment of the present disclosure, the user interface 116 may overlay hallmark predictions with clinical metadata for enhanced interpretability. The hallmark probability scores may be dynamically updated when new transcriptomic or clinical data are uploaded. The clinical integration unit 114 may compute impact scores of hallmark-specific drug regimens using logistic regression models trained on TCGA patient outcomes. Through these interconnected modules, the processor 102-2 may enable the system 100 to quantify hallmark activities with high precision, interpret them in a clinical context, and support personalized cancer therapy planning.
[0037] In an embodiment of the present disclosure, the system 100 may implement the high-throughput neural multi-task learning (N-TML) framework 106 that may quantify all cancer hallmark activities simultaneously using transcriptomic data obtained from routine tumor biopsies. Utilizing a dataset comprising nine-hundred and forty-one tumor samples across fourteen tissue types and over 3.1 million cells aggregated from fifty-six studies, the N-TML framework 106 may be trained and validated using repeated five-fold cross-validation and may be further tested on independent datasets, including publicly available repositories such as The Cancer Genome Atlas (TCGA), Genotype-Tissue Expression (GTEx), Metastatic Tumor Project 500 (MET500), Personalized OncoGenomics (POG570), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), Cancer Cell Line Encyclopedia (CCLE), and Pan-Cancer Analysis of Whole Genomes (PCAWG). Validation across four prespecified groups, internal test set, external test set, normal samples, and cancer samples, may confirm high performance, with balanced accuracy, precision, recall, and F1 scores possibly exceeding 99% in internal evaluations and reaching at least 96.6% in external validations. Specificity analyses may reliably distinguish normal tissues, while sensitivity tests may demonstrate robust detection of hallmark activities across a wide spectrum of cancer types. The N-TML framework 106 may capture hallmark interdependencies, enabling a comprehensive molecular characterization of tumors and supporting the identification of hallmark-specific vulnerabilities for personalized therapeutic interventions.
[0038] In an embodiment of the present disclosure, the user interface 116 may allow oncologists to visualize hallmark activity, integrate clinical metadata, and guide treatment planning. Altogether, the system 100 may represent a foundational advancement toward embedding hallmark-based diagnostics into routine precision oncology workflows.
[0039] In an embodiment of the present disclosure, the system 100 may initiate synthetic data construction by acquiring single-cell transcriptomic datasets through the data acquisition unit 104, where each cell profile may undergo stringent quality control to filter out low-quality cells based on metrics such as mitochondrial gene content and transcript count thresholds. Post-filtering, the hallmark scoring engine 110 may compute hallmark-specific digital enrichment scores using UCell, leveraging curated gene sets associated with each hallmark. These UCell scores may be thresholded using Otsu’s method to yield binary annotations indicating the presence or absence of hallmark activity for each cell. Synthetic pseudo-bulk samples may then be constructed by aggregating hallmark-annotated single-cell profiles from individual patients, preserving intra-sample biological fidelity and inter-sample independence. The aggregated datasets may be normalized using a pipeline of rank transformation, log2 normalization, and standard scaling to prepare input matrices for training. The normalized data may be used to train the N-TML framework 106, which may learn shared and task-specific representations under multi-task learning objectives, as orchestrated by the model training and validation unit 108. For inference, new transcriptomic data may be processed through the same normalization pipeline before being passed to the trained N-TML framework 106, which may generate hallmark activation probabilities. These probabilities may be visualized by the prediction output interface 112 and explored via the user interface 116, with downstream interpretation supported by the clinical integration unit 114.
[0040] FIG. 2 illustrates an exemplary representation of a workflow for cancer hallmark prediction using the system, in accordance with an embodiment of the present disclosure.
[0041] Illustrated in Fig. 2 is a representation 200 of a workflow for cancer hallmark prediction using the system 100. Single-cell data from multiple cancer types may undergo quality control to filter out low-quality cells. Hallmark-specific gene expression may be digitally scored, producing binary annotations (Yes/No) indicating the presence or absence of each hallmark. Annotated cells may be used to generate synthetic (pseudo-bulk) datasets for each hallmark, which may be utilized to train a Multi-Task Neural Network (M-TNN). The M-TNN may learn a shared representation across all hallmarks, followed by hallmark-specific layers that may enable precise prediction of individual hallmark activities.
[0042] In an embodiment of the present disclosure, in block 202, the system 100 may acquire single-cell transcriptomic data spanning multiple cancer types through the data acquisition unit 104, aggregating high-resolution expression profiles at the cellular level. In block 204, the system 100 may perform stringent filtering and quality control to remove low-quality cells, based on metrics such as mitochondrial content and gene expression count thresholds, thereby ensuring integrity of downstream analysis. In block 206, the hallmark scoring engine 110 may digitally score each cell for hallmark activity by computing gene set enrichment using methods, including, but not limited to, UCell, resulting in a continuous score matrix representing hallmark intensities. In block 208, the hallmark scoring engine 110 may apply Otsu’s thresholding to these scores to produce binary annotations for each hallmark, assigning "Yes" or "No" labels based on hallmark presence or absence. In block 210, synthetic data generation may be performed by aggregating hallmark-annotated single-cell profiles into pseudo-bulk datasets that preserve biological variability across patient samples. These synthetic biopsy datasets may be supplied to the model training and validation unit 108, which may oversee the training of the N-TML framework 106 using multi-task learning objectives. In block 212, the N-TML framework 106 may extract a shared representation of transcriptomic features through a common input layer, capturing hallmark-independent biological patterns. In block 214, hallmark-specific output layers may independently learn to classify the presence of each hallmark based on the shared embedding, allowing the system 100 to model hallmark interdependencies. The prediction output interface 112 may receive probabilistic outputs from each hallmark-specific layer and prepare them for visual rendering through the user interface 116. The clinical integration unit 114 may incorporate these outputs with staging, drug-response, and survival data, enabling clinicians to interpret hallmark activity in the broader context of precision oncology using the system 100.
[0043] FIG.s 3A-3D illustrate exemplary representation of an evaluation of hallmark predictor performance on test and external validation datasets, in accordance with an embodiment of the present disclosure.
[0044] Illustrated in Fig. 3A-3D are graphical representations of a mean Receiver Operating Characteristic (ROC) curve 300A, a mean precision-recall curve on intrinsic test datasets from 5-fold cross-validation repeated twice 300B, an ROC curve 300C, and a precision-recall curve on external validation datasets across all hallmarks 300D. Labels indicate each hallmark along with the Area Under the Curve (AUC) ± standard deviation.
[0045] In an embodiment of the present disclosure, 300A and 300B may reflect intrinsic test results from the model training and validation unit 108, showing that the N-TML framework 106 achieves near-perfect true positive rates and precision across all ten hallmarks, with AUROC and AUPRC values approaching 1.0 and minimal standard deviations. These predictions may be derived from hallmark-annotated synthetic datasets generated via the hallmark scoring engine 110 and supplied by the data acquisition unit 104. 300C and 300D may demonstrate external validation performance across independent datasets, indicating strong generalizability of the N-TML framework 106, with ROC and PR values consistently high. The output scores may be visualized through the prediction output interface 112 and interpreted by clinicians using the user interface 116, potentially informing clinical decision-making through integration with patient metadata by the clinical integration unit 114.
[0046] FIG. 4 illustrates an exemplary representation of probability distribution plots depicting predicted probabilities for hallmark-specific signatures shown across normal and cancer datasets, in accordance with an embodiment of the present disclosure.
[0047] Illustrated in Fig. 4 is a representation 400 of representation of probability distribution plots depicting predicted probabilities for hallmark-specific signatures shown across normal and cancer datasets.
[0048] In an embodiment of the present disclosure, normal datasets on a left panel may include Genotype-Tissue Expression (GTEx) and GSE120795 (ANTE), where the system 100 may display distinct density distributions of predicted probabilities that may emphasize hallmark-specific variations in normal tissues. These predictions may be derived from gene expression data processed through the data acquisition unit 104 and scored using the hallmark scoring engine 110. A right panel may feature cancer datasets, including The Cancer Genome Atlas (TCGA), Cancer Cell Line Encyclopedia (CCLE), Personalized OncoGenomics 570 (POG570), Pan-Cancer Analysis of Whole Genomes (PCAWG), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and Metastatic Tumor Project 500 (MET500), showing shifted distributions that may indicate elevated hallmark activity in malignant states. The N-TML framework 106 may compute these hallmark probabilities, while the model training and validation unit 108 may ensure their accuracy across diverse data sources. The resulting outputs may be rendered via the prediction output interface 112 and visualized through the user interface 116 for downstream interpretation by clinicians using the clinical integration unit 114.
[0049] FIG.s 5A-5D illustrate exemplary representations of heatmaps demonstrating the association between hallmark activity and clinical cancer staging in TCGA datasets, in accordance with an embodiment of the present disclosure.
[0050] Illustrated in Fig. 5A-5D are representations of AJCC stage 500A, metastasis stage 500B, node stage 500C, and tumor stage 500D in a form of heatmaps demonstrating the association between hallmark activity and clinical cancer staging in TCGA datasets. The color intensity in each heatmap reflects the magnitude of the association, with darker shades indicating higher associations. Asterisks (*) mark statistically significant associations (p-value < 0.05).
[0051] In an embodiment of the present disclosure, 500A may illustrate associations between AJCC stage and hallmark scores computed by the N-TML framework 106 using transcriptomic input acquired via the data acquisition unit 104 and annotated through the hallmark scoring engine 110. 500B, 500C, and 500D may show similar correlations for metastasis stage (M0 vs. M1), node stage (n0 to n2/3), and tumor stage (t1 to t4), respectively, where hallmark activity may intensify with advancing clinical stage. These associations may be derived through odds ratio calculations and statistical significance testing by the model training and validation unit 108, with asterisks denoting significant hallmark-stage co-occurrence. The results may be visualized via the prediction output interface 112 and explored through the user interface 116, enabling oncologists to identify stage-specific hallmark patterns and inform personalized treatment strategies within the system 100.
[0052] FIG. 6 illustrates an exemplary representation of an impact of cancer drugs on hallmarks of cancer and overall patient survival, progression-free survival, and disease-free survival, in accordance with an embodiment of the present disclosure.
[0053] Illustrated in Fig. 6 is a representation 600 of an impact of cancer drugs on hallmarks of cancer and overall patient survival, progression-free survival, and disease-free survival. Colour intensity represents an effect of various drugs on distinct cancer hallmarks, with darker shades indicating higher impact scores, highlighting their contribution to improving patient survival outcomes.
[0054] In an embodiment of the present disclosure, relationship between cancer hallmark activity and therapeutic response outcomes may be demonstrated across three survival metrics: Disease-Free Survival (A), Progression-Free Survival (B), and Overall Survival (C), as analyzed by the clinical integration unit 114 of the system 100. Each heatmap may show an impact of various drugs on specific hallmark pathways, based on transcriptomic profiles acquired through the data acquisition unit 104 and annotated by the hallmark scoring engine 110. The N-TML framework 106 may compute hallmark activation probabilities, while logistic regression models trained by the model training and validation unit 108 may quantify the association between drug efficacy and hallmark modulation. Darker shades may indicate higher impact scores, suggesting that the corresponding treatment may significantly influence survival outcomes through targeted hallmark suppression or alteration. These insights may be delivered to clinicians via the prediction output interface 112 and the user interface 116, supporting data-driven selection of hallmark-guided therapeutic regimens within the system 100. Table 1 displays sample information along with the performance evaluation of the system 100 for the prediction of 10 cancer hallmarks. The model's predictive accuracy was assessed using 5-fold cross-validation repeated twice on the primary dataset and validated on five independent external datasets. Metrics include accuracy score, precision score, recall score, f1-score, and balanced accuracy. Sample information includes the number of positive and negative samples along with the number of patients from which the given samples were generated. As given in Table 1, sample information along with performance evaluation of the system 100 for the prediction of 10 cancer hallmarks: The model's predictive accuracy was assessed using 5-fold cross-validation repeated twice on the primary dataset and validated on five independent external datasets. Metrics include accuracy score, precision score, recall score, f1-score, and balanced accuracy. Sample information includes the number of positive and negative samples, along with the number of patients from which the given samples were generated. Table 2 displays the Kolmogorov-Smirnov (K-S) test statistic and p-value of the hallmark-specific probability difference in the model prediction. Kolmogorov–Smirnov (K–S) test results evaluating hallmark distributional differences captured by the system 100: Higher K–S statistical values indicate a greater ability of the system 100 to distinguish the activation status of a given hallmark. The zero p-values confirm that the observed distributional differences are statistically significant, reinforcing the model’s effectiveness in capturing biologically relevant hallmark variations.
Table 1
Table 2
[0055] FIG. 7 illustrates an exemplary flowchart representation of a method for quantification of cancer hallmark activity, in accordance with an embodiment of the present disclosure.
[0056] Illustrated in Fig. 7 is a representation of a method 700 of quantification of cancer hallmark activity. The method 700 may begin with receiving 702, by the processor 102-2, transcriptomic data from a tumor biopsy sample. The method 700 may proceed with constructing 704, by the processor 102-2, a normalized gene expression profile by applying rank transformation, log2 normalization, and feature standardization to the received transcriptomic data. The method 700 may end with computing 706, by the processor 102-2, a probability score corresponding to a cancer hallmark based on the normalized gene expression profile by implementing a Neural Multi-Task Learning (N-MTL) framework trained on synthetic pseudo-bulk transcriptomic datasets.
[0057] In an embodiment of the present disclosure, the system 100 may initiate operation by acquiring transcriptomic data from tumor biopsy samples through the data acquisition unit 104, which may apply preprocessing techniques including rank transformation, log2 normalization, and standardization to harmonize input features. The hallmark scoring engine 110 may then generate digital scores for hallmark-associated gene sets using UCell-based enrichment analysis and derive binary labels via Otsu's thresholding to annotate hallmark presence across samples. These annotated samples may be aggregated into pseudo-bulk datasets and passed to the model training and validation unit 108, which may partition the data using repeated five-fold cross-validation for robust model supervision.
[0058] In an embodiment of the present disclosure, the system 100 may implement the N-TML framework 106 that may train on these datasets by learning a shared feature representation through deep feedforward layers and refining hallmark-specific outputs through parallel task-specific branches. Binary cross-entropy loss may be minimized using the Adam optimizer, while early stopping and learning rate decay may be applied to prevent overfitting. Upon inference, the N-TML framework 106 may produce independent probability scores for each hallmark across new input samples. The prediction output interface 112 may visualize these scores as dynamic heatmaps, bar charts, or distribution plots to aid clinical interpretation. The clinical integration unit 114 may correlate hallmark probabilities with clinical metadata such as AJCC stages, TNM classification, drug treatments, and survival outcomes using logistic regression and odds ratio estimation. The user interface 116 may provide an interactive platform for clinicians to upload transcriptomic profiles, review hallmark-specific predictions, and support therapeutic decision-making workflows powered by the system 100.
[0059] A use case of the system 100 is described herein. A clinician may use the system 100 to assess hallmark activity in a breast cancer patient by uploading transcriptomic data through the user interface 116. The data acquisition unit 104 may preprocess the input using rank transformation, log2 normalization, and batch-standardization to produce a harmonized gene expression matrix. The hallmark scoring engine 110 may apply UCell-based enrichment analysis to compute hallmark-specific digital scores and derive binary labels via Otsu’s thresholding for internal validation. The processed input may be analyzed by the N-TML framework 106, which may infer hallmark activation probabilities using a shared encoder and task-specific neural heads trained via multi-task learning. The model training and validation unit 108 may ensure inference reliability through metrics such as F1-score and AUROC, calibrated from prior cross-validation and external testing. The prediction output interface 112 may visualize hallmark probabilities and overlay them with treatment outcomes and cancer staging data. The clinical integration unit 114 may interpret these predictions in the context of survival risk and drug response, guiding the selection of hallmark-targeted therapies.
ADVANTAGES OF THE INVENTION
[0060] The present disclosure provides a system that enables simultaneous quantification of all cancer hallmarks from biopsy transcriptomics.
[0061] The present disclosure provides a system that supports personalized treatment strategies through hallmark-driven molecular and clinical data integration.
, Claims:1. A system (100) for quantification of cancer hallmark activity, the system (100) comprising:
a processor (102-2); and
a memory (102-4) coupled to the processor (102-2), wherein the memory (102-4) comprises processor-executable instructions, which on execution, cause the processor (102-2) to:
receive transcriptomic data from a tumor biopsy sample;
construct a normalized gene expression profile by applying rank transformation, log2 normalization, and feature standardization to the received transcriptomic data; and
compute a probability score corresponding to a cancer hallmark based on the normalized gene expression profile by implementing a Neural Multi-Task Learning (N-MTL) framework trained on synthetic pseudo-bulk transcriptomic datasets.
2. The system (100) as claimed in claim 1, wherein the processor (102-2) performs transcriptomic feature normalization and harmonizes inter-sample variability across input datasets by implementing rank transformation, log2 scaling, and batch-standardization techniques.
3. The system (100) as claimed in claim 1, wherein the processor (102-2) executes gene expression dimensionality reduction by implementing shared hidden-layer representations in a Neural Multi-Task Learning (N-MTL) architecture trained on pseudo-bulk data synthesized from single-cell transcriptomes.
4. The system (100) as claimed in claim 1, wherein the processor (102-2) performs hallmark co-dependency modeling by implementing shared encoding layers within the N-MTL framework trained on multi-label hallmark presence matrices.
5. The system (100) as claimed in claim 1, wherein the processor (102-2) executes hallmark-specific classification by implementing parallel task-specific output layers trained using binary cross-entropy loss and optimized through an Adam optimizer.
6. The system (100) as claimed in claim 1, wherein the processor (102-2) executes cancer-normal tissue discrimination by implementing supervised classification layers trained on probability distributions from Genotype-Tissue Expression (GTEx) and Atlas of Normal Tissue Expression (ANTE) normal datasets compared against hallmark-positive samples from The Cancer Genome Atlas (TCGA), Cancer Cell Line Encyclopedia (CCLE), and Metastatic Tumor Project 500 (MET500).
7. The system (100) as claimed in claim 1, wherein the processor (102-2) executes hallmark-aware tumor staging correlation by implementing odds-ratio estimation models trained on predicted hallmark vectors and AJCC (American Joint Committee on Cancer)/Tumor, Node, Metastasis (Staging System) (TNM) stage labels from The Cancer Genome Atlas (TCGA) datasets.
8. The system (100) as claimed in claim 1, wherein the processor (102-2) performs cancer-type generalization by implementing k-fold cross-validation with external dataset holdouts trained on hallmark-labeled synthetic biopsies representing at least fourteen tumor types.
9. The system (100) as claimed in claim 1, wherein the processor (102-2) performs co-expression dependency learning by implementing deep feedforward layers in conjunction with multitask objectives, trained on pseudo-bulk transcriptomic vectors aggregated from hallmark-labeled single-cell inputs.
10. A method (100) for quantification of cancer hallmark activity, the method (100) comprising:
receiving (702), by a processor (102-2), transcriptomic data from a tumor biopsy sample;
constructing (704), by the processor (102-2), a normalized gene expression profile by applying rank transformation, log2 normalization, and feature standardization to the received transcriptomic data; and
computing (706), by the processor (102-2), a probability score corresponding to a cancer hallmark based on the normalized gene expression profile by implementing a Neural Multi-Task Learning (N-MTL) framework trained on synthetic pseudo-bulk transcriptomic datasets.
| # | Name | Date |
|---|---|---|
| 1 | 202511064233-STATEMENT OF UNDERTAKING (FORM 3) [04-07-2025(online)].pdf | 2025-07-04 |
| 2 | 202511064233-POWER OF AUTHORITY [04-07-2025(online)].pdf | 2025-07-04 |
| 3 | 202511064233-FORM FOR SMALL ENTITY(FORM-28) [04-07-2025(online)].pdf | 2025-07-04 |
| 4 | 202511064233-FORM 1 [04-07-2025(online)].pdf | 2025-07-04 |
| 5 | 202511064233-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [04-07-2025(online)].pdf | 2025-07-04 |
| 6 | 202511064233-EVIDENCE FOR REGISTRATION UNDER SSI [04-07-2025(online)].pdf | 2025-07-04 |
| 7 | 202511064233-EDUCATIONAL INSTITUTION(S) [04-07-2025(online)].pdf | 2025-07-04 |
| 8 | 202511064233-DRAWINGS [04-07-2025(online)].pdf | 2025-07-04 |
| 9 | 202511064233-DECLARATION OF INVENTORSHIP (FORM 5) [04-07-2025(online)].pdf | 2025-07-04 |
| 10 | 202511064233-COMPLETE SPECIFICATION [04-07-2025(online)].pdf | 2025-07-04 |
| 11 | 202511064233-FORM-9 [18-07-2025(online)].pdf | 2025-07-18 |
| 12 | 202511064233-FORM-8 [21-07-2025(online)].pdf | 2025-07-21 |
| 13 | 202511064233-FORM 18A [21-07-2025(online)].pdf | 2025-07-21 |
| 14 | 202511064233-EVIDENCE OF ELIGIBILTY RULE 24C1f [21-07-2025(online)].pdf | 2025-07-21 |
| 15 | 202511064233-Proof of Right [23-07-2025(online)].pdf | 2025-07-23 |