AACR 2026 Highlights AI Innovations for Cancer Diagnostics Using Pathomics and DNA Methylation
The American Association for Cancer Research Annual Meeting 2026 (AACR 2026) in San Diego, California featured significant strides in applying artificial intelligence (AI) to clinical oncology. Researchers unveiled computational platforms designed to solve persistent clinical challenges in cancer care. Dr. Marco A. De Velasco from Kindai University presented a machine learning model that utilizes DNA methylation patterns to identify the origin of cancers of unknown primary. Dr. Rukhmini Bandyopadhyay from The University of Texas MD Anderson Cancer Center introduced a deep learning pathomics platform capable of predicting immunotherapy responses in patients with non-small cell lung cancer (NSCLC). Both tools leverage complex biological data to guide personalized treatment decisions and improve patient outcomes.
The Challenge of Cancers of Unknown Primary
Cancers of unknown primary (CUP) present a severe diagnostic and therapeutic hurdle for oncologists. These metastatic malignancies manifest without a clearly identifiable primary site in the body, sharing common features such as early dissemination, aggressive clinical behavior, and unpredictable metastatic patterns. Even though they account for a small fraction of all cancer diagnoses, they consistently rank among the top ten most common causes of cancer-related deaths worldwide. Because physicians cannot pinpoint the source of the cancer, they must often rely on broad and nonspecific chemotherapy regimens rather than targeted treatments.
“Only between 15% and 20% of patients with CUP show features that allow physicians to treat them with site-specific therapies, which are associated with better outcomes,” explained De Velasco. “However, most patients, between 80% and 85%, receive more general chemotherapy, which is often less effective. Patients receiving site-directed therapy can survive up to 24 months, compared with 6 to 9 months for those receiving standard treatment.”
The biological origin of these tumors remains an enigma, with researchers theorizing that they may stem from small, dormant, or completely regressed primary lesions. Previous attempts to solve this diagnostic dilemma evaluated gene activity or other genetic alterations. While some earlier molecular profiling methods showed initial promise, they generally failed to demonstrate clear survival benefits in clinical trials.
Developing the Methylation Fingerprint
To address this diagnostic gap, De Velasco and a team led by Kazuto Nishio developed a machine learning model focused strictly on CpG DNA methylation, where a methyl group is added to the 5th carbon of a cytosine base when it is immediately followed by a guanine base (a CpG site). Primarily, this epigenetic modification serves as a stable regulatory switch that determines whether specific genes are expressed or silenced in different tissues. It also acts as a reliable molecular fingerprint for different tissue types across the human body.
The research team analyzed methylation data from 7,421 patients across 21 distinct cancer types using The Cancer Genome Atlas and other public databases. Instead of relying on the entire genome, the researchers utilized a hybrid feature selection approach combining Shapley values and gradient boosting. They successfully narrowed the focus to exactly 1,000 CpG regions out of hundreds of thousands. The researchers also found that hypermethylation in these specific regions correlated strongly with reduced gene expression in certain tumors. Simplifying the complex molecular data into this focused signature makes the diagnostic test much more practical for future clinical implementation.
Validating the Diagnostic Algorithm
The resulting ridge regression model demonstrated predictive accuracy. The tool correctly identified the cancer type in 94.7% of test cohort cases. The research team then validated the algorithm using an independent dataset from Kindai University involving 31 cases representing 17 different cancer types. The algorithm maintained a classification accuracy of 87.1% in this real-world validation.
The researchers noted that misclassifications often occurred between biologically similar cancers. For instance, when grouping colon and rectal cancers together, the model achieved a 93.8% accuracy rate. These errors reflect inherent biological similarities rather than model failures. Furthermore, the algorithm’s performance remained independent of training dataset biases like class size or tumor purity. The researchers also used unsupervised analysis to explore tumor phenotypes, grouping the data into 20 distinct clusters to highlight biological heterogeneity across cancer types. The team now plans to adapt the algorithm for blood-based liquid biopsies to analyze circulating tumor DNA.
The Emergence of Pathomics in Oncology
As AI permeates cancer research, an emerging computational field known as pathomics is rapidly transforming digital pathology. Traditionally, human pathologists visually inspect hematoxylin and eosin stained slides to diagnose diseases and assess tumor grades. However, routine human evaluation relies on subjective interpretation and cannot easily capture complex, large-scale spatial relationships. Pathomics applies advanced machine learning algorithms to extract high-dimensional, quantitative data from these standard images. The technology quantifies sub-visual morphological features, cellular arrangements, and spatial heterogeneity within the tumor microenvironment.
The software maps intricate tissue architectures and cellular interactions that the human eye cannot adequately measure. By capturing the precise spatial distribution of lymphocytes and tumor cells, the technology helps reveal underlying immune evasion mechanisms. This wealth of quantitative data bridges the gap between microscopic anatomy and molecular biology. Translating visual tissue phenotypes into mineable datasets provides a robust foundation for developing new predictive biomarkers. Scientists recently leveraged this computational approach to tackle the unpredictability of immunotherapy outcomes in lung cancer and presented their findings at AACR 2026.
Moving Beyond Traditional Biomarkers with Deep Learning
Immunotherapy has transformed oncology, yet only a fraction of patients experience durable benefits. Oncologists currently rely on the FDA-approved PD-L1 biomarker to guide NSCLC treatment decisions. However, manual PD-L1 scoring often fails to accurately predict patient outcomes because it ignores the broader spatial context of the tumor microenvironment. To build better predictive tools, Bandyopadhyay and her team developed a deep learning survival prediction model called Pathology-driven Immunotherapy Optimization (Path-IO). The platform uses pathomics to analyze routine digital whole-slide images and extracts quantitative data regarding complex tissue architecture.
Using an attention-based multiple instance learning framework, the system identifies distinct spatial niches and physical interactions between tumor cells and surrounding immune cells. The software maps specific tissue habitats across the entire slide, differentiating between tumor-dominant regions lacking immune infiltration and areas demonstrating active tumor-immune engagement. Tumors displaying robust immune cell interactions generally fall into the low-risk category. This mechanism provides clear biological evidence explaining why certain patients respond favorably to immune checkpoint inhibitors. Bandyopadhyay noted that capturing these precise spatial relationships allows the model to surpass the predictive limitations of traditional biomarkers.
Validating the Pathomics Platform
The research team rigorously evaluated the algorithm using a discovery cohort of 797 NSCLC patients treated with immune checkpoint inhibitors at MD Anderson. They subsequently validated the computational tool across 280 additional patients sourced from the Mayo Clinic, Gustave Roussy, and the Lung-MAP phase three clinical trial. The results demonstrated that the software reliably stratifies patients into distinct clinical risk categories. In the primary discovery cohort, patients classified in the high-risk group faced more than double the risk of death or disease progression compared to their low-risk counterparts. This stratification provides oncologists with a concrete metric to anticipate treatment resistance.
To quantify predictive performance, the team utilized the concordance index to measure how accurately the model distinguished between different patient survival outcomes. A score of 0.5 indicates random chance while a score of 1.0 represents perfect prediction. The pathomics model consistently outperformed standard PD-L1 testing across all analyzed datasets. In the discovery cohort, the traditional PD-L1 test achieved a concordance index of 0.58 for overall survival and 0.57 for progression-free survival. The deep learning tool demonstrated stronger discriminative ability by reaching 0.69 for overall survival and 0.65 for progression-free survival. The independent validation cohorts mirrored these performance improvements.
Multimodal Integration for Future Clinical Use
The MD Anderson researchers further enhanced the predictive power of Path-IO by combining the pathology-based data with radiomics and clinical information. This multimodal integration boosted the concordance index to 0.75 for overall survival and 0.70 for progression-free survival. Bandyopadhyay pointed out that the software grounds its predictions in biological tissue structures familiar to pathologists. Because the system relies on standard hematoxylin and eosin stained slides, hospitals could potentially incorporate it into existing workflows without incurring the substantial costs associated with advanced molecular sequencing. Future steps involve prospective validation and incorporating comprehensive molecular profiling to refine the predictive capabilities.
As computational technology matures, these models exemplify the steady shift toward precision oncology. By unlocking hidden patterns in DNA methylation and routine pathology slides, researchers are equipping clinicians with practical tools to decode complex tumors and optimize patient care.









