This study addresses the challenge of generating synthetic histopathology images that preserve tissue heterogeneity and fine morphological details. While generative AI methods have shown success in natural image domains, their direct application to histopathology has been limited by a tendency to produce homogeneous tissue samples. The proposed framework, termed HeteroTissue-Diffuse, is a latent diffusion model that synthesizes heterogeneous histopathology images through a novel conditioning mechanism. The framework is designed to scale to both annotated and unannotated datasets, enabling the generation of realistic, diverse, and annotated synthetic tissue slides.
HeteroTissue-Diffuse is based on a latent diffusion model, operating in a compressed latent space rather than raw pixels to improve efficiency and stability. Instead of depending solely on text prompts or abstract embeddings, the model incorporates raw tissue exemplars alongside spatial information, ensuring that generated samples retain clinically relevant details such as nuclear texture, staining variations, and cellular morphology.
The key innovation here is the dual-conditioning mechanism which combines:
To address the lack of large-scale pixel-wise annotations, a self-supervised extension was developed for the TCGA dataset consisting of 11,765 whole-slide images. Patches from these slides were clustered into 100 tissue phenotypes using embeddings from a foundation model trained on histopathology. These clusters were then used to generate pseudo-semantic maps, enabling the training of the diffusion model without manual annotation. This approach allows the framework to scale to massive unannotated datasets while preserving patient privacy.
Quantitative evaluation used the Fréchet Distance (FD) to measure fidelity. On Camelyon16, FD was reduced from 430.1 to 72.0, a six-fold improvement when using the proposed conditioning.
Similar improvements were observed across PANDA and TCGA datasets, where FD decreased by factors of two to three. For segmentation tasks, DeepLabv3+ models trained exclusively on synthetic data achieved test IoU scores of 0.71 (Camelyon16) and 0.95 (PANDA), compared to 0.72 and 0.96 for real data. This demonstrates that synthetic data can nearly match the performance of real data in downstream tasks.
Qualitative evaluation involved a blinded study with a certified pathologist who rated 120 randomly selected images from both real and synthetic datasets without prior knowledge of their origin. The images were assessed on overall quality, structural detail, and nuclear morphology. Synthetic images generated with visual prompt conditioning received scores comparable to real images across all metrics.
The pathologist concluded that the two types of images were indistinguishable, noting in some cases that synthetic images appeared of equal or higher quality: "The generated images tended to have equal or higher quality than the real images."
For downstream applications, segmentation models trained on synthetic datasets achieved nearly the same performance as models trained on real data.
In Camelyon16 and PANDA, the difference in IoU between synthetic and real training was only 1-2%, demonstrating that synthetic datasets can be used not merely for augmentation but also as a substitute for real patient data. Models trained on unconditioned synthetic data showed larger performance drops, confirming that semantic and visual conditioning are essential for clinically viable synthesis.
@InProceedings{Alfasly2025HeteroTissueDiffuse, author = {Alfasly, Saghir and Uegami, Wataru and Hoq, MD Enamul and Alabtah, Ghazal and Tizhoosh, H.R.}, title = {Semantic and Visual Crop-Guided Diffusion Models for Heterogeneous Tissue Synthesis in Histopathology}, booktitle = {Neural Information Processing Systems (NeurIPS)}, month = {December}, year = {2025} }
KIMIA Lab, Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, USA
This web page template is borrowed from here.