Home Overview FPS HistoRotate PathDino Dataset Results BibTex


The proposed Whole Slide Image (WSI) analysis pipeline incorporates a fast patch selection method, \(FPS\), which efficiently selects representative patches while preserving spatial distribution. The second component, \(HistoRotate\), introduces a \(360^\circ\) rotation augmentation for training histopathology models. Unlike natural images, histopathology patch rotation enhances learning without altering contextual information. The third module, \(PathDino\), is a compact histopathology Transformer with only five small vision transformer blocks and ≈\(9\) million parameters, markedly fewer than alternatives.. Customized for histology images, PathDino demonstrates superior performance and mitigates overfitting, a common challenge in histology image analysis.

FPS: Fast Patch Selection

FPS is a non-clustering-based patch-selecting algorithm that is capable of identifying a compact and yet highly representative subset of patches for analysis. This algorithm has been meticulously tuned to balance computational efficiency and diagnostic utility.

HistoRotate: Rotation-Agnostic Training

A \(360^\circ\) rotation augmentation for training models on histopathology images. Unlike training on natural images where the rotation may change the context of the visual data, rotating a histopathology patch does not change the context and it improves the learning process for better reliable embedding learning.

PathDino: Lightweight Histopathology Vision Transformer

PathDino is a lightweight histopathology transformer consisting of just five small vision transformer blocks. PathDino is a customized ViT architecture, finely tuned to the nuances of histology images. It not only exhibits superior performance but also effectively reduces susceptibility to overfitting, a common challenge in histology image analysis.

Attention Visualization. When visualizing attention maps, our PathDino transformer outperforms HIPT-small and DinoSSLPath, despite being trained on a smaller dataset of \(6\)M TCGA patches. In contrast, DinoSSLPath and HIPT were trained on much larger datasets, with \(19\) million and \(104\) million TCGA patches, respectively.


PathDino Pretraining Dataset. We extracted a total of \(6,087,558\) patches from \(11,765\) diagnostic TCGA WSIs. Specifically, \(3,969,490\) patches have a \(1024 \times 1024\) dimension, while \(2,118,068\) patches have a \(512 \times 512\) dimension. The extraction was conducted at a \(20\times\) magnification level, with a tissue threshold of \(90\%\). The pretrianing WSI list used from TCGA can be found (TCGA Diagnositc WSI List).


The results presented in Table 4 provide an extensive comparative analysis of models in patch-level histopathology image search. The standout performer is our proposed model, PathDino-512. The model not only outperforms others in terms of accuracy but also establishes new benchmarks in the Macro Average F1 score, a critical metric for robust evaluation. For internal datasets such as Mayo-Breast and Mayo-Liver, PathDino-512 achieves the highest Accuracy rates of 55.1% and 82.7%, respectively. More remarkably, it tops the Macro Average F1 score with 49.1% and 69.5% in the same datasets. These findings extend to public datasets like PANDA and CAMELYON16, where PathDino-512 records accuracy and macro average F1 scores of 48.3% and 46.3%, and 75.1% and 70.4%, respectively. While it’s important to note the strong performance of models like iBOT-Path and DinoSSLPathology, especially in public datasets, PathDino-512 consistently outperforms them across multiple metrics and datasets.


      title={Rotation-Agnostic Image Representation Learning for Digital Pathology}, 
      author={Saghir Alfasly and Abubakr Shafique and Peyman Nejat and Jibran Khan and Areej Alsaafin and Ghazal Alabtah and H.R. Tizhoosh},

KIMIA Lab, Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, USA

This web page template is borrowed from here.