An end-to-end multifunctional AI platform for intraoperative diagnosis

2025-07-20 07:21:20 英文原文

作者：Cai, Muyan

Introduction

Intraoperative frozen diagnosis is a critical technique in surgical pathology¹, allowing rapid tissue sample analysis to guide surgical decisions in real-time². However, the accuracy of this diagnosis depends significantly on the quality of the frozen tissue sections, which is often compromised by the time-sensitive nature of surgery³. Tissue degeneration, contraction during freezing, and suboptimal staining frequently lead to artifacts or unclear nuclear and cytoplasmic details, all of which can obscure diagnostic features^4,5^,. These quality issues make frozen section interpretation challenging for pathologists, affecting diagnostic reliability and confidence⁶. Therefore, improving the frozen section quality is critical to advancing intraoperative diagnostic accuracy.

Deep learning has emerged as a powerful tool in digital pathology for various tasks, including image quality enhancement^7,8. Generative adversarial networks (GANs) have shown particular promise, as they can learn from data distributions and generate synthetic data that closely mirrors real samples^9,10. Previous studies have shown that GANs can be effectively applied to image enhancement, producing high-resolution synthetic images that can even deceive experienced pathologists^11,12. The ability to generate high-quality synthetic images makes GANs a promising candidate for improving the quality of frozen histologic images. GANs have even been demonstrated to enhance cryosectioned image quality by transforming them to resemble formalin-fixed, paraffin-embedded (FFPE) tissue images, a standard in histopathology^13,14. Despite these promising advances, there remains a lack of comprehensive evaluation on the clinical impact, reliability, and utility of these methods within the context of an end-to-end intraoperative diagnostic workflow.

To address these challenges, we introduce GAS, a multifunctional, end-to-end framework tailored for intraoperative frozen diagnosis, which includes three integrated modules: the Generative, Assessment and Support modules. The Generation module, trained using a GAN-based multimodal network, effectively converts frozen section images into high-quality virtual FFPE-style images, guided by descriptive text annotations¹⁵. The Assessment module utilizes four quality control models, fine-tuned from foundational pathology models, to objectively assess the quality of these generated images. Substantial improvements in microstructural details are demonstrated over the original frozen sections. Finally, the Support module integrates GAS into routine clinical practice through a human-artificial intelligence (AI) collaboration platform, improving pathologists’ diagnostic confidence within prospective cohorts. Our study highlights the clinical utility of the GAS platform in intraoperative diagnostics, establishing a model for integrating end-to-end AI solutions into standard clinical workflow.

Results

Study participants

In this study, we introduced the GAS platform to provide a comprehensive diagnostic process that optimizes frozen section images, evaluates quality, and supports clinical decision-making through multiple integrated modules. To achieve this, we utilized six cohorts comprising over 6700 whole-slide images (WSIs) for model training and validation, as shown in Fig. 1. The Internal GZCC cohort served as the training and internal validation dataset, consisting of 553 frozen and 844 FFPE slides from 325 patients. External validation included four retrospective cohorts and one prospective cohort. Specifically, the External JMCH cohort comprised 228 frozen slides and 310 FFPE slides from 191 patients, the External HZCH cohort contained 225 frozen and 225 FFPE slides from 225 patients, the External DGPH cohort contained 114 frozen and 115 FFPE slides from 115 patients, the External JMPH cohort had 103 frozen and 86 FFPE slides each from 77 patients, the External ZSZL cohort had 1500 frozen slides from 674 patients, the TCGA cohort contained 1180 frozen and 857 FFPE slides from 815 patients. The prospective cohort, Pro-External GZCC, included 189 frozen and 179 FFPE slides from 188 patients. All slides were initially scanned at ×40 magnification and subsequently downsampled to ×20 for model training, considering both clinical relevance and computational efficiency. Details of participant characteristics are presented in Table 1.

**Fig. 1: The overview of the workflow.**

Table 1 Details of study participants

Development of a generation module for optimizing frozen section images

To generate virtual FFPE-like images from frozen sections, we developed the Generation module of the GAS platform. This module employed a GAN-based multimodal unpaired transfer network, with both image- and patch-level supervision, making it adaptable to various tumor types. The transformation process was further guided and compared using text descriptions representing various histological styles. Among these, the style described as ‘formalin-fixed paraffin-embedded tissues’ consistently yielded the most favorable results within the Internal GZCC cohort (Supplementary Table 1). Consequently, this FFPE style was adopted to guide the transformation, enhancing the visual fidelity of the synthetic images to real FFPE samples (Fig. 2a). Notable quality improvements were observed across multiple organ types (Fig. 2b–d, Supplementary Fig. 1 and Supplementary Data no. 1).

**Fig. 2: Development of the generation module.**

To assess the fidelity of the generated images compared to real FFPE images, a reader study was conducted. Three pathologists reviewed 200 image patches and classified them as either generated or real. The pathologists classified 50–72% of synthetic images as real FFPE and 26–47% of real FFPE images as generated (Fig. 2e–g). Low inter-observer agreement indicated that pathologists could not reliably differentiate real FFPE from synthetic images. An additional reader study was conducted to identify whether pathologists can distinguish frozen images from the generated images. Results showed a high classification accuracy ranging from 89.5 to 93.5% (Supplementary Fig. 2), indicating that the generated images were visually distinct from frozen images.

Evaluation of the generation module using Fréchet inception distance (FID)

The Generation module of the GAS platform was assessed for its performance in producing FFPE-like images using FID, a widely accepted metric for assessing image similarity and quality¹³. The FID values, which quantify the similarity of generated images to real FFPE images—where lower values indicate greater similarity - were evaluated against state-of-the-art generative models across multiple datasets (Fig. 3a). In the independent test cohort of Internal GZCC, GAS achieved a superior FID of 23.021, outperforming other generative models with FID values ranging from 23.717 to 64.977. Similar advantages were observed in external validation cohorts (Fig. 3b–g and Supplementary Table 2). These results collectively demonstrated that GAS consistently produces high-quality FFPE-like images that closely resembled real FFPE images across diverse datasets.

**Fig. 3: FID evaluation of the generation module.**

Establishment of an assessment module for quality control

To complement the evaluation of similarity between generated and FFPE images, we established the second component of the GAS platform: the Assessment module. This module comprised four quality control models designed to automatically assess critical pathological features: nuclear detail, cytoplasmic detail, extracellular fibrosis, and overall stain quality (Fig. 4a). A total of 1020 image patches from frozen, generated, and FFPE sections across multiple organs were used for model training and validation (Supplementary Table 3), with 816 patches fine-tuned on a pathological foundation model to train four binary classifiers. High- and low-quality labels were established by consensus between two expert pathologists, following criteria detailed in Supplementary Table 4. Compared to a 4-tier scoring system (excellent to very poor), the binary labeling showed higher inter-rater consistency (Supplementary Figs. 3 and 4). In independent testing, all four binary labeling models achieved strong performance: nuclear detail [Area under the curve (AUC 0.973, accuracy 0.922], cytoplasmic detail (AUC 0.975, accuracy 0.941), extracellular fibrosis (AUC 0.871, accuracy 0.877), and overall stain quality (AUC 0.965, accuracy 0.941) (Fig. 4b, c).

**Fig. 4: Development of the assessment module.**

To verify that the models effectively captured the intended features, Gradient-weighted Class Activation Mapping (Grad-CAM) was applied. Overlaying Grad-CAM heatmaps onto H&E patches, revealed that the nuclear detail, cytoplasmic detail, and extracellular fibrosis models accurately focused on their respective features: nuclei, cytoplasm, and extracellular fibrosis. These visualizations confirmed that the Assessment module successfully evaluated the quality of the specified pathological features (Fig. 4d and Supplementary Fig. 5).

Image quality evaluation by the assessment module

The Assessment module was utilized to assess and compare the quality of frozen images and GAS-generated images. Substantial improvements in the quality of generated patches were observed across all evaluated features in the Internal GZCC cohort. For nuclear detail, the proportion of high-quality patches increased from 38.26% in frozen patches to 61.19% in generated patches, marking a 22.93% improvement. In the cytoplasmic detail item, high-quality patches rosed from 31.19% in frozen patches to 64.78% in generated patches, an improvement of 33.59%. For extracellular fibrosis, the percentage of high-quality patches increased from 87.88% in frozen patches to 97.91% in generated patches, reflecting a 10.03% improvement. Regarding overall stain quality, high-quality patches increased from 70.93% in frozen patches to 92.23% in generated patches, demonstrating a 21.30% improvement (Fig. 4e).

The module was further applied across external validation cohorts, revealing consistent enhancements in image quality across diverse datasets. In the External JMCH cohort, generated images exhibited an increase in high-quality patches, with nuclear detail improving from 77.12 to 78.86%, cytoplasmic detail from 79.93 to 87.18%, extracellular fibrosis from 97.24 to 99.00%, and overall stain from 91.73 to 94.55%. In the External HZCH cohort, improvements were observed in nuclear detail (from 63.98 to 65.22%), cytoplasmic detail (from 69.14 to 78.75%), and overall stain quality (from 90.36 to 91.70%). In the External DGPH cohort, significant increases were seen in nuclear detail (from 56.64 to 69.38%), cytoplasmic detail (from 61.07 to 75.72%), extracellular fibrosis (from 95.74 to 98.31%), and overall stain quality (from 91.20 to 94.82%). The External JPCH cohort exhibited enhancements in nuclear detail (from 84.27 to 85.96%), cytoplasmic detail (from 87.29 to 92.36%), extracellular fibrosis (from 98.76 to 99.44%), and overall stain (from 96.97 to 97.89%). In the prospective Pro-External GZCC cohort, significant quality improvements were also observed in GAS-generated images compared to frozen images: nuclear detail increases from 8.74 to 25.45%, cytoplasmic detail improved from 20.45 to 40.45%, extracellular fibrosis rose from 80.32 to 87.50%, and overall stain improved from 41.57 to 69.27% (Supplementary Fig. 6). To assess its robustness, we applied our model to images with pronounced folds and poor quality. The generated images still demonstrated an increase in high-quality patches (Supplementary Table 5). These results underscore the effectiveness of the GAS platform in enhancing the quality of pathological microstructure, facilitating improved diagnostic evaluation.

Support module for clinical application

To assess the impact of GAS-generated images on diagnostic performance, we conducted several deep-learning classification tasks, comparing the results derived from GAS-generated images with those using original frozen images as inputs. For assessing margin positivity in breast cancer, GAS-generated images achieved a higher AUC of 0.874, compared to 0.862 for frozen images (P = 0.788, t test). Similarly, in distinguishing between benign and malignant breast lesions, the GAS-generated images outperformed frozen images, achieving an AUC of 0.784 compared to 0.765 (P = 0.254, t test). In addition, for differentiating between breast carcinoma in situ and invasive breast cancer, GAS-generated images again demonstrated superior performance, with an AUC of 0.722, compared to 0.696 for frozen images (P = 0.057, t test) (Fig. 5a–c). We also tested three additional tasks, including predicting sentinel lymph node metastasis in breast cancer, classifying lung cancers into adenocarcinoma or squamous cell carcinoma, and benign and malignant thyroid lesions. In these tasks, the generated images demonstrated higher diagnostic performance compared to frozen images (Supplementary Table 6). These consistently higher AUCs for GAS-generated images underscore the platform’s ability to transform frozen images into high-quality virtual FFPE-like images, significantly enhancing diagnostic accuracy in clinical applications.

**Fig. 5: Clinical application of the support module.**

To further explore the clinical utility of GAS in intraoperative diagnostic workflows, we developed a human–AI collaboration software. The software allowed pathologists to review frozen WSIs and identified regions of low-quality requiring closer evaluation. These regions could then be selectively converted into virtual FFPE-like images for enhanced clarity (Supplementary Movie 1). We conducted a prospective assessment using a sub-cohort from the Pro-External GZCC cohort, which included 45 breast lesion samples (34 benign and 11 malignant). Three pathologists initially provided diagnoses and assigned diagnostic confidence levels (on a scale of 1–3, where 1 represents low confidence and 3 high confidence) based on frozen WSIs. After employing the GAS platform to convert unclear regions into virtual FFPE-like regions, the pathologists finalized their diagnoses and confidence levels. The time required for the GAS platform was demonstrated in Supplementary Table 7. While their diagnostic outcomes for distinguishing benign from malignant breast lesions remained unchanged with GAS assistance, their confidence levels significantly improved (all P < 0.001, Wilcoxon signed-rank test; Fig. 5d–h). To further validate this, we conducted two additional reader studies focusing on margin assessment and sentinel lymph node metastasis evaluation, respectively, to assess whether the Support Module could enhance pathologists’ diagnostic confidence. Consistent results were observed, with all three pathologists showing significantly increased confidence when using the Support Module (all P < 0.001, Wilcoxon signed-rank test; Supplementary Fig. 7). These findings highlight the GAS platform’s capability to enhance diagnostic confidence effectively. Moreover, the selective transformation approach enables efficient integration into routine testing workflows, underscoring the platform’s potential for broad clinical adoption.

Discussion

Despite the growing interest in AI for histopathology, few systems have achieved clinical integration, largely due to the absence of end-to-end, multifunctional solutions¹⁶. To bridge this gap, we developed GAS—a comprehensive platform for intraoperative diagnosis that integrates three key modules: Generation for virtual FFPE-like image synthesis, Assessment for quality control, and Support for decision assistance via human–AI collaboration. Unlike previous approaches that focus on narrow tasks or specific tumor types¹³, GAS provides a complete diagnostic pipeline with broad applicability across organs and clinical scenarios. Its consistent performance in prospective settings demonstrates strong potential for real-world deployment, especially in time-sensitive intraoperative workflows.

A critical bottleneck in the development and deployment of AI pathology systems lies in data quality, which directly affects model performance^17,18. Manual quality control of large WSI datasets is impractical, and existing AI tools often focus only on technical artifacts such as blur or focus^19,20. In this study, we leveraged a pathology foundation model and introduced a lightweight adapter-based fine-tuning strategy to develop four quality control models targeting nuclear detail, cytoplasmic detail, extracellular fibrosis, and stain quality. Compared to full fine-tuning, the adapter framework minimizes annotation burden, reduces computational cost, and mitigates catastrophic forgetting²¹. This approach aligns with the practical needs of clinical AI systems and offers a scalable path for extending to other pathology tasks.

Although FID is commonly used to quantify image realism, we observed a discrepancy between high FID values and pathologists’ inability to distinguish generated images from real ones. This stems from fundamental differences in evaluation principles: FID captures global statistical differences based on Inception-v3 features²², while pathologists rely on local morphological cues. In addition, FID is sensitive to clinically irrelevant variations and does not directly reflect diagnostic utility. To address this, we introduced a domain-specific Assessment module that evaluates nuclear and cytoplasmic quality, providing a pathology-informed supplement to FID and enabling a more nuanced, task-relevant assessment of image fidelity.

Our study has several limitations. First, the quality control models were developed based on four microstructural features drawn from prior work on virtual staining evaluation²³. While aligned with standard pathology criteria, these features may not fully capture all diagnostic subtleties. Future work could explore unsupervised methods for more comprehensive image quality assessment with reduced reliance on manual annotation. Second, the Generation model shows limited robustness on severely degraded slides, highlighting the need to enhance artifact tolerance. Third, despite meeting intraoperative time constraints, system latency remains a concern. Techniques such as model compression, pruning, and mixed-precision inference may help reduce computational load in future clinical deployments. Lastly, the dataset was enriched for some types of cancer, such as breast and lung cancers, potentially limiting generalizability. Expanding to more diverse tumor types will be critical for broader validation.

In conclusion, GAS offers an end-to-end platform that enhances the reliability of intraoperative diagnoses and facilitates the integration of AI into clinical practice, paving the way for more efficient intraoperative decision-making.

Methods

Patient cohorts

This multicenter study included retrospective development and validation of the GAS platform, followed by a prospective validation to assess its generalizability. The Generation module was trained using retrospectively collected data from the Internal GZCC. External validation involved six independent cohorts from JMCH, HZCH, DGPH, JMPH, ZSZL, and TCGA. A prospective cohort was enrolled at Pro-External GZCC. All included patients had intraoperative diagnoses and matched post-operative FFPE-confirmed diagnoses. Frozen and FFPE slides were derived from the same surgical specimens, though exact pairing was not required due to the unpaired GAN architecture. Exclusion criteria included patients with incomplete clinical data and slides with extensive tissue folds, large fractures, or severe out-of-focus regions. The study was approved by the ethics committee of Sun Yat-sen University Cancer Center (No. SL-B2023-416-02), with informed consent waived for retrospective data and obtained for the prospective cohort.

Data digitalization

For each case, all available tumor slides were scanned at ×40 magnification (0.25 μm/pixel) for further analysis. Slides from the Internal GZCC, External HZCH, External DGPH, and Pro-External GZCC cohorts were scanned with a PHILIPS Ultra-Fast Scanner (Philips Electronics N.V., Amsterdam, Netherlands) and saved in iSyntax format. Meanwhile, slides from the External JMCH and External JMPH datasets were scanned using the Aperio AT2 scanner (Leica Biosystems, Wetzlar, Germany) and stored in SVS format.

Data preprocessing

For each digitized slide, CLAM’s preprocessing²⁴ method was used to segment the tissue regions in the WSIs. Based on this segmentation mask, the tissue regions were partitioned into non-overlapping 512 × 512 patches at 20× magnification, with coordinates retained for the final reconstruction of the entire slide. Due to imperfections in the segmentation, partial background regions such as text or partial blank areas were often included. It was observed that background patches erroneously included exhibit distinctly different histogram distributions compared to tissue patches (Supplementary Fig. 8). To eliminate these background patches, all patches were converted to grayscale, and the mean and variance of their grayscale images were computed. Background patches were then filtered based on thresholds applied to these statistics. In addition, within the remaining patches, there were areas blurred due to lens defocus during slide scanning, which were identified and removed using the Laplacian method²⁵.

Generation module

Model overview: To synthesize FFPE-like images from frozen sections, we developed the Generation module of the GAS platform, comprising an encoder, a style neck, and a decoder. The encoder downsampled inputs to expand the receptive field and reduce computational burden. The style neck enabled domain translation, and the decoder reconstructed the output, aided by skip connections to retain fine details lost during downsampling. Training optimized a weighted sum of adversarial²⁶ and PatchNCE losses²⁷. The adversarial loss drove global realism, while PatchNCE preserved local structural fidelity by maximizing mutual information between input and output patches.

Style neck: The style neck was composed of stacked residual blocks²⁸. Crucially, the normalization layers within the residual blocks were Adaptive Instance Normalization (AdaIN) layers²⁹, which accomplished the style transformation of the input images. The AdaIN layer altered the style of the input image through adaptive affine transformations (Eq. (1)):

$${AdaIN}\left(x,y\right)=\sigma \left(y\right)\left(\frac{x-\mu \left(x\right)}{\sigma \left(x\right)}\right)+\mu \left(y\right)$$

(1)

where x represented the input image and y represents the target style image. μ(·) and σ(·) denoted the mean and standard deviation, respectively. The original AdaIN layer required multiple computations of the affine parameters (the mean and standard deviation) from the style input. We proposed that the affine parameters could be directly predicted from text descriptions of the FFPE style, without the need for additional input images of the FFPE style and repeated calculations. There, we employed the text encoder of QuiltNet³⁰ to text encoder encode the text representing the FFPE style, which was “formalin-fixed paraffin-embedded tissues”. The encoded text was then fed into a multilayer perceptron (MLP) to predict the affine parameters. Through this design, our generative model efficiently leverages pathology-specific knowledge embedded in the pretrained network to encode FFPE-style representations. These representations guide the image synthesis process, enabling the model to generate high-quality FFPE-like images from frozen section inputs.

Training configurations: Our model was optimized using the Adam³¹ optimizer with a learning rate set to 0.0002. It underwent training with a batch size of 1 over a maximum of 200 epochs. To ensure a fair comparison, all baseline models were trained under consistent settings. The generative model was trained with flip-equivariance augmentation, where the input image to the generator was horizontally flipped, and the output features were flipped back before computing the PatchNCE loss.

Differences from our method: Unlike prior approaches, our method is tailored for intraoperative scenarios, where rapid inference is essential. To this end, we use a lightweight encoder–decoder architecture with fewer layers and incorporate skip connections to mitigate feature loss during downsampling. Rather than relying on computationally intensive attention modules, we leverage domain knowledge embedded in a pathology-pretrained model to generate textual descriptions of FFPE characteristics. These are integrated into the generator via AdaIN, enabling fast and high-fidelity generation of FFPE-style images.

Reader study: To assess the fidelity of the generated images compared to real FFPE images, a reader study was conducted. Three pathologists reviewed 200 image patches (100 generated and 100 real FFPE) and classified them as either generated or real, and reviewed 200 image patches (100 generated and 100 frozen) and classified them as either generated or frozen.These image patches were randomly sampled from all test sets, and the sampling process was unbiased. Regarding tissue region selection, the preprocessing pipeline was consistent across datasets.

Assessment module

With the application of foundation models in computational pathology^15,32,33,34, the ability of these models to extract general features from pathological images has significantly improved. We developed the Assessment module using pathological image quality control models derived from the foundation model. Then, the adapter architecture was introduced to enhance the model’s performance on the task of pathological image quality assessment. During training, the parameters of the foundation model were kept frozen, only the adapter and the projector used for prediction were trained. The foundation model consisted of 24 ViT blocks³⁵. The adapter followed an MLP architecture. To retain the original knowledge encoded by the foundation model, a residual structure was introduced. Specifically, the output of the ViT block was fed into the adapter, and the adapter’s output was aggregated with the original features using a fixed parameter γ (Eq. (2)):

$${{\mathcal{F}}}_{l}^{* }=\gamma {A}_{l}{({F}_{l})}^{T}+(1-\gamma ){F}_{l}$$

(2)

Where A_l denoted the adapter at layer l, F_l represented the output of the ViT block at layer l, and ${{\mathcal{F}}}_{l}^{* }$ indicated the output feature, which served as the input to the subsequent ViT block, and γ was set to 0.8 in this experiment^36,37.

Compared to Low-Rank Adaptation (LoRA)³⁸, a widely used parameter-efficient fine-tuning method that modifies model weights through low-rank matrix updates, the adapter approach adjusts features via external modules^36,39,40. It combines original and adapted features using a scaling factor γ, allowing finer control over the degree of adaptation. As the foundation model, UNI²¹, was pretrained on over 100 million pathology images using the MoCoV3⁴¹ framework and already captures extensive domain knowledge, the adapter strategy preserves core representations while introducing task-specific features for quality control, thereby improving overall performance.

The quality control models were trained using a batch size of 16, with training lasting up to 50 epochs. Optimization employed the Adam optimizer with an L2 weight decay set at 5e−5, along with a learning rate of 1e−5. Segmentation and patching of WSIs were performed on Intel(R) Xeon(R) Gold 6240 Central Processing Units (CPUs), and the models were trained on 4 NVIDIA GeForce RTX 2080 Ti Graphics Processing Units (GPUs).

To verify whether the quality control model attended to relevant features, we applied Grad-CAM. A forward pass produced the final convolutional feature map and the pre-softmax prediction score. Gradients of the score with respect to the feature map were then computed via backpropagation. Channel-wise importance weights were calculated, aggregated through a weighted sum, and passed through ReLU to generate the final localization map.

Deep-learning classification of support module

To evaluate whether the synthetic FFPE-like images generated by GAS provide greater utility in downstream classification tasks compared to frozen images, we designed a series of deep-learning classification tasks. These included: margin assessment, sentinel lymph node metastasis prediction in breast cancer, classification of benign versus malignant thyroid lesions, lung adenocarcinoma versus lung squamous cell carcinoma, benign versus malignant breast lesions, and breast carcinoma in situ versus invasive breast cancer.

For the task of distinguishing lung adenocarcinoma from lung squamous cell carcinoma, cases from TCGA were used to develop the model. For the classification of benign versus malignant thyroid lesions, benign versus malignant breast lesions, and carcinoma in situ versus invasive breast cancer, FFPE data were used for model development. In contrast, for margin assessment and sentinel lymph node metastasis tasks, diagnostic models were developed using either original frozen images or GAS-generated images only, without incorporating any FFPE data. The classification results obtained using GAS-generated images were compared with those using frozen images as inputs. The prediction process using the generated images was as follows: tissue regions in the frozen images were segmented and divided into 512 × 512 patches. Subsequently, synthetic FFPE-like patches were generated from frozen patches, and their features were extracted. Finally, the patch-level features were aggregated into a slide-level feature for prediction. The feature aggregation follows the attention-based pooling function introduced in CLAM (Eq. (3)):

$${a}_{i,k}=\frac{{e}^{\{{W}_{a,i}(\tan \,{\rm{h}}\,{V}_{a}{{\bf{h}}}_{k})\odot simg({U}_{a}{{\bf{h}}}_{k})\}}}{{\sum }_{j=1}^{K}{e}^{\{{W}_{a,i}(\tan \,{\rm{h}}\,{V}_{a}{{\bf{h}}}_{j})\odot simg({U}_{a}{{\bf{h}}}_{j})\}}}$$

(3)

$${{\boldsymbol{f}}}_{{slide},i}=\mathop{\sum }\limits_{k=1}^{K}{a}_{i,k}{{\boldsymbol{h}}}_{k}$$

Here, f_slide,i denotes the aggregated feature representation for class i, a_i,k is the attention score for patch k, and h_k is the patch-level feature. V_a and U_a are learnable fully connected layers, and W_a,i epresents one of N parallel attention branches.

GAS software of support module

A prospective assessment using a sub-cohort from the Pro-External GZCC cohort was conducted to validate the clinical utility of GAS in intraoperative diagnostic workflows. We developed a human–AI collaboration software to facilitate the process. Once the frozen sections were prepared, they were scanned to create digital slides. After uploading these slides to the GAS software, pathologists can browse the frozen sections online. If any areas appear unclear, the pathologists can click on the corresponding regions, and the software will convert these unclear areas into high-resolution FFPE-like images.

Quantification and statistical analysis

FID was used to measure image similarity for the generative models. Additionally, the quality control model evaluated the generative models using scoring as another metric. For the quality control model, accuracy and AUROC served as the primary evaluation metrics. The external test sets from different centers, allowed for a more comprehensive assessment of the generalization properties of GAS. A P value less than 0.05 was considered statistically significant. Statistical significance was evaluated using t test and Wilcoxon signed-rank test. Data preprocessing and model development were conducted using Python (version 3.8.0) and the deep-learning platform PyTorch (version 2.3.0).

Data availability

The digital whole-slide images (WSIs) from The Cancer Genome Atlas (TCGA) are publicly available at https://portal.gdc.cancer.gov. However, WSIs generated from other included cohorts are not publicly accessible due to data protection regulations and institutional guidelines. Statistical results derived from these WSIs can be provided upon reasonable request to the lead contact, following the completion of a data transfer agreement.

Code availability

The source code for model training is publicly available on GitHub (https://github.com/LexieK7/GAS). The GAS platform is also accessible as an open-source software tool at the same link, accompanied by a detailed tutorial designed for pathologists, developers, and other users interested in utilizing it for intraoperative diagnosis. The pretrained weights and training dataset of QuiltNet are available at https://github.com/wisdomikezogwo/quilt1m, and the pretrained weights of UNI are available at https://github.com/mahmoodlab/UNI. Further information and requests for resources and code should be directed to and will be fulfilled by the corresponding author.

References

Vergara, R. et al. Surgical pathology and sustainable development: international landscape and prospects. J. Clin. Pathol. 78, 233–239 (2025).
CAS PubMed Google Scholar
Ackerman, L. V. & Ramirez, G. A. The indications for and limitations of frozen section diagnosis; a review of 1269 consecutive frozen section diagnoses. Br. J. Surg. 46, 336–350 (1959).
CAS PubMed Google Scholar
Jaafar, H. Intra-operative frozen section consultation: concepts, applications and limitations. Malays. J. Med. Sci. 13, 4–12 (2006).
PubMed PubMed Central Google Scholar
Renne, S. L., Redaelli, S. & Paolini, B. Cryoembedder, automatic processor/stainer, liquid nitrogen freezing, and manual staining for frozen section examination: a comparative study. Acta Histochem. 121, 761–764 (2019).
CAS PubMed Google Scholar
Chen, Y., Anderson, K. R., Xu, J., Goldsmith, J. D. & Heher, Y. K. Frozen-section checklist implementation improves quality and patient safety. Am. J. Clin. Pathol. 151, 607–612 (2019).
PubMed Google Scholar
Nakhleh, R., Coffin, C. & Cooper, K. Recommendations for quality assurance and improvement in surgical and autopsy pathology. Hum. Pathol. 37, 985–988 (2006).
PubMed Google Scholar
Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).
PubMed PubMed Central Google Scholar
Madabhushi, A. & Lee, G. Image analysis and machine learning in digital pathology: challenges and opportunities. Med. Image Anal. 33, 170–175 (2016).
PubMed PubMed Central Google Scholar
Iqbal, T. & Ali, H. Generative adversarial network for medical images (MI-GAN). J. Med. Syst. 42, 231 (2018).
PubMed Google Scholar
Bentaieb, A. & Hamarneh, G. Adversarial stain transfer for histopathology image analysis. IEEE Trans. Med. Imaging 37, 792–802 (2018).
PubMed Google Scholar
Li, W. et al. High resolution histopathology image generation and segmentation through adversarial training. Med. Image Anal. 75, 102251 (2022).
PubMed Google Scholar
Ahmad, W., Ali, H., Shah, Z. & Azmat, S. A new generative adversarial network for medical images super resolution. Sci. Rep. 12, 9533 (2022).
CAS PubMed PubMed Central Google Scholar
Ozyoruk, K. B. et al. A deep-learning model for transforming the style of tissue images from cryosectioned to formalin-fixed and paraffin-embedded. Nat. Biomed. Eng. 6, 1407–1419 (2022).
CAS PubMed Google Scholar
Siller, M. et al. On the acceptance of “fake” histopathology: a study on frozen sections optimized with deep learning. J. Pathol. Inf. 13, 6 (2022).
Google Scholar
Oluchi Ikezogwo, W. et al. Quilt-1M: one million image-text pairs for histopathology. Adv. Neural Inf. Process. Syst. 36, 37995–38017 (2023).
Google Scholar
Griffin, J. & Treanor, D. Digital pathology in clinical use: where are we now and what is holding us back?. Histopathology 70, 134–145 (2017).
PubMed Google Scholar
Haghighat, M. et al. Automated quality assessment of large digitised histology cohorts by artificial intelligence. Sci. Rep. 12, 5002 (2022).
CAS PubMed PubMed Central Google Scholar
Nakhleh, R. E. Quality in surgical pathology communication and reporting. Arch. Pathol. Lab. Med. 135, 1394–1397 (2011).
PubMed Google Scholar
Schömig-Markiefka, B. et al. Quality control stress test for deep learning-based diagnostic model in digital pathology. Mod. Pathol. 34, 2098–2108 (2021).
PubMed PubMed Central Google Scholar
Chen, Y. et al. Assessment of a computerized quantitative quality control tool for whole slide images of kidney biopsies. J. Pathol. 253, 268–278 (2021).
CAS PubMed PubMed Central Google Scholar
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
CAS PubMed PubMed Central Google Scholar
Wu, Y. et al. A pragmatic note on evaluating generative models with Fréchet inception distance for retinal image synthesis. Preprint at https://arxiv.org/abs/2502.17160 (2025).
Rivenson, Y. et al. Virtual histological staining of unlabelled tissue-autofluorescence images via deep learning. Nat. Biomed. Eng. 3, 466–477 (2019).
CAS PubMed Google Scholar
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
PubMed PubMed Central Google Scholar
Pech-Pacheco, J.L., Cristóbal, G., Chamorro-Martinez, J. & Fernández-Valdivia, J. Diatom autofocusing in brightfield microscopy: a comparative study. In Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, Vol. 3, 314–317 (IEEE, 2000).
Goodfellow, I. J. et al. Generative adversarial nets. In Neural Information Processing Systems 27 (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. Q.) 2672–2680 (Curran Associates, Inc., 2014).
Park, T., Efros, et al. In Computer Vision – ECCV 2020. (eds Andrea Vedaldi, Horst Bischof, Thomas Brox, & Jan-Michael Frahm) 319-345 (Springer International Publishing).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016).
Huang, X. & Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In 2017 IEEE International Conference on Computer Vision (ICCV) 1510–1519 (IEEE, 2017).
Ikezogwo, W. O. et al. Quilt-1M: one million image-text pairs for histopathology. Adv. Neural Inf. Process Syst. 36, 37995–38017 (2023).
PubMed PubMed Central Google Scholar
Kingma, D. P. & Ba, J. J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Kang, M., Song, H., Park, S., Yoo, D. & Pereira, S. Benchmarking self-supervised learning on diverse pathology datasets. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 3344–3354 (IEEE, 2023).
Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
CAS PubMed PubMed Central Google Scholar
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).
CAS PubMed PubMed Central Google Scholar
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint at https://arxiv.org/abs/2010.11929 (2020).
Zhang, J. O. et al. In Computer Vision – ECCV 2020. (eds Andrea Vedaldi, Horst Bischof, Thomas Brox, & Jan-Michael Frahm) 698-714 (Springer International Publishing).
Sung, Y.-L., Cho, J. & Bansal, M. LST: ladder side-tuning for parameter and memory efficient transfer learning. Preprint at https://arxiv.org/abs/2206.06522 (2022).
Hu, J. E. et al. LoRA: Low-Rank Adaptation of Large Language Models. In Proceedings of the International Conference on Learning Representations (ICLR, 2021).
Han, Z., Gao, C., Liu, J., Zhang, J. & Zhang, S. Q. J. A. Parameter-efficient fine-tuning for large models: a comprehensive survey. Preprint at https://arxiv.org/abs/2403.14608 (2024).
Sung, Y.-L., Cho, J. & Bansal, M. J. A. LST: ladder side-tuning for parameter and memory efficient transfer learning. Adv. Neural Inf. Process. Syst. 35, 12991–13005 (2022).
Chen, X. et al. An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision 9620–9629 (IEEE, 2021).

Download references

Acknowledgements

This work was supported by grants from the National Natural Science Foundation of China (Nos. 82172646 and 82403737), Guangdong Basic and Applied Basic Research Foundation (Nos. 2024A1515012448 and 2024A1515012828), China Postdoctoral Science Foundation (No. 2024M753786), Science and Technology Program of Guangzhou (No. SL2024A04J00711), Chih Kuang Scholarship for Outstanding Young Physician-Scientists of Sun Yat-sen University Cancer Center (No. CKS-SYSUCC-2023005), and Young Talents Program of Sun Yat-sen University Cancer Center (No. YTP-SYSUCC-0025).

Author information

Author notes

These authors contributed equally: Xueyi Zheng, Ke Zheng, Yongqin Wen, Jiajia Meng.

Authors and Affiliations

Department of Pathology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
Xueyi Zheng, Ke Zheng, Xinke Zhang, Xiaobo Wen, Zihan Zhao, Chengyou Zheng, Jiewei Chen, Jinling Duan, Dan Xie & Muyan Cai
Department of Pathology, The Tenth Affiliated Hospital, Southern Medical University, Dongguan People’s Hospital, Dongguan, China
Yongqin Wen & Wei Yuan
Department of Pathology, Huizhou Central People’s Hospital, Huizhou, Guangdong, China
Jiajia Meng
State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
Xiaoxia Cai, Jiliang Lin & Dan Xie
Department of Pathology, Affiliated Cancer Hospital & Institute of Guangzhou Medical University, Guangzhou, China
Liwen Jiang & Jiangyu Zhang
Department of Pathology, Shenzhen People’s Hospital, Shenzhen, China
Xiaomei Li
Department of Pathology, Jiangmen Central Hospital, Jiangmen, China
Yubo Cai

Authors

Xueyi Zheng
Ke Zheng
Yongqin Wen
Jiajia Meng
Xinke Zhang
Xiaobo Wen
Zihan Zhao
Chengyou Zheng
Xiaoxia Cai
Jiliang Lin
Jiewei Chen
Jinling Duan
Liwen Jiang
Wei Yuan
Xiaomei Li
Dan Xie
Yubo Cai
Jiangyu Zhang
Muyan Cai

Contributions

X.Y.Z: conceptualization, data curation, writing–original draft and writing–review and editing. K.Z.: software, formal analysis, validation, visualization, methodology and writing–original draft. Y.Q.W.: resources and data curation. J.J.M.: resources and data curation. X.K.Z.: formal analysis. X.B.W.: data curation. Z.H.Z.: data curation. C.Y.Z.: data curation. X.X.C.: formal analysis. J.L.L.: formal analysis. J.W.C.: data curation. J.L.D.: data curation. L.W.J.: data curation. W.Y.: data curation. X.M.L.: data curation. D.X.: conceptualization, supervision, methodology, project administration and writing–review and editing. Y.B.C.: conceptualization, data curation, project administration and writing–review and editing. J.Y.Z.: conceptualization, data curation, project administration and writing–review and editing. M.Y.C.: conceptualization, data curation, software, formal analysis, validation, visualization, methodology, writing–original draft, writing–review and editing. All authors have read and approved the manuscript.

Corresponding authors

Correspondence to Dan Xie, Yubo Cai, Jiangyu Zhang or Muyan Cai.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zheng, X., Zheng, K., Wen, Y. et al. An end-to-end multifunctional AI platform for intraoperative diagnosis. npj Digit. Med. 8, 460 (2025). https://doi.org/10.1038/s41746-025-01808-7

Download citation

Received: 01 March 2025
Accepted: 17 June 2025
Published: 20 July 2025
DOI: https://doi.org/10.1038/s41746-025-01808-7

关于《An end-to-end multifunctional AI platform for intraoperative diagnosis》的评论

暂无评论

发表评论

摘要

The paper "An end-to-end multifunctional AI platform for intraoperative diagnosis" by Xueyi Zheng and colleagues presents an innovative approach to leveraging artificial intelligence (AI) in the context of intraoperative pathology. The primary focus of this work is to develop a comprehensive AI solution that integrates various functionalities essential for real-time, accurate diagnoses during surgical procedures. ### Key Components of the Study 1. **Conceptualization:** - **Authors:** Xueyi Zheng and Muyan Cai - **Objective:** To create an end-to-end AI platform capable of performing multiple diagnostic tasks relevant to intraoperative pathology. 2. **Software Development & Implementation:** - **Contributors:** Ke Zheng (software development), Xiaobo Wen, Zihan Zhao, Chengyou Zheng (data curation). - **Methodology:** The team developed a multifunctional AI platform that can process and analyze real-time pathological images captured during surgical procedures. 3. **Data Collection & Analysis:** - **Contributors:** Yongqin Wen, Jiajia Meng, Xiaoxia Cai, Jiliang Lin (data curation), Liwen Jiang, Wei Yuan, Xiaomei Li, Yubo Cai (resources). - **Process:** They collected and curated a dataset of intraoperative images to train the AI models. The data included various types of pathological samples relevant for intraoperative analysis. 4. **Formal Analysis & Validation:** - **Contributors:** Ke Zheng, Xiaoxia Cai, Jiliang Lin (formal analysis), Xinke Zhang (software development). - **Process:** Rigorous testing and validation were conducted to ensure the accuracy and reliability of the AI platform. 5. **Visualization & Methodology:** - **Contributor:** Muyan Cai - **Process:** The team employed advanced visualization techniques to interpret results and methodologies used in developing the AI system. 6. **Supervision, Project Administration, Writing Review & Editing:** - **Contributors:** Dan Xie (supervision), Yubo Cai (project administration and writing review). - **Process:** Ongoing oversight and management of project development ensured alignment with goals. ### Contributions by Authors - **Xueyi Zheng & Muyan Cai:** Primary contributors to the conceptualization, methodology, validation, visualization, software development, formal analysis, and writing. - **Ke Zheng:** Focus on software development, formal analysis, visualization, and writing original draft. - **Yongqin Wen & Jiajia Meng:** Contributed in resources and data curation. ### Conclusion The study successfully develops an AI platform that integrates various functionalities for intraoperative pathology diagnosis. This multifunctional system has the potential to enhance real-time decision-making during surgical procedures, thereby improving patient outcomes by providing accurate, immediate feedback. This work is significant as it bridges the gap between advanced AI technologies and clinical practice in surgery, offering a promising pathway towards more efficient and effective intraoperative diagnoses. ### Future Work Future research could focus on expanding the dataset to include more diverse pathological cases, enhancing the robustness of the AI models through continual learning mechanisms, and integrating this system with existing surgical workflows for seamless real-time analysis during operations.