RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance

Computer Aided Medical Procedures,
Technical University of Munich

Abstract

Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems.

Methodology

RaDialog integrates both image features and structured pathology labels for clinically correct report generation, along with the integration of image information with a large language model to enable conversational downstream tasks. Additionally, the LLM is adapted using parameter-efficient fine-tuning for teaching image understanding and radiological knowledge. RaDialog has achieved state-of-the-art results in the clinical correctness of report generation.

RaDialog Instruct Dataset

Regarding the RaDialog Instruct Dataset, it includes a proposal for a diverse instruct dataset that maintains the general capacities of LLMs while imparting radiology-specific knowledge and style. This dataset encompasses a variety of tasks ranging from report generation to report correction and question answering. It is uniquely constructed using a combination of existing datasets and LLM-generated pseudo-ground truth answers. Importantly, training with this instruct dataset has significantly enhanced RaDialog's performance in interactive downstream tasks.

BibTeX

@inproceedings{pellegrini2025radialog, title={RaDialog: Large Vision-Language Models for X-Ray Reporting and Dialog-Driven Assistance}, author={Pellegrini, Chantal and {\"O}zsoy, Ege and Busam, Benjamin and Wiestler, Benedikt and Navab, Nassir and Keicher, Matthias}, booktitle={Medical Imaging with Deep Learning}, year={2025} }