RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance

Computer Aided Medical Procedures,
Technical University of Munich

*Indicates equal contribution.

Abstract

Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems.

Methodology

RaDialog integrates both image features and structured pathology labels for clinically correct report generation, along with the integration of image information with a large language model to enable conversational downstream tasks. Additionally, the LLM is adapted using parameter-efficient fine-tuning for teaching image understanding and radiological knowledge. RaDialog has achieved state-of-the-art results in the clinical correctness of report generation.

RaDialog Instruct Dataset

Regarding the RaDialog Instruct Dataset, it includes a proposal for a diverse instruct dataset that maintains the general capacities of LLMs while imparting radiology-specific knowledge and style. This dataset encompasses a variety of tasks ranging from report generation to report correction and question answering. It is uniquely constructed using a combination of existing datasets and LLM-generated pseudo-ground truth answers. Importantly, training with this instruct dataset has significantly enhanced RaDialog's performance in interactive downstream tasks.

BibTeX

@misc{pellegrini2023radialog,
      title={RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance}, 
      author={Chantal Pellegrini and Ege Özsoy and Benjamin Busam and Nassir Navab and Matthias Keicher},
      year={2023},
      eprint={2311.18681},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}