Abstract
Background Cardiac MRI reports contain rich structured information that can be valuable for research. However, such information is present only in documents (e.g., PDFs, Word) requiring manual extraction. Manual extraction of important clinical information from these reports is time and resource consuming, in a health service under intense pressure. We are developing an AI tool that accelerate automation of processing large volumes of reports and provides structured data. This data then can be easily accessible for research and other uses. The automated AI approach brings significant savings compared to the manual effort that would have required otherwise to input the data into a standard structured format, specifically for cardiac MRI reports.
Methodology Our AI tool makes use of natural language processing techniques (NLP), a subfield within artificial intelligence, which enables machines to understand human language. Our NLP-based pipeline, developed initially as a proof-of-concept, consists of: (1) automatically processing large volume of documents extracting various information (e.g., file details, sections, tables), (2) automatically extracting key information (e.g., patient details, checks, measurements). The above two sub-pipelines are modularised for usage beyond the scope of this project for processing other types of reports or documents. The pipeline incorporates several methods that range from conducting a simple logic step (e.g., rule-based) to complex NLP-based approaches (e.g., using large language models) within a fully automated and optimised flow.
This tool will be deployed in the new GOSH DRE development environment for processing approximately 10,000 cardiac MRI reports.
This project has been completed by Great Ormond Street Hospital NHS Foundation Trust and Roche Products Ltd as part of a collaborative working agreement. Roche Products Ltd had no influence on the results or decision to publish regarding this work.