Your new analysis is a mixed-method evaluation of AI in chest diagnostics. What prompted your focus on this topic, and how did you conduct the evaluation?
Angus: The UK government and NHS leaders regard AI as a vital solution to many pressures and challenges facing the NHS. While there are numerous expected benefits in terms of operational efficiency and effectiveness, the evidence surrounding the selection, implementation, and outcomes of AI remains quite limited. Moreover, the experiences of key stakeholders, such as NHS staff and patients, have not been thoroughly explored.
RSET was tasked with evaluating the AI Diagnostic Fund, a significant NHS England initiative aimed at integrating AI tools into chest diagnostics across approximately half of the NHS trusts in England. This presented a valuable opportunity for us to bridge existing knowledge gaps. We approached the project in two phases. The initial phase examined the evidence surrounding AI in radiology diagnostics, combined with a mixed-methods empirical study focused on the procurement and preparation for deployment of AI tools.
In the second phase, we conducted a fully empirical study using a mixed-methods design. This included qualitative methods to explore implementation, usage, and stakeholder experiences, alongside a quantitative analysis assessing the impact on service delivery from multiple perspectives, leveraging both national and local datasets, as well as health economic analyses based on local data from participating sites.
It’s crucial to clarify that no clinical decisions were made without human oversight. While AI played a role in prioritizing cases, actual autonomous decision-making by AI was not implemented.
Chris: Our evaluation involved a real-world deployment. Prior research, evident in our systematic review, revealed that few real-world deployments had been evaluated prior to this; much of the existing research occurs within controlled environments. Understanding the effectiveness of AI in practical applications is of utmost importance.
Kevin: This gives us a genuine insight into what may transpire during such a rollout.
What were the primary findings from your work?
Chris: The impact of AI varied depending on its intended use across different hospitals. Each of the facilities we examined utilized AI to assist in clinical decision-making. After an x-ray image was taken, the AI would analyze it and rank it based on its likelihood of indicating abnormal findings. Images flagged as potentially abnormal were given priority, allowing human readers to review them first.
Where we could quantify it, approximately 90% of images with a high suspicion of cancer were prioritized by AI, leading to faster turnaround times for these urgent cases than for less urgent images—often within 24 hours.
However, results were inconsistent among the various trusts studied. Some reported an increase in follow-up cases, while others saw a decrease. The reasons for these disparities were unclear but might have been linked to existing capacity backlogs, changes implemented to address AI-related risks, or differences in resources available among the trusts.
Angus: We discovered that the effort required to deploy AI tools was greater than anticipated. Additionally, there was considerable variation in how AI was implemented. Factors included the users of the AI tools—some sites had solely radiology or radiography teams engaged, while others involved a broader spectrum of clinicians. Given the complexities of implementing AI, it is essential for service leaders to focus on the problems they aim to solve rather than solely on the solutions.
Another significant finding was the overall positive perception of AI among both staff and patients, who recognized potential benefits in efficiency. Staff particularly appreciated the utility of the tools for prioritization and clinical decision-making, viewing them as supportive aids during their evaluations.
In the event of fully autonomous AI adoption, both staff and patients expressed the need for robust monitoring and governance to ensure AI tools did not overlook critical cases and that protocols were in place to manage any potential errors.
Does the use of AI in this manner lead to cost savings?
Kevin: Our findings indicate that AI is cost-effective, leading to lower overall costs while enhancing health outcomes compared to the preceding period. This represents an almost ideal scenario. However, it’s important to understand that while our analyses were consistent, the magnitude of health outcome improvements, measured in quality-adjusted life years (QALYs), was marginal.
The implementation costs for a radiology department within any given trust were relatively modest; however, this finding is set against a backdrop of trusts possibly experiencing financial pressures or staffing shortages. Additionally, we encountered significant gaps in reported implementation costs due to the retrospective nature of the data collection.
Local staff often found themselves seconded into project management roles—such as data management, IT, or even clinical positions—taking on these responsibilities alongside their regular duties, and in some cases, leading to full removal from their standard tasks. In pressured departments, this scenario can pose challenges, and utilizing non-specialists in such a complex implementation can complicate processes and lead to delays in AI activation.
What factors may influence the effectiveness of this technology within a trust?
Chris: Some trusts lack proper monitoring capabilities, which can hinder effectiveness. We also observed better outcomes in trusts where there weren’t significant issues with CT scan capacity; if major capacity problems exist, the effectiveness of AI could be compromised. Conversely, if operations are already efficient and images are processed quickly, the benefits may be less pronounced.
Additionally, there is a preventive aspect to consider. While a trust may currently think it has no need for AI, incoming pressures might exacerbate conditions in the future. In such cases, AI could play a crucial role in mitigating potential challenges before they escalate.
Resource allocation is another critical factor. Questions surrounding who reads the images and how operations are structured within the trust yielded varied answers across different trusts. For instance, one trust outsourced a significant portion of their reading and reporting, while others managed these tasks internally. Supplier stability also matters; if a vendor encounters financial difficulties, as one did, it can force a trust to cease usage of the tool.
AI captivates many. Based on your evaluation, how prepared is it for widespread rollout, and is it truly effective?
Angus: Readiness encompasses not just the AI tools themselves but also the NHS services into which they will be integrated. Challenges persist in areas such as procurement, selection, AI literacy, infrastructure differences, and governance—all of which complicate the large-scale implementation of AI. Nevertheless, there is a palpable willingness among stakeholders—including suppliers, NHS staff, and leadership—to embrace this challenge. It represents an opportunity to reinforce the infrastructure, capabilities, and capacity of NHS services for effective management moving forward. Yet, significant work remains.
Chris: In terms of effectiveness, there are indications that processes may improve, but the long-term impacts have yet to be fully examined. These include potential effects on organizational culture, training, and the skill development of personnel.
Ensuring the availability of accurate data is crucial for monitoring the effectiveness and safety of these tools—a current challenge that remains hit or miss.
What finding surprised you the most?
Angus: I was pleasantly surprised by how positively staff responded to the introduction of AI tools. There is often concern that innovations may intrude upon their roles; however, in practice, staff appreciated the AI tools and found value in them as complementary to their everyday practices.
Chris: I was struck by the inadequate capacity many trusts have to monitor their impact, primarily due to a lack of accessible data. Surprisingly, this was often not a consideration when deciding funding for specific sites.
Kevin: It’s important to acknowledge that humans are not infallible, and while AI is effective, it too has limitations. With the possibility of autonomous reporting, the key question will be: how proficient must AI be for broad acceptance on regulatory, public, and perceptual fronts? Our findings raise intriguing conceptual questions about the essence of AI, its applications, and the standards of performance necessary.
What are the next steps for this project?
Angus: We are in the process of drafting several papers to compile our analyses and share our findings more broadly, with the goal of creating both national and international impacts. We are also looking forward to presenting our results at various national and international conferences in the coming months. Additionally, we are actively participating in national initiatives aimed at shaping future policies and regulations related to the use of AI in healthcare.
This project provided many of us with our first experience evaluating AI. Like many innovations, it evolves rapidly, but this experience significantly enhances our understanding of how to evaluate a fast-paced advancement of this nature. It will serve as a solid foundation for future endeavors in diverse settings.