The rapid advancement of artificial intelligence (AI) technologies has significantly impacted various sectors, including healthcare. Even generative AI – which continues being associated with a level of scepticism in the healthcare space – seems to have outdone human physicians when assessing medical case histories[1]. Evaluating such technologies is particularly fascinating because it includes a further degree of complexities and challenges such as data bias, hallucinations, data linkage, data infrastructure, capability, user trust, regulatory, legal and ethical considerations.
Whilst it might be Friday 13th, today is your lucky day! Drawing from our publication with NHSE around planning and implementing real-world AI evaluations across 13 technologies, here are some practical tips to addressing some key challenges when evaluating AI technologies in line with principles of transparency, safety, responsibility and sustainability:
Data bias is a critical challenge in evaluating AI technologies. Bias can arise from the data used to train models that is not representative of the target population, the algorithms themselves (design of the AI), and the interpretation of results. This can lead to inaccurate and discriminatory results. In health economics, data bias can lead to skewed outcomes that do not accurately reflect diverse patient populations. For example, if an AI model is trained on data from a specific ethnic group, it may not perform as well for other groups.
To address this, evaluators need to assess the quality of the training data and identify potential biases especially if these can lead to further widening health inequalities. Furthermore, they should also consider how the AI technology will be used in practice and whether it’s appropriate for the intended population with principles of representation, proportionality and generalisability to be considered. At the end of the day, this links to a key principle behind a safe AI technology related to its algorithmic explainability and transparency.
Data linkage involves combining data from different sources to create a comprehensive dataset for analysis. While this enhances data richness, it also introduces challenges related to data privacy, security, and interoperability. Linking datasets from different sources is crucial for evaluating AI’s performance across the patient pathway. However, this can be challenging due to variations in data collection, formats, and governance.
Strong engagement with the sites where the AI technology is implemented is critical in order to develop across the technology, implementation site and evaluator clear dataflows and protocols for links and anonymising data. This engagement should be ongoing to ensure the right data quality and consistency is guaranteed.
Hallucination refers to AI generating outputs that seem plausible but are not based on real evidence, leading to incorrect or misleading results. Understandably, from a safety point of view in healthcare, this is a major challenge as it could lead to life-altering consequences. This underscores the need for rigorous validation and verification processes to detect and address hallucinations in AI models. Early engagement with clinicians to validate the AI’s outputs and identify potential hallucinations through parallel run studies is critical. Furthermore, mechanisms should be developed to monitor the AI’s performance in real-world settings e.g. sources of retraining data post-deployment.
Navigating the regulatory landscape is another significant challenge in the evaluation of AI technologies, with ongoing changes. One AI technology was initially classified as a lower-risk medical device but was later upgraded to a higher-risk class, requiring additional regulatory approvals. The publication highlights the importance of adhering to regulatory frameworks to ensure the safety, efficacy, and ethical use of AI in healthcare, and doing so from the onset around the expected intended use of the final product. Key regulatory bodies, such as the Medicines and Healthcare products Regulatory Agency (MHRA) and the National Institute for Health and Care Excellence (NICE), provide guidelines and standards for the development and deployment of AI technologies. Compliance with these regulations is essential for gaining approval and ensuring the responsible use of AI in healthcare.
Building trust in AI technologies is crucial for their successful implementation in healthcare. Media have to-date painted a fairly negative picture of AI technologies, sensationalising the consequences and effects, and as such breaking the trust from the wider public. Trust can be fostered through transparency, reliability, and consistent performance of AI systems. The report emphasises the importance of engaging with stakeholders, including patients and the public, through Patient and Public Involvement and Engagement (PPIE) initiatives early on at design stages of the AI model. PPIE helps ensure that AI technologies are developed and evaluated with the needs and concerns of end-users in mind, thereby enhancing trust and acceptance.
Implementing AI technologies requires robust IT infrastructure and a workforce capable of using and managing them. However, many healthcare organisations lack the necessary resources and expertise. The fit of the AI technology to the relevant site and associated infrastructure is key. For example, considering whether there is enough data across sites for the AI model to learn from or whether the training dataset is representative of the specific population.
Furthermore, in addition to an initial site data maturity and infrastructure requirement, AI technologies often require adjustments during implementation to fit the local context. One evaluation team found it took around three years for an AI technology to be fully optimised within the environment. Different sites may use AI technologies in different ways, making it challenging to generalise evaluation findings. One team had to identify multiple comparators due to variations in care pathways and settings.
A lot of these real-world findings reflect similar messages to this paper [2] emphasising the need for balanced investment in AI development and the supporting infrastructure necessary for its successful implementation. Which brings me to my favourite subject of health economics and the importance of not only building cost-effectiveness models to assess value for money against the effectiveness of the solution but wider cost-benefit analysis and spread and forecast models. These help with capturing a wider return on investment indicator of implementing such a technology taking into account microeconomic parameters within the site’s context and wider sites for scale. Thus, helping make informed decisions about their adoption and implementation in healthcare settings.
Evaluating AI technologies in healthcare is an ongoing process of learning and adaptation. By understanding the specific challenges and adopting appropriate mitigation strategies, we can ensure that AI is implemented efficiently, safely and effectively to improve patient care.
[1] ChatGPT Defeated Doctors at Diagnosing Illness – The New York Times
[2] A Justifiable Investment in AI for Healthcare: Aligning Ambition with Reality | Minds and Machines