AI in Disease Detection

DOI: https://www.doi.org/10.53289/QKKC2474

Providing benefits for the whole population

Tobias Rijken

Tobias Rijken is the co-founder and CTO of Kheiron Medical, a leading developer of AI cancer diagnostics. Kheiron’s breast screening solution is already helping doctors improve early breast cancer detection. Tobias has an MSc in Computational Statistics and Machine Learning from UCL with a Deep Learning focus. Before founding Kheiron, he was a Machine Learning scientist at BenevolentAI applying Deep Learning in the pharmaceutical industry.

SUMMARY

  • Breast cancer screening is a very well-defined programme
  • AI can help meet a shortage of skilled clinicians
  • Getting AI into clinical practice is not at all easy
  • Generalisability is key to maximising the value of AI models
  • Being able to monitor and adjust for change is critical to success.

Cancer detection and treatment raises a number of issues concerning information problems. Can we detect the cancer in the first place? How do we diagnose it and at what stage? What is the best treatment plan – and how should it be monitored to verify its efficacy? Then, when a person regains their health, the whole process must start again because the cancer can come back.

At Kheiron, we began at the start of the process, which is detection: you need to detect a cancer before you can do anything else. We focussed on breast cancer screening because it is one of the best-defined screening programmes we have today. Depending on the country, women between the ages of 50 and 75 are screened using mammography every two to three years.

Adding resource

To introduce AI into clinical workflow, the aim should be to keep everything else as much as possible the same but to help where there is a clear need. The problem for breast-cancer screening is the huge workforce crisis. The Royal College of Radiologists published a report recently in which they noted there is already a 29% shortfall in clinical radiology with 50% of vacancies remaining open for more than 12 months. The NHS is currently spending £223 million a year on overtime and outsourcing costs to address the shortage of radiologists. That is not sustainable.

This is where AI can help. The breast cancer task for radiologists is well defined: should we call back this woman back for further examination – yes or no? There is no diagnosis here: this is detection. Only 1% of women in the screened population have breast cancer, the other 99% are healthy. Maybe 10% of those are difficult to decide and those are the ones that matter.

How can we bring AI into the workflow in the simplest and most effective way? The Kheiron AI, called Mia, performs the same task as the radiologist, so it can be fitted into the process in a very flexible way (see Figure 1). On the top left is the current standard which is double reading. Every mammogram is read by two radiologists: when they agree with each other that is fine, when they disagree an arbitrator radiologist is called upon. This is incredibly wasteful in terms of resources because these two radiologists agree on the vast majority of cases. We want to reach a situation where Mia becomes one of those readers because that takes care of approximately 50% of all the reads in one go.

As another possibility, we have also identified that an additional arbitration step (see top right of Figure 1) can increase the cancer detection rate: we are seeing clinical evidence for this in Hungary. That does not reduce the current standard of care, there are still two human radiologists and also an arbitrator. This would just add Mia as an additional step to help identify some cancers earlier.

Clinical practice

Getting AI into clinical practice is not at all easy. Mia is already being used in Aberdeen as a service evaluation. Our Libra study is one of the first UK prospective studies and Mia is already being deployed in 15 screening units. By the end of 2023, we should be screening half a million women per year in the UK.

Generalisability is a key issue, making sure that AI works for every woman everywhere in an unbiased way. AI models are built on existing datasets but the aim is that it should work well on future data as well. If the AI is trained on biased data or a small dataset that is not representative of the general population, then it will not generalise.

Figure 1. Options for incorporating AI into the breast-screening process

To address this, Kheiron decided early on to build one of the largest – and most diverse – datasets in this field. Our data comes from the UK, Europe, Asia, South America and North America, incorporating different genetic makeups from different hardware devices with different post-processing software, even different screening programmes. We demonstrated our results on a very large retrospective study of 275,000 cases.

The purpose is to build AI that works for every woman everywhere. One area of focus for us is breast density which is one of the risk factors. African American women have a slightly higher risk of being diagnosed with breast cancer and we are working with Emory University which has a large African American population. We are now testing how well the AI generalises and the results so far are looking good.

AI development is, however, just the beginning of the story. What happens next? AI and data are not stationary; indeed, data changes all the time. Those changes may affect how the AI performs. That is not necessarily a problem provided you monitor it and take appropriate action.

Early in the Covid pandemic, the NHS decided to stop screening for breast cancer. A couple of months later, when screening programmes reopened, we noticed that our AI started behaving differently, calling back more women – and it was not just us, the radiologists were doing the same. Now, that was completely natural because the cancers had more time to grow and the distribution had changed. The cancers were now bigger and we could detect that with our monitoring.

An AI system is more than just an AI model. It consists of the model itself, hardware to make the inference, but also modules that help determine whether this data is representative and if the outputs are along the lines expected. Other modules can help detect drift or bias – or the quality of the image.

As an example, one of our monitoring tools noticed that one hospital suddenly reported a completely different number of call-backs for women. It turned out the hardware vendor for this hospital had upgraded their post-processing software, but they had not told the radiologist or the hospital. We told the radiologist and were able to recalibrate the model quite quickly. Being able to detect changes to the data or the model in real time is absolutely critical to the safe deployment of AI.

Bringing AI into clinical practice is hard. There is no clear path to adoption at the moment. This is not just a technical problem to be solved, there are questions about who pays, who makes the decision about software in population-based screening programmes, etc – a series of challenges still to be addressed and resolved.

A recent report found there is a 29% shortfall in clinical radiology with 50% of vacancies remaining open for more than 12 months.