Dealing with technical artefacts in building AI models in pathology

October 24, 2024

When building AI models in histopathology, particularly for working with whole slide images (WSIs), things can get messy — literally. Pathologists are well acquainted with artefacts such as air bubbles, tissue folds, and staining mishaps, but through experience can look past them so as to not interfere with their diagnosis. While these imperfections might seem like minor nuisances, they have the potential to cause big headaches for AI when trying to diagnose diseases like prostate cancer.

What kinds of technical artefacts need to be considered?

There are several types of artefacts in histopathology ranging from tissue preparation all the way to the slide imaging step. Despite having qualified and proficient histopathology technicians responsible for slide preparation, we still find these imperfections on each slide that pathologists filter through when completing their cases. Our model was trained using WSIs with over 20 different tissue sectioning, staining and imaging artefacts. Some examples of these are shown below:

Air bubble
Diathermy artefact
Calcified tissue
Hematoxylin and Eosin under-staining
 Hematoxylin and Eosin over-staining
Tissue processing artefact
Tissue fold
Tissue tear

Why are these technical artefacts a problem for AI?

AI, as impressive as it can be at times, can’t simply “brush off” a tissue tear or pretend an air bubble isn’t there. Like a person trying to drive through fog, it could struggle to distinguish between the critical sign-posts (cancerous cells) and the traffic hazards (air bubbles). If these imperfections are not addressed, it could result weaker classification performance .

A common approach in model development is to train on a “clean” dataset, where WSI are quality screened therefore WSIs used in model training are free of any artefacts. However, in the real world of pathology, almost every slide has some degree of imperfection. So, if we only trained AI on perfect slides, the model may struggle to perform consistently well in the real world.

How did we tackle this challenge?

At Franklin.ai, instead of avoiding slides with artefacts, we embraced them. Our AI had a crash course in over 20 different types of technical imperfections, ranging from mild hiccups like staining variations to major offenders like tissue folds and calcifications. If an artefact could disrupt a diagnosis, we made sure our AI met it head-on.

About 71% of the slides we tested on had one or more of these imperfections, making our AI model feel right at home in a real-world lab where the slides are, let’s say, less than Instagram-ready. And we didn’t stop there — we used techniques like data augmentation and normalization, helping the AI focus on what really mattered, while ignoring the visual noise around it.  

What were the results?

The AI performed strongly even when analysing slides with technical artefacts. Our model achieved AUC scores in the high 0.90s for most of the critical clinical markers we tested, proving it could handle whatever we threw at it—whether it was air bubbles, staining variations, or tissue tears. In other words, it didn’t sweat the small stuff (just like a real pathologist wouldn’t).

No doubt there will be technical artefacts present in the real world which will influence AI performance (and even human experts!) - so pathologists should always be mindful of this when using diagnostic support tools. However, our analytical performance sub-study into technical artefact performance should give some peace-of-mind that our tools have been designed from the ground up to be robust in standard clinical conditions.