Rare and aggressive skin cancers were incorrectly classified as low-risk by machine learning models for detecting skin cancers, according to a new study presented at the European Dermatology and Venereology (EADV) Congress 2021.

The research suggests that making apps based on such models available directly to the public without transparency on performance metrics for rare but potentially life-threatening skin cancers is ethically questionable.

Researchers in London focused on two types of skin cancer, Merkel cell carcinoma (MCC) and amelanotic melanoma, both of which are rare but particularly aggressive cancers that tend to grow fast and require early treatment. They created a dataset of 116 images of these rare cancers and of the benign lesions seborrahoeic keratosis and haemangiomas, and assessed these images with two machine-learning models.

The first model studied was a certified medical device, directly sold to the public via the App store and advertised as being able to diagnose 95% of skin cancers (Model 1). The second model was available for research purposes only and used as a reference (Model 2).

The results showed that Model 1 incorrectly classified 17.9% of MCCs and 22.9% of amelanotic melanomas as low-risk. In turn, 62.2% of benign lesions were classified as high risk. For detecting malignancy, Model 1’s sensitivity was 79.4% [95% confidence interval (CI) 69.3-89.4%] and specificity was 37.7% [95% CI 24.7-50.8]. For Model 2, MCC was not included in the top 5 diagnosis for any of the 28 MCC images analysed, raising the possibility that the model had not been trained that this disease class exists.

Questions the safety of artificial intelligence models

The high false positive rate of Model 1 has potentially negative consequences on a personal and societal level. The results pose a bigger question of the safety of other artificial intelligence (AI) models for detecting skin cancer available on the market.

Lloyd Steele, lead author of the study at the Blizard Institute, Queen Mary University of London, UK explains: “In order to improve, machine learning model evaluations should consider the spectrum of diseases that will be seen in practice. At the moment, most of the performance of those models is driven by the imaging data available, which is particularly scarce when it comes to rare skin cancers.”

A global collaboration between research groups and hospitals can be a step towards tackling the gap of skin cancer imaging data, which is a crucial element for a high-performance rate of machine learning.

Marie-Aleth Richard, EADV Board Member and Professor at the University Hospital of La Timone, Marseille, said: “The number of skin cancer detection apps available for consumer use is growing, but as demonstrated in this research, there must be more transparency around the safety and efficacy of these apps. Furthermore such devices detect only what they are shown to analyse and do not make systematic analysis of all the skin’s surface. Failure to be transparent could put lives at risk.”