Melanoma Detection with Uncertainty Quantification

ArXiv Preprint View Code Live Demo
The Core Philosophy

[cite_start]The Problem: Deep learning models are powerful but often "confidently incorrect." In melanoma detection, a confident false negative (telling a patient they are fine when they have cancer) is catastrophic[cite: 267].

Our Solution: It is safer for an AI to say "I don't know" than to guess wrong. We implemented an Entropy-Based Uncertainty Quantification framework. By measuring the model's confusion, we identify ambiguous cases and refer them to human experts instead of making a risky automated guess.

The Result: By rejecting uncertain predictions, we improved accuracy from 93.2% → [cite_start]97.8% and reduced clinical misdiagnoses by 40.5%[cite: 255].

1. Methodology: Diversity & Calibration

[cite_start]To build a robust generalist model, we combined 10 open-source datasets (ISIC'16-'20, 7-point, PH2, etc.) and trained 24 different CNN architectures [cite: 330-331].

We then used Calibration Curves and Expected Calibration Error (ECE) to find the exact "Reject Threshold"—the point where the model's uncertainty is too high to be trusted.

Figure 4: Calibration Curves
[cite_start]Figure 4 Analysis[cite: 445]: Models trained on combined datasets (Right) align much closer to the ideal diagonal line ($y=x$) compared to single datasets (Left). This allows us to trust the model's probability scores when calculating uncertainty.
2. Impact: Reducing Misdiagnoses

How much safer is the model? We compared the number of False Positives (FP) and False Negatives (FN) before and after applying our uncertainty rejection.

[cite_start]Figure 5 Data [cite: 482-485]:
  • Overall: We prevented 353 misdiagnoses across all test sets.
  • Kaggle Dataset: We reduced False Negatives (missing a cancer diagnosis) by 81% (from 177 to 44).
  • Clinical Safety: The red bars (False Negatives) are significantly lower in the "With Rejection" group, minimizing the risk of missing malignant tumors.
3. System Overview

Our pipeline integrates heterogeneous data, standardizes it, and adds a "Human-in-the-Loop" safety valve for uncertain cases.

Figure 1: System Overview
[cite_start]Figure 1[cite: 311]: The workflow consists of (1) Data Integration, (2) Melanoma Recognition (CNNs), (3) Uncertainty Analysis (Entropy), and (4) Integration, where "Uncertain" cases are filtered out for human review.