17 models. 10,200
trials. One question.

Aggregate evaluation,
multimodal LLMs on
dermoscopy.
JAAD 2025 (in press)

Question · Method · Limit

Can a multimodal LLM, prompted carefully, recognize dermatoscopic features as well as a board-certified dermatologist? Across 6 prompting arms and 8 diagnoses — short answer: not yet, but the gap is closing.

Dataset spans 8 dermatologic diagnoses tested across 17 multimodal LLMs from OpenAI and Google. 6 prompting strategies cover zero-shot label list, few-shot exemplars, primer board, anti-anchoring, and free-form variants.

Back to research → Browse apps →

Method Aggregate accuracy, sensitivity,
and specificity across all
model × prompt × diagnosis
combinations. Citation Tadros AR, Zhuo W, Fathy RA,
et al. JAAD 2025 (in press). Disclaimer Research dashboard. Not medical
advice. Not for clinical use.

Loading dashboard…

17 models. 10,200trials. One question.

17 models. 10,200
trials. One question.