ChatGPT's Limitations in Athlete ECG Interpretation: Evidence from a Multicenter Diagnostic Study.

Abstract original

Artificial intelligence (AI) has shown promise in the interpretation of electrocardiograms (ECGs) using signal-based deep learning models. In parallel, large language models (LLMs) have gained increasing visibility in clinical practice, including exploratory applications in ECG analysis. Whether a general-purpose LLM can meaningfully discriminate cardiovascular disease from athlete ECGs during PPS remains unknown. We aimed to evaluate the diagnostic performance of a general-purpose LLM for this task. In this multicentre diagnostic accuracy study, we evaluated a commercially available LLM (ChatGPT, version 5) in 2950 competitive athletes undergoing PPS. All athletes underwent resting 12-lead ECG, with second- and third-line investigations performed when clinically indicated. The reference outcome was confirmed cardiovascular disease after full diagnostic work-up (n = 450, 15.3%). For each ECG, the LLM generated a numeric score (0-100) representing the inferred likelihood of underlying disease using a standardized prompt and without task-specific fine-tuning. Discriminative performance was assessed using receiver operating characteristic (ROC) analysis. Misclassification patterns were analysed according to International ECG Criteria. GPT-derived scores demonstrated a marked floor effect, with a median value of 0 (IQR 0-2) in both diseased and non-diseased athletes and substantial overlap between groups. The area under the ROC curve was 0.52 (95% CI 0.49-0.55), indicating performance close to random classification. At the Youden-derived threshold, 79% of athletes with confirmed disease were incorrectly classified as negative. False-negative cases were predominantly characterized by borderline ECG patterns (82%), and a substantial number of red-flag ECG abnormalities were also missed. In this PPS cohort, a general-purpose LLM used in a naïve configuration showed no clinically meaningful ability to discriminate between cardiovascular disease and athlete ECGs. Without task-specific training or domain adaptation, such models should not be used for diagnostic triage in athlete screening.

📖 Resumen en español para socios

El resumen ejecutivo en español, con interpretación clínica para el contexto LATAM, está reservado a socios SPMD. La membresía incluye además acceso ilimitado a FairMed.lat y EvidenX.lat.

Afíliate ($99/año) →

Cómo citar:
Palermi S, Vecchiato M, Iacovone TR, Anselmino M, Adorisio R, Biffi A, Borrelli F, Brugin E, Cantarutti N, Cavarretta E, Cominacini M, Corsi M, D'Ascenzi F, De Feo V, Di Gioia G, Dorelli G, Foccardi G (2026). ChatGPT's Limitations in Athlete ECG Interpretation: Evidence from a Multicenter Diagnostic Study.. Journal of cardiovascular development and disease.
DOI: 10.3390/jcdd13050191 ↗
PMID: 42188077 ↗
Acceso al paper: Ver completo ↗

Abstract original

📖 Resumen en español para socios

Más artículos relacionados

Artificial Intelligence in Sports Cardiology: Advancing Cardiovascular Screening and Diagnosis.

Introducing Cardiac Magnetic Resonance in Athlete Screening: An Initial Moroccan Experience With Professional Football Players.

25-Year Evolution of Apical Hypertrophic Cardiomyopathy in a Professional Athlete.