Vetrix Anesthesiology
Citation: Zhao Q, Zhang Y, An R, Yi B, Huang G. An interpretable machine learning model for predicting emergence agitation in children: a multicenter development and validation study. BMC Anesthesiol. 2026;[epub ahead of print]. This multicenter retrospective study from two Chinese hospitals developed and externally validated several machine learning models to predict emergence agitation in children after general anesthesia. Using five routine perioperative predictors, a compact multilayer perceptron model showed excellent discrimination internally but only moderate performance and suboptimal calibration in an external cohort. Because variable selection, model choice, and key predictors are highly data- and center-dependent, the tool is best seen as a proof-of-concept rather than a ready-made decision aid for routine paediatric anesthesia practice. Study at a glance - Design and setting: Retrospective multicenter prediction-model study using electronic records from two tertiary hospitals in China: Center I (Third Affiliated Hospital of Zunyi Medical University) for model development and internal validation, and Center II (First Affiliated Hospital of Army Medical University) as an independent external validation cohort. Children aged 3–12 years with American Society of Anesthesiologists physical status I–II undergoing elective surgery under general anesthesia were included. - Participants and primary outcome: A total of 445 pediatric patients were analyzed (321 in the development center, 124 in the external validation center). In Center I, emergence agitation occurred in 95 children, an incidence of 29.6%; in Center II, the reported incidence was about 25.8% (32–33 events, with slight inconsistencies across sections). The primary outcome was emergence agitation within 30 minutes after post-anesthesia care unit admission, defined as Pediatric Anesthesia Emergence Delirium (PAED) score >10, after pain was assessed and treated using the FLACC scale to limit misclassification from pain-related distress. - Predictors and main model: From 63 perioperative variables, the authors used univariable screening followed by least absolute shrinkage and selection operator (LASSO) regression to select predictors, then trained six algorithms (logistic regression, support vector machine, multilayer perceptron, random forest, extreme gradient boosting, and Light Gradient Boosting Machine). The final five-variable clinical model used parental educational level, preoperative alanine aminotransferase (ALT), postoperative patient-controlled analgesia (PCA) pump use, postoperative antagonist (reversal agent) use, and extubation suctioning frequency. A multilayer perceptron (MLP) was chosen as the primary clinical model because it performed best in external validation; a support vector machine was used for detailed interpretability analyses with SHAP values. - Key performance results: In internal holdout validation at Center I, discrimination was high across models, with area under the receiver operating curve (AUC) around 0.87–0.92; the support vector machine achieved AUC 0.918 (95% confidence interval 0.844–0.973, Brier score 0.098), and logistic regression AUC 0.915. In external validation at Center II, performance dropped noticeably. The primary clinical MLP model achieved AUC 0.705 (95% confidence interval 0.59–0.804) with a Brier score of 0.190, reflecting only moderate discrimination and imperfect calibration; other models performed worse (e.g., logistic regression AUC 0.587, LightGBM AUC 0.494). No decision-curve or net benefit analyses were reported. - Risk of bias and applicability: Using a structured prediction-model appraisal, overall risk of bias was judged high, mainly due to the analysis domain. Concerns include data-driven predictor selection from many candidates, testing and informally selecting among six algorithms, limited optimism correction (formally reported only for the support vector machine), and a relatively small, case-mix–different external validation cohort. Applicability is further constrained because three of the five final predictors (PCA pump use, antagonist use, suctioning frequency) are early postoperative management decisions that vary by center and over time, rather than stable baseline risk factors. - Practice implications: For practising clinicians, this study underscores that emergence agitation after pediatric anesthesia is common and potentially predictable, and it highlights perioperative features—such as parental education, preoperative ALT, and postoperative analgesia and reversal strategies—that may correlate with risk. However, the current models should not be used as stand-alone decision aids: external performance is only moderate, calibration is imperfect, and the models depend strongly on center-specific management choices. At present, these tools are best viewed as research prototypes and a stimulus for locally developed and rigorously validated prediction models, rather than as ready-to-implement clinical calculators.
27 episodes
Comments
0Be the first to comment
Sign up now and become a member of the Vetrix Anesthesiology community!