An interpretable machine learning model for predicting emergence agitation in children: a multicenter development and validation study

Description

Citation: Zhao Q, Zhang Y, An R, Yi B, Huang G. An interpretable machine learning model for predicting emergence agitation in children: a multicenter development and validation study. BMC Anesthesiol. 2026;[epub ahead of print]. This multicenter retrospective study from two Chinese hospitals developed and externally validated several machine learning models to predict emergence agitation in children after general anesthesia. Using five routine perioperative predictors, a compact multilayer perceptron model showed excellent discrimination internally but only moderate performance and suboptimal calibration in an external cohort. Because variable selection, model choice, and key predictors are highly data- and center-dependent, the tool is best seen as a proof-of-concept rather than a ready-made decision aid for routine paediatric anesthesia practice. Study at a glance - Design and setting: Retrospective multicenter prediction-model study using electronic records from two tertiary hospitals in China: Center I (Third Affiliated Hospital of Zunyi Medical University) for model development and internal validation, and Center II (First Affiliated Hospital of Army Medical University) as an independent external validation cohort. Children aged 3–12 years with American Society of Anesthesiologists physical status I–II undergoing elective surgery under general anesthesia were included. - Participants and primary outcome: A total of 445 pediatric patients were analyzed (321 in the development center, 124 in the external validation center). In Center I, emergence agitation occurred in 95 children, an incidence of 29.6%; in Center II, the reported incidence was about 25.8% (32–33 events, with slight inconsistencies across sections). The primary outcome was emergence agitation within 30 minutes after post-anesthesia care unit admission, defined as Pediatric Anesthesia Emergence Delirium (PAED) score >10, after pain was assessed and treated using the FLACC scale to limit misclassification from pain-related distress. - Predictors and main model: From 63 perioperative variables, the authors used univariable screening followed by least absolute shrinkage and selection operator (LASSO) regression to select predictors, then trained six algorithms (logistic regression, support vector machine, multilayer perceptron, random forest, extreme gradient boosting, and Light Gradient Boosting Machine). The final five-variable clinical model used parental educational level, preoperative alanine aminotransferase (ALT), postoperative patient-controlled analgesia (PCA) pump use, postoperative antagonist (reversal agent) use, and extubation suctioning frequency. A multilayer perceptron (MLP) was chosen as the primary clinical model because it performed best in external validation; a support vector machine was used for detailed interpretability analyses with SHAP values. - Key performance results: In internal holdout validation at Center I, discrimination was high across models, with area under the receiver operating curve (AUC) around 0.87–0.92; the support vector machine achieved AUC 0.918 (95% confidence interval 0.844–0.973, Brier score 0.098), and logistic regression AUC 0.915. In external validation at Center II, performance dropped noticeably. The primary clinical MLP model achieved AUC 0.705 (95% confidence interval 0.59–0.804) with a Brier score of 0.190, reflecting only moderate discrimination and imperfect calibration; other models performed worse (e.g., logistic regression AUC 0.587, LightGBM AUC 0.494). No decision-curve or net benefit analyses were reported. - Risk of bias and applicability: Using a structured prediction-model appraisal, overall risk of bias was judged high, mainly due to the analysis domain. Concerns include data-driven predictor selection from many candidates, testing and informally selecting among six algorithms, limited optimism correction (formally reported only for the support vector machine), and a relatively small, case-mix–different external validation cohort. Applicability is further constrained because three of the five final predictors (PCA pump use, antagonist use, suctioning frequency) are early postoperative management decisions that vary by center and over time, rather than stable baseline risk factors. - Practice implications: For practising clinicians, this study underscores that emergence agitation after pediatric anesthesia is common and potentially predictable, and it highlights perioperative features—such as parental education, preoperative ALT, and postoperative analgesia and reversal strategies—that may correlate with risk. However, the current models should not be used as stand-alone decision aids: external performance is only moderate, calibration is imperfect, and the models depend strongly on center-specific management choices. At present, these tools are best viewed as research prototypes and a stimulus for locally developed and rigorously validated prediction models, rather than as ready-to-implement clinical calculators.

Outcome Differences Between General and Neuraxial Anesthesia for Hip Fracture by Frailty and Age in the Elderly: A Retrospective Cohort Study

Citation: Giannakis P, Restrepo M, Stone AB, Zhuang ST, Wang J, Cozowicz C, et al. Outcome Differences Between General and Neuraxial Anesthesia for Hip Fracture by Frailty and Age in the Elderly: A Retrospective Cohort Study. Anesth Analg. 2026;XXX(00):00–300. doi:10.1213/ANE.0000000000008062 Using a large United States hospital claims database, Giannakis and colleagues compared neuraxial versus general anesthesia for more than six hundred thousand hip fracture surgeries across age and frailty strata. Neuraxial anesthesia was associated with very small differences in in-hospital mortality and a composite of major complications, a clearer reduction in high opioid use, and slightly more discharges home, but also small increases in some complications and intensive care admissions. Because anesthesia type was not randomized and key clinical confounders and outcomes were captured only through billing codes, overall certainty is very low and the results should inform, not dictate, anesthetic choice. Study at a glance - Design and setting: Retrospective cohort study using the Premier Healthcare Database, including 623,122 adults undergoing surgical treatment of hip fracture in United States hospitals between 2016 and 2023. Exposure was anesthesia type (general vs neuraxial) coded from billing data; outcomes (in-hospital mortality, major complications, intensive care unit admission, length of stay, opioid use, discharge disposition) were defined from ICD-10-CM diagnosis codes and billing records. Associations were estimated with mixed-effects multivariable logistic regression adjusted for demographics, comorbidities, hospital characteristics, procedure type, peripheral nerve block use, fracture type, and time to surgery. - Primary outcome – composite of death and major complications: The prespecified primary endpoint was a composite of in-hospital mortality, respiratory complications, cardiac complications, acute renal failure, and delirium. Overall, neuraxial anesthesia versus general anesthesia was associated with an adjusted odds ratio (OR) of 0.97 (95% confidence interval [CI] 0.94–0.997; p=0.053), a very small relative difference compatible with little to no effect. Given the nonrandomized, claims-based design and serious residual confounding, GRADE certainty for this outcome is Very Low; the apparent benefit could easily be due to unmeasured differences between patients selected for each technique. - In-hospital mortality: In-hospital death was lower in the neuraxial group overall, with an adjusted OR of 0.83 (95% CI 0.74–0.93; p=0.003), and a more pronounced association in older, more frail subgroups (for example, OR 0.77, 95% CI 0.65–0.91 in patients aged ≥87 years with intermediate/high frailty). However, choice of anesthesia is strongly confounded by clinical status, cognitive function, and hemodynamic reserve, which are incompletely measured in claims. With Serious overall risk of bias and no advanced causal methods, GRADE certainty for any mortality benefit is Very Low. - Key secondary outcomes – opioid use, discharge home, length of stay: Neuraxial anesthesia was associated with a moderate reduction in high postoperative opioid use (overall adjusted OR 0.69, 95% CI 0.66–0.72; p<0.001), consistent across age and frailty strata, and with slightly higher odds of discharge to home among survivors (overall OR 1.08, 95% CI 1.04–1.12; p<0.001). Prolonged length of stay (≥75th percentile) showed a very small reduction with neuraxial anesthesia (overall OR 0.97, 95% CI 0.94–0.998; p=0.046). High opioid use is a process measure rather than a direct patient-important endpoint, and discharge disposition and length of stay are influenced by social and system factors; all three outcomes are rated Very Low certainty due to serious confounding and, for opioid use, additional indirectness. - Potential harms – respiratory, cardiac, and ICU outcomes: Across the overall cohort, neuraxial anesthesia was associated with slightly higher rates of several coded complications and intensive care unit use: respiratory complications (OR 1.06, 95% CI 1.01–1.10; p=0.03), cardiac complications (OR 1.07, 95% CI 1.02–1.12; p=0.008), and intensive care unit admission (OR 1.07, 95% CI 1.03–1.12; p=0.002). Subgroup and sensitivity analyses showed some heterogeneity by age, frailty, and hospital neuraxial use, but effects remained small. Because these outcomes rely on diagnosis codes without validation and are highly susceptible to confounding by severity and practice patterns, GRADE certainty is Very Low, and the direction of true effect is uncertain. - Risk of bias, certainty, and practice implications: Overall risk of bias is judged Serious due to residual confounding by indication, selection related to coding completeness, and outcome misclassification from claims data. All appraised outcomes, including the primary composite, mortality, complications, length of stay, opioid use, intensive care unit admission, and discharge home, are rated Very Low certainty with GRADE. Clinically, the study suggests that neuraxial and general anesthesia for hip fracture have broadly similar in-hospital risks, with neuraxial associated with less high opioid use and more home discharges but also small increases in some complications, all very uncertain. These findings should not, on their own, drive a major practice shift; instead, anesthetic choice should remain individualized, and system-level improvements in timely surgery, hemodynamic management, multimodal analgesia, delirium prevention, and early mobilization are likely to have larger and more reliable impact than anesthesia type alone.

3. maj 202611 min

An interpretable machine learning model for predicting emergence agitation in children: a multicenter development and validation study

Description

Comments

1 month for 9 kr.

All episodes