(S-120) Glance, L.G., Sunday 9:15
TITLE: EFFECT OF VARYING THE CASE MIX ON THE PERFORMANCE OF APACHE II: A SIMULATION STUDY
AUTHORS: Laurent G. Glance, MD1, Turner Osler, MD2, Peter Papadakos, MD1
AFFILIATION: 1University of Rochester Medical Center, Rochester, NY; 2University of Vermont Medical College, Burlington, VT.
INTRODUCTION: Crude mortality rates cannot be used to compare ICU performance since differences in mortality may simply be a function of case mix variation. Outcome prediction models can be used to calculate risk-adjusted mortality rates for ICUs because they (presumably) account for differences in case mix. In order for a scoring system to allow comparisons of ICU's with varying mortality rates, its performance must be stable across a range of case mixes. The purpose of this study was to assess the impact of case mix on the performance of APACHE II using measures of calibration and discrimination.
METHODS: APACHE II data was collected prospectively from the SICU at a single institution between 1990 and 1997 (cardiac, burn and pediatric patients were excluded). The patients in this data set were first ranked according to their APACHE II predicted mortality and then divided into ten groups (decile of risk) of equal size: 0-3.1%, 3.2-4.7%, …, 39-100%. A computer simulation was used to create 2000 different cases mixes with mortality rates 5%-18%. The number of patients in each decile (x1 from the first decile, x2 from the 2nd decile, …, x10 from the 10th decile) constituted a distinct case mix. A virtual ICU (VICU) with a given case mix was created by randomly resampling (with replacement) x1 from the first decile, x2 from the 2nd decile, …, x10 from the 10th decile. One hundred different VICUs with identical case mixes were created for each of the 2000 case mixes. The area under the ROC curve and the Hosmer- Lemeshow C statistic were derived for each case mix.
RESULTS: With increases in simulated mortality rate, the HL C stat increased significantly, suggesting marked detoriation in calibration (R-Sq=0.90; p=0.000). There was significant variability in the area under the ROC curve with changes in case mix.
CONCLUSION: As the simulated mortality rate increased, APACHE II exhibited progressively worse calibration. Since all of the patients in simulated data sets were cared for at a single institution, it is reasonable to assume that the level of care was uniform. Therefore the changes in calibration of APACHE II is intrinsic to the scoring system itself. Therefore, it may not be possible to use APACHE II to compare ICUs with widely divergent mortality rates since the accuracy of APACHE II is itself a function of case mix.