Limitations in the application of scoring systems

The way in which a scoring system is developed often differs from the way that it is applied. Such differences may arise in the definitions and rules for data collection, the patient inclusion/exclusion criteria, the time period for data collection, the mode of data collection, the precise outcome variable measured, and the handling of data prior to analysis. The assumption is that the method is not affected by such differences. However, it is not known whether they invalidate the method. The differences often arise because the detailed instructions required to apply the scoring system are not fully described in the original scientific paper, possibly because of space limitations. Some examples are given below.

Different patient inclusion criteria

The relationship between the APACHE II score and hospital mortality can be biased by using it to estimate probabilities of death before discharge from hospital for patients selected by substantially different inclusion criteria from those included in its development. APACHE II may not be properly calibrated for such selected patients and may systematically over- or underestimate the probabilities. Many examples of the application of APACHE II, a generic scoring system for intensive care patients, to specific groups of intensive care patients are found in the literature.

Different time period for data collection

The definition of what constitutes the first 24 h of intensive care can vary ( Rowan In addition, observance of the inclusion criteria of a randomized controlled trial might lead to both the inclusion of different patients from those used to develop the scoring system and its application beyond the time period for data collection used in its development (e.g. use of APACHE II scores as inclusion criteria for patients in a trial based on data collected on their fourth day in intensive care). The scoring system may not be valid except for 'the initial 24 h after ICU admission' as stated in the original publication. Considerable variation of the time period for data collection is found in the literature.

Different outcome variable

TISS was developed solely as a proxy measure for the severity of illness of patients by quantifying the type and amount of treatment provided. However, TISS scores have been employed to determine both nursing workload and nurse dependency. These applications may not be valid for the following reasons. A TISS score represents only direct nursing tasks. Labor-intensive nursing tasks are not included in TISS. Nursing tasks, such as consoling bereaved relatives, are not included in TISS. A sedated patient receiving a greater number of TISS interventions may be less nurse dependent than an alert agitated patient receiving far fewer TISS interventions. Such issues may affect the use of TISS as a nurse workload/dependency measure.

