## Models and definitions

Most clinicians have an intuitive idea of what the concept of reliability means, and that being able to demonstrate that one's measuring instruments have high reliability is a good thing. Reliability concerns the consistency of repeated measurements, where the repetitions might be repeated interviews by the same interviewer, alternative ratings of the same interview (as a video recording) by different raters, alternative forms or repeated administration of a questionnaire, or even different subscales of a single questionnaire, and so on. One learns from elementary texts that reliability is estimated by a correlation coefficient (in the case of a quantitative rating) or a kappa (k) or weighted k statistic (in the case of a qualitative judgement such as a diagnosis). Rarely are clinicians aware of either the formal definition of reliability or of its estimation through the use of various forms of intraclass correlation coefficient rho (r).

First consider a quantitative measurement X. We start with the assumption that it is fallible and that it is the sum of two components: the 'truth' T and 'error' E. If T and E are statistically independent (uncorrelated), then it can be shown that where Var(X) is the variance of X (i.e. the square of its standard deviation), and so on. The reliability r X of X is defined as the proportion of the total variability of X (i.e. Var(X)) that is explained by the variability of the true scores (i.e. Var( T)):

This ratio will approach zero as the variability of the measurement errors increases compared with that of the truth. Alternatively, it will approach 1 as the variability of the errors decreases. The standard deviation of the measurement errors (i.e. the square root of Var( E)) is usually know as the instrument's standard error of measurement. Note that reliability is not a fixed characteristic of an instrument, even when its standard error of measurement (i.e. its precision) is fixed. When the instrument is used on a population that is relatively homogeneous (low values of Var( T)), it will have a relatively low reliability. However, as Var( T) increases then so does the instrument's reliability. In many ways the standard error of measurement is a much more useful summary of an instrument's performance, but one should always bear in mind that it too might vary from one population to anotherâ€”a possibility that must be carefully checked by both the developers and users of the instrument.

Now let us complicate matters slightly. Suppose that a rating depends not only on the subject's so-called true score T and random measurement error E, but also on the identity R, say, of the interviewer or rater R. That is, each rater has his or her own characteristic bias (constant from assessment to another) and the biases can be thought of as varying randomly from one rater to another. Again, assuming statistical independence, we can show that, if X = T + R + E, then

But what is the instrument's reliability? It depends. If subjects in a survey or experiment, for example, are each going to be assessed by a rater randomly selected from a large pool of possible raters, then

However, if only a single rater is to be used for all subjects in the proposed study, there will be no variation due to the rater and the reliability now becomes

Of course, rxb > rxa. Again, the value of the instrument's reliability depends on the context of its use. This is the essence of generalizability theory. (1> The three versions of r given above are all intraclass correlation coefficients and are also examples of what generalizability theorists refer to as generalizability coefficients.

## Funny Wiring Autism

Autism is a developmental disorder that manifests itself in early childhood and affects the functioning of the brain, primarily in the areas of social interaction and communication. Children with autism look like other children but do not play or behave like other children. They must struggle daily to cope and connect with the world around them.

## Post a comment