## Analysis of the results

Here we are particularly concerned with the last of the questions posed at the beginning of the section. In fact, it is a question that should be asked not only of prevalence surveys but of all investigations whether they are epidemiological surveys, intervention studies, or laboratory experiments. How was the design incorporated in the analysis? Frequently the required information is missing. Either the authors are ignorant of the implications of the design, or the journal editor has insisted that technical details are stripped from the report, or both.

Consider a hypothetical sample of 100 participants who have contributed to an estimate of prevalence of, say, depression using a definitive psychiatric interview.

Seventy of the participants have been given a diagnosis of depression. What is a valid estimate of prevalence? What is the standard error of the estimate? Assuming that the data have arisen through simple random sampling, the prevalence p is estimated by 0.70 and its variance is given by where N is the sample size. The standard error is then given by the square root of this expression.

Suppose that we are now told that the results were obtained from a two-phase survey. The size of the first-phase sample was 300. Of these, 100 were screen positive and 200 were screen negative. The second-phase sample consisted of 70 screen positives, of whom 65 were found to be depressed on interview, together with 30 screen negatives, of whom five were found to be depressed on interview. The estimate of prevalence is given by p - ft screen +w)x interview +vefscreer +ve) + M screen -ve) x interview +velscreeu -ve) - t IOW 300) X (65/70} +■ ( 200/300) x (5/30)

where P(A) should be read as 'probability of A' and P(A|B) should be read as 'probability of A given B' or 'probability of A conditional on B having occurred'. The vertical | should not be confused with division, represented by /.

The prevalence estimate from the two-phase survey is considerably lower than if simple random sampling had been assumed. How has this arisen? Obviously the second-phase sample has been enriched for people who are likely to be depressed. The sampling fraction for the screen positives is 70/100, i.e. each second-phase participant can be thought of as representing 100/70 of screen positives from the original sample. Similarly, the sampling fraction for the screen negatives is 30/200, and each second-phase participant represents 200/30 screen-negative participants from the first-phase sample. The reciprocal of the sampling fraction is called the sampling weight. The total weighted second-phase sample size is 70 * (100/70) + 30 * (200/30) = 300, the first-phase sample. Similarly, the total weighted number of cases of depression is 65 * (100/70) + 5 * (200/30) » 126. The latter is the estimate of the number of cases in the first-phase sample. Hence the estimate of prevalence is 126/300 = 0.42, as before. To recapitulate in a slightly more technical way, if the ith individual in the second-phase sample is assigned a sampling weight wi, and if the interview outcome yi has a value of 1 if the ith subject is a case and is zero otherwise, then the weighted prevalence estimate is given by where S means 'sum over all observations in the second-phase sample' and xi is simply an indicator that the observation is, indeed, a second-phase observation ( xi =

1 for everyone). This estimator is an example of the well-known Horwitz-Thompson estimator from the sampling survey literature (9) but it is not particular familiar to psychiatrists or medical statisticians. We shall discuss the use of weighting adjustments again below.

Returning to our original two-phase calculations, let A = P(screen +ve) and B = 1-A = P(screen -ve). Also, let p = ^(interview +ve|screen +ve) and q = P(interview +ve|screen -ve), so that eqn (10) becomes

The variance of the estimate of prevalence from the two-phase design is given by (10)

where N1 is the number of first-phase screen positives and N2 is the number of first-phase screen negatives. Validation of screening questionnaires

It is frequently the case that data from a two-phase survey which has been designed to estimate prevalence are also used to examine the characteristics of the screen questionnaire (in particular, sensitivity and specificity). Readers who are unfamiliar with these concepts are referred to Cha.pter..2.7 or to Goldberg and Williams(!1>. Sensitivity is the proportion of true cases who are screen positive. Specificity is the proportion of true non-cases who are screen negative. The trouble is caused because we used the screen first and then differentially subsampled to carry out the definitive diagnostic interview. Readers familiar with the use of Bayes' theorem will realize how to solve the problem, but here we use another version of the Horwitz-Thompson estimator:

and where, as before, y indicates whether the ith subject was a true case of depression (1=yes, 0=no). This ensures that the calculations in eqn ( 14) are only being carried out on the true cases and, similarly, that the calculations in (15) are only being carried out on the non-cases. Again, w, is the second-phase sampling weight. The new variable z indicates whether the screen result was positive (1=yes, 0=no). An alternative, and perhaps easier, approach is to split the second-phase sample into two: cases and non-cases. Estimation of sensitivity and specificity in these two subfiles (assuming that they are being stored on a computer) is then computationally exactly the same as the weighted estimation of prevalence discussed in the previous section. In the first file, sensitivity is simply the weighted sum of the screen positives divided by the weighted sum of the cases. Similarly, in the second file, specificity is the weighted sum of the screen negatives divided by the weighted sum of the non-cases.

Many readers will be familiar with the idea of choosing a range of cut-points for the screen questionnaire and then estimating sensitivity and specificity at each of the choices. A plot of sensitivity against 1 - specificity is called a receiver operating characteristic ( ROC) curve. If the screen is of no use, then the plot will be a straight line through the origin with unit slope. A good screen will produce a convex curve (the greater the area between the observed curve and that indicated by a straight line with unit slope, the better the screen is at discriminating between cases and non-cases). It is sometimes said that one cannot investigate ROC curves using two-phase data. This view is, in fact, mistaken. One can think of the two-phase sampling design as a mechanism by which one can deliberately introduce the analogues of verification bias. (12) Note that there no necessity to restrict the first-phase stratification to just two strata (potential cases versus non-cases) to define the sampling fractions for the second phase of the survey. We start by calculating observed sampling fractions for each discrete outcome of the screening questionnaire. These define the corresponding sampling weights. We then consider all the possibilities for defining z, in eqns (14) and (15)—there is no need for the z, to correspond to the way that the second-phase sampling fractions were determined. We then repeatedly use eqns ( 14) and (15), keeping the weights constant as we change the definition of the z.. One important point to bear in mind is that if the characteristics of the screen are not fairly well known beforehand and if one of the major aims of the survey is to carry out an ROC analysis, then this is not a particularly efficient design to use. It would be better to go back to the simple random sample—all subjects assessed by both screen and interview.

If one needs, say, confidence intervals for estimates of sensitivity and specificity, it is relatively straightforward to do this via a weighted logistic regression (see next section). The file can be split into cases and non-cases and then, using appropriate software (see below), one fits a logistic model containing a predictor variable which has the value of 1 for all subjects (i.e. just fitting a constant). One then obtains the confidence interval for the intercept term in the output. Finally, the inverse of the logistic transformation of the lower and upper confidence limits will yield the corresponding limits for the sensitivity (specificity) itself. Note that the interval will be asymmetric and will be within the permitted bounds of zero and unity.

## Positive Thinking Power Play

Learning About A Positive Thinking Power Play Can Have Amazing Benefits For Your Life And Success. Learn About Positive Thinking Power Play -And Have A Look At 10 Steps to Success To Create Amazing Results.

Get My Free Ebook