Identification procedures apply explicit or implicit statistical models to one or more spontaneous reports in order to screen for product-AE pairs of interest. Much of the published statistical research concerning AE signalling methodology has been directed toward developing methods that are used at this step. Like sorting, the identification step is carried out using reports, not cases, so that the output of such activity is intended to undergo further refinement using case-based methods. It has long been well understood that the use of statistical models at this step is intended to automate the process of finding product-AE pairs that are "suspicious" or "interesting", and does not represent a form of hypothesis testing (Finney, 1971a, 1971b, 1982; DuMouchel, 1999). It is helpful to think of identification procedures as "screening tests" that are applied to report databases in order to find potentially important material.

As noted in Table 19.3, procedures employed at the identification (screening) step of signalling should be: (1) easy to use, (2) generalizable to many product-AE pairs, (3) associated with a low false positive rate, (4) amenable to automation, and (5) capable of producing uniform output. Although achieving all of these objectives simultaneously is

Table 19.3. Characteristics of a good method used at the identification (screening) step of spontaneous signalling.

• Generalizable to many product-AE pairs

• Associated with a low false positive rate

• Amenable to automation

• Capable of producing uniform output difficult, program designers can improve the effectiveness of AE database screening programs by increasing the specificity (and, concomitantly, decreasing the sensitivity) of instruments used at the identification step, as illustrated in Figure 19.2.

Panel A of Figure 19.2 emphasizes that the function of the identification step is to look within product-AE databases for the presence of "interesting" safety outcomes. This process is screening-oriented (not patient- or diagnostic-oriented), and is associated with a sensitivity, specificity, positive predictive value, and likelihood ratio for "interesting" safety issues. Panel B of Figure 19.2 demonstrates that, in the presence of a low prevalence of "interesting" product-AE pairs (a typical situation in postmarketing surveillance), the likelihood ratio for those safety outcomes that are identified must be extremely high if a reasonable positive predictive value for the program is to be achieved. This

A. Screening For "Interesting" Product AEs

Interesting Yes No

Screening Result

Interesting Yes No

TP = True Positive |
FP = False Positive |

FN = False Negative |
TN = True Negative |

Sensitivity = TP / (TP + FN) Specificity = TN / (TN + FP) Positive Predictive Value = TP / (TP ! FP) Likelihood Ratio = Sensitivity / (1 - Specificity)

Sensitivity = TP / (TP + FN) Specificity = TN / (TN + FP) Positive Predictive Value = TP / (TP ! FP) Likelihood Ratio = Sensitivity / (1 - Specificity)

B. Report Phase Positive Predictive Value (PPV)

(Sensitivity) (X) + (1 - Specificity) (Y) 1 + 1 / (P LR)

PPV = positive predictive value for screening X = number of "interesting" AEs in product database * Y = number of "not interesting" AEs in product database * P = underlying probability of interesting product AE-pairs LR = likelihood ratio for a positive screen

* X is assumed to be small in comparison to Y

latter attribute is critical because the positive predictive value of the report-based identification step is equivalent to the proportion of safety outcomes selected for intensive, case-based work-up that are subsequently found to be program-matically significant. Failure to maintain this proportion at an acceptable level quickly leads to saturation of evaluative resources, and, in the long run, to an ineffective AE surveillance effort (Institute of Medicine, 1999).

In recent years, much emphasis has been placed on the use of complex statistical models to carry out the identification step of signalling. There is no evidence, however, that departing from the basic idea of report-based screening, as originally articulated by Finney (Finney, 1965, 1974) has improved the efficiency of AE surveillance programs. On the contrary, the design principles discussed above, as well as actual US experience with "increased frequency'' methods (see the section Serial Identification Methods below), suggest that statistical modelling will have to be used carefully and employed selectively if excessive false positive rates are to be avoided.

Imputation Screening (Causality) Assessments

Imputation screening assessments use single-report evaluations to identify a signal (i.e. they implement the signalling process by screening for reports that contain diagnostically suggestive information). When used at the identification step to search spontaneous report databases, imputation is not intended to establish or refute product-AE causality, but, rather, like all identification methods, attempts to find promising product-AE pairs (Venulet, 1992; Meyboom et al., 1997a; Stephens, 1999). Thus, the reader should recognize that the use of the term "causality assessment" to describe report-based imputation is misleading. In this chapter we will use instead the term "imputation screening".

Imputation methodology has undergone extensive refinement over the past four decades (Lane et al., 1987; Kramer and Lane, 1992; Naranjo and Lanctot, 1993). Starting from pure subjectivity, it evolved into formalized procedures based on rules, and then into more specific probabilistic calculations derived from Bayes' Theorem. More complex approaches, such as decision support algorithms, have also been suggested (Hoskins and Manning, 1992). However, essentially no experience exists in which Bayesian or other statistical models have been used systematically to screen spontaneous databases for product-AE signals. Although formal studies are lacking, it is likely that elaborate, diagnostically oriented models will have limited applicability at the identification step of spontaneous signalling due to their complexity and emphasis on individualization.

In the simplest form of imputation screening, subjective assessment (also called global introspection (Kramer, 1986) or unstructured clinical judgment (Jones, 1994)), an evaluator assigns a causal rating to an individual spontaneous AE report based on medical diagnostic experience (MacDonald and MacKay, 1964). Subjective assessments usually have involved classification into imputation categories, such as the designations "documented", "probable", "possible", and "doubtful" proposed in the 1960s and still used in modified form in AE evaluations today (Cluff et al., 1964; Seidl et al., 1965). It has been shown that subjectively generated causality assessments are associated with high levels of intra- and inter-rater variability (Karch et al., 1976; Koch-Weser and Greenblatt, 1976; KochWeser et al., 1977; Blanc et al., 1979; Naranjo et al., 1981) and produce results that can differ substantially from imputation methods based on more explicit methods (Miremont et al., 1994). This imprecision stems in large part from the multi-factorial nature of AE causality, which makes reproducible evaluations difficult to carry out in the absence of a formal rating procedure (Kramer, 1986). Nonetheless, global introspection probably remains the method most widely used for screening imputation (Meyboom and Royer, 1992; Jones, 1994; Hartmann et al., 1997).

Rule-Based Methods

With rule-based methods (also called standardized assessment methods (Hutchinson et al., 1983;

Hutchinson, 1986) or standardized decision aids (Naranjo, 1986)), evaluation of an individual AE report is scored using a questionnaire or an algorithm (Lane et al., 1987; Hutchinson and Lane, 1989; Begaud et al., 1994a; Jones, 1994). Rule-based methods have been shown to decrease intra- and inter-rater variability, and can therefore improve the uniformity of ratings that are assigned to candidate reports in spontaneous databases (Hutchinson et al., 1979; Busto et al., 1982). A large number of such generalized, rule-based instruments have been published, beginning with that of Irey (Irey, 1976a, 1976b; Karch and Lasagna, 1977; Kramer et al., 1979; Naranjo et al., 1981). Rule-based methods intended for use with a particular AE have also been developed (Danan and Benichou, 1993). However, there is essentially no published experience in which such AE-specific rule-based methods have been used systematically to screen spontaneous databases.

Several concerns have been raised about the use of rule-based methods that have implications for their value as database screening methodologies (Naranjo and Lanctot, 1993). These include residual inter-rater variability (Case and Oszko, 1991), poor performance in actual surveillance environments (Leventhal et al., 1979; Grohmann et al., 1985; Louik et al., 1985; Schmidt et al., 1986; Meyboom, 1998), and the lack of inter-method comparability or translation methodologies (Michel and Knodel, 1986; Pere et al., 1986; Hutchinson and Lane, 1989; Frick et al., 1997) (see Chapter 18, Algorithms). The latter observation underlines the inappropriateness of the word "standardized" to describe rule-based methods, since such instruments neither conform to an accepted standard nor produce standard output. Rule-based methods are better thought of as a group of independently designed expert systems that possess similar report assessment features. The biggest problem facing the implementation of rule-based methods as identification procedures in AE report databases is the lack of information to evaluate their performance as screening tests. Given the absence of studies to support the use of any particular approach to screening imputation, health care product manufacturers and regulatory agencies will likely continue to use a multiplicity of instruments (and their modifications) for this purpose.

Bayesian analysis is the most individualized formal imputation method that has been applied thus far to the evaluation of single-report content (Lane, 1984; Auriche, 1985). The odds form of Bayes'' expression can be thought of as the underlying statistical model from which more simplistic rule-based methods are derived. The model requires the evaluator to stipulate an initial estimate of the ratio of probabilities favoring causality versus non-causality for a given product-AE pair among exposed patients (the prior odds) (see the section Bayesian Odds Model in Chapter 18). This quantity is then multiplied by AE occurrence-specific likelihood ratios. Each likelihood ratio is formed by dividing the probability for a risk factor or test result (usually a temporal test) under the condition of causality by the probability for the same risk factor/test result under the condition of other causality. The resulting quantity (the posterior odds) is the theoretical best estimate for the odds in favor of product versus other causal explanations, and is calculated for each AE occurrence in each affected patient (Lane, 1984, 1986a, 1986b; Auriche, 1985; Lane et al., 1987).

Bayesian assessments are time-intensive, requiring a large number of assumptions and calculations, and are individualized on a patient-by-patient basis (Jones, 1994). The latter attribute is reflected in relatively low correlations between Bayesian and rule-based assessments. Bayesian methodology is not algorithmic, but instead relies upon report-specific epidemiologic estimates and case data modelling that are strongly influenced by empirical observation (Lanctot and Naranjo, 1995). Although Bayesian calculations can be automated to an extent (Naranjo and Lanctot, 1993), they will likely always require significant time input on the part of the operator. These considerations suggest that the Bayesian odds model is fundamentally a diagnostic tool whose primary application is to formally evaluate etiolo-gic alternatives in those unusual situations where single-case diagnosis of safety outcomes is warranted (investigatory purposes, for example). Thus, as a result of complexity and the need for recomputation on a report-by-report basis, the Bayesian odds model, as well as other statistical procedures that have been proposed for use with spontaneous data (Hoskins and Manning, 1992), have limited applicability as screening tools for AE surveillance programs.

Intra-product quantitative identification methods (serial (increased frequency) methods and tempor-ospatial cluster identification methods) enumerate groups of reports of similar content for the same product over time or time-space in order to identify an AE signal. The statistical models that have been used for serial identification methods have often been based on a two-group Poisson model, while temporospatial cluster identification methods use established clustering procedures to identify localized increases in the reporting rate of a product-AE pair. Past experience indicates that, when applied routinely to collections of spontaneous reports, knowledge of the limitations of serial methodology is important. A good example of this occurred in the United States, where a regulatory requirement for serial signalling was eventually withdrawn as a result of a high ''false alarm'' (false positive) rate (Food and Drug Administration, 1996, 1997). In contrast to serial methods, procedures that focus on temporospatial clustering identification have received little attention from the AE signalling community, although there are both theoretical and practical bases for the application of these techniques in AE surveillance programs (Moussa, 1978; Jacquez et al., 1996a, 1996b; Clark et al., 1999). AE clustering methodology takes essentially the same approach that is commonly used by public health agencies in the investigation of both product- and non-product-induced disease outbreaks (Jacquez et al., 1996a).

Spontaneous serial methods monitor the number, proportion, or rate of a reported product-AE pair over time (Finney, 1971a, 1974; Royall, 1971; Royall and Venulet, 1972; Moussa, 1978; Lydick et al, 1990; Tsong, 1992; Praus et al, 1993; Amery, 1994; Lao, 1997). The first such method (Patwary signalling) was based on a current versus historical comparison of the proportion of index-product-attributed AE reports to all-product-attributed AE reports for a particular AE type in a multi-product batch. This "AE-specific, proportion of all products'' strategy was subsequently changed to the "product-specific, proportion of all AEs'' strategy which is commonly used today. The same statistical procedures can be used to carry out either calculation.

All serial signalling procedures are designed to identify either a sudden departure in a reported rate or proportion relative to prior experience or a sustained increased trend in reporting over time (Lao et al., 1998; Lao, 2000) (see Table 19.4). Methods for detecting spike increases were originally derived from a reporting rate comparison between current versus extensive historical time periods (Finney, 1974), whereas later approaches emphasized comparison between two successive equal time periods (Food and Drug Administration, 1985, 1992; Clark, 1988; Norwood and Sampson, 1988). Unequal time period procedures have also been suggested that involve comparing current to all (i.e. current + past, rather than just past) historical experience (Lao et al., 1998). Trend methods have been published in which a comparison is made over several contiguous time intervals (Finney, 1974; Mandel et al., 1976; Levine et al., 1977; Lao et al., 1998; Lao, 2000), or in which cumulative experience is compared with an externally specified standard (Moussa, 1978). The latter two types of procedures are aimed at ascertaining incremental increases over time and cumulative upward divergence beyond a pre-set

Table 19.4. Kinds of serial signalling methods.

Detection of spikes

• Period-to-period methods

• Long history methods

Detection of gradual increases

• Trend methods

• Cumulative sum methods level, respectively. Although valuable experience and data have accumulated regarding serial signalling methods, there is, at present, no empirical evidence to suggest that any single procedure would be superior as a screening tool in spontaneous databases.

Two acknowledged methods for period-to-period serial testing are the conditional binomial method of Norwood and Sampson (Norwood and Sampson, 1988; Tsong, 1992) and the normal approximation for a difference between two proportions described in the US FDA 1991 reporting guidelines (Food and Drug Administration, 1992). Tsong published an evaluation of six period-to-period increased frequency methods based on simulated false positive rates that showed the FDA's 1991 method to be the optimal procedure of those that were investigated (Tsong, 1992), while Hillson suggested a method in which serial comparisons are adjusted for variations in the lag time between the dates of occurrence and reporting (Hillson et al., 1998). Regardless of what report-based serial test is used, the safety professional should carefully evaluate and adjust for false positive rates, since experience indicates that this is the major limitation associated with serial methodology (Food and Drug Administration, 1996, 1997). A listing of published serial methods is provided in Table 19.5.

Spontaneous serial methods that assume a classical statistical distribution are limited by the presence of geographical clustering (Moussa, 1978; Clark et al., 1999). Specifically, when widely distributed products are monitored using a Poisson-based or negative binomial-based test, geographical clustering of reports (which occurs commonly in spontaneous reporting systems) can be shown to violate the distributional assumption. In the presence of geoclustering, an evaluator should therefore consider replacing Poisson-based or similar serial testing with a procedure that identifies high probability clusters. Other metho-dologic approaches in this setting include region-based serial testing, in which numerator counts are made of reporters and/or institutions instead of reports (Clark et al., 1999), or the use of specialized probability distributions for reports (Moussa, 1978; Lao, 1997, 2000; Lao et al., 1998).

Table 19.5. Spontaneous serial methods.

Type of method

Test method

References3

Comments

Period-to-period

Numerical increase Doubling

Normal approximation Conditional binomial Normal approximation

Yate's correction Normal approximation (log transformed data) Normal approximation (square root transformed data) Chi-squared with continuity correction

Normal approximation with lag time adjustment

Long history T-test

Score standardized to past experience compared to an arbitrary threshold (M-statistic)

Negative binomial exact test

Binomial-Poisson exact test

Zero-truncated Poisson exact test

Norwood and Sampson (1988) Norwood and Sampson (1988)

Tsong (1992)

Tsong (1992)

Tsong (1992)

Hillson et al. (1998) Finney (1974)

Mandel et al. (1976)

Lao (2000)

Sometimes referred to as safety ''shift tables''

Assumes uniform product usage/reporting

1985 FDA ''arithmetic'' method

1985 FDA ''Poisson'' method

High false alarm rate

Exact method, rare event assumption

Derived by Norwood and Sampson 1991 FDA method Best false alarm rate of those tested Variant of normal approximation method

Variant of normal approximation method

Variant of normal approximation method High false alarm rate

Also referred to as ''pairwise comparisons'' Proposed for medical device reports Assumes uniform product usage/reporting Intended for comparisons when periods are short

Adjusts for lag time between event occurrence and reporting

Patwary method

Comparison of current to past report proportions

Mandel method

Comparison of current to past mean report numbers (or proportions) Specified threshold based on severity of AE

Proposed for medical device reports Detects a cluster of reports during a specified period

Assumes uniform product usage/reporting Comparison of current to all report proportions

Proposed for medical device reports Detects a cluster of reports during a specified period

Assumes uniform product usage/reporting Comparison of current to all report proportions

Proposed for medical device reports Detects a cluster of reports during a specified period

Assumes uniform product usage/reporting Comparison of current to all report proportions

(continued)

Table 19.5. Continued.

Type of method Test method References3 Comments

Trend

Cumulative sum

Chi-squared

Exact probability for specified number of increases Linear trend threshold method of Mandel Centre-batch matrix for trend

Exact (or, if appropriate, asymptotic) test based on distributions derived from a centre-batch matrix

Cox-Stuart non-parametric

Graphical smoothing techniques

Modified one-sided numerical cumulative sum test (NCST)

Finney (1974)

Finney (1974)

Mandel et al. (1976)

Mandel et al. (1976)

Levine et al. (1977)

Lao (2000)

Moussa (1978)

Comparison of current to >2 past proportions

Nonparametric comparison of current to >2 past proportions

Assumes uniform product usage/reporting

Comparison of geographically stratified, report means or proportions over sequential batches Uses geographical dispersion of data sources to reduce false positives Intended for finding ''latent'' signals (i.e. detectable centrally but not by peripheral contributors)

Uses geographically dispersed trend data Assumes uniform product usage/reporting Proposed for medical device reports Detects a gradual trend in reports Proper interpretation is based on knowledge of product usage Proposed for medical device reports Use graphical smoothing to visually present reporting trends Assumes uniform product usage/reporting

Devised by WHO

Specified threshold based on severity of AE

a Indicates primary footnote reference for that method.

Reproduced from Clark et al. (2001), Epidemiologic Reviews 23(2), 191 -210, Table 2, with permission from Oxford University Press.

The latter two strategies take geoclustering into account either by eliminating it from the calculation or by incorporating its effects into the assumed probability distribution.

A number of product-AE outbreak investigations have been published that arose from reports that were clustered over time and space (Centers for Disease Control, 1984, 1989a, 1989b; Martone et al., 1986; Jolson et al., 1992; Bennett et al., 1995). Consequently, the safety professional may find it desirable to design a method that screens product-specific spontaneous AE reports for the presence of temporospatial clusters. Temporospatial clustering has been seen with lot-associated and other product defects (Centers for Disease Control, 1984, 1989a; Martone et al., 1986; Jolson et al., 1992), with localized patterns of use/misuse (Bennett et al., 1995), and with events later determined to occur commonly under permissive conditions (Centers for Disease Control, 1989b).

Although first described by WHO researchers in the 1970s, only recently have inter-product identification methods become the subject of intense discussion (Bate et al., 1998a, 1998b; Amery, 1999; DuMouchel, 1999). Such procedures evaluate disproportions found in multiple product reporting systems in order to identify suspected product-AE relationships. Like other identification methods, inter-product identification comparisons yield product-AE pairs that are then further evaluated through the use of more precise techniques (Bate et al, 1998b; DuMouchel, 1999; Lindquist et al., 1999). An increase in the number of products in the computational universe for inter-product quantitative identification methods is associated logically with both improvement in estimates for expected reporting behavior and a reduction in sensitivity to underreporting. Thus, inter-product identification techniques appear to be most useful when applied to large AE reporting systems, such as those found at the regional, national, or international levels.

In the early 1970s Patwary and Finney described an inter-product identification method in which the proportion of all index-product reports containing a specific AE type was compared with the same proportion derived from a multiple product database. They referred to this strategy as "reaction proportion signalling" and applied it to WHO's spontaneous reporting system (Finney, 1974). In 1994, Amery published a similar procedure called the relative adverse drug experience profile, which was applied to a manufacturer's multi-product report database (Amery, 1994, 1999). With these methods, signal identification occurs when standard statistical tests indicate that the product-AE to product-all-AE proportion exceeds an expected value calculated from the reference reporting system under the null assumption of product-AE independence.

The Medicines Control Agency (MCA) has described a procedure called the proportionate reporting ratio (PRR) method. Like the two previous procedures, the PRR portion of this calculation is based on a comparison of the proportion of index product reports that contain a specific AE type with the same proportion from a multi-product universe (Evans, 2000). The PRR is then defined as for reaction proportion signalling (see above). However, unlike the above methods, the PRR methodology selects suspect product-AE pairs by applying combination criteria derived from general experience, such as a PRR > 2 + a chi-square score > 4, in the presence of at least two product-AE reports (Evans, 2000).

Two Bayesian data mining procedures have been published that extend inter-product quantitative identification to M x N tables (Bate et al., 1998a; 1998b; Lindquist et al., 1999; Amery, 1999; DuMouchel, 1999), where M refers to a large set of AEs, and N to a large set of monitored products (see Table 19.6). The first of these, the Bayesian confidence propagation neural network method, was proposed by Bate and colleagues from the WHO (Bate et al., 1998a); the second was developed by DuMouchel with reference to the FDA's spontaneous reporting system (DuMou-chel, 1999). A stepwise approach to the development of Bayesian data mining techniques has been described (Louis and Shen, 1999) which involves: (1) the construction of a large Mx N table, where cells contain counts of the 1st through Mth AE for the 1st through Nth product; (2) the creation of a null model that calculates expected counts for each cell; (3) derivation of a statistic that measures deviation beyond the expected value in each cell; and (4) quantification of the relationship between observed and expected counts in each cell to assist in selecting or evaluating product-AE pairs.

The procedure of Bate and colleagues has been carried out for specific product-AE pairs at multiple time points, thereby incorporating interval changes into a time scan. This repeated comparison technique has been proposed as an enhancement of signal detection, since an upward change in the measurement statistic over time implies that increasing "awareness" of a product-AE pair has developed within the reporting system as time progresses. In contrast, DuMouchel's method employs time stratification in the overall model to adjust for secular reporting trends. Its aim is to use all accumulated evidence in the reporting system to rank order suspect product-AE pairs. Both the Bate and DuMouchel methods explicitly recognize that Bayesian data mining is an identification process only, and is not meaningful unless subsequent evaluation takes place.

Table 19.6. Comparison of two Bayesian data mining procedures used in spontaneous report signalling.

Bayesian Confidence Propagation

Neural Network (BCPNN) methoda DuMouchel methodb

Bayesian Confidence Propagation

Neural Network (BCPNN) methoda DuMouchel methodb

Table 19.6. Comparison of two Bayesian data mining procedures used in spontaneous report signalling.

Product - AE |
* |
Non-stratified |
* |
Stratified by (at least) gender and year of |

stratifications |

Was this article helpful?

How To Beat Drugs And Be On Your Way To Full Recovery. In this book, you will learn all about: Background Info On Drugs, Psychological Treatments Statistics, Rehab, Hypnosis and Much MORE.

## Post a comment