Description Of Data Mining Methodology Used By The

The UMC's main purpose is to find novel drug safety signals: new information. From experience a principal argument has evolved in drug safety, that, if important signals are not to be missed, the first analysis of information should be free from prejudice and a priori thinking (Hand, 1999). Quantitative filtering of the data focuses clinical review on the most potentially important ADR combinations (Bate et al., 1998; Hand, 1999; Lindquist et al., 1999, 2000; Orre et al., 2000). Human intelligence and experience are able to operate better with a transparent filtering method in the generation of hypotheses.

The BCPNN is a feed-forward neural network where learning and inference are done using the principles of Bayes' law. For regular routine output we use it as a one-layer model (Lansner and Ekeberg, 1989), although it has been extended to a multilayer network (Holst, 1997). Such a multilayer network can be used in further investigations of combinations of several variables in the WHO database and has already been successfully applied to areas like diagnosis (Holst and Lansner, 1996), expert systems (Holst and Lans-ner, 1993) and data analysis in pulp and paper manufacturing (Orre and Lansner, 1996).

Estimates of precision (standard deviation) are provided for each point estimate of the information component (IC); thus both the point estimate of unexpectedness as well as the certainty associated with it can be examined. Despite the presence of missing data, the IC and its standard deviation can be calculated for any combination of variable values; additionally, the interpretation of the probability distributions is intuitive.

The network is transparent, in that it is easy to see what has been calculated and robust, and that valid, relevant results can still be generated despite missing data. This is advantageous as most reports in the database contain some empty fields. The results are reproducible, making validation and checking simple. The network is easy to train; it only takes one pass across the data, which makes it highly time efficient. A small proportion of all possible drug-adverse reaction combinations are actually non-zero in the database, thus use of a sparse matrix method makes searches through the database quick and efficient.

The neural network provides an efficient computational model for the analysis of large amounts of data and combinations of variables, whether real, discrete or binary. The efficiency is enhanced by the IC being the weight in the neural network. The neural network architecture allows the same framework to be used both for data analysis/data mining as well as prediction, as used for pattern recognition and classification. Bayesian statistics fits intuitively into the framework of a neural network approach as both build on the concept of adapting on the basis of new data. The method has also been extended to detect dependencies between several variables and is robust in handling missing data. Pattern recognition by the BCPNN does not depend upon any a priori hypothesis since an unsupervised learning approach is used. This is useful in new syndrome detection, finding age profiles of ADRs, determining at-risk groups and dose relationships, and can thus be used to find complex dependencies that have not necessarily been considered before. Naturally, changes in patterns may also be important (Hand, 1999).

The BCPNN methodology thus uses a neural network architecture to identify unexpectedly strong dependencies between variables (e.g. drugs and adverse reactions) within the WHO database, and how dependencies change after the addition of new data. The dependencies are selected using a measure of disproportionality called the information component (IC):

where px = the probability of a specific drug being listed on a case report; py = the probability of a specific ADR being listed on a case report; and pxy = the probability that a specific drug-adverse reaction combination is listed on a case report.

Thus the IC value is based on:

• the number of reports with the specific combination (cxy); and

Positive IC values indicate that the particular combination of variables is reported to the database more often than statistically expected from reports already in the database. The higher value of the IC, the more the combination stands out from the background.

From the distribution of the IC, expectation and variance values are calculated using Bayesian statistics. The standard deviation for each IC provides a measure of the robustness of the value. The higher the Cx, Cy and Cxy levels are, the narrower the confidence interval becomes. If a positive IC value increases over time and the confidence interval narrows, then this shows a likelihood of a positive quantitative association between the studied variables. The UMC, as the WHO Collaborating Centre for International Drug Monitoring, is responsible for the technical and scientific maintenance and development of the WHO International Drug Monitoring Programme. The Programme now has more than 60 member countries, annually contributing around 150 000 suspected ADR reports to the WHO database in Uppsala.

One of the main aims of the international pharmacovigilance programme is to identify early signals of safety problems related to medicines. To aid this, a new ADR signalling system has been provided for national monitoring centres and authorities, using the BCPNN. It complements the previous signal generation procedure that involved the examination of unwieldy, large amounts of sorted and tabulated material by an expert panel. An overview of the new signalling approach, including results from the first part of an evaluation including a comparison against another signalling system, has been published (Lindquist et al, 2000).

The new system is based using the BCPNN to scan incoming ADR reports and compare them statistically with what is already stored in the database.

A new quarterly output to national centres contains statistical information from the BCPNN scan. It also contains frequency counts for each drug and ADR listed, both individually and occurring together. The figures from the previous quarter are also included and the data are provided in computerised format.

Drug-adverse reaction combinations that are statistically significantly different from the background of reports ("associations") are sent to a panel of reviewers for evaluation and expert opinion. Within the WHO Programme a "signal" concerns "information regarding a possible relationship between a drug and an adverse event or interaction". As previously, signals of possible safety problems are circulated to all national centres participating in the international pharma-covigilance programme for consideration of public health implications.

Was this article helpful?

0 0
Drug Free Life

Drug Free Life

How To Beat Drugs And Be On Your Way To Full Recovery. In this book, you will learn all about: Background Info On Drugs, Psychological Treatments Statistics, Rehab, Hypnosis and Much MORE.

Get My Free Ebook

Post a comment