THE DISTRIBUTED MODEL—THE HMO RESEARCH NETWORK CENTER FOR EDUCATION AND RESEARCH IN THERAPEUTICS
This group includes nine HMOs with a total population of approximately seven million, plus a coordinating center at the Channing Laboratory, a research unit of Harvard Medical School. Each site has a lead investigator; other HMO based investigators also participate. All centers participate in the development of policies, procedures, infrastructure and core studies. New studies are approved by a steering committee, after which centers decide individually whether to participate. Study leadership is distributed among investigators based in the different HMOs. Investigators at every site participate in protocol development, creation of work-plans, and writing of manuscripts. The coordinating center supports all studies, usually by leading the creation of study-specific analysis datasets.
The most important organizational principle is to extract data needed for each study from the HMOs' separate databases as the need for this information becomes clear, rather than to create a single merged dataset to support future, still unspecified, studies. This approach has both advantages and disadvantages. The advantages include the fact that it ensures that investigators and support staff who are knowledgeable about the individual HMOs and data systems use their expertise on an ongoing basis. This is important because administrative data systems are typically unique in a variety of ways. They may use locally modified coding conventions, they often contain discontinuities resulting from the use of different computer systems over time, and they often contain undocumented gaps or varying levels of detail that result from different contractual arrangements with selected providers or vendors. They also evolve rapidly, so that this information must be updated frequently.
Maintaining HMOs' data on their host systems also avoids the very large cost and effort required to build and sustain a merged database. This includes the work of extracting and refreshing very large datasets, converting these to a uniform format, creating stable person-level identifiers that protect the confidentiality of individuals, and reconciling and/or annotating anomalies for the entire dataset. This work is required even though most of the data is never used for a multicenter study.
An additional reason for not creating a single large dataset is that it allows each organization to control access to, and use of, its data at all times. To maximize individual centers' control over their information, we have adopted the principle of having each center provide as little data for each study as possible. A method for minimizing the amount of data required is discussed below.
The principal disadvantages of not creating a single merged dataset are the extra time required to assemble a project-specific dataset, and the effort's dependence on individual HMOs to maintain their data in accessible form. Some organizations archive data after several years in a manner that makes it difficult to use for research purposes.
Several types of data sharing can fit within this distributed model. For simple studies, it may be sufficient for each site to ascertain frequencies or rates that can be combined across sites. In essence, each site creates an agreed-upon set of tables, which are then combined. An example of such a study that included three of this center's sites, plus others, is a recent analysis of contraindicated dispensing of cisapride (Smalley et al., 2000). For more complex studies, we have found it preferable to create a pooled analysis dataset. Doing so is essential for conducting multivariate analyses. It also allows more straightforward creation of derived variables, e.g. time windows of drug exposures, or combinations of diagnosis codes, and it reduces the overall amount of effort on the part of investigators and staff at each site.
Capabilities Required to Support Distributed Multi-center Studies
Several kinds of capabilities are needed to support multi-center studies and to improve the interpret-ability of the results. Considerable effort is required to create mutually interpretable drug identifiers. Although all of the HMOs in our group use National Drug Codes (NDCs) to identify drugs in their dispensing files, there are important differences in their implementation of these codes. Some of these represent different formatting conventions, others are data entry errors, and some are the result of individual HMOs' creation of new codes for local use. Such codes are more common in older data files, but new codes may still be created; for instance, if a pharmacy repackages a bulk supply of medication into smaller units. For our first joint study, of elderly recipients of alendronate, involving approximately 120 000 person years, we identified approximately 20 000 unique formulary entries, 10% of which required manual coding of drug identity. This experience led us to avoid merging the entire drug exposure lists of all the sites, in favor of incremental additions required by individual studies.
An additional centralized function is maintenance of (NDC) lists that map to disease categories that are used to compute the chronic disease score (Von Korff et al., 1992; Putnam et al., 2002), a comorbidity index that predicts mortality, hospitalization, and total medical resource utilization. The chronic disease score uses pharmacy dispensing as a surrogate for various chronic diseases by assigning empirically derived weights to classes of drugs that have been dispensed during the prior year. Weights are also assigned for age and gender. We have found the chronic disease score to be a useful case-mix adjuster in multi-center epidemiologic studies. Most drug codes have not yet been assigned to chronic disease categories. Our first attempt to use it required manual assignment of several thousand drug codes to chronic disease score categories.
An important feature of efficient multi-center studies is the creation of computer code that can be used at each site to extract and manipulate data, assign unique, arbitrary (for confidentiality) study identification numbers, and format it so that it can be merged with information from other sites. While such programs must be modified to run on each system, they share the same core.
Typically, such code is developed at the coordinating center for each study. This approach has several advantages. It improves data quality, ensuring that algorithms are implemented in the same way at each site. It also reduces the amount of data that must be submitted to the data center, because it is possible to do more complex data manipulations at each site than would otherwise be desirable. In addition, this approach reduces the total amount of programmer effort, since only one person develops the core code. It improves the quality of the programs, since programmers at each site test the same programs, and it ensures consistent implementation of a protocol's logic at each site.
In the near future, the sites will begin to ascertain the completeness of their dispensing records by directly querying a sample of members about their use of prescription medications. This will allow understanding of the potential impact of "out-of-plan" pharmacy use on ascertainment of drug exposures by HMO members. Such exposures have usually been assumed to be negligible, especially for drugs whose cost is high relative to the required co-payment. Although this rationale is still valid, many new pharmacy benefit variations have been introduced, including higher co-payments for some drugs, and periodic out-of-pocket spending requirements before pharmacy benefits apply. Additionally, pharmacy data systems have no information about drugs dispensed by clinicians as samples, or drugs that some members with dual insurance coverage obtain through their "other" policy. For some studies, it is also important to estimate the amount of out of plan dispensing of inexpensive drugs. It can be important to know about some of these inexpensive drugs because they can be the subject of study; for instance, a characterization of overall antibiotic use, or because of their contribution to the chronic disease score. All of these can cause dispensing files to be incomplete.
Examples of Multi-center Studies Using the Distributed System
Our experience thus far leads us to believe that it is possible to perform multi-center epidemiologic studies reasonably efficiently. We have completed a cohort study to determine whether alendronate is associated with an increased risk of hospitalization for gastrointestinal perforation, ulcer or bleeding (Donahue et al., 2002). This study used pharmacy dispensing and enrolment data to identify eligible, exposed individuals and age, sex, HMO-matched unexposed individuals. Part of the eligibility determination included establishing that the individual had pharmacy benefits; there were more than 70 different pharmacy benefit plans in effect during the time covered by this study. We used ambulatory and inpatient claims to identify a second comparison cohort that had experienced a fracture (a surrogate for osteoporosis, the principal indication for alendronate). Hospital discharge diagnosis codes were used to screen for potential events of interest, then the full text records of these hospitalizations were reviewed to confirm the diagnoses. Approximately 80% of hospital records were retrievable for review. An additional "product" of this study was the determination of the sensitivity and specificity of individual International Classification of Diseases, ninth revision, Clinical Modification (ICD-9-CM) diagnosis codes for identification of these gastrointestinal events. These data also supported a case-control study of the relationship between fracture and prior exposure to statins (Chan et al., 2000).
In addition to studies of drug safety and effectiveness, we are planning descriptive studies of drug use (population based rates and indications for antibiotic use in pediatrics), of the impact of changing co-payment levels on the use of prescription drugs (focus on clinicians' prescribing for diabetes and on patients' adherence to prescribed regimens), of clinicians' adherence to prescribing guidelines (use of angiotensin converting enzyme inhibitors after hospitalization for congestive heart failure), and the impact of programs to influence prescriber and consumer behavior regarding prescription drugs. This ensemble of HMOs is also a promising venue for pharmacoeconomic studies, because cost information is generally available, and for pharmaco-genetic studies, because it is possible, with appropriate approval and oversight, to contact individuals with conditions of interest or with specified responses to specific medications. Although it is beyond the scope of this discussion, this center is also well positioned to work with delivery systems to disseminate information to their clinicians and their members about the appropriate use of medical therapies.
Multi-center studies like the ones described here require additional organizational and logistic capacity compared with single center studies. They are therefore only preferred for questions that cannot be addressed in a single delivery system. We believe there will be a substantial number of these situations for the foreseeable future. Although the efforts described here are a work in progress, we conclude that it is possible to create a durable multi-center organization that facilitates such work.
THE CENTRALIZED MODEL—THE VSD
A contrasting strategy for conducting multi-center studies is to create a large, centralized dataset that supports multiple studies of vaccine use and safety. This approach is used by the VSD, which is sponsored by the CDC to study rare adverse events following vaccination (Chen et al., 1997). From 1991 to 2000 the VSD included the Group Health Cooperative in Washington, the Northwest Kaiser Permanente in Oregon, and the Northern California Kaiser and Southern California Kaiser Permanente health programs. In the fall of 2000, Harvard Pilgrim Health Care and Kaiser Permanente of Colorado were added. All five current VSD members are also part of the HMO Research Network Center for Education and Research in Therapeutics.
The main advantage of the VSD database is the ability to address vaccine safety concerns relatively quickly as they arise. The population provides a large cohort of individuals in which key demographic, enrolment, vaccination and healthcare data from several different health plans are maintained in a standardized, edited database. Thus, data are readily available for designing and conducting studies, or for simply assessing whether a study is feasible.
Even in studies in which additional data collection (e.g. chart abstraction) is required, the centralized automated data provide efficiencies. The availability of automated diagnostic codes allows ready identification of potential cases, and provides a source of control selection for case-control studies. When a chart review is required, as often occurs in case-control studies, the amount of review can be minimized. Since the automated vaccination data are of high quality, frequently only cases need to have their charts abstracted to verify case status, but chart review for exposure (i.e. vaccination) is not necessary and thus chart review for controls is not required.
Capabilities Required to Support Centralized Multi-center Studies
Permanent data files maintained by the VSD include a unique study identification number, age and sex, vaccine records (including date and type of vaccination), diagnoses (including those assigned in hospitals, emergency departments, and outpatient visits), plus selected covariate information, e.g. census block codes. Originally, the VSD intended to rely on centrally maintained data to identify potential study subjects. For investigations of specific events or diseases, e.g. seizures, cases were identified from the automated data files maintained at the CDC. The medical records of potential cases were then reviewed according to the individual study protocol at each HMO by trained abstractors to verify case status, document the disease onset date, collect information on competing causes of illness, and gather covariate data. For many studies, however, it has not been practical to rely solely on the central automated datasets to screen for potential cases. In some instances, because of the rarity of the disease and lack of statistical power, it has been necessary to supplement the study sample with cases and controls from earlier years.
The creation and maintenance of high-quality data in a single centralized location comes at a cost. At each VSD site, the data manager and a team of programmers spend several months planning and creating data files. Considerable attention is needed to account for the constantly changing nature of HMO data systems, data collection procedures, and population (including changes in coverage plans). Once new coverage plans, data systems, or data dictionary changes are researched, new changes are incorporated into the previous year's data creation programs, and then the modified and updated files are tested using sample program runs. After these file modifications have been completed, at some HMOs more than 50 programs are run to identify the study cohort and enrolment intervals, extract utilization data and create the necessary data files. Discrepancy-check programs are then run, and discrepancies resolved when discovered (including occasionally entirely remaking some data files). Finally, a discrepancy summary report is created, and the files are transmitted electronically to a secure CDC site. This last step is time-consuming because of the number and size of the data files.
Examples of Multi-center Studies Using the Centralized System
As noted above, many studies can be conducted utilizing the centralized automated data files alone. The vaccination data, in particular, are continuously monitored and have been shown to be of high quality. So studies of vaccination coverage (e.g. after introduction of a new vaccine or vaccination schedule, or in special populations) can be conducted quickly and without additional data gathering. In some circumstances, health outcomes can also be reliably ascertained using just the automated data. This is the case for conditions in which well-established validated algorithms already exist for the condition, such as asthma, or in which broader categories of health conditions, such as acute respiratory infections, are being evaluated. The centralized database also provides a sampling frame for selecting controls in nested case-control studies.
Specific examples of studies completed using centrally maintained data include those of immunization levels among premature and low-birth-weight infants and risk factors for delayed up to date immunization status (Davis et al., 1999), the epidemiology of diarrheal disease among children enrolled in four west coast HMOs (Parashar et al., 1998), the impact of influenza on the rates of respiratory disease hospitalizations among young children (Izurieta et al, 2000), the rate of influenza vaccination in children with asthma in HMOs (Kramarz et al, 2000a), the impact of influenza vaccination exacerbations of asthma (Kramarz et al, 2000b), and the incidence of Kawasaki syndrome in west coast HMOs (Belay et al, 2000).
Examples of studies completed using expanded or specialized ad hoc datasets from VSD sites include the impact of the sequential inactivated polio vaccine/oral polio vaccine schedule on vaccination coverage levels in the United States (Davis et al., 2001), a comparison of adverse event rates after measles-mumps-rubella2 exposure (Davis et al., 1997), assessment of the risk of chronic arthropathy among women after rubella vaccination (Ray et al., 1997), and assessment of the risk of hospitalization because of aseptic meningitis after measles-mumps-rubella vaccination in one-to-two-year-old children (Black et al., 1997)
Was this article helpful?