The US Medicaid Program is a health insurance system created in 1965 to provide access to medical care for economically disadvantaged and disabled persons. It consists of a series of 54 programs, supported jointly by federal and state funds, and managed independently by states or jurisdictions. The Welfare Reform Act of 1996 defined eligible individuals to include children younger than six; pregnant women whose family income is 133% of the federal poverty level or lower; children younger than 19 in families with incomes at or below the federal poverty level; persons eligible for the Supplemental Security Income Program (SSI) because they are aged, blind, or disabled and have limited income; and other specified needy groups (Carson et al., 2000). Compared with the overall US population, the Medicaid population has a disproportionate number of children, females, and non-whites. Eligibility for these specific programs can vary by year, based on employment and income. Tennessee, for example, has lost a quarter of its enrollment in one year, and retained only half after five years, due to losses in eligibility and deaths (Ray and Griffin, 1989).

Although Medicaid recipient services vary by state, the federal government requires that minimum services include physician and nurse-midwife services, home health care, care in skilled nursing facilities, inpatient and outpatient hospital care, rural health clinics, independent laboratory and radiology services, early and periodic screening of children, family planning services, and transportation to and from medical services. Virtually all states provide for reimbursement of prescribed drugs; the list of drugs eligible for reimbursement also varies by state.

In order to administer this large health care program, the US government developed the Medicaid Management Information System, which laid out a set of specifications for the processing of computerized claims and management information. Minimum standards were established for its six components: recipient; provider; claims processing; reference files; surveillance and utilization review; and management and administrative reporting (Carson et al., 2000).

The claims processing files include information on age, gender, state, inpatient and outpatient diagnoses (using the coding system of the International Classification of Disease Ninth Revision—Clinical Modification (ICD-9-CM)), outpatient drugs (using National Drug Code (NDC)), procedures, such as laboratory and radiographic, and information on deaths, which is available in some states. Pharmacy data include records of all outpatient and nursing home prescriptions filled at the pharmacy for drugs, equipment, and supplies that are on the Medicaid formulary. Each record contains the date filled, the NDC code, quantity of the drug dispensed, the number of days the supply is anticipated to last, and pharmacy and prescribing practitioner identifiers. Most drugs are dispensed for no longer than a 30-day period, although some states permit the dispensing of a larger supply for chronically used drugs.

Diagnosis data contain the records of outpatient claims for care provided, and of hospitalizations, including hospital identifier and admission and discharge dates. Although primary and secondary diagnoses and surgical procedures are available in these files only for non-Medicare recipients, they can be obtained from Medicare files for enrollees over 65 for whom Medicare becomes the primary payer.

There is a lag period of up to 2 months between the time a drug is dispensed and the time it appears on the database; the lag time for diagnoses may be up to 3 months.

Historically, considerable research using Medicaid data has been performed using COMPASS, which stands for the Computerized On-Line Medical Pharmaceutical Analysis and Surveillance System. No longer available as COMPASS, these Medicaid Databases are now owned by Protocare Sciences of Herndon, Virginia, and are now called the Protocare Sciences Proprietary Medicaid Database. The older COMPASS database included billing data from Medicaid patients in 11 states, with a total Medicaid population of over 8 million patients (Strom et al., 1985; Morse et al., 1986). Currently, new data are only available for approximately 1.25 million patients from Ohio. Importantly, although the data available for each patient are identical between the two databases, access to medical records is no longer an option, due to increasing societal concern about confidentiality.

The State of Tennessee has contracted with Vanderbilt School of Medicine to collect the claims files of the state's Medicaid program, analyze the quality of the claims, and perform research of interest to the state. Twenty-seven percent of the state's population, or 1.4 million people, were enrolled as of 1997, of whom 11% were 65 years of age and older. Almost half of all births in Tennessee were to women enrolled in Tennessee Medicaid (Ray, unpublished data, 1999).

The Tennessee Medicaid program changed in 1994 from a fee-for-service program to a capitated model consisting of 12 managed care organizations, each with its own formulary that restricts the list of reimbursable drugs (Ruther et al., 1986;

Mirvis et al., 1995). Although the quality of the database has generally been maintained, effort is ongoing to check the completeness and accuracy of the data since financial incentives for submitting data no longer exist.

The Tennessee system has developed linkages to a number of other database files, including Medicare files, vital statistics files, links of files of mothers to children, public health clinic files, motor vehicle files, and the Tennessee cancer registry, so that research on Medicaid enrollees can be expanded. Medical record abstracting has been possible, with permission of the Tennessee Medicaid Program and its constituent providers.

Medicaid databases contain an over-representation of special populations, with greater numbers of pregnant women, the elderly, nursing home residents, and African Americans than would be expected in a representative sample of the population. These are the populations that are often excluded or under-represented in pre-marketing trials. Although considered a disadvantage when a representative population is required, for analytic studies it controls for variation in socio-economic status.

However, as a claims database (similar to most of the other databases described), information is lacking on some variables often needed to control for confounding, such as smoking, environmental exposures, illicit drug use, alcohol use, occupation, family history, and use of over-the-counter drugs.

Experience suggests that medical records must be obtained in many studies to confirm the diagnosis, to characterize the severity of the disease, and to obtain information on potential confounding variables not found in the computer data. Some states never developed a way to access their medical records, and outpatient records have always been difficult to obtain and often do not have the information necessary for a specific study. Regardless, however, medical records are no longer accessible in most of these databases, because of concerns about confidentiality. Studies where primary record confirmation is less important are those which focus on drug-to-drug relationships, or studies which can use drugs or procedures as markers of diagnoses.

