Many people have suggested that the regulation should not affect epidemiologic and outcomes research because it generally does not require access to ''individually identifiable" information. The statute says that "individually identifiable health information'' is any information, including demographic information collected from an individual, that (A) is created or received by a health care provider, health plan, employer, or health care clearinghouse; and (B) relates to the past, present, or future physical or mental health or condition of an individual, the provision of health care to an individual, or the past, present, or future payment for the provision of health care to an individual, and (i) identifies the individual, or (ii) with respect to which there is a reasonable basis to believe that the information can be used to identify the individual.32
Under the statute, information that does not fall within the category to be considered ''individually identifiable" is not subject to the statutory, or regulatory, requirements.
Congress, the US Department of Health and Human Services Regulatory, privacy advocates, the research community, and others have wrestled with the definition of what characteristics of data create a ''reasonable basis to believe'' that it could be used to identify the individual. What would be a reasonable standard? On one extreme are researchers and public health advocates who may argue that all data should be considered exempt if the key ''direct identifiers'' are removed. They argue that the importance of research using these data outweighs the low probability that these data might be used (or misused) to re-identify individual patients. On the other end of the spectrum are those who are concerned that any database, even
31 Social Security Act ("SSA") § 1177, 42 U.S.C. 1320d-2. 32SSA § 1171(6), 42 U.S.C. 1320d(6).
with the complete removal of identifiers, could potentially be overlain with other data sources, and through probability matching on certain information fields, could be used to re-identify individuals. Many advocates have maintained that even if the researcher has no interest in knowing the patients' identities, no intent to link the files to other files for this purpose, and establishes physical and procedural safeguards to make it difficult or impossible for employees to do so, that the mere possibility that files could theoretically be linked to re-identify patients is a privacy risk to society that should be avoided.
For its part, in implementing this definition, the Department of Health and Human Services created an extremely high standard for information to be considered as falling outside the category of individually identifiable health information. It specifically defined such information as ''de-identified''. It chose to use statistical probability—as determined by a statistician—to establish the permissible practices that can be used to establish a "reasonable basis to believe''. The agency's approach is firmly grounded in the art and science of database manipulation. It does not ask whether a reasonable person looking at the data fields on an individual record could discern who the person is or how to contact him or her. It does not take into consideration who will use the data, for what purpose, or how the data are protected from being used to identify individuals. Rather, it asks whether the data fields that appear in a data set also appear in databases that are generally available and which therefore could be used by someone who is attempting to identify data subjects. Examples of such generally available databases include state drivers license data, voter registration lists, the telephone book, birth records, etc.
The regulation offers a ''safe harbor'' method in which the covered entity must (a) have no actual knowledge that the information could be used alone or in combination with other information to identify participants and (b) all of the following must be removed from the data:
Was this article helpful?