Empirical research associated with ICD10

The classification in Chapter V (F) was based on the practical clinical experience of many researchers worldwide, but goodness of fit, reliability, and validity had to be proved. Therefore the Mental Health Division of WHO initiated several multicentre field trials of the classification, diagnosis, and assessment of mental and behavioural disorders which were carried out by the WHO reference and training centres. Ten centres, located in China, Denmark, Egypt, Germany, India, Japan, Luxembourg, Spain, the United Kingdom, and the United States, took part so that there was a wide range of language and geographical area. Several papers describing these studies have been published.(5 58)

The first major studies were of the Clinical Descriptions and Diagnostic Guidelines, involving 112 centres in 39 countries, and the Diagnostic Criteria for Research, involving 151 centres in 32 countries. Cases were diagnosed by several clinicians in case conferences and using videotapes. The inter-rater reliability was much higher for the diagnostic criteria (0.8-1.0 k) than for the diagnostic guidelines (0.5 k). (Kappa (k) values reflect the chance-corrected inter-rater reliability, with high correspondence for k > 0.81 and low correspondence for k < 0.20. (5.§>) These two studies, which formed the largest international psychiatric scientific enterprise ever conducted, showed that such large studies are possible, but that many differences and uncertainties have to be considered if a study is undertaken in so many different centres with such a large number of raters.

Another study initiated by WHO concerned the multiaxial classification. In the first part of this study 10 cases provided by WHO were rated by many centres worldwide. As the next step, the inter-rater reliability and validity of the axes was examined in cases from the centres. The inter-rater reliability of PHC has also been studied in an international research programme.

Although the inter-rater reliability of most diagnoses has been measured and is satisfactory, the same cannot be said for any measure of validity—fitting well with clinical experience (face validity), predicting the outcome (predictive validity), or satisfying associations with independent variables (construct validity).


The separate American development of DSM began about 20 years ago when operationalized diagnosis was not offered by WHO. In 1980, shortly after the publication of ICD-9, the American Psychiatric Association published DSM-III, followed 7 years later by the revision DSM-IIIR (60) and, in 1994, by DSM-IV.(32

The introduction of DSM-III inaugurated the use of explicit and operationally defined diagnostic criteria, multiaxiality, and, as far as possible, a classification based on neutrality rather than aetiological theories. This version of DSM was revised in the 1980s and many changes were proposed. Finally, a systematic revision process was performed for DSM-IV with reviews of the world literature and a final consensus process. WHO was involved in this process, and before it was finished representatives of the American Psychiatric Association and WHO met in several sessions. A consensus was reached on many questions, although differences persisted on others.

