Poor Agreement between Clinician Response Ratings and Calculated Response Measures in Patients with Chronic Graft-versus-Host Disease.

Publication Type:

Journal Article


Biology of blood and marrow transplantation : journal of the American Society for Blood and Marrow Transplantation, Volume 18, Issue 11, p.1649-55 (2012)


2012, Center-Authored Paper, Clinical Research Division, July 2012, Research Trials Office Core Facility - Biostatistics Service, Shared Resources


In 2005, a National Institutes of Health consensus conference was held to refine methods for research in patients with chronic graft-versus-host disease, including proposed objective response measures and a provisional algorithm for calculating organ-specific and overall response. In this study, we used weighted kappa statistics to evaluate the level of agreement between clinician response ratings and calculated response categories in patients with chronic graft-versus-host disease. The study included 290 patients who had paired enrollment and follow-up visits. Based on a set of objective measures, 37% of the patients had an overall complete or partial response, whereas clinicians reported an overall complete or partial response rate of 71% (slight to fair agreement, weighted kappa 0.20). Agreement rates between calculated organ-specific responses and clinician-reported changes in skin, mouth, and eyes were fair to moderate (weighted kappa, 0.28-0.54). We conclude that for both overall and organ-specific comparisons, clinician response ratings did not agree well with calculated response categories. Possible reasons for this discrepancy include a high clinical sensitivity for detecting response, a clinical predisposition to recognize selective improvements as overall response, the large change in objective measures proposed to define response, and the high incidence of progressive disease based on new manifestations. Conclusions from prior literature reporting high overall response rates based on clinician judgment would not be supported if the provisional algorithm had been applied to calculate response. Our analysis also highlights the need to define an overall response measure that incorporates both patient-reported and objective measures and accurately reflects the outcome in patients with a mixed response in which one organ or site improves, whereas another shows new involvement.