Interrater reliability (Kappa)
Interrater reliability is a measure used to examine the agreement between two people (raters/observers) on the assignment of categories of a categorical variable. It is an important measure in determining how well an implementation of some coding or measurement system works.
A statistical measure of interrater reliability is Cohen’s Kappa which ranges generally from 0 to 1.0 (although negative numbers are possible) where large numbers mean better reliability, values near or less than zero suggest that agreement is attributable to chance alone.
Example Interrater reliability analysis
Using an example from Fleiss (1981, p 213), suppose you have 100 subjects whose diagnosis is rated by two raters on a scale that rates the subject’s disorder as being either psychological, neurological, or organic. The data are given below: (KAPPA.SAV)
RATER A | ||||
Psychological | Neurological | Organic | ||
RATER B | Psychological | 75 | 1 | 4 |
Neurological | 5 | 4 | 1 | |
Organic | 0 | 0 | 10 | |
The data set KAPPA.SAV contains variables, Rater_A, Rater_B and Count. The figure below shows the data file in count (summarized) form.
To analyze this data follow these steps:
1. Open the file KAPPA.SAV. Before performing the analysis on this summarized data, you must tell SPSS that the Count variable is a “weighted” variable. Select Data/Weight Cases...and select the “weight cases by” option with Count as the Frequency variable
2. Select Analyze/Descriptive Statistics/Crosstabs.
3. Select Rater A as Row, Rater B as Col.
4. Click on the Statistics button, select Kappa and Continue.
5. Click OK to display the results for the Kappa test shown here:
The results of the interrater analysis are Kappa = 0.676 with p < 0.001. This measure of agreement, while statistically significant, is only marginally convincing. As a rule of thumb values of Kappa from 0.40 to 0.59 are considered moderate, 0.60 to 0.79 substantial, and 0.80 outstanding (Landis & Koch, 1977). Most statisticians prefer for Kappa values to be at least 0.6 and most often higher than 0.7 before claiming a good level of agreement. Although not displayed in the output, you can find a 95 % confidence interval using the generic formula for 95% confidence intervals:
Estimate ± 1.96SE
Using this formula and the results in the table an approximate 95% confidence interval on Kappa is (0.504, 0.848). Some statisticians prefer the use of a weighted Kappa, particularly if the categories are ordered. The weighted Kappa allows “close” ratings to not simply be counted as “misses.” However, SPSS does not calculate weighted Kappas.
A more complete list of how Kappa might be interpreted (Landis & Koch, 1977) is given in the following table
| Kappa | Interpretation |
|---|---|
< 0 | Poor agreement |
0.0 – 0.20 | Slight agreement |
0.21 – 0.40 | Fair agreement |
0.41 – 0.60 | Moderate agreement |
0.61 – 0.80 | Substantial agreement |
0.81 – 1.00 | Almost perfect agreement |
Reporting the results of an interrater reliability analysis
The following illustrate how you might report this interrater analysis in a publication format.
Narrative for the methods section:
“An interrater reliability analysis using the Kappa statistic was performed to determine consistency among raters.”
Narrative for the results section:
“The interrater reliability for the raters was found to be Kappa = 0.68 (p <.0.001), 95% CI (0.504, 0.848). ”
Reference
Landis, J. R., Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics 33:159-174.
source:http://www.stattutorials.com/SPSS/TUTORIAL-SPSS-Interrater-Reliability-Kappa.htm
No comments:
Post a Comment