Intercoder Reliability in Content Analysis

Simstat help screen

Inter-raters agreement statistics

Inter-rater agreement measures are used to assess the concordance in observed ratings of two different judges at the same point in time. Such measures can also be used to assess the reliability of the ratings of a single judge at different points in time. The simplest measure of agreement for nominal level variables is the proportion of concordant ratings out of the total number of ratings made. Unfortunately, this measure often yields spuriously high values because it does not take into account chance agreements that occur from guessing. Several adjustment techniques have been proposed in the literature to correct for the chance factor, three of which are available in the SIMSTAT program. The following are the assumptions made by each of these correction techniques: Free marginal adjustment assumes either that all categories on a given scale have equal probability of being observed, or that the judges haven't based their decisions on information about the distribution for their ratings. Scott's pi adjustment does not assume that all categories have equal probability of being observed, but does assume that the distributions of the categories observed by the judges are equal. Cohen's kappa adjustment does not assume that all categories have equal probability of being observed nor that the distribution of the categories is equal for the two judges. It does, however, in computing the chance factor, take into account the differential tendencies of the judges.

SIMSTAT also offers three adjustments for ordinal level variables. These are similar to the previous measures except that they also take into account the ordinal nature of the scales by adjusting the weights assigned to various levels of agreement. They apply the same tree model of chance agreement used in the previous measures of nominal data. Free marginal adjustment for ordinal level variables also assumes that all categories on a given scale have equal probability of being observed. Krippendorf's R-bar adjustment is the ordinal extension of Scott's pi and assumes that the distributions of the categories are equal for the two sets of ratings. Krippendorf's r adjustment is the ordinal extension of Cohen's Kappa in that it adjusts for the differential tendencies of the judges in the computation of the chance factor.

NOTE: The expected frequencies that are displayed in the inter-rater agreement tables do not necessarily correspond to the expected frequencies used in the above correction techniques. Rather, they correspond to the values used in the computation of chi-square statistics used in contingency tables. However, those values coincide with those used in the computation of Cohen's Kappa and Krippendorf's r.