The kappa coefficient is a statistic that measures the agreement between raters. It is commonly used in mental health and psychosocial studies. The kappa coefficient can be used for scales with more than two categories, and its range of possible values is from −1 to 1.
Cohen’s kappa is a quantitative measure of reliability for two raters that are rating the same thing, correcting for how often the raters may agree by chance.
In other words, a model will have high kappa score if there is a big differefence between the accuracy and null rate error.
kappa = (Observed agreement - Expected Agreement) / ( 1 - Expected agreement )