This document will provide an overview of methodological issues related to the evaluation of ERREURS, including aspects related to the design of studies, the selection and calculation of appropriate IRR statistics, and the interpretation and reporting of results. Examples of calculations include SPSS and R syntaxes for The Calculation of Cohens Kappa for Nominal Variables and Intraclass Correlations (CCI) for ordination, interval and Ratio Variables. Although it is outside the scope of the current document to offer a comprehensive review of the many available IRR statistics, references to other IRR statistics tailored to the designs not addressed in this tutorial are provided. As with Cohens Kappa, sPSS and R require that the data be structured with separate variables for each coder of interest, as for a variable that presents the empathy assessments in Table 5. If several variables were evaluated for each subject, each variable for each coder would be entered in a new column in Table 5, and CCIs would be calculated in separate analyses for each variable. Measurement errors (E) prevent direct observation of a subject`s actual score and can be introduced by several factors. For example, a measurement error may occur due to the vagueness, vagueness or poor scale of the elements within an instrument (i.e. issues of internal coherence). the instability of the measuring device to the extent of the same subject over time (i.e.

the reliability issues of the repetition of the tests), and the instability of the measuring device during measurements between programmers (i.e. irr issues). Each of these issues can have a negative effect on reliability, and it is the latter that is at the heart of the current document`s concerns. The spSS and the R-pack require users to indicate a single or two-way model, an absolute type of match or consistency, as well as individual or medium units. The design of the hypothetical study provides information on the correct choice of ICC variants. Note that the SPSS, but not the R-irr package, allows a user to indicate random or mixed effects, the calculation and results are identical for random and mixed effects. For this hypothetical study, all subjects were evaluated by all coders, meaning that the researcher should probably use a two-way ICC model, because the design is completely cross-referenced and an average CCI unit of measurement, because the researcher is probably interested in the reliability of the average evaluations provided by all coders. The researcher is interested in assessing the degree of correspondence between the coder`s assessments, so that higher ratings of one coder corresponded to higher ratings of another coder, but not to the extent that the coders agreed on the absolute values of their ratings, which justifies a type of ICC consistency.

The coders were not randomly selected, so the researcher is interested in knowing how much the coders agreed on their assessments in the current study, but not to generalize these assessments to a larger population of coders, which justifies a mixed model. The data presented in Table 5 are in their final form and are not further processed, so these are the variables on which an analysis of the IRR should be performed.