## Loading required package: psych
As a first step into the analysis of the data obtained via the Survey Gizmo (n = 42) online platform, we needed to clean up the data. To achieve this, I followed a number of steps:
With this clean database, I ran some statistical analyses to find interesting trends.
Fleiss' Kappa assesses the reliability of agreement between a set number of raters. In our case, the raters were exposed to a fraction of the total amount of pairs, hence our incomplete database. I wrote fleiss.EDM.R to pick the minimum number of ratings at random for all pairs, and get the Kappa statistic. Since it is random, it is performed as many times as needed by entering the number of iterations. The function outputs the descriptive statistics of the list made from the Kappa value of each of the iterations.
# The current raw dataset was loaded as "rtim"
# The variable "conf.1.rm" stands for "confidence level 1: remove"
# The variable "polarize" changes the data from 4-point base to 2-point base.
fexample <- fleiss.EDM(data=rtim, conf.1.rm=FALSE, polarize=FALSE, iterations=200)
## Loading required package: irr
## Loading required package: lpSolve
## vars n mean sd median trimmed mad min max range
## X0.144861983091651 1 200 0.15 0.01 0.15 0.15 0.01 0.12 0.19 0.07
## skew kurtosis se
## X0.144861983091651 0.14 -0.32 0
Here we can observe how out of the 200 iterations, we obtained a mean of 0.15, a minimum of 0.12, and a maximum of 0.19.
# The current raw dataset was loaded as "rtim"
# The variable "conf.1.rm" stands for "confidence level 1: remove"
# The variable "polarize" changes the data from 4-point base to 2-point base.
fexample <- fleiss.EDM(data=rtim, conf.1.rm=FALSE, polarize=TRUE, iterations=200)
## vars n mean sd median trimmed mad min max range
## X0.129523009315755 1 200 0.13 0.01 0.13 0.13 0.01 0.09 0.17 0.08
## skew kurtosis se
## X0.129523009315755 -0.04 0.9 0
Here we can observe how out of the 200 iterations, we obtained a mean of 0.13, a minimum of 0.09, and a maximum of 0.17.
Another measure for reliability of agreement between (and also within) raters is the Interclass Correlation Coefficient (ICC). This measure stands apart from other agreement measures because of its ability to analyze exchangeable measurements. This means that it takes into account systematic differences among observer, thanks to its nature as a correlation. However, unlike most correlations, ICC works not only with pairs, but also with larger groups.
The function icc.EDM.R uses the same procedure as fleiss.EDM.R, where ratings are chosen at random, and the the computation of the ICC statistic is made any number of times as deemed necessary.
# The current raw dataset was loaded as "rtim"
# The variable "conf.1.rm" stands for "confidence level 1: remove"
# The variable "polarize" changes the data from 4-point base to 2-point base.
icc.EDM(data=rtim, conf.1.rm=FALSE, polarize=FALSE, iterations=200)
## type ICC F df1
## Length:200 Min. :0.129 Min. :2.19 Min. :189
## Class :character 1st Qu.:0.148 1st Qu.:2.39 1st Qu.:189
## Mode :character Median :0.156 Median :2.48 Median :189
## Mean :0.157 Mean :2.49 Mean :189
## 3rd Qu.:0.165 3rd Qu.:2.59 3rd Qu.:189
## Max. :0.189 Max. :2.86 Max. :189
## df2 p lower bound upper bound
## Min. :1330 Min. :0.00e+00 Min. :0.0889 Min. :0.179
## 1st Qu.:1330 1st Qu.:0.00e+00 1st Qu.:0.1057 1st Qu.:0.200
## Median :1330 Median :0.00e+00 Median :0.1124 Median :0.208
## Mean :1330 Mean :1.78e-17 Mean :0.1132 Mean :0.209
## 3rd Qu.:1330 3rd Qu.:0.00e+00 3rd Qu.:0.1211 3rd Qu.:0.219
## Max. :1330 Max. :1.78e-15 Max. :0.1421 Max. :0.244
# The current raw dataset was loaded as "rtim"
# The variable "conf.1.rm" stands for "confidence level 1: remove"
# The variable "polarize" changes the data from 4-point base to 2-point base.
icc.EDM(data=rtim, conf.1.rm=FALSE, polarize=TRUE, iterations=200)
## type ICC F df1
## Length:200 Min. :0.106 Min. :1.95 Min. :189
## Class :character 1st Qu.:0.126 1st Qu.:2.16 1st Qu.:189
## Mode :character Median :0.134 Median :2.24 Median :189
## Mean :0.135 Mean :2.25 Mean :189
## 3rd Qu.:0.142 3rd Qu.:2.33 3rd Qu.:189
## Max. :0.163 Max. :2.56 Max. :189
## df2 p lower bound upper bound
## Min. :1330 Min. :0.0e+00 Min. :0.0679 Min. :0.152
## 1st Qu.:1330 1st Qu.:0.0e+00 1st Qu.:0.0862 1st Qu.:0.175
## Median :1330 Median :0.0e+00 Median :0.0932 Median :0.184
## Mean :1330 Mean :5.5e-13 Mean :0.0935 Mean :0.185
## 3rd Qu.:1330 3rd Qu.:6.0e-15 3rd Qu.:0.1005 3rd Qu.:0.193
## Max. :1330 Max. :2.0e-11 Max. :0.1187 Max. :0.216
Another part of the project was to investigate if subjects could rate the same segment pairs consistently. We asked one participant to rate the same 18 segments six times. The analysis was made with both 4-point and 2-point scales.
timbreWPC <- read.csv("timbreWPC.csv")
kappam.fleiss(ratings=t(timbreWPC[,seq(22, 57, 2)]))
## Fleiss' Kappa for m Raters
##
## Subjects = 18
## Raters = 6
## Kappa = 0.544
##
## z = 14.4
## p-value = 0
icc(ratings=t(timbreWPC[,seq(22, 57, 2)]))
## Single Score Intraclass Correlation
##
## Model: oneway
## Type : consistency
##
## Subjects = 18
## Raters = 6
## ICC(1) = 0.792
##
## F-Test, H0: r0 = 0 ; H1: r0 > 0
## F(17,90) = 23.8 , p = 3.88e-26
##
## 95%-Confidence Interval for ICC Population Values:
## 0.654 < ICC < 0.901
timbreWPC[timbreWPC == 2] <- 1 ; timbreWPC[timbreWPC == 3 | timbreWPC == 4] <- 2
kappam.fleiss(ratings=t(timbreWPC[,seq(22, 57, 2)]))
## Fleiss' Kappa for m Raters
##
## Subjects = 18
## Raters = 6
## Kappa = 0.643
##
## z = 10.6
## p-value = 0
icc(ratings=t(timbreWPC[,seq(22, 57, 2)]))
## Single Score Intraclass Correlation
##
## Model: oneway
## Type : consistency
##
## Subjects = 18
## Raters = 6
## ICC(1) = 0.657
##
## F-Test, H0: r0 = 0 ; H1: r0 > 0
## F(17,90) = 12.5 , p = 5.56e-17
##
## 95%-Confidence Interval for ICC Population Values:
## 0.477 < ICC < 0.824
For the rhythm experiment, Thomas has already analyzed the data with Fleiss' Kappa. Here I will use the ICC function to gain some new insights on it.
# The current raw dataset was loaded as "rrhy"
# The variable "conf.1.rm" stands for "confidence level 1: remove"
# The variable "polarize" changes the data from 4-point base to 2-point base.
icc.EDM(data=rrhy, conf.1.rm=FALSE, polarize=FALSE, iterations=200)
## type ICC F df1
## Length:200 Min. :0.252 Min. :4.37 Min. :189
## Class :character 1st Qu.:0.280 1st Qu.:4.88 1st Qu.:189
## Mode :character Median :0.294 Median :5.16 Median :189
## Mean :0.294 Mean :5.18 Mean :189
## 3rd Qu.:0.307 3rd Qu.:5.43 3rd Qu.:189
## Max. :0.343 Max. :6.23 Max. :189
## df2 p lower bound upper bound
## Min. :1710 Min. :0 Min. :0.204 Min. :0.308
## 1st Qu.:1710 1st Qu.:0 1st Qu.:0.230 1st Qu.:0.338
## Median :1710 Median :0 Median :0.243 Median :0.353
## Mean :1710 Mean :0 Mean :0.243 Mean :0.353
## 3rd Qu.:1710 3rd Qu.:0 3rd Qu.:0.255 3rd Qu.:0.366
## Max. :1710 Max. :0 Max. :0.290 Max. :0.404
# The current raw dataset was loaded as "rrhy"
# The variable "conf.1.rm" stands for "confidence level 1: remove"
# The variable "polarize" changes the data from 4-point base to 2-point base.
icc.EDM(data=rrhy, conf.1.rm=FALSE, polarize=TRUE, iterations=200)
## type ICC F df1
## Length:200 Min. :0.252 Min. :4.37 Min. :189
## Class :character 1st Qu.:0.285 1st Qu.:4.98 1st Qu.:189
## Mode :character Median :0.297 Median :5.23 Median :189
## Mean :0.298 Mean :5.27 Mean :189
## 3rd Qu.:0.311 3rd Qu.:5.51 3rd Qu.:189
## Max. :0.347 Max. :6.32 Max. :189
## df2 p lower bound upper bound
## Min. :1710 Min. :0 Min. :0.204 Min. :0.308
## 1st Qu.:1710 1st Qu.:0 1st Qu.:0.234 1st Qu.:0.343
## Median :1710 Median :0 Median :0.246 Median :0.356
## Mean :1710 Mean :0 Mean :0.247 Mean :0.357
## 3rd Qu.:1710 3rd Qu.:0 3rd Qu.:0.259 3rd Qu.:0.370
## Max. :1710 Max. :0 Max. :0.294 Max. :0.408
As we can see, rhythm similarity shows a more promising panorama, with ICC values around 0.30, meaning a positive correlation of the data.