Analysis of rating values

RQ: Do domain experts differ in the rating values they assign?

Results:

Setup

This dataset contains one record per rating. Each rating contains attributes of both the user and the rating. We remove validation ratings that ask the user to rate the same concept pair twice.


options(width = 120)
ratings <- read.table("dat/ratings.tsv", header = TRUE, sep = "\t")
ratings <- ratings[ratings$r_condition != "validation", ]
ratings$r_id <- factor(ratings$r_id)
ratings$r_condition <- relevel(ratings$r_condition, ref = "mturk")

For now we are going to remove non-scholars who are not turkers. We don't have enough of them to report interesting effects:

ratings <- ratings[ratings$r_condition != "general", ]

We add an attribute (r_resid) that is the residual after controlling for question-specific effect and a flag indicating whether the rating is general common knowledge.

ratings$r_resid <- resid(lm(r_rating ~ r_id + u_email, data = ratings))
ratings$r_common <- factor(ifelse(ratings$r_field == "general", "general", "specific"))
ratings$r_common <- relevel(ratings$r_common, ref = "general")

The total number of ratings is 8063 from 110 users.

ratings <- ratings[ratings$r_rating > 0,];    # remove ratings for unknown phrases

The total number of defined ratings is 7947.

Do domain experts rate differently than non-experts?

We perform an anova on condition (turker / scholar-in / scholar-out). We control for the average rating for each question.

fit <- aov(formula = r_rating ~ r_condition + r_id, data = ratings)
summary(fit)
##               Df Sum Sq Mean Sq F value Pr(>F)    
## r_condition    2    390   194.8   218.6 <2e-16 ***
## r_id         199   6442    32.4    36.3 <2e-16 ***
## Residuals   7745   6900     0.9                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(fit, "r_condition")
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = r_rating ~ r_condition + r_id, data = ratings)
## 
## $r_condition
##                           diff     lwr     upr p adj
## scholar-in-mturk        0.6211  0.5462  0.6960     0
## scholar-out-mturk       0.3653  0.3100  0.4206     0
## scholar-out-scholar-in -0.2558 -0.3266 -0.1850     0

Overall, it appears that scholar-in questions are rated 0.26 higher than scholar-outs and 0.62 higher than turkers.

Is this a constant effect, independent of question?

But this might not be particularly interesting. We may just be able to “average adjust” each subject to get comparable scores.

We CANT do this if effects differ between common and domain-spceific questions. Do they?

Yes! The following analysis shows:

fit <- aov(formula = r_rating ~ r_condition + r_common + r_condition:r_common + u_email + r_id, data = ratings)

summary(fit)
##                        Df Sum Sq Mean Sq F value  Pr(>F)    
## r_condition             2    390   194.8  280.68 < 2e-16 ***
## r_common                1      2     1.8    2.62    0.11    
## u_email               108   1703    15.8   22.73 < 2e-16 ***
## r_id                  198   6323    31.9   46.02 < 2e-16 ***
## r_condition:r_common    1     16    16.0   23.10 1.6e-06 ***
## Residuals            7636   5299     0.7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(fit, "r_condition:r_common")
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = r_rating ~ r_condition + r_common + r_condition:r_common + u_email + r_id, data = ratings)
## 
## $`r_condition:r_common`
##                                             diff      lwr      upr  p adj
## scholar-in:general-mturk:general              NA       NA       NA     NA
## scholar-out:general-mturk:general         0.1586  0.02223  0.29489 0.0118
## mturk:specific-mturk:general             -0.2039 -0.32539 -0.08232 0.0000
## scholar-in:specific-mturk:general         0.4523  0.32357  0.58112 0.0000
## scholar-out:specific-mturk:general        0.2075  0.08889  0.32601 0.0000
## scholar-out:general-scholar-in:general        NA       NA       NA     NA
## mturk:specific-scholar-in:general             NA       NA       NA     NA
## scholar-in:specific-scholar-in:general        NA       NA       NA     NA
## scholar-out:specific-scholar-in:general       NA       NA       NA     NA
## mturk:specific-scholar-out:general       -0.3624 -0.45674 -0.26808 0.0000
## scholar-in:specific-scholar-out:general   0.2938  0.19030  0.39727 0.0000
## scholar-out:specific-scholar-out:general  0.0489 -0.04156  0.13936 0.6377
## scholar-in:specific-mturk:specific        0.6562  0.57316  0.73923 0.0000
## scholar-out:specific-mturk:specific       0.4113  0.34521  0.47740 0.0000
## scholar-out:specific-scholar-in:specific -0.2449 -0.32350 -0.16627 0.0000