Next we will be comparing means for author book rating based on where the author is from. posing the question, Does an Authors birthplace have an affect on the rating that book will receive? There are a ton of different birthplaces in this dataset so we will isolate some. I’ve chosen the United States, Canada, and the UK. These 3 were selected because I felt as if I saw them the most when I looked through the data.
$$ H_0:\ H_A:
$$
Birthplace <- data[which(data$birthplace %in% c("United States", "Canada", "United Kingdom")),]
data$author_average_rating[is.na(data$author_average_rating)]<- 0
data$book_average_rating[is.na(data$book_average_rating)]<- 0
data[which(data$author_average_rating > data$book_average_rating),"Rating"]= "Less"
data[which(data$author_average_rating < data$book_average_rating),"Rating"]= "More"
data[which(data$author_average_rating == data$book_average_rating),"Rating"]= "Equal"
table(data$Rating)
##
## Equal Less More
## 1976 10733 10182
lets add an indicator for female authors we will use this later
data[which(data$genre_1 %in% "female"), "Female"]= TRUE
data[which(data$genre_1 %in% "female"), "Female"]= FALSE
So next I will be doing a one way ANOVA comparison to find how the means of author book rating differ based on where the author is from.
model <- aov(data$book_average_rating ~ data$birthplace, data=Birthplace)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## data$birthplace 434 88.5 0.20383 2.469 <2e-16 ***
## Residuals 22456 1854.1 0.08257
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Because my P- value is Very small I am able to reject the null hypothesis that Mean Book rating for authors from US, CAN, and UK are equal, in favor of the Alternative hypothesis.
plot(model,1)
The red line in the middle shows that the majority of variances are equal in the above graph.
plot(model,2)
## Warning: not plotting observations with leverage one:
## 1462, 1780, 2088, 2098, 2360, 2385, 2648, 3728, 4345, 4950, 4995, 5452, 5605, 6234, 7384, 7817, 7843, 7983, 8033, 8410, 8909, 9089, 9433, 9461, 9502, 9845, 9961, 9963, 10210, 10271, 10335, 10769, 11067, 11118, 11547, 12172, 12469, 12654, 12721, 12848, 12934, 13015, 13573, 14295, 14337, 14365, 14390, 14591, 14691, 14868, 15129, 15139, 15184, 15278, 15580, 15586, 15794, 15945, 15974, 16257, 16434, 16594, 16743, 16924, 16962, 17058, 17171, 17371, 17372, 17390, 17431, 17680, 17745, 17801, 17836, 18086, 18280, 18301, 18520, 18590, 18865, 19063, 19066, 19215, 19220, 19247, 19268, 19375, 19639, 19652, 19677, 19719, 19726, 19727, 19734, 19899, 19995, 20104, 20206, 20266, 20315, 20355, 20466, 20474, 20543, 20648, 20714, 20851, 20939, 21070, 21089, 21396, 21561, 21657, 21776, 21812, 21835, 21858, 21879, 21990, 22005, 22069, 22185, 22189, 22310, 22348, 22443, 22808, 22827, 22835
The QQ plot above almost shows a straight line! this confirms that we meeth the conditions to reject the null hypothesis.
For this 2 way ANOVA we will be answering the follwing 3 Hypothesis tests $$ H_{01}:\
H_{A1}:\
$$
$$ H_{02}:\
H_{A2}:\ $$
\[ H_{03}:\text{Mean Book rating for authors of opposite gender from US, CAN, and UK is equal}\\ H_{A3}:\text{Mean Book rating for authors of opposite gender from US, CAN, and UK is not equal} \] Okay lets get some results! maybe
model2 <- aov(data$book_average_rating ~ data$birthplace*data$author_gender, data=Birthplace)
summary(model2)
## Df Sum Sq Mean Sq F value Pr(>F)
## data$birthplace 434 88.5 0.2038 2.488 < 2e-16 ***
## data$author_gender 1 1.3 1.2840 15.674 7.55e-05 ***
## data$birthplace:data$author_gender 149 25.6 0.1715 2.094 1.86e-13 ***
## Residuals 22306 1827.3 0.0819
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
can reject all 3 null hypotheses because all P- Values are small.
plot(model2,1)
plot (model2,2)
## Warning: not plotting observations with leverage one:
## 469, 1462, 1780, 1794, 2088, 2098, 2360, 2385, 2648, 3225, 3728, 4345, 4950, 4995, 5452, 5605, 6234, 7384, 7817, 7843, 7983, 8033, 8159, 8347, 8410, 8909, 9089, 9134, 9433, 9461, 9502, 9845, 9961, 9963, 10210, 10271, 10335, 10769, 11067, 11085, 11118, 11547, 11896, 11908, 12172, 12403, 12469, 12654, 12721, 12848, 12934, 13015, 13404, 13444, 13573, 14295, 14337, 14365, 14390, 14591, 14691, 14868, 14982, 15129, 15139, 15184, 15278, 15508, 15580, 15586, 15652, 15794, 15910, 15945, 15974, 16228, 16235, 16257, 16434, 16594, 16627, 16743, 16924, 16930, 16962, 17058, 17166, 17171, 17337, 17371, 17372, 17390, 17431, 17680, 17745, 17801, 17836, 18030, 18086, 18092, 18280, 18301, 18520, 18590, 18824, 18865, 19031, 19063, 19066, 19215, 19220, 19247, 19252, 19268, 19375, 19639, 19652, 19659, 19677, 19718, 19719, 19726, 19727, 19734, 19899, 19995, 20033, 20069, 20104, 20206, 20246, 20266, 20315, 20355, 20396, 20466, 20474, 20543, 20648, 20714, 20762, 20851, 20939, 21070, 21089, 21144, 21396, 21561, 21657, 21776, 21812, 21832, 21835, 21858, 21879, 21883, 21990, 22005, 22046, 22069, 22185, 22189, 22310, 22348, 22443, 22700, 22808, 22827, 22833, 22835