ANOVA One

Next we will be comparing means for author book rating based on where the author is from. posing the question, Does an Authors birthplace have an affect on the rating that book will receive? There are a ton of different birthplaces in this dataset so we will isolate some. I’ve chosen the United States, Canada, and the UK. These 3 were selected because I felt as if I saw them the most when I looked through the data.

$$ H_0:\ H_A:

$$

Birthplace <- data[which(data$birthplace %in% c("United States", "Canada", "United Kingdom")),]
data$author_average_rating[is.na(data$author_average_rating)]<- 0 
data$book_average_rating[is.na(data$book_average_rating)]<- 0 
data[which(data$author_average_rating > data$book_average_rating),"Rating"]= "Less"
data[which(data$author_average_rating < data$book_average_rating),"Rating"]= "More"
data[which(data$author_average_rating == data$book_average_rating),"Rating"]= "Equal"
table(data$Rating)
## 
## Equal  Less  More 
##  1976 10733 10182

lets add an indicator for female authors we will use this later

data[which(data$genre_1 %in% "female"), "Female"]= TRUE
data[which(data$genre_1 %in% "female"), "Female"]= FALSE

One way or ANOVA

So next I will be doing a one way ANOVA comparison to find how the means of author book rating differ based on where the author is from.

model <- aov(data$book_average_rating ~ data$birthplace, data=Birthplace)
summary(model)
##                    Df Sum Sq Mean Sq F value Pr(>F)    
## data$birthplace   434   88.5 0.20383   2.469 <2e-16 ***
## Residuals       22456 1854.1 0.08257                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Because my P- value is Very small I am able to reject the null hypothesis that Mean Book rating for authors from US, CAN, and UK are equal, in favor of the Alternative hypothesis.

plot(model,1)

The red line in the middle shows that the majority of variances are equal in the above graph.

plot(model,2)
## Warning: not plotting observations with leverage one:
##   1462, 1780, 2088, 2098, 2360, 2385, 2648, 3728, 4345, 4950, 4995, 5452, 5605, 6234, 7384, 7817, 7843, 7983, 8033, 8410, 8909, 9089, 9433, 9461, 9502, 9845, 9961, 9963, 10210, 10271, 10335, 10769, 11067, 11118, 11547, 12172, 12469, 12654, 12721, 12848, 12934, 13015, 13573, 14295, 14337, 14365, 14390, 14591, 14691, 14868, 15129, 15139, 15184, 15278, 15580, 15586, 15794, 15945, 15974, 16257, 16434, 16594, 16743, 16924, 16962, 17058, 17171, 17371, 17372, 17390, 17431, 17680, 17745, 17801, 17836, 18086, 18280, 18301, 18520, 18590, 18865, 19063, 19066, 19215, 19220, 19247, 19268, 19375, 19639, 19652, 19677, 19719, 19726, 19727, 19734, 19899, 19995, 20104, 20206, 20266, 20315, 20355, 20466, 20474, 20543, 20648, 20714, 20851, 20939, 21070, 21089, 21396, 21561, 21657, 21776, 21812, 21835, 21858, 21879, 21990, 22005, 22069, 22185, 22189, 22310, 22348, 22443, 22808, 22827, 22835

The QQ plot above almost shows a straight line! this confirms that we meeth the conditions to reject the null hypothesis.

Two Way Anova (running out of puns)

For this 2 way ANOVA we will be answering the follwing 3 Hypothesis tests $$ H_{01}:\

H_{A1}:\

$$

$$ H_{02}:\

H_{A2}:\ $$

\[ H_{03}:\text{Mean Book rating for authors of opposite gender from US, CAN, and UK is equal}\\ H_{A3}:\text{Mean Book rating for authors of opposite gender from US, CAN, and UK is not equal} \] Okay lets get some results! maybe

model2 <- aov(data$book_average_rating ~ data$birthplace*data$author_gender, data=Birthplace)
summary(model2)
##                                       Df Sum Sq Mean Sq F value   Pr(>F)    
## data$birthplace                      434   88.5  0.2038   2.488  < 2e-16 ***
## data$author_gender                     1    1.3  1.2840  15.674 7.55e-05 ***
## data$birthplace:data$author_gender   149   25.6  0.1715   2.094 1.86e-13 ***
## Residuals                          22306 1827.3  0.0819                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

can reject all 3 null hypotheses because all P- Values are small.

plot(model2,1)

plot (model2,2)
## Warning: not plotting observations with leverage one:
##   469, 1462, 1780, 1794, 2088, 2098, 2360, 2385, 2648, 3225, 3728, 4345, 4950, 4995, 5452, 5605, 6234, 7384, 7817, 7843, 7983, 8033, 8159, 8347, 8410, 8909, 9089, 9134, 9433, 9461, 9502, 9845, 9961, 9963, 10210, 10271, 10335, 10769, 11067, 11085, 11118, 11547, 11896, 11908, 12172, 12403, 12469, 12654, 12721, 12848, 12934, 13015, 13404, 13444, 13573, 14295, 14337, 14365, 14390, 14591, 14691, 14868, 14982, 15129, 15139, 15184, 15278, 15508, 15580, 15586, 15652, 15794, 15910, 15945, 15974, 16228, 16235, 16257, 16434, 16594, 16627, 16743, 16924, 16930, 16962, 17058, 17166, 17171, 17337, 17371, 17372, 17390, 17431, 17680, 17745, 17801, 17836, 18030, 18086, 18092, 18280, 18301, 18520, 18590, 18824, 18865, 19031, 19063, 19066, 19215, 19220, 19247, 19252, 19268, 19375, 19639, 19652, 19659, 19677, 19718, 19719, 19726, 19727, 19734, 19899, 19995, 20033, 20069, 20104, 20206, 20246, 20266, 20315, 20355, 20396, 20466, 20474, 20543, 20648, 20714, 20762, 20851, 20939, 21070, 21089, 21144, 21396, 21561, 21657, 21776, 21812, 21832, 21835, 21858, 21879, 21883, 21990, 22005, 22046, 22069, 22185, 22189, 22310, 22348, 22443, 22700, 22808, 22827, 22833, 22835