Julia Ferris
2023-09-13
The data loaded for this document comes from the MySQL .CSV file saved after running the MySQL script.
Some of the people included in the data set did not see the movies in the list. This missing data will be replaced in the section below by the average rating for the movie corresponding to the missing data entry. Table 1 shows the new values.
movie1 <- c(movies$AvatarWater)
movie1 <- suppressWarnings(as.numeric(movie1, na.rm = TRUE))
movies$AvatarWater[movies$AvatarWater == "\\N"] <- round(mean(movie1, na.rm = TRUE))
movie2 <- c(movies$TopGunMaverick)
movie2 <- suppressWarnings(as.numeric(movie2, na.rm = TRUE))
movies$TopGunMaverick[movies$TopGunMaverick == "\\N"] <- round(mean(movie2, na.rm = TRUE))
movie3 <- c(movies$Oppenheimer)
movie3 <- suppressWarnings(as.numeric(movie3, na.rm = TRUE))
movies$Oppenheimer[movies$Oppenheimer == "\\N"] <- round(mean(movie3, na.rm = TRUE))
movie4 <- c(movies$SoundOfFreedom)
movie4 <- suppressWarnings(as.numeric(movie4, na.rm = TRUE))
movies$SoundOfFreedom[movies$SoundOfFreedom == "\\N"] <- round(mean(movie4, na.rm = TRUE))
movie5 <- c(movies$Barbie)
movie5 <- suppressWarnings(as.numeric(movie5, na.rm = TRUE))
movies$Barbie[movies$Barbie == "\\N"] <- round(mean(movie5, na.rm = TRUE))
movie6 <- c(movies$Boogeyman)
movie6 <- suppressWarnings(as.numeric(movie6, na.rm = TRUE))
movies$Boogeyman[movies$Boogeyman == "\\N"] <- round(mean(movie6, na.rm = TRUE))
library(gt)
gt(head(movies)) |>
tab_header(
title = "Table 1",
subtitle = "Movie Ratings"
)| Table 1 | ||||||
| Movie Ratings | ||||||
| person | AvatarWater | TopGunMaverick | Oppenheimer | SoundOfFreedom | Barbie | Boogeyman |
|---|---|---|---|---|---|---|
| 1 | 3 | 4 | 4 | 4 | 2 | 4 |
| 2 | 4 | 5 | 4 | 5 | 4 | 2 |
| 3 | 4 | 3 | 4 | 5 | 4 | 5 |
| 4 | 5 | 5 | 3 | 5 | 3 | 4 |
| 5 | 5 | 4 | 4 | 5 | 3 | 4 |
To answer this question, the average rating for each movie was calculated. The average was based on the actual data plus the filled-in values.
one <- mean(as.numeric(movies$AvatarWater))
two <- mean(as.numeric(movies$TopGunMaverick))
three <- mean(as.numeric(movies$Oppenheimer))
four <- mean(as.numeric(movies$SoundOfFreedom))
five <- mean(as.numeric(movies$Barbie))
six <- mean(as.numeric(movies$Boogeyman))
newdf <- data.frame(Avatar = one, TopGun = two, Opp = three, Sound = four, Barbie = five, Boogey = six)
newdf## Avatar TopGun Opp Sound Barbie Boogey
## 1 4.2 4.2 3.8 4.8 3.2 3.8
## [1] 4.8
Sound of Freedom had the highest average rating.
To answer this question, bar graphs were created for each person. Then, bar graphs were created for each movie. The bar graphs were used for visual comparison. The standard deviation was calculated for each person and for each movie for numerical comparison.
barplot(as.numeric(movies$AvatarWater),xlab = "Person",ylab = "Rating",main = "Avatar: The Way of Water Ratings",names.arg = c(1, 2, 3, 4, 5), col="blue")
barplot(as.numeric(movies$TopGunMaverick),xlab = "Person",ylab = "Rating",main = "Top Gun: Maverick Ratings",names.arg = c(1, 2, 3, 4, 5), col="blue")
barplot(as.numeric(movies$Oppenheimer),xlab = "Person",ylab = "Rating",main = "Oppenheimer Ratings",names.arg = c(1, 2, 3, 4, 5), col="blue")
barplot(as.numeric(movies$SoundOfFreedom),xlab = "Person",ylab = "Rating",main = "Sound of Freedom Ratings",names.arg = c(1, 2, 3, 4, 5), col="blue")
barplot(as.numeric(movies$Barbie),xlab = "Person",ylab = "Rating",main = "Barbie Ratings",names.arg = c(1, 2, 3, 4, 5), col="blue")
barplot(as.numeric(movies$Boogeyman),xlab = "Person",ylab = "Rating",main = "Boogeyman Ratings",names.arg = c(1, 2, 3, 4, 5), col="blue")barplot(as.numeric(movies[1,2:7]),xlab = "Movie",ylab = "Rating",main = "Person 1 Ratings", names.arg = c("Avatar", "TopGun", "Opp", "Sound", "Barbie", "Boogey"),col="blue")
barplot(as.numeric(movies[2,2:7]),xlab = "Movie",ylab = "Rating",main = "Person 2 Ratings", names.arg = c("Avatar", "TopGun", "Opp", "Sound", "Barbie", "Boogey"),col="blue")
barplot(as.numeric(movies[3,2:7]),xlab = "Movie",ylab = "Rating",main = "Person 3 Ratings", names.arg = c("Avatar", "TopGun", "Opp", "Sound", "Barbie", "Boogey"),col="blue")
barplot(as.numeric(movies[4,2:7]),xlab = "Movie",ylab = "Rating",main = "Person 4 Ratings", names.arg = c("Avatar", "TopGun", "Opp", "Sound", "Barbie", "Boogey"),col="blue")
barplot(as.numeric(movies[5,2:7]),xlab = "Movie",ylab = "Rating",main = "Person 5 Ratings", names.arg = c("Avatar", "TopGun", "Opp", "Sound", "Barbie", "Boogey"),col="blue")mean(c(sd(as.numeric(movies$AvatarWater)), sd(as.numeric(movies$TopGunMaverick)), sd(as.numeric(movies$Oppenheimer)), sd(as.numeric(movies$SoundOfFreedom)), sd(as.numeric(movies$Barbie)), sd(as.numeric(movies$Boogeyman))))## [1] 0.7499754
mean(c(sd(as.numeric(movies[1,2:7])), sd(as.numeric(movies[2,2:7])), sd(as.numeric(movies[3,2:7])),
sd(as.numeric(movies[4,2:7])), sd(as.numeric(movies[5,2:7]))))## [1] 0.8841685
Based on the visual comparison and the lowest average standard deviation, movie ratings were more consistent by movie than by person. This means most people voted similarly for the same movie. Also, this means most people voted differently when voting for different movies. However, the sample size of five people was small, so this is not representative of a larger population of people who watched these movies.
To answer this question, the number of ratings that were 4 or 5 were counted. This number was divided by the total number of ratings and multiplied by 100.
## [1] 71.42857
71.42857% of ratings were greater than 3.
Wagner, Donald. Stack Overflow. 2016. https://stackoverflow.com/questions/5941809/include-headers-when-using-select-into-outfile
Naveen. Spark By Examples. 2023. https://sparkbyexamples.com/r-programming/replace-values-in-r/
Xie, Yihui. Dervieux, Christophe. Riederer, Emily. R Markdown Cookbook. Bookdown. 2023. https://bookdown.org/yihui/rmarkdown-cookbook/figures-side.html