library(RMySQL)
library(dplyr)
host <- "localhost"
source("logincredentials.R")
dbname <- "movieRatingsData"
# Establish the database connection
con <- dbConnect(MySQL(), user = user, password = password, dbname = dbname, host = host)
query <- "SELECT * FROM movieRatings"
# Fetch data into a data frame
movieRatings <- dbGetQuery(con, query)
colnames(movieRatings) <- c("ID", "Name", "Gender", "Age", "Barbie_Rating", "Oppenheimer_Rating", "Passages_Rating", "Nimona_Rating", "TheLittleMermaid_Rating", "MissionImpossible_Rating")
# change rating and age columns to numeric
movieRatings <- movieRatings %>%
mutate_at(vars(Age, Barbie_Rating, Oppenheimer_Rating, Passages_Rating, Nimona_Rating, TheLittleMermaid_Rating, MissionImpossible_Rating), as.numeric)
In this analysis, I conducted a Google Forms survey asking people to rank 6 recently released movies on a scale of 1-5. The movies in question are: Barbie, Oppenheimer, The Little Mermaid, Passages, Nimona, and Mission Impossible - Dead Reckoning. Not all people surveyed had seen every movie.
Below is a glimpse of the data:
## ID Name Gender Age Barbie_Rating Oppenheimer_Rating Passages_Rating
## 1 1 Sydney Female 24 5 5 NA
## 2 2 Brielle Female 23 5 NA NA
## 3 3 Marty Male 38 NA NA NA
## 4 4 Caitlyn Female 24 5 NA NA
## 5 5 Jooj Female 22 NA 4 NA
## 6 6 John Male 59 NA 5 NA
## Nimona_Rating TheLittleMermaid_Rating MissionImpossible_Rating
## 1 NA NA 2
## 2 NA NA NA
## 3 NA NA NA
## 4 NA 5 NA
## 5 NA NA NA
## 6 NA NA 5
After the “Barbenheimer” debate, I was interested to see if there was a correlation in ratings for movies based on gender. From social media, it seemed like women were more excited for the Barbie movie, while men were more excited for Oppenheimer.
In this analysis, I want to see which movies (out of the 6 surveyed) each gender prefers.
Below is a table showing the average ratings for each movie:
library(tidyr)
library(gt)
# Reshape Data
avgRatings <- movieRatings %>%
pivot_longer(
cols = c(`Barbie_Rating`, `Oppenheimer_Rating`, `Passages_Rating`, `Nimona_Rating`, `TheLittleMermaid_Rating`, `MissionImpossible_Rating`),
names_to = "Movie",
values_to = "Rating"
)
# Calculate the average ratings for each movie and sort in decreasing order
avgRatings <- avgRatings %>%
group_by(Movie) %>%
summarise(Avg_Rating = mean(Rating, na.rm = TRUE)
)
# Arrange the results in descending order by Avg_Rating
avgRatings <- avgRatings %>%
arrange(desc(Avg_Rating))
# Round the Avg_Rating column to two decimal points
avgRatings$Avg_Rating <- round(avgRatings$Avg_Rating, 2)
# Rename the movie column contents
avgRatings$Movie <- sub("_Rating", "", avgRatings$Movie)
avgRatings$Movie <- sub("MissionImpossible", "Mission Impossible", avgRatings$Movie)
avgRatings$Movie <- sub("TheLittleMermaid", "The Little Mermaid", avgRatings$Movie)
## Create a gt table
avgRatings_tbl <- gt(avgRatings)
# Customize gt table
avgRatings_tbl <- avgRatings_tbl |>
tab_header(
title = md("**Average Rating per Movie**")
) |>
cols_label(
Movie = md("**Movie**"), Avg_Rating = md("**Average Rating**")
)
avgRatings_tbl
Average Rating per Movie | |
Movie | Average Rating |
---|---|
Oppenheimer | 4.54 |
Mission Impossible | 4.50 |
Barbie | 4.12 |
Passages | 3.60 |
The Little Mermaid | 3.46 |
Nimona | 3.17 |
Below is a table combining the average ratings for each movie for each gender:
colnames(movieRatings) <- c("ID","Name", "Gender", "Age", "Barbie", "Oppenheimer", "Passages", "Nimona", "The Little Mermaid", "Mission Impossible")
## female average ratings
# create a subset of the female ratings
femaleRatings <- subset(movieRatings, Gender == "Female")
# Reshape Data
femaleAvgRatings <- femaleRatings %>%
pivot_longer(
cols = c(`Barbie`, `Oppenheimer`, `Passages`, `Nimona`, `The Little Mermaid`, `Mission Impossible`),
names_to = "Movie",
values_to = "Rating"
)
# Calculate the average ratings for each movie and sort in decreasing order
femaleAvgRatings <- femaleAvgRatings %>%
group_by(Movie) %>%
summarise(Avg_Rating = mean(Rating, na.rm = TRUE)
)
# Round the Avg_Rating column to two decimal points
femaleAvgRatings$Avg_Rating <- round(femaleAvgRatings$Avg_Rating, 2)
## male average ratings
# create a subset of the male ratings
maleRatings <- subset(movieRatings, Gender == "Male")
# Reshape Data
maleAvgRatings <- maleRatings %>%
pivot_longer(
cols = c(`Barbie`, `Oppenheimer`, `Passages`, `Nimona`, `The Little Mermaid`, `Mission Impossible`),
names_to = "Movie",
values_to = "Rating"
)
# Calculate the average ratings for each movie and sort in decreasing order
maleAvgRatings <- maleAvgRatings %>%
group_by(Movie) %>%
summarise(Avg_Rating = mean(Rating, na.rm = TRUE)
)
# Round the Avg_Rating column to two decimal points
maleAvgRatings$Avg_Rating <- round(maleAvgRatings$Avg_Rating, 2)
# Merge the female and male average ratings by "Movie"
genderRatings <- merge(femaleAvgRatings, maleAvgRatings, by = "Movie", all = TRUE)
# Rename the columns
colnames(genderRatings) <- c("Movie", "Female_Avg_Rating", "Male_Avg_Rating")
# create a gt table
genderRatings_tbl <- gt(genderRatings)
genderRatings_tbl <- genderRatings_tbl |>
tab_header(
title = md("**Average Rating per Movie for Each Gender**")
) |>
cols_label(
Movie = md("**Movie**"), Female_Avg_Rating = md("**Female**"), Male_Avg_Rating = md("**Male**")
)
genderRatings_tbl
Average Rating per Movie for Each Gender | ||
Movie | Female | Male |
---|---|---|
Barbie | 4.45 | 3.50 |
Mission Impossible | 4.17 | 4.83 |
Nimona | 4.00 | 2.75 |
Oppenheimer | 4.43 | 4.67 |
Passages | 3.00 | 3.75 |
The Little Mermaid | 4.17 | 2.86 |
Below, we visualize the difference in ratings for each gender for each movie:
library(ggplot2)
movies <- c("Barbie", "Oppenheimer", "Passages", "The Little Mermaid", "Nimona", "Mission Impossible")
movie_data <- genderRatings %>%
filter(Movie %in% movies) %>%
select(Movie, Female_Avg_Rating, Male_Avg_Rating)
# Side-by-side bar chart
ggplot(movie_data, aes(x = Movie)) +
geom_bar(aes(y = Female_Avg_Rating, fill = "Female"), stat = "identity", position = "dodge", alpha = 0.75, color="hotpink") +
geom_bar(aes(y = Male_Avg_Rating, fill = "Male"), stat = "identity", position = "dodge", alpha = 0.5, color = "blue") +
labs(title = "Comparison of Male and Female Ratings",
x = "Movie",
y = "Rating") +
scale_fill_manual(values = c("Female" = "hotpink", "Male" = "blue")) +
theme_minimal() +
theme(legend.title = element_blank(),
legend.position = "top") +
coord_flip()
In conclusion, the top 2 movies for women are Barbie and Oppenheimer, while the top 2 movies for men are Mission Impossible and Oppenheimer. It was interesting to see that the average ratings for Oppenheimer were very close (being 4.43 for women and 4.67 for men). As predicted, ratings for Barbie from men were much lower than ratings from women. However, women seem to be just as interest in Oppenheimer as men. Overall, men seem to have more interest in action, thriller, and drama movies, while women seem to have more interest in action, thriller, and fantasy movies.
If someone were to further their research on this topic, I would suggest to conduct a survey with a larger amount of movies. This way, it would be more accurate to assume a gender’s preference based on genre of movie.
Retrieved from https://www.cnn.com/2023/07/20/media/barbie-oppenheimer-barbenheimer-reliable-sources/index.html