my_pass <- read_file("C:/Users/Shoshana/Documents/pass.txt")
con <- dbConnect(
Postgres(),
host = "localhost",
port = 5432,
user = "postgres",
password = my_pass,
dbname = "cuny-sps"
)
movie_ratings <- dbGetQuery(con, "SELECT * FROM movie_ratings")
raters <- dbGetQuery(con, "SELECT * FROM raters")
movies <- dbGetQuery(con, "SELECT * FROM movies")
# preview each table
head(movie_ratings)
## raterid movieid rating
## 1 1 1 NA
## 2 1 2 3
## 3 1 3 NA
## 4 1 4 3
## 5 1 5 NA
## 6 1 6 NA
head(raters)
## raterid name age
## 1 1 Sarah 24
## 2 2 Shani 22
## 3 3 Leah 61
## 4 4 Shimon 34
## 5 5 Dinah 23
## 6 6 Abe 61
head(movies)
## movieid movie_title release_date
## 1 1 Avatar The Way of Water 2022-12-16
## 2 2 Black Panther Wakanda Forever 2022-11-11
## 3 3 Knives Out Glass Onion 2022-09-10
## 4 4 Matilda the Musical 2022-12-02
## 5 5 Top Gun Maverick 2022-05-27
## 6 6 Bullet Train 2022-08-05
# want a table of just the raters, movies, and their ratings
movie_ratings <- movie_ratings %>%
left_join(movies, on = "movieID") %>%
left_join(raters, on = "raterID") %>%
transmute(name, movie_title, rating)
## Joining, by = "movieid"
## Joining, by = "raterid"
head(movie_ratings)
## name movie_title rating
## 1 Sarah Avatar The Way of Water NA
## 2 Sarah Black Panther Wakanda Forever 3
## 3 Sarah Knives Out Glass Onion NA
## 4 Sarah Matilda the Musical 3
## 5 Sarah Top Gun Maverick NA
## 6 Sarah Bullet Train NA
avg_rating <- movie_ratings %>%
group_by(movie_title) %>%
filter(rating != is.na(rating)) %>%
summarize(avg_rating = round(mean(rating), 2))
avg_rating
## # A tibble: 8 × 2
## movie_title avg_rating
## <chr> <dbl>
## 1 Avatar The Way of Water 4.17
## 2 Black Panther Wakanda Forever 3
## 3 Bullet Train 3.5
## 4 Don't Worry Darling 4
## 5 Knives Out Glass Onion 3
## 6 Matilda the Musical 4
## 7 Ticket to Paradise 4.5
## 8 Top Gun Maverick 3.71
avg_rating %>%
arrange(-avg_rating)
## # A tibble: 8 × 2
## movie_title avg_rating
## <chr> <dbl>
## 1 Ticket to Paradise 4.5
## 2 Avatar The Way of Water 4.17
## 3 Don't Worry Darling 4
## 4 Matilda the Musical 4
## 5 Top Gun Maverick 3.71
## 6 Bullet Train 3.5
## 7 Black Panther Wakanda Forever 3
## 8 Knives Out Glass Onion 3
Ticket to Paradise is the highest rated movie.
movie_ratings %>%
filter(movie_title == "Ticket to Paradise",
rating != is.na(rating))
## name movie_title rating
## 1 Dinah Ticket to Paradise 5
## 2 Leeor Ticket to Paradise 4
So ticket to paradise is the highest rated, but only 2/8 people actually watched it.
watched <- movie_ratings %>%
filter(rating != is.na(rating)) %>%
mutate(watched = 1)
watched %>%
group_by(movie_title) %>%
summarize(num_watched = sum(watched)) %>%
arrange(-num_watched)
## # A tibble: 8 × 2
## movie_title num_watched
## <chr> <dbl>
## 1 Matilda the Musical 7
## 2 Top Gun Maverick 7
## 3 Avatar The Way of Water 6
## 4 Knives Out Glass Onion 6
## 5 Don't Worry Darling 5
## 6 Black Panther Wakanda Forever 4
## 7 Bullet Train 4
## 8 Ticket to Paradise 2
watched %>%
ggplot(aes(y = movie_title)) +
geom_bar(fill = "plum") +
labs(title = "Total Watches per Movie")
Matilda the Musical and Top Gun Maverick were watched by the most people (7/8).
avg_rating_person <- movie_ratings %>%
group_by(name) %>%
filter(rating != is.na(rating)) %>%
summarize(avg_rating = round(mean(rating), 2))
avg_rating_person
## # A tibble: 8 × 2
## name avg_rating
## <chr> <dbl>
## 1 Abe 3.6
## 2 Dinah 4
## 3 Leah 4
## 4 Leeor 4.17
## 5 Sarah 2.33
## 6 Shani 4
## 7 Shimon 3.4
## 8 Talia 3.5
I connected to PostgreSQL server to access the
movie_ratings, movies, and raters
schemas directly from the database. raters has a primary
key of raterID, and movies has a primary
key of movieID. These are both foreign keys in
movie_ratings.
To analyze the data here, I reassigned movie_ratings to
be a table of just the names of the raters, the movie titles, and their
respective ratings. If I had more data, I would analyze popularity based
on age, as the raters table includes the ages of those who
rated. Additionally, all these movies are recent releases and were
released in the same year, but if I had many more movies from many more
years I would want to group based on year to see most popular movies per
year.