library(RPostgres)
Week_2_Assignment
Movies Ratings
Collecting Data
To collect the ratings of 6 of the recent popular movies, I created a Google Form and I sent the link to REAL friends. I also rated the movies myself; I watched none of the movies I included in the survey, yet I picked them because I am interested in watching them when I have time. Here is the link to the Google form: https://forms.gle/9PSCwsAJcQDfvAGPA
Connecting pgadmin4 with Rstudio:
Loading Packages:
<- dbDriver("Postgres")
drv <-dbCanConnect(drv = RPostgres::Postgres(),
con dbname = "Movies Ratings",
port = 5432,
user = "postgres",
password = "MOHAMEDChawki2018")
con
[1] TRUE
<-dbConnect(drv = RPostgres::Postgres(),
con_1 dbname = "Movies Ratings",
port = 5432,
user = "postgres",
password = "MOHAMEDChawki2018")
con_1
<PqConnection> Movies Ratings@/tmp:5432
dbListTables(con_1)
[1] "moviesratings" "IMDBmoviesRatings"
dbGetQuery(con_1, "SELECT * FROM moviesratings")
Name 1. Avengers: Endgame: 2. The Little Mermaid 3. Booksmart
1 Hiba NA 5 NA
2 Ali NA NA NA
3 Omar NA NA NA
4 Tony NA NA NA
5 Youssef 4 NA 2
6 Adil 3 NA 5
7 Jor 3 NA NA
8 Sarah NA NA NA
9 Lilly 4 NA 5
10 Lauren 5 NA NA
11 Jeannie 5 NA NA
12 Jonathan 5 NA NA
13 Saloua NA NA NA
14 Sara NA NA NA
15 Christine 5 4 NA
16 Nancy 4 NA NA
17 Stephen 2 1 3
18 Howard 4 NA 3
4. Talk to Me 5. Interstellar 6. The Irishman
1 NA NA NA
2 4 5 NA
3 NA NA 5
4 NA 3 5
5 NA NA 5
6 NA 5 NA
7 NA 3 NA
8 3 NA 4
9 NA 2 NA
10 NA 3 3
11 NA 5 3
12 NA 5 NA
13 NA NA NA
14 NA NA NA
15 NA 5 NA
16 NA NA NA
17 NA NA NA
18 4 NA NA
<- dbGetQuery(con_1, "SELECT * FROM moviesratings") ratings_df
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.3 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(gt)
library(ggplot2)
= FALSE echo
gt(ratings_df)
Name | 1. Avengers: Endgame: | 2. The Little Mermaid | 3. Booksmart | 4. Talk to Me | 5. Interstellar | 6. The Irishman |
---|---|---|---|---|---|---|
Hiba | NA | 5 | NA | NA | NA | NA |
Ali | NA | NA | NA | 4 | 5 | NA |
Omar | NA | NA | NA | NA | NA | 5 |
Tony | NA | NA | NA | NA | 3 | 5 |
Youssef | 4 | NA | 2 | NA | NA | 5 |
Adil | 3 | NA | 5 | NA | 5 | NA |
Jor | 3 | NA | NA | NA | 3 | NA |
Sarah | NA | NA | NA | 3 | NA | 4 |
Lilly | 4 | NA | 5 | NA | 2 | NA |
Lauren | 5 | NA | NA | NA | 3 | 3 |
Jeannie | 5 | NA | NA | NA | 5 | 3 |
Jonathan | 5 | NA | NA | NA | 5 | NA |
Saloua | NA | NA | NA | NA | NA | NA |
Sara | NA | NA | NA | NA | NA | NA |
Christine | 5 | 4 | NA | NA | 5 | NA |
Nancy | 4 | NA | NA | NA | NA | NA |
Stephen | 2 | 1 | 3 | NA | NA | NA |
Howard | 4 | NA | 3 | 4 | NA | NA |
%>%
ratings_df ::select(1:7) %>%
dplyr::slice(1:18) %>%
dplyrgt(rowname_col = "Names of Respondents") %>%
tab_spanner(label = "Movies Rated", columns = matches("1.|2.|3.|4.|5.|6.")) %>%
tab_header(
title = md("*Six Popular Movies Ratings*")
%>%
) opt_stylize(style = 6)
Six Popular Movies Ratings | ||||||
Name | Movies Rated | |||||
---|---|---|---|---|---|---|
1. Avengers: Endgame: | 2. The Little Mermaid | 3. Booksmart | 4. Talk to Me | 5. Interstellar | 6. The Irishman | |
Hiba | NA | 5 | NA | NA | NA | NA |
Ali | NA | NA | NA | 4 | 5 | NA |
Omar | NA | NA | NA | NA | NA | 5 |
Tony | NA | NA | NA | NA | 3 | 5 |
Youssef | 4 | NA | 2 | NA | NA | 5 |
Adil | 3 | NA | 5 | NA | 5 | NA |
Jor | 3 | NA | NA | NA | 3 | NA |
Sarah | NA | NA | NA | 3 | NA | 4 |
Lilly | 4 | NA | 5 | NA | 2 | NA |
Lauren | 5 | NA | NA | NA | 3 | 3 |
Jeannie | 5 | NA | NA | NA | 5 | 3 |
Jonathan | 5 | NA | NA | NA | 5 | NA |
Saloua | NA | NA | NA | NA | NA | NA |
Sara | NA | NA | NA | NA | NA | NA |
Christine | 5 | 4 | NA | NA | 5 | NA |
Nancy | 4 | NA | NA | NA | NA | NA |
Stephen | 2 | 1 | 3 | NA | NA | NA |
Howard | 4 | NA | 3 | 4 | NA | NA |
<- round(mean(ratings_df$`1. Avengers: Endgame:`, na.rm = TRUE),2)
mean1 <- round(mean(ratings_df$`2. The Little Mermaid`, na.rm = TRUE),2)
mean2 <- round(mean(ratings_df$`3. Booksmart`, na.rm = TRUE),2)
mean3 <-round(mean(ratings_df$`4. Talk to Me`, na.rm=TRUE),2)
mean4 <- round(mean(ratings_df$`5. Interstellar`, na.rm=TRUE),2)
mean5 <- round(mean(ratings_df$`6. The Irishman`, na.rm = TRUE),2) mean6
= data.frame(Movie_Name=c('Avengers: Endgame','The Little Mermaid', 'Booksmart', 'Talk To Me', 'Interstellar', 'The Irishman'),
Average_ratings Avg_ratings=c(mean1, mean2, mean3, mean4, mean5,mean6))
print(Average_ratings)
Movie_Name Avg_ratings
1 Avengers: Endgame 4.00
2 The Little Mermaid 3.33
3 Booksmart 3.60
4 Talk To Me 3.67
5 Interstellar 4.00
6 The Irishman 4.17
Visualizing the data:
ggplot(data = Average_ratings, ) + geom_bar(mapping = aes(x = Movie_Name, y = Avg_ratings ), stat = "identity")
<- data.frame(t(ratings_df))
ratings_df2 print(ratings_df2)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
Name Hiba Ali Omar Tony Youssef Adil Jor Sarah Lilly Lauren
1. Avengers: Endgame: <NA> <NA> <NA> <NA> 4 3 3 <NA> 4 5
2. The Little Mermaid 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3. Booksmart <NA> <NA> <NA> <NA> 2 5 <NA> <NA> 5 <NA>
4. Talk to Me <NA> 4 <NA> <NA> <NA> <NA> <NA> 3 <NA> <NA>
5. Interstellar <NA> 5 <NA> 3 <NA> 5 3 <NA> 2 3
6. The Irishman <NA> <NA> 5 5 5 <NA> <NA> 4 <NA> 3
X11 X12 X13 X14 X15 X16 X17
Name Jeannie Jonathan Saloua Sara Christine Nancy Stephen
1. Avengers: Endgame: 5 5 <NA> <NA> 5 4 2
2. The Little Mermaid <NA> <NA> <NA> <NA> 4 <NA> 1
3. Booksmart <NA> <NA> <NA> <NA> <NA> <NA> 3
4. Talk to Me <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5. Interstellar 5 5 <NA> <NA> 5 <NA> <NA>
6. The Irishman 3 <NA> <NA> <NA> <NA> <NA> <NA>
X18
Name Howard
1. Avengers: Endgame: 4
2. The Little Mermaid <NA>
3. Booksmart 3
4. Talk to Me 4
5. Interstellar <NA>
6. The Irishman <NA>
Conclusion:
After calculating the Global Baseline Estimate using Excel, I found it was 3.89. Considering 3.89 as an average threshold, the most recommended movies are “Avengers: Endgame”, “Interstellar”, and ” The Irishman”. The movies that were rated have different Genres which makes it though to recommend any movie to any of the respondents, especially to Saloua and Sara. They both happened to not watch any of these movies. Based on the average ratings of each movie, the Irishman got the greatest average of all “4.17”
References:
- “A Simple Guide On Connecting Rstudio to a PostgreSQL Database” https://medium.com/geekculture/a-simple-guide-on-connecting-rstudio-to-a-postgresql-database-9e35ccdc08be
- “Great Looking Tables: gt (v0.2)“ https://posit.co/blog/great-looking-tables-gt-0-2/