I’ve selected six horror/thriller movies I’ve watched before in theater and thought were great. Several titles here have won awards and other had mixed reviews with diverse opinions among critics. However, I am interested in what a small group of randomly selected movie fan fanatics would say about these films. Hence, I surveyed 5 friends with diverse film tastes to rate these movies on a scale of 1 to 5. The movies are The Sustance, Nosferatu, Frankenstein, 28 years later, Sinner, and The Conjuring 4. I want to store these rating in a database where I can analyze and find out what is the most popular movie among this group.
My strategy moving forward would be to create a simple database structure that is normalized. In order to show the relational database connections, I will create 3 separate tables titled “ratings,”movies”, and “reviewers”.
The challenge here will be the existence of missing data and not
every participant will have seen every movie. My plan for the missing
data is to use R’s na.rm=TRUE function and NOT NULL OR NULL for
PostgreSQL. This way it would stop the lowering of rating of the movies
that weren’t watched by the majority. Source - (1)Gemini Pro
(2) https://happygitwithr.com/push-pull-github
(3) R for Data Science 2nd edition by Grolemund
I used R for data science 2nd edition book for the codes to fullfill deliverables I also when through several interactions with Gemini Pro LLM to: (1) Collect PostgreSQL statment required for creating multiple table with relational database connections (2) Create DROP statement to allowing for reset tables for anyone to run the SQL script
#Loading data I am connecting to the PostgreSQL database pgadmin. The SQL script created on there with the multiple tables is available in my Github repository using git according to instruction on Happygitwithr for uploading.
link <- dbConnect(RPostgres::Postgres(),
dbname = "DATA 607",
host = "localhost",
port = 5432,
user = "postgres",
password = db_password)
query <- "
SELECT r.name, m.title, rt.rating
FROM ratings rt
JOIN movies m ON rt.movie_id = m.movie_id
JOIN reviewers r ON rt.reviewer_id = r.reviewer_id;"
raw_data <- dbGetQuery(link, query)
head(raw_data)
## name title rating
## 1 Liz The Substance 5
## 2 Liz Nosferatu 5
## 3 Liz 28 Years Later 5
## 4 Jed Sinners 5
## 5 Jed The Substance 3
## 6 Brenda 28 Years Later 2
rating_conclusion <- raw_data |>
group_by(title) |>
summarise (average_rating = mean(rating, na.rm=TRUE),
review_amount= n()) |>
arrange(desc(average_rating))
view(rating_conclusion)
ggplot(rating_conclusion, aes(x= average_rating, y=title))+ geom_col(fill="blue");