Approach Deliverable

Approach

This project will collect and analyze simple movie rating data using a SQL database and R. I will select six recent popular movies and ask at least five people to rate each movie they have seen on a 1–5 scale. Participants will be allowed to skip movies they have not watched, which will result in missing ratings.

The data will be stored in a relational database using a normalized structure with separate tables for users, movies, and ratings. All tables will be created and populated using SQL code.

After the data is stored, I will load it into R as a dataframe by querying the database directly. In R, I will inspect the data, handle missing ratings appropriately, and compute basic summaries such as average ratings and counts per movie and user.

Anticipated Challenges

The main challenge is missing data, since not all participants will rate every movie. This will be handled by allowing NULL values in the database and excluding missing ratings from summary calculations. Another challenge is ensuring correct joins between tables, which will be addressed by carefully validating IDs and relationships.