Data 607 Project 3

PART 1 - Project 1 Description

Create a short document, with the names of group members. You should briefly describe your collaboration tool(s) you’ll use as a group, including for communication, code sharing, and project documentation. You should have identified your data sources, where the data can be found, and how to load it. And you should have created at least a logical model for your normalized database, and produced an Entity-Relationship (ER) diagram documenting your database design

Project 3 Team Members

Ariann Chai
Lwin Shwe
Chun Shing Leung

Project Tools

slack channel and zoom are used for collaboration and communication. We will use R studio to write the codes and publish it to RPubs (https://rpubs.com) . We will use draw.io to create the ER Diagram. Source CSV and RMD files are saved in a github so all team members can access the code. We will use google slides for presentation.

Project 3 Data Sources

The data used to answer the question is “Netflix TV Shows and Movies” from Kaggle.

Source: https://www.kaggle.com/datasets/victorsoeiro/netflix-tv-shows-and-movies/code

Source CSV (Uncleaned for Project): https://github.com/tonyCUNY/tonyCUNY/blob/main/titles.csv

Question/Model for this data

Load the data into R using tidyverse package
tidy/transform the data
Visualize the relationship between variables
Answer questions such as:
What are the top 10 movies with highest scores?
Which movies have large amount of voters?
How these information change across countries, years, ages and genres?

ER Diagram

https://github.com/tonyCUNY/tonyCUNY/blob/main/ER2.jpg