Erase this but notice that you can reuse some of the sections below for the final project report.
Please replace the descriptions and questions below each section header with your groups’ relevant information.
Everyone will need to submit to a proposal and if you are doing this in a group then your proposal should be the same as the other team members.
Team members:
Student 1: Student name
Student 2: Student name
Student 3: Student name
Our project investigates the main characteristics of popular movies in recent years. We will be using the data available at: https://github.com/amanda-nathan/top_1000_other_fields/blob/main/imdb_top_1000.csv
The raw data is included in the data folder.
Note 1: erase this line and put a .csv file of your raw data in the included data folder.
Note 2: erase this too but remember from the posted Moodle project guidelines about your dataset: It is important that you choose a manageable dataset. This means that the data should be readily accessible and large enough that multiple relationships can be explored. As such, your dataset must have at least 50 observations and more than 5 variables/attributes. (Exceptions can be made but you must notify me before.) Ideally, the dataset’s variables should include categorical variables and numerical variables.
Replace this with a draft of your introduction or motivation here. The introduction should introduce your general research topic and your raw data (where it came from, how it was collected, what are the cases, what are the variables, etc.).
Below is an example of what I mean. Replace with your data.
library(tidyverse)
#read preprocessed data
movies <- read_csv("https://raw.githubusercontent.com/amanda-nathan/top_1000_other_fields/main/imdb_top_1000.csv")
glimpse(movies)
## Rows: 1,000
## Columns: 16
## $ Poster_Link <chr> "https://m.media-amazon.com/images/M/MV5BMDFkYTc0MGEtZmN…
## $ Series_Title <chr> "The Shawshank Redemption", "The Godfather", "The Dark K…
## $ Released_Year <chr> "1994", "1972", "2008", "1974", "1957", "2003", "1994", …
## $ Certificate <chr> "A", "A", "UA", "A", "U", "U", "A", "A", "UA", "A", "U",…
## $ Runtime <chr> "142 min", "175 min", "152 min", "202 min", "96 min", "2…
## $ Genre <chr> "Drama", "Crime, Drama", "Action, Crime, Drama", "Crime,…
## $ IMDB_Rating <dbl> 9.3, 9.2, 9.0, 9.0, 9.0, 8.9, 8.9, 8.9, 8.8, 8.8, 8.8, 8…
## $ Overview <chr> "Two imprisoned men bond over a number of years, finding…
## $ Meta_score <dbl> 80, 100, 84, 90, 96, 94, 94, 94, 74, 66, 92, 82, 90, 87,…
## $ Director <chr> "Frank Darabont", "Francis Ford Coppola", "Christopher N…
## $ Star1 <chr> "Tim Robbins", "Marlon Brando", "Christian Bale", "Al Pa…
## $ Star2 <chr> "Morgan Freeman", "Al Pacino", "Heath Ledger", "Robert D…
## $ Star3 <chr> "Bob Gunton", "James Caan", "Aaron Eckhart", "Robert Duv…
## $ Star4 <chr> "William Sadler", "Diane Keaton", "Michael Caine", "Dian…
## $ No_of_Votes <dbl> 2343110, 1620367, 2303232, 1129952, 689845, 1642758, 182…
## $ Gross <dbl> 28341469, 134966411, 534858444, 57300000, 4360000, 37784…
Erase this but what you doing here is using a markdown table to do this.
This section should include exploratory data analysis for the data source proposal should include preliminary investigations the following:
Using what you learn from the exploratory data analysis as a guide, formulate research questions that you can explore with the data chosen for your project.