Final Project Plan and Data Source Proposal

Erase this but notice that you can reuse some of the sections below for the final project report.

Please replace the descriptions and questions below each section header with your groups’ relevant information.

Everyone will need to submit to a proposal and if you are doing this in a group then your proposal should be the same as the other team members.

Title of your project goes here

Team members:

Student 1: Student name
Student 2: Student name
Student 3: Student name

Overview

Our project investigates the main characteristics of popular movies in recent years. We will be using the data available at: https://github.com/amanda-nathan/top_1000_other_fields/blob/main/imdb_top_1000.csv

The raw data is included in the data folder.

Note 1: erase this line and put a .csv file of your raw data in the included data folder.

Note 2: erase this too but remember from the posted Moodle project guidelines about your dataset: It is important that you choose a manageable dataset. This means that the data should be readily accessible and large enough that multiple relationships can be explored. As such, your dataset must have at least 50 observations and more than 5 variables/attributes. (Exceptions can be made but you must notify me before.) Ideally, the dataset’s variables should include categorical variables and numerical variables.

Section 1. Introduction

Replace this with a draft of your introduction or motivation here. The introduction should introduce your general research topic and your raw data (where it came from, how it was collected, what are the cases, what are the variables, etc.).

Section 2. Data Description

Print out a “glimpse” the data frame.

Below is an example of what I mean. Replace with your data.

library(tidyverse)
#read preprocessed data
movies <- read_csv("https://raw.githubusercontent.com/amanda-nathan/top_1000_other_fields/main/imdb_top_1000.csv")
glimpse(movies)

## Rows: 1,000
## Columns: 16
## $ Poster_Link   <chr> "https://m.media-amazon.com/images/M/MV5BMDFkYTc0MGEtZmN…
## $ Series_Title  <chr> "The Shawshank Redemption", "The Godfather", "The Dark K…
## $ Released_Year <chr> "1994", "1972", "2008", "1974", "1957", "2003", "1994", …
## $ Certificate   <chr> "A", "A", "UA", "A", "U", "U", "A", "A", "UA", "A", "U",…
## $ Runtime       <chr> "142 min", "175 min", "152 min", "202 min", "96 min", "2…
## $ Genre         <chr> "Drama", "Crime, Drama", "Action, Crime, Drama", "Crime,…
## $ IMDB_Rating   <dbl> 9.3, 9.2, 9.0, 9.0, 9.0, 8.9, 8.9, 8.9, 8.8, 8.8, 8.8, 8…
## $ Overview      <chr> "Two imprisoned men bond over a number of years, finding…
## $ Meta_score    <dbl> 80, 100, 84, 90, 96, 94, 94, 94, 74, 66, 92, 82, 90, 87,…
## $ Director      <chr> "Frank Darabont", "Francis Ford Coppola", "Christopher N…
## $ Star1         <chr> "Tim Robbins", "Marlon Brando", "Christian Bale", "Al Pa…
## $ Star2         <chr> "Morgan Freeman", "Al Pacino", "Heath Ledger", "Robert D…
## $ Star3         <chr> "Bob Gunton", "James Caan", "Aaron Eckhart", "Robert Duv…
## $ Star4         <chr> "William Sadler", "Diane Keaton", "Michael Caine", "Dian…
## $ No_of_Votes   <dbl> 2343110, 1620367, 2303232, 1129952, 689845, 1642758, 182…
## $ Gross         <dbl> 28341469, 134966411, 534858444, 57300000, 4360000, 37784…

Replace this but create a data dictionary that is neatly formatted and easy to read.

Erase this but what you doing here is using a markdown table to do this.

Section 3. Preliminary Exploratory Data Analysis

This section should include exploratory data analysis for the data source proposal should include preliminary investigations the following:

Uni-variate summary statistics and data visualizations.
Bi-variate and/or multivariate data visualizations and summary statistics if applicable.
Narrative about what you observe in the exploratory data analysis and what you learn about the data from the exploratory data analysis.

Section 4. Research Questions

Using what you learn from the exploratory data analysis as a guide, formulate research questions that you can explore with the data chosen for your project.