Data Preparation

# load data
#install.packages("dplyr")

install.packages("nycflights13", repos='http://cran.us.r-project.org')

library(nycflights13)

fl_data<-flights

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

Either the “origin location” of a given flight route or the “destination location” of a given flight route has a significant effect on the “delay in arrival time” that is observed upon a flight’s arrival to its designated destination

Cases

What are the cases, and how many are there?

336776 flight details are the cases

Data collection

Describe the method of data collection.

Airline on-time data for all flights departing NYC in 2013. Also includes useful ‘metadata’ on airlines, airports, weather, and planes.

https://cran.r-project.org/web/packages/nycflights13/nycflights13.pdf

https://github.com/hadley/nycflights13/blob/master/data/flights.rda?raw=true

Type of study

What type of study is this (observational/experiment)?

Observational

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

Airline on-time data for all flights departing NYC in 2013. Also includes useful ‘metadata’ on airlines, airports, weather, and planes.

https://cran.r-project.org/web/packages/nycflights13/nycflights13.pdf

https://github.com/hadley/nycflights13/blob/master/data/flights.rda?raw=true

Response

What is the response variable, and what type is it (numerical/categorical)?

categorical

Explanatory

What is the explanatory variable, and what type is it (numerical/categorival)?

categorical

Relevant summary statistics

Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.