# load data
library(dplyr)
library(nycflights13)
library(ggplot2)
head(flights,10)
names(flights)
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
Is there any relationship between origin airport and and probability of delay?
What are the cases, and how many are there?
nrow(flights)
## [1] 336776
summary(flights$year)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2013 2013 2013 2013 2013 2013
Cases are each flights information departure from EWR,JFK,LGA in 2013.
The are 336776 cases.
Describe the method of data collection.
Data is clollected by the ‘dplyr’ library. It contains all airline on-time data for all flights departing NYC in 2013.
What type of study is this (observational/experiment)?
Observational study
If you collected the data, state self-collected. If not, provide a citation/link.
What is the response variable? Is it quantitative or qualitative?
The response variable is airline delay or not
flights=flights%>%
mutate(dep_delay_bool=ifelse(dep_delay>=0,'ontime','delay'))
class(flights$dep_delay_bool)
## [1] "character"
You should have two independent variables, one quantitative and one qualitative.
class(flights$origin)
## [1] "character"
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
table(flights$dep_delay_bool)
##
## delay ontime
## 183575 144946
prop.table(table(flights$dep_delay_bool))
##
## delay ontime
## 0.5587923 0.4412077
table(flights$origin)
##
## EWR JFK LGA
## 120835 111279 104662
prop.table(table(flights$origin))
##
## EWR JFK LGA
## 0.3587993 0.3304244 0.3107763
table(flights$origin, flights$dep_delay_bool)
##
## delay ontime
## EWR 59300 58296
## JFK 61146 48270
## LGA 63129 38380
prop.table(table(flights$origin, flights$dep_delay_bool),1)
##
## delay ontime
## EWR 0.5042689 0.4957311
## JFK 0.5588397 0.4411603
## LGA 0.6219054 0.3780946