Data Preparation

# load data
#install.packages("fueleconomy")
library(plyr)
library(dplyr)
library(tidyverse)
library(tidyr)
library(tidyselect)
#data("vehicles")

I saved the data as a csv file in my local drive and uploaded the output into my github repository for ease of accessibility, and to ensure that anyone running my file has access to it.

#write.csv(vehicles, file = "vehicles.csv")
#vehicles <- read.csv(file = "vehicles.csv")
#head(vehicles)

Reading the data from my github repository.

vehicles <- read_csv("https://raw.githubusercontent.com/igukusamuel/DATA-606-Project-Proposal/master/vehicles.csv")
head(vehicles)
## # A tibble: 6 x 13
##      X1    id make   model   year class trans drive   cyl displ fuel    hwy
##   <dbl> <dbl> <chr>  <chr>  <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <dbl>
## 1     1 27550 AM Ge~ DJ Po~  1984 Spec~ Auto~ 2-Wh~     4   2.5 Regu~    17
## 2     2 28426 AM Ge~ DJ Po~  1984 Spec~ Auto~ 2-Wh~     4   2.5 Regu~    17
## 3     3 27549 AM Ge~ FJ8c ~  1984 Spec~ Auto~ 2-Wh~     6   4.2 Regu~    13
## 4     4 28425 AM Ge~ FJ8c ~  1984 Spec~ Auto~ 2-Wh~     6   4.2 Regu~    13
## 5     5  1032 AM Ge~ Post ~  1985 Spec~ Auto~ Rear~     4   2.5 Regu~    17
## 6     6  1033 AM Ge~ Post ~  1985 Spec~ Auto~ Rear~     6   4.2 Regu~    13
## # ... with 1 more variable: cty <dbl>

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

Is the number of cylinders in a vehicle predictive of fuel economy?

Cases

What are the cases, and how many are there?

Each case represents a vehicle cylinders. There are (nrow <- length(vehicles$cyl)) 33,442 observations in the given data set.

nrow <- length(vehicles$cyl)
nrow
## [1] 33442

Data collection

Describe the method of data collection.

Fuel economy data for all cars sold in the US from 1984 to 2015 is collected by the Environmental protection agency. (Source: https://www.fueleconomy.gov/feg/download.shtml). This is as a result of vehicle testing done at the Environmental Protection Agency’s National Vehicle and Fuel Emissions Laboratory in Ann Arbor, Michigan, and by vehicle manufacturers with oversight by EPA.

Type of study

What type of study is this (observational/experiment)?

This is an observational study.

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

Fuel economy data for all cars sold in the US from 1984 to 2015 is collected by the Environmental protection agency and is available online here (Source: https://www.fueleconomy.gov/feg/download.shtml). We have extracted it using fueleconomy package available under R (install.packages(“fueleconomy”)).

Dependent Variable

What is the response variable? Is it quantitative or qualitative?

The response variable is the highway fuel consumption/economy and is numerical.

Independent Variable

You should have two independent variables, one quantitative and one qualitative.

The independent variable is the number of cylinders (cyl) in the vehicles and is numerical.

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

summary(vehicles$cyl)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   2.000   4.000   6.000   5.772   6.000  16.000      58
hist(vehicles$cyl, breaks = 5)

summary(vehicles$hwy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.00   19.00   23.00   23.55   27.00  109.00
hist(vehicles$hwy, breaks = 30)

line(vehicles$hwy)
## 
## Call:
## line(vehicles$hwy)
## 
## Coefficients:
## [1]  2.086e+01  1.346e-04
summary(vehicles$hwy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.00   19.00   23.00   23.55   27.00  109.00
plot(vehicles$cyl, vehicles$hwy)

summary(vehicles$hwy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.00   19.00   23.00   23.55   27.00  109.00
plot(vehicles$hwy, vehicles$cyl)