Overview

This project will look to see if there is a relationship between the speed of cars and the distance taken to stop.

Introduction

The data used for this project was obtained through the datasets library in R and first was published in the 1930 journal article Methods of Correlation Analysis by Mordecai Ezekiel.

Exploring the Data

# Store data in environment
cars <- cars

# View the structure of dataset
str(cars)
## 'data.frame':    50 obs. of  2 variables:
##  $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
##  $ dist : num  2 10 4 22 16 10 18 26 34 17 ...

# Preview of first few lines of data
head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

The dataset contains 50 observations of cars. In these observations, two variables were recorded which were the speed of the car (in MPH) and the stopping distance (in feet). There is no missing data in the dataset.

# Summary of data
summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

The speeds of the cars ranges from 4 MPH to 25 MPH. The stopping distances range from 2 feet to 120 feet.

To see if there is a relationship between the two variables, we should look at a scatterplot of the data.

# Exploring data with a scatterplot
plot(cars$speed, cars$dist, xlab = "Speed (MPH)", ylab = "Distance (feet)", main = "Stopping Distance of Cars Based on Speed")

Based on the data, it appears there could be a linear correlation between the two variables.

Analysis

Correlation Coefficient

# Calculate correlation coefficient
cor(cars$speed, cars$dist)
## [1] 0.8068949

The correlation coefficient is r = 0.807 which for a sample size of 50 indicates a strong positive linear correlation between the two variables.

Linear Regression

# Create linear regression model
mod1 <- lm(dist~speed, data = cars)
mod1
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Coefficients:
## (Intercept)        speed  
##     -17.579        3.932

The regression equation of the linear model with distance as a response variable (y) and speed as the explanatory variable (x) is \(y = 3.932x - 17.579\)

Explained Variation

# Summary of our linear model
summary(mod1)$r.squared
## [1] 0.6510794

This relationship has an \(r^2 = 0.6511\). That means approximately 65.11% of the variation in stopping distance is explained by the speed of the car.

Conclusions

Based on the data used in this analysis, it appears that the stopping distance of a car does depend on the speed the car was traveling. More specifically, for every additional MPH of speed the car is traveling, it will take the car an additional 4 more feet to come to a complete stop.

Limitations

One major limitation of the study is we do not have any other data on the cars used in this study. Perhaps there are other variables to consider as well that were affecting the stopping distances of the cars like weight or length. We do not know if this data was all collected from the same model car, just a few models, or 50 different models. The results would be less reliable if there were several different cars than if it was all gathered from the same model car.

Another limitation of the study is that this data was recorded in the 1920s which may not still be accurate about cars today since cars have evolved considerably since then.


This document was produced as a final project for MAT 143H - Introduction to Statistics (Honors) at North Shore Community College.
The course was led by Professor Billy Jackson.
Student Name: Billy Jackson
Semester: Fall 2017