Read in the mtcars dataset:

Make sure you have set RStudio’s Working Directory to your STATS 10 Lab Folder:

Session -> Set Working Directory -> Choose Directory

IMPORTANT: Run the chunk below FIRST! We will need this data to complete the lab!

# Glossary:
# mpg: Miles/(US) gallon
# cyl: Number of cylinders
# disp: Displacement (cu.in.)
# hp: Gross horsepower
# drat: Rear axle ratio
# wt: Weight (1000 lbs)
# qsec: 1/4 mile time
# vs: Engine (0 = V-shaped, 1 = straight)
# am: Transmission (0 = automatic, 1 = manual)
# gear: Number of forward gears 
# carb: Number of carburetors
# Source: https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/mtcars

mtcars_df <- mtcars
head(mtcars_df)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Code Examples

Scatterplot

# y ~ x means that we are plotting the variables y on the y-axis and x on the x-axis
# We use these arguments to plot the data and label the scatter plot:
# main: Title the plot (character)
# xlab: Label the horizontal or x-axis (character)
# ylab: Label the vertical or y-axis (character)
# col: The colors of the points (character); see https://r-charts.com/colors/
# pch: The symbols of the points (numerical); see https://r-charts.com/base-r/pch-symbols/
# cex: The sizes of the points (numerical)

plot(
  mtcars_df$mpg ~ mtcars_df$wt,
  main = "Miles per Gallon versus Weight",
  xlab = "Weight (1000 lbs)",
  ylab = "Miles per Gallon",
  col = "red",
  pch = 3,
  cex = 1.5
)

Correlation

# cor(): Return the correlation coefficient between two vectors
cor(mtcars_df$mpg, mtcars_df$wt)
## [1] -0.8676594

Simple Linear Regression Models

# lm: Fit a linear model.
# y ~ x: Specify that y and x are your DEPENDENT and INDEPENDENT variables
# data: The data frame or table from which we are pulling the values
# summary(): Print a summary table for the model
# You can read the goodness of fit (R-squared) from the summary table
linear_model <- lm(mpg ~ wt, data = mtcars_df)
summary(linear_model)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Plotting Regression Lines

# We use abline(model_name) to add regression lines to scatterplots
# Below, we have the same scatterplot as above.
# We use abline(linear_model) to add the regression line to it.
# lwd: Line width
# lty: Line type (lty = 2 makes dashed lines); see https://r-charts.com/base-r/line-types/

plot(
  mtcars_df$mpg ~ mtcars_df$wt,
  main = "Miles per Gallon versus Weight",
  xlab = "Weight (1000 lbs)",
  ylab = "Miles per Gallon",
  col = "red",
  pch = 3,
  cex = 1.5
)
abline(linear_model, lwd = 1.5, lty = 2)


Lab Exercises

Question 1

Explain the difference between the correlation coefficient and the goodness of fit value.

Answer: Delete this text and type in your answer.

Question 2

Choose two variables other than mpg and wt from the mtcars dataset (you can use only one of mpg or wt). Create a scatterplot and calculate/print the correlation coefficient. Interpret the correlation coefficient for your chosen two variables.

Scatterplot

# Fill in the dots based on the two variables you choose!
# Scroll up to the "Scatterplot" example to see how.

plot(
  mtcars_df$wt ~ mtcars_df$disp,
  main = "wt vs. disp",
  xlab = "disp",
  ylab = "wt",
  col = "red",
  pch = 3,
  cex = 1.5
)

Correlation

# Fill in the dots based on the two variables you choose!
# Scroll up to the "Correlation" example to see how.

cor(mtcars_df$wt, mtcars_df$disp)
## [1] 0.8879799

Answer: Delete this text and type in your answer.

Question 3

Using the same two variables from Question 2, create a linear regression line and plot it on the scatterplot. What is the intercept and slope coefficients, and what do they mean in the context of the data?

Linear Regression

# Fill in the dots based on the variables you choose!
# Scroll up to the "Simple Linear Regression Models" example to see how.

mtcars_model <- lm(wt ~ disp, data = mtcars_df)
summary(mtcars_model)
## 
## Call:
## lm(formula = wt ~ disp, data = mtcars_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.89044 -0.29775 -0.00684  0.33428  0.66525 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.5998146  0.1729964   9.248 2.74e-10 ***
## disp        0.0070103  0.0006629  10.576 1.22e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4574 on 30 degrees of freedom
## Multiple R-squared:  0.7885, Adjusted R-squared:  0.7815 
## F-statistic: 111.8 on 1 and 30 DF,  p-value: 1.222e-11

Regression Line

# Exactly the same scatterplot code as in Question 2
# We only add abline(mtcars_model) at the end to add the line

plot(
  mtcars_df$wt ~ mtcars_df$disp,
  main = "mpg vs. wt",
  xlab = "wt",
  ylab = "mpg",
  col = "red",
  pch = 3,
  cex = 1.5
)
abline(mtcars_model, lwd = 1.5, lty = 2)

Answer: Delete this text and type in your answer.

Question 4

Explain whether the linear regression line is a good model for the chosen variables from Question 2.

Answer: I would prefer a clustering strategy

Question 5

Using the same two variables from Question 2, find the correlation coefficient without using the cor() function. Check with your answer in Question 2.

# Hints:
# 1. What will taking the square root of the R-squared yield?
# From which table can you get the R-squared?
# 2. Remember that the correlation coefficient has a sign.
# From which plot can you see the direction of the relationship? 
summary(mtcars_model)$r.squared |> sqrt()
## [1] 0.8879799
## code here (OPTIONAL)

Answer: \(r^2 \implies \sqrt{r^2}=r_{xy}\)