Multiple Regression Practice Quiz

Getting the Data

mammals = read.csv("https://raw.githubusercontent.com/jfcross4/stats/refs/heads/master/mammals2.csv")
rents = read.csv("https://raw.githubusercontent.com/jfcross4/stats/refs/heads/master/rent_data.csv")
quiz = read.csv("https://raw.githubusercontent.com/jfcross4/stats/refs/heads/master/study_data.csv")

1. Mammals

This data set has sleep hours

Create the following model:

m = lm(total_sleep ~ danger, data=mammals)
summary(m)

Write the equation you created to predict sleep hours.
Use your equation to predict the sleep for a mammal with a danger level of 5.
Interpret the coefficient of danger in your model.

Now, create a model to predict total_sleep from life_span and gestation:

m = lm(total_sleep ~ life_span + gestation, data=mammals)
summary(m)

Looking at the summary of this model, which appears to be a more important predictor of total sleep, gestation or life_span? Please explain your reasoning.

2. Rents

View(rents)

This data frame has the prices of rental aparments (“rent”) along with the square footage (“sqft”) as well as a column (“near_subway”) that takes on values of yes or no depending on whether the apartment is near a subway.

Create and interpret the following model to predict rent from whether an apartment is near the subway:

m = lm(rent ~ near_subway, data=rents)
summary(m)

Now, add square footage as a second predictor (as shown below):

m = lm(rent ~ near_subway + sqft, data=rents)
summary(m)

How did adding “sqft” to the model effect the coefficient of “near_subway”? How would you explain this difference?

Predict the rental price for a 1000 square foot apartment that is not near the subway.

3. Studying

The “quiz” data frame has data on quiz scores along with the number of hours students studied in the week before as well as the number of hours they slept the night before:

View(quiz)

We can create a model to predict scores from sleep hours as follows:

m = lm(score ~ sleep_hours, data=quiz)
summary(m)

Bonus: What is the relationship between sleeping and quiz performance? How does adding study hours to your model affect your results? Please create additional models using this data set and explain what you find.