Load some necessary packages:

library(ggplot2)

library(dplyr)


Question 1

A real estate appraiser wants to explore the relationship between sale price of an apartment building and other characteristics of the property. The data available is contained in the file, MNSALES.csv. The variables of interest are:

Price: Sale price of the building (in dollars)

NumApts: Number of apartments in the building

Age: Age of the building in years

LotSize: Size of the lot on which the building is built (in square feet)

Parking: Number of parking spots

Condition: Condition of the building (F=Fair, G=Good, E=Excellent)


Read in the data:

MNSALES = read.csv('https://raw.githubusercontent.com/vittorioaddona/data/main/MNSALES.csv')


Use this dataset to answer the following questions.


(a) Find the R-squared value for each of the following models:

Price ~ Age

model=lm(formula=Price~Age,data=MNSALES)
ssr=sum((fitted(model)-mean(MNSALES$Price))^2)
ssr
## [1] 14076534339
sse=sum((fitted(model)-MNSALES$Price)^2)
sse
## [1] 1.059793e+12
sst=ssr+sse
sst
## [1] 1.07387e+12
Rsquared=1-(ssr/sst)
Rsquared
## [1] 0.9868918

Rsquared(Price~Age)=0.9868918 #### Price ~ NumApts

model1=lm(formula=Price~NumApts,data=MNSALES)
ssr1=sum((fitted(model1)-mean(MNSALES$Price))^2)
ssr1
## [1] 915775843108
sse1=sum((fitted(model1)-MNSALES$Price)^2)
sse1
## [1] 158094104206
sst1=ssr1+sse1
sst1
## [1] 1.07387e+12
Rsquared1=1-(ssr1/sst1)
Rsquared1
## [1] 0.147219

Rsquared(Price~NumApts)=0.147219 #### Price ~ LotSize

model2=lm(formula=Price~LotSize,data=MNSALES)
ssr2=sum((fitted(model2)-mean(MNSALES$Price))^2)
ssr2
## [1] 590946204278
sse2=sum((fitted(model2)-MNSALES$Price)^2)
sse2
## [1] 482923743036
sst2=ssr2+sse2
sst2
## [1] 1.07387e+12
Rsquared2=1-(ssr2/sst2)
Rsquared2
## [1] 0.4497041

Rsquared(Price~LotSize)=0.4497041 #### Price ~ Parking

model3=lm(formula=Price~Parking,data=MNSALES)
ssr3=sum((fitted(model3)-mean(MNSALES$Price))^2)
ssr3
## [1] 54310439533
sse3=sum((fitted(model3)-MNSALES$Price)^2)
sse3
## [1] 1.01956e+12
sst3=ssr3+sse3
sst3
## [1] 1.07387e+12
Rsquared3=1-(ssr3/sst3)
Rsquared3
## [1] 0.9494255

Rsquared(Price~Parking)=0.9494255 #### Price ~ Condition

model4=lm(formula=Price~Condition,data=MNSALES)
ssr4=sum((fitted(model4)-mean(MNSALES$Price))^2)
ssr4
## [1] 89083840373
sse4=sum((fitted(model4)-MNSALES$Price)^2)
sse4
## [1] 984786106941
sst4=ssr4+sse4
sst4
## [1] 1.07387e+12
Rsquared4=1-(ssr4/sst4)
Rsquared4
## [1] 0.9170441

Rsquared(Price~Condition)=0.9170441


(b) Using your answers to (a), rank the explanatory variables from best to worst.

1.) Age 2.) Parking 3.) Condition 4.) LotSize 5.) NumApts

(c) Now, find the R-squared values for each of the following models, and rank them from best to worst:

Price ~ NumApts + LotSize

model5=lm(formula=Price~NumApts+LotSize,data=MNSALES)
ssr5=sum((fitted(model5)-mean(MNSALES$Price))^2)
ssr5
## [1] 915809281076
sse5=sum((fitted(model5)-MNSALES$Price)^2)
sse5
## [1] 158060666239
sst5=ssr5+sse5
sst5
## [1] 1.07387e+12
Rsquared5=1-(ssr5/sst5)
Rsquared5
## [1] 0.1471879

Rsquared(Price~NumApts+LotSize)=0.1471879 #### Price ~ NumApts + Condition

model6=lm(formula=Price~NumApts+Condition,data=MNSALES)
ssr6=sum((fitted(model6)-mean(MNSALES$Price))^2)
ssr6
## [1] 986169504463
sse6=sum((fitted(model6)-MNSALES$Price)^2)
sse6
## [1] 87700442851
sst6=ssr6+sse6
sst6
## [1] 1.07387e+12
Rsquared6=1-(ssr6/sst6)
Rsquared6
## [1] 0.08166766

Rsquared(Price~NumApts+Condition)=0.08166766 #### Price ~ NumApts + Age

model7=lm(formula=Price~NumApts+Age,data=MNSALES)
ssr7=sum((fitted(model7)-mean(MNSALES$Price))^2)
ssr7
## [1] 926823801896
sse7=sum((fitted(model7)-MNSALES$Price)^2)
sse7
## [1] 147046145418
sst7=ssr7+sse7
sst7
## [1] 1.07387e+12
Rsquared7=1-(ssr7/sst7)
Rsquared7
## [1] 0.1369311

Rsquared(Price~NumApts+Age)=0.1369311

1.) NumApts+LotSize 2.) NumApts+Age 3.) NumApts+Condition

(d) What might be surprising about your rankings in (c)? And how can you explain these results?

they really don’t match anything from (b) this might be due to the NumApts Variable



Question 2

HumanHeight = read.csv('https://raw.githubusercontent.com/vittorioaddona/data/main/HumanHeight.csv')

The Correlation Coefficient: Correlation (r, \(\rho\), R) is a measure of the strength of the linear relationship between two quantitative variables. Correlation is always between -1 (perfect negative relationship) and +1 (perfect positive relationship).

(a) To find the correlation between two variables, use the cor function in R. For example, using the HumanHeight.csv data, we can find the correlation between Height and FatherHeight using the command below (uncomment the command by deleting the “#”, and run it):

 HumanHeight %>% summarize( cor( Height , FatherHeight ) )
##   cor(Height, FatherHeight)
## 1                 0.2753548


(b) What if you had typed the following (again, uncomment and run the command):

 HumanHeight %>% summarize( cor( FatherHeight , Height ) )
##   cor(FatherHeight, Height)
## 1                 0.2753548


(c) Describe in your own words at least 1 implication of this property of correlation.

correlation between two variables doesn’t change based on which variable is explanatory or response.


(d) Consider the following two models for the average height of an individual:

Model 1: Height = m1*FatherHeight + b1

Model 2: FatherHeight = m2*Height + b2

Is m1 = m2?

yes ### Is b1 = b2? yes


(e) The notation often used for correlation is R (not the software!). Why do you think this is? Compare the R-squared values of the two models presented in (d). What do you notice?

model8=lm(formula=Height~FatherHeight,data=HumanHeight)
model8=lm(formula=Height~FatherHeight,data=HumanHeight)
ssr8=sum((fitted(model8)-mean(HumanHeight$Height))^2)
ssr8
## [1] 873.0753
sse8=sum((fitted(model8)-HumanHeight$Height)^2)
sse8
## [1] 10641.99
sst8=ssr8+sse8
sst8
## [1] 11515.06
Rsquared8=1-(ssr8/sst8)
Rsquared8
## [1] 0.9241797

Model1R2=0.9241797

model9=lm(formula=FatherHeight~Height,data=HumanHeight)
ssr9=sum((fitted(model9)-mean(HumanHeight$FatherHeight))^2)
ssr9
## [1] 415.013
sse9=sum((fitted(model9)-HumanHeight$FatherHeight)^2)
sse9
## [1] 5058.628
sst9=ssr9+sse9
sst9
## [1] 5473.641
Rsquared9=1-(ssr9/sst9)
Rsquared9
## [1] 0.9241797

Model2R2=0.9241797 the R2 for both models is the same


(f) Limitations of correlation: Think about the following two scenarios:

1. Suppose we wanted to find the correlation between Height and Sex. Try it. What happens and why?

"HumanHeight%>%
  summarize(cor(Height,Sex))"
## [1] "HumanHeight%>%\n  summarize(cor(Height,Sex))"

we get an error message because Sex is a categorical variable


2. Suppose we wanted to model Height by FatherHeight, MotherHeight, and Sex all in the same model. Can we find a correlation in this situation? Can we find an R-squared value?

"HumanHeight%>%
  summarize(cor(Height,FatherHeight,MotherHeight,Sex))"
## [1] "HumanHeight%>%\n  summarize(cor(Height,FatherHeight,MotherHeight,Sex))"
model10=lm(formula=Height~FatherHeight+MotherHeight+Sex,data=HumanHeight)
model10
## 
## Call:
## lm(formula = Height ~ FatherHeight + MotherHeight + Sex, data = HumanHeight)
## 
## Coefficients:
##  (Intercept)  FatherHeight  MotherHeight          SexM  
##      15.3448        0.4060        0.3215        5.2260
model10=lm(formula=Height~FatherHeight+MotherHeight+Sex,data=HumanHeight)
ssr10=sum((fitted(model10)-mean(HumanHeight$Height))^2)
ssr10
## [1] 7365.9
sse10=sum((fitted(model10)-HumanHeight$Height)^2)
sse10
## [1] 4149.162
sst10=ssr10+sse10
sst10
## [1] 11515.06
Rsquared10=1-(ssr10/sst10)
Rsquared10
## [1] 0.3603248

we cannot find a correlation for this, but we can find the Rsquared value, which is 0.3603248

(g) Why do you think we will prefer R-squared to correlation in our class?

because it works regardless of whether a variable is quantitative or qualitative



Question 3

Answer the following TRUE/FALSE questions regarding R-squared, AND provide a justification for each:

(a) TRUE or FALSE: Consider the following two models:

M1: y ~ x1

M2: y ~ x2 + x3 + x4

where x1, x2, x3, and x4, are explanatory variables. M2 will necessarily have a higher R-squared value than M1.

False, because the addition of additional explanatory variables does not mean a bigger R-squared value necessarily


(b) TRUE or FALSE: Consider the following four models and their R-squared values:

M1: y ~ x1 \(~\rightarrow~\) R-squared = 0.70

M2: y ~ x2 \(~\rightarrow~\) R-squared = 0.60

M3: y ~ x3 \(~\rightarrow~\) R-squared = 0.50

M4: y ~ x4 \(~\rightarrow~\) R-sqaured = 0.40

Form these two models: M5: y ~ x1 + x2 and M6: y ~ x3 + x4. M5 will necessarily have a higher R-squared value than M6.

False, the fact that M5 consists of the two higher Rsquared values does not mean anything, there is no real correlation between the R2 value of 2, single explanatory variable models, and the R2 value of a Model that combines those two explanatory variables.



Question 4

This question reviews the concept of nested models.

(a) Which of the following models are nested in: A ~ B*C + D?

Model 1:\(~\) A ~ B

Model 2:\(~\) A ~ B + D

Model 3:\(~\) B ~ C

Model 4:\(~\) A ~ B + C + D

Model 5:\(~\) A ~ B*D + C

Model 6:\(~\) A ~ D


(b)

Consider the following models involving variables A, B, C, and D.

Model 1:\(~\) A ~ B

Model 2:\(~\) A ~ B + C

Model 3:\(~\) A ~ B + C + B:C

Model 4:\(~\) A ~ C + D

Model 5:\(~\) A ~ B*C

Model 6:\(~\) B ~ A

Model 7:\(~\) B ~ A*C

Answer TRUE or FALSE for each of the following:

(i) Model 1 is nested in Model 2.

(ii) Model 1 is nested in Model 3.

(iii) Model 1 is nested in Model 4.

(iv) Model 2 is nested in Model 3.

(v) Model 3 is nested in Model 2.

(vi) Model 2 is identical to Model 5.

(vii) Model 3 is identical to Model 5.

(viii) Model 1 is identical to Model 6.

(ix) Model 1 has the same R-squared as Model 6. Typically, one shouldn’t compare the R-squared of models with different response variables, but this is a special case.

(x) Model 2 is nested in Model 7.



Question 5

One measure of fitness is percent body fat. Unlike body mass index (BMI), percent body fat is not easy to calculate. BodyFat.csv has estimates of percent body fat (determined by underwater weighing) and various body circumference measurements for 252 men. The variables are:

BF: percent body fat from Siri’s equation (percentage, 0-100)

Age: individual’s age in years

Weight: individual’s weight in pounds (lbs)

Height: individual’s height in inches

Neck: individual’s neck circumference in cm

Chest: individual’s chest circumference in cm

Abdomen: individual’s abdomen circumference in cm

Hip: individual’s hip circumference in cm

Thigh: individual’s thigh circumference in cm

Knee: individual’s knee circumference in cm

Ankle: individual’s ankle circumference in cm

Biceps: individual’s bicep circumference in cm

Forearm: individual’s forearm circumference in cm

Wrist: individual’s wrist circumference in cm


Read in the data:

BodyFat = read.csv('https://raw.githubusercontent.com/vittorioaddona/data/main/BodyFat.csv')


Use this dataset to answer the following questions.


(a) Find the R-squared of the model: BF ~ Weight.

model1A=lm(formula=BF~Weight,data=BodyFat)
ssr1A=sum((fitted(model1A)-mean(BodyFat$BF))^2)
ssr1A
## [1] 6593.016
sse1A=sum((fitted(model1A)-BodyFat$BF)^2)
sse1A
## [1] 10985.97
sst1A=ssr1A+sse1A
sst1A
## [1] 17578.99
Rsquared1A=1-(ssr1A/sst1A)
Rsquared1A
## [1] 0.6249491

R2(BF~Weight)=0.6249491


(b) Find the R-squared of the model: BF ~ Height.

model2A=lm(formula=BF~Height,data=BodyFat)
ssr2A=sum((fitted(model2A)-mean(BodyFat$BF))^2)
ssr2A
## [1] 140.7976
sse2A=sum((fitted(model2A)-BodyFat$BF)^2)
sse2A
## [1] 17438.19
sst2A=ssr2A+sse2A
sst2A
## [1] 17578.99
Rsquared2A=1-(ssr2A/sst2A)
Rsquared2A
## [1] 0.9919906

R2(BF~Height)=0.9919906


(c) Based on your answers to (a) and (b), what do you think are the smallest and largest possible R-squared values for the model: BF ~ Weight + Height.

the smallest is probably around 0.5 and the highest is probably about 1


(d) Find the R-squared value of the model: BF ~ Weight + Height.

model3A=lm(formula=BF~Weight+Height,data=BodyFat)
ssr3A=sum((fitted(model3A)-mean(BodyFat$BF))^2)
ssr3A
## [1] 8097.391
sse3A=sum((fitted(model3A)-BodyFat$BF)^2)
sse3A
## [1] 9481.599
sst3A=ssr3A+sse3A
sst3A
## [1] 17578.99
Rsquared3A=1-(ssr3A/sst3A)
Rsquared3A
## [1] 0.5393711

R2(BF~Weight+Height)=0.5393711

(e) By comparing the values found in (a), (b), and (d), explain in your own words what you have learned about R-squared.

that the addition of additional variables decreases how accurate a model is.



Question 6

Answer the following short answer questions in your own words:

(a) How does R-squared summarize the quality of a model (that is, describe what R-squared is actually measuring)?


(b) What does it mean for one model to be nested inside another model?


(c) State at least 2 ways that the correlation coefficient differs from R-squared?


(d) Consider models of the form: response ~ 1. Briefly explain in your own words why R-squared = 0 for such models.


(e) Generally, in comparing two models with the same response variable, the model with the larger R-sqaured is better. Add some nuance to this rule of thumb.



Question 7

Condsider the following four scatterplots with response variables y and four different explanatory variables x. If we were to fit a simple linear regression model y ~ x to each of these sets of data, order the R-squared values from largest to smallest for the four models.




Question 8

Our primary research question for the FEV data is: what is the impact of smoking on lung function in children? Forced expiratory volume (FEV) is a measure of lung capacity. Higher values indicate higher lung capacity - the ability to blow out more air in one second - and therefore better lung function. Data are available for 654 children, ages 3-19. Each row in the data corresponds to a child’s visit to a doctor. All children attended the same pediatric clinic. The date of the study is unknown, but it was prior to 2005. Variables collected include:

age: at time of measurement (years)

fev: forced expiratory volume fev (liters per second)

height: at time of measurement (inches)

sex: (male/female)

smoke: indicator of smoking habits (smoker/nonsmoker)


Load the FEV data:

fevdata = read.csv('https://raw.githubusercontent.com/vittorioaddona/data/main/fev.csv')


(a) We’ll be evaluating and comparing 4 models:

Model 1:\(~\) fev ~ age

Model 2:\(~\) fev ~ smoke

Model 3:\(~\) fev ~ smoke + age

Model 4:\(~\) fev ~ smoke + age + smoke:age


Uncomment (delete the ‘#’), and complete the code chunk below to fit these 4 models and save them as objects called mod1, mod2, mod3, and mod4.

mod1=lm(formula=fev~age,data=fevdata) 

mod2=lm(formula=fev~smoke,data=fevdata)

mod3=lm(formula=fev~smoke+age,data=fevdata)

mod4=lm(formula=fev~smoke+age+smoke:age,data=fevdata)


(b) Most of our model evaluation techniques focus on the model residuals. Answer the following 2 brief review questions to help remind ourselves of some key terms:


What is a fitted value?

the predicted response based on an explanatory value

What is a residual?

the difference between the observed value and the fitted value



(c) Using Model 1, calculate the fitted value and residual for an eight-year-old with an FEV of 1.724 liters/second.

fitted(mod1)
##        1        2        3        4        5        6        7        8 
## 2.430017 2.207976 1.985935 2.430017 2.430017 2.207976 1.763894 1.763894 
##        9       10       11       12       13       14       15       16 
## 2.207976 2.430017 1.763894 2.207976 2.207976 2.207976 2.207976 1.985935 
##       17       18       19       20       21       22       23       24 
## 1.541853 1.763894 2.430017 2.430017 1.541853 1.541853 1.319812 1.985935 
##       25       26       27       28       29       30       31       32 
## 2.430017 1.097771 2.430017 1.541853 2.207976 2.430017 1.541853 2.430017 
##       33       34       35       36       37       38       39       40 
## 2.207976 1.985935 1.541853 2.207976 2.430017 2.207976 2.207976 2.207976 
##       41       42       43       44       45       46       47       48 
## 2.430017 2.207976 1.541853 2.207976 1.541853 2.430017 1.985935 2.207976 
##       49       50       51       52       53       54       55       56 
## 1.763894 2.207976 1.541853 2.430017 2.430017 2.207976 1.763894 2.430017 
##       57       58       59       60       61       62       63       64 
## 2.430017 1.985935 1.319812 2.207976 2.207976 2.207976 1.763894 1.319812 
##       65       66       67       68       69       70       71       72 
## 2.207976 1.763894 2.430017 1.985935 1.541853 2.430017 2.207976 2.207976 
##       73       74       75       76       77       78       79       80 
## 2.430017 2.430017 2.430017 1.985935 1.541853 1.541853 2.430017 1.763894 
##       81       82       83       84       85       86       87       88 
## 1.985935 1.763894 2.207976 2.207976 1.985935 2.207976 1.985935 2.430017 
##       89       90       91       92       93       94       95       96 
## 1.541853 2.430017 2.430017 2.430017 1.985935 2.207976 2.207976 2.430017 
##       97       98       99      100      101      102      103      104 
## 2.430017 2.430017 1.985935 2.207976 2.207976 1.985935 2.430017 1.319812 
##      105      106      107      108      109      110      111      112 
## 2.430017 1.763894 2.207976 1.763894 1.985935 1.985935 2.207976 1.985935 
##      113      114      115      116      117      118      119      120 
## 1.985935 1.985935 1.985935 2.207976 1.985935 1.541853 2.207976 1.985935 
##      121      122      123      124      125      126      127      128 
## 2.430017 1.985935 1.985935 1.763894 2.207976 2.207976 2.207976 2.430017 
##      129      130      131      132      133      134      135      136 
## 1.985935 2.207976 2.430017 2.207976 2.207976 2.430017 2.207976 1.763894 
##      137      138      139      140      141      142      143      144 
## 1.763894 2.207976 2.430017 1.541853 1.985935 2.430017 1.763894 2.430017 
##      145      146      147      148      149      150      151      152 
## 2.430017 2.430017 1.763894 2.207976 2.430017 2.207976 2.207976 2.430017 
##      153      154      155      156      157      158      159      160 
## 2.430017 2.430017 1.985935 2.207976 1.763894 2.430017 2.430017 2.430017 
##      161      162      163      164      165      166      167      168 
## 1.985935 2.207976 1.541853 2.207976 2.430017 1.763894 2.430017 1.763894 
##      169      170      171      172      173      174      175      176 
## 2.207976 1.541853 1.985935 1.985935 1.319812 2.430017 2.207976 2.430017 
##      177      178      179      180      181      182      183      184 
## 2.430017 2.430017 1.541853 2.430017 1.985935 1.763894 2.430017 2.430017 
##      185      186      187      188      189      190      191      192 
## 2.430017 1.985935 1.541853 2.207976 2.430017 1.985935 2.430017 2.207976 
##      193      194      195      196      197      198      199      200 
## 2.430017 1.763894 1.763894 2.207976 2.430017 1.541853 1.763894 1.763894 
##      201      202      203      204      205      206      207      208 
## 2.430017 1.985935 2.430017 2.207976 1.541853 1.985935 1.763894 2.430017 
##      209      210      211      212      213      214      215      216 
## 1.985935 2.430017 2.430017 2.207976 2.430017 1.985935 2.430017 1.319812 
##      217      218      219      220      221      222      223      224 
## 2.430017 1.541853 2.207976 2.430017 2.207976 1.097771 2.430017 2.207976 
##      225      226      227      228      229      230      231      232 
## 1.763894 2.430017 2.207976 2.207976 1.985935 1.763894 2.207976 2.430017 
##      233      234      235      236      237      238      239      240 
## 1.319812 1.985935 2.207976 2.207976 2.430017 1.763894 2.207976 1.763894 
##      241      242      243      244      245      246      247      248 
## 2.207976 2.430017 2.207976 1.985935 2.430017 2.207976 1.985935 2.430017 
##      249      250      251      252      253      254      255      256 
## 2.207976 2.430017 1.763894 2.207976 2.430017 2.207976 2.430017 2.430017 
##      257      258      259      260      261      262      263      264 
## 2.207976 1.985935 1.541853 1.985935 2.207976 2.430017 2.430017 1.763894 
##      265      266      267      268      269      270      271      272 
## 2.207976 1.985935 2.430017 1.985935 1.985935 1.541853 2.430017 2.430017 
##      273      274      275      276      277      278      279      280 
## 2.207976 2.207976 2.430017 1.763894 1.985935 1.541853 2.430017 1.541853 
##      281      282      283      284      285      286      287      288 
## 1.985935 1.763894 2.207976 1.985935 2.207976 1.319812 2.207976 1.541853 
##      289      290      291      292      293      294      295      296 
## 2.207976 1.985935 1.985935 2.430017 2.430017 2.207976 2.430017 1.763894 
##      297      298      299      300      301      302      303      304 
## 2.207976 2.430017 1.319812 1.763894 1.985935 2.430017 2.207976 1.763894 
##      305      306      307      308      309      310      311      312 
## 2.207976 1.985935 1.541853 2.207976 1.985935 2.874099 2.652058 3.540222 
##      313      314      315      316      317      318      319      320 
## 2.874099 2.874099 3.096140 2.652058 2.874099 2.652058 3.540222 3.318181 
##      321      322      323      324      325      326      327      328 
## 3.540222 3.096140 3.096140 2.652058 3.318181 2.652058 2.874099 2.652058 
##      329      330      331      332      333      334      335      336 
## 2.874099 2.652058 3.318181 3.540222 2.874099 2.652058 2.874099 3.318181 
##      337      338      339      340      341      342      343      344 
## 2.652058 2.652058 3.096140 2.652058 2.652058 2.652058 2.874099 2.874099 
##      345      346      347      348      349      350      351      352 
## 2.874099 2.652058 2.874099 2.874099 3.318181 3.318181 2.874099 2.874099 
##      353      354      355      356      357      358      359      360 
## 3.540222 2.874099 2.652058 2.652058 2.652058 3.540222 3.318181 2.652058 
##      361      362      363      364      365      366      367      368 
## 3.540222 2.652058 2.874099 3.318181 3.096140 3.318181 2.652058 3.318181 
##      369      370      371      372      373      374      375      376 
## 2.874099 3.540222 2.874099 3.318181 2.874099 2.874099 2.652058 2.874099 
##      377      378      379      380      381      382      383      384 
## 2.874099 2.652058 2.874099 3.318181 3.096140 2.652058 2.652058 3.540222 
##      385      386      387      388      389      390      391      392 
## 2.874099 2.652058 2.874099 2.652058 2.874099 3.318181 3.318181 2.652058 
##      393      394      395      396      397      398      399      400 
## 2.874099 2.874099 3.096140 2.652058 2.652058 2.874099 2.652058 2.874099 
##      401      402      403      404      405      406      407      408 
## 3.540222 3.318181 3.096140 2.874099 2.874099 2.874099 3.540222 3.096140 
##      409      410      411      412      413      414      415      416 
## 2.652058 3.096140 2.874099 2.652058 2.874099 3.318181 2.652058 2.652058 
##      417      418      419      420      421      422      423      424 
## 2.874099 3.318181 2.652058 2.874099 2.652058 3.318181 2.874099 2.652058 
##      425      426      427      428      429      430      431      432 
## 2.874099 2.874099 3.540222 2.874099 3.318181 2.874099 2.874099 2.652058 
##      433      434      435      436      437      438      439      440 
## 3.318181 2.652058 3.318181 2.652058 3.096140 2.652058 3.540222 3.096140 
##      441      442      443      444      445      446      447      448 
## 2.652058 2.874099 3.540222 3.096140 2.652058 2.652058 2.652058 2.652058 
##      449      450      451      452      453      454      455      456 
## 3.096140 3.318181 2.874099 3.096140 2.874099 3.096140 2.874099 2.874099 
##      457      458      459      460      461      462      463      464 
## 3.096140 3.096140 3.318181 2.874099 3.096140 2.652058 3.096140 3.318181 
##      465      466      467      468      469      470      471      472 
## 2.652058 3.096140 2.652058 3.096140 2.652058 2.874099 2.652058 3.096140 
##      473      474      475      476      477      478      479      480 
## 3.540222 2.652058 2.652058 3.096140 2.652058 2.652058 3.318181 3.096140 
##      481      482      483      484      485      486      487      488 
## 3.096140 2.874099 3.318181 3.096140 2.652058 2.874099 2.874099 3.318181 
##      489      490      491      492      493      494      495      496 
## 3.096140 3.318181 3.318181 2.652058 3.096140 3.096140 3.540222 2.874099 
##      497      498      499      500      501      502      503      504 
## 2.652058 3.318181 2.874099 2.874099 3.318181 3.096140 2.652058 2.652058 
##      505      506      507      508      509      510      511      512 
## 3.096140 3.318181 2.874099 2.652058 2.874099 2.874099 2.874099 2.874099 
##      513      514      515      516      517      518      519      520 
## 2.874099 3.540222 3.096140 3.318181 3.318181 2.652058 3.096140 2.652058 
##      521      522      523      524      525      526      527      528 
## 2.652058 3.096140 2.874099 3.096140 2.874099 2.874099 3.096140 3.096140 
##      529      530      531      532      533      534      535      536 
## 3.540222 2.874099 2.652058 2.874099 3.096140 3.318181 3.096140 2.874099 
##      537      538      539      540      541      542      543      544 
## 2.874099 2.874099 3.540222 2.874099 3.318181 3.096140 2.652058 3.096140 
##      545      546      547      548      549      550      551      552 
## 3.318181 2.652058 2.652058 2.652058 2.652058 3.540222 3.096140 2.874099 
##      553      554      555      556      557      558      559      560 
## 2.874099 3.096140 3.540222 3.540222 2.652058 2.874099 2.874099 2.652058 
##      561      562      563      564      565      566      567      568 
## 2.652058 3.096140 3.096140 2.874099 3.096140 2.652058 3.096140 3.318181 
##      569      570      571      572      573      574      575      576 
## 2.652058 3.096140 2.652058 3.318181 3.096140 2.652058 3.096140 2.652058 
##      577      578      579      580      581      582      583      584 
## 2.874099 3.096140 2.874099 3.096140 2.652058 3.318181 3.096140 2.874099 
##      585      586      587      588      589      590      591      592 
## 2.874099 2.874099 2.874099 3.096140 3.540222 2.874099 2.874099 3.096140 
##      593      594      595      596      597      598      599      600 
## 3.540222 2.874099 3.318181 2.874099 2.652058 3.318181 3.096140 2.874099 
##      601      602      603      604      605      606      607      608 
## 3.318181 3.540222 2.652058 2.874099 2.874099 3.762263 3.762263 4.428386 
##      609      610      611      612      613      614      615      616 
## 4.650427 4.650427 3.984304 4.206345 3.762263 3.762263 3.762263 3.762263 
##      617      618      619      620      621      622      623      624 
## 3.762263 4.650427 4.428386 3.984304 4.206345 3.984304 3.762263 3.762263 
##      625      626      627      628      629      630      631      632 
## 3.762263 4.428386 4.206345 3.762263 4.206345 4.206345 3.984304 4.206345 
##      633      634      635      636      637      638      639      640 
## 3.762263 3.762263 3.984304 3.984304 3.762263 4.428386 3.762263 3.984304 
##      641      642      643      644      645      646      647      648 
## 4.206345 3.984304 3.984304 3.762263 4.428386 3.762263 3.984304 4.206345 
##      649      650      651      652      653      654 
## 3.984304 3.984304 3.762263 4.428386 3.984304 3.762263
resid(mod1)
##            1            2            3            4            5            6 
## -0.722016889 -0.483975913 -0.265934937 -0.872016889 -0.535016889  0.128024087 
##            7            8            9           10           11           12 
##  0.155106039 -0.348893961 -0.220975913 -0.488016889 -0.161893961 -0.472975913 
##           13           14           15           16           17           18 
## -0.014975913 -0.089975913  0.050024087 -0.053934937 -0.069852986  0.114106039 
##           19           20           21           22           23           24 
## -0.078016889  0.173983111 -0.141852986 -0.285852986 -0.480812010  0.592065063 
##           25           26           27           28           29           30 
##  0.557983111  0.306228966 -0.082016889  0.213147014  0.772024087 -0.330016889 
##           31           32           33           34           35           36 
## -0.259852986  0.569983111  0.465024087  0.107065063  0.070147014 -0.032975913 
##           37           38           39           40           41           42 
##  0.294983111 -0.136975913 -0.660975913 -0.203975913  0.704983111  0.212024087 
##           43           44           45           46           47           48 
##  0.234147014 -0.276975913 -0.198852986 -0.354016889 -0.361934937 -0.863975913 
##           49           50           51           52           53           54 
## -0.113893961  0.524024087  0.475147014  0.366983111  1.125983111 -0.504975913 
##           55           56           57           58           59           60 
## -0.129893961  0.139983111  0.585983111  0.433065063  0.249187990 -0.509975913 
##           61           62           63           64           65           66 
## -0.084975913  0.273024087 -0.282893961  0.257187990 -0.267975913 -0.016893961 
##           67           68           69           70           71           72 
## -0.361016889 -0.354934937 -0.005852986  0.129983111 -0.245975913  0.323024087 
##           73           74           75           76           77           78 
##  0.284983111  0.026983111 -0.340016889 -0.196934937  0.316147014 -0.089852986 
##           79           80           81           82           83           84 
##  1.411983111 -0.044893961  0.125065063 -0.068893961  0.003024087 -0.413975913 
##           85           86           87           88           89           90 
## -0.068934937 -0.063975913 -0.732934937  0.228983111  0.038147014 -0.304016889 
##           91           92           93           94           95           96 
##  0.598983111  0.533983111 -0.374934937  0.007024087  0.180024087 -0.234016889 
##           97           98           99          100          101          102 
## -0.679016889 -0.265016889 -0.303934937 -0.684975913 -0.915975913 -0.336934937 
##          103          104          105          106          107          108 
##  0.157983111 -0.523812010  0.143983111  0.215106039  0.146024087 -0.045893961 
##          109          110          111          112          113          114 
## -0.243934937 -0.382934937  0.431024087 -0.156934937  0.098065063  0.234065063 
##          115          116          117          118          119          120 
## -0.512934937  0.133024087 -0.287934937 -0.345852986 -0.335975913  0.233065063 
##          121          122          123          124          125          126 
## -0.010016889 -0.158934937 -0.524934937 -0.425893961 -0.117975913 -0.510975913 
##          127          128          129          130          131          132 
## -0.645975913 -0.390016889 -0.376934937  0.250024087  0.219983111 -0.778975913 
##          133          134          135          136          137          138 
## -0.532975913 -0.483016889 -0.138975913 -0.191893961 -0.415893961  0.080024087 
##          139          140          141          142          143          144 
## -0.657016889 -0.750852986 -0.080934937  0.032983111 -0.332893961  0.200983111 
##          145          146          147          148          149          150 
##  0.683983111 -0.295016889 -0.236893961  0.085024087  0.611983111  0.719024087 
##          151          152          153          154          155          156 
##  0.457024087 -0.129016889  0.029983111  0.161983111 -0.235934937 -0.448975913 
##          157          158          159          160          161          162 
## -0.227893961 -0.171016889 -0.382016889  0.140983111  0.060065063 -0.427975913 
##          163          164          165          166          167          168 
##  0.010147014 -0.254975913  0.462983111 -0.050893961  0.420983111 -0.139893961 
##          169          170          171          172          173          174 
##  0.423024087  0.277147014 -0.327934937  0.172065063  0.469187990  0.573983111 
##          175          176          177          178          179          180 
##  0.295024087 -0.497016889 -0.339016889 -0.114016889  0.162147014 -0.824016889 
##          181          182          183          184          185          186 
## -0.820934937  0.338106039 -0.110016889 -0.200016889 -0.714016889 -0.195934937 
##          187          188          189          190          191          192 
## -0.395852986 -0.020975913  0.286983111 -0.189934937 -0.477016889 -0.872975913 
##          193          194          195          196          197          198 
## -0.311016889 -0.097893961  0.062106039  0.501024087  0.440983111 -0.449852986 
##          199          200          201          202          203          204 
##  0.498106039  0.340106039 -0.264016889 -0.295934937  0.542983111 -0.062975913 
##          205          206          207          208          209          210 
##  0.429147014  0.109065063 -0.066893961  0.024983111 -0.065934937 -0.266016889 
##          211          212          213          214          215          216 
## -0.300016889  0.785024087  0.098983111 -0.259934937  0.011983111 -0.217812010 
##          217          218          219          220          221          222 
## -0.374016889  0.266147014  0.097024087 -0.461016889 -0.651975913 -0.025771034 
##          223          224          225          226          227          228 
## -0.388016889 -0.695975913 -0.340893961  1.250983111 -0.216975913 -0.310975913 
##          229          230          231          232          233          234 
## -0.615934937 -0.425893961 -0.191975913  0.208983111  0.069187990 -0.373934937 
##          235          236          237          238          239          240 
## -0.072975913  0.473024087  0.792983111  0.032106039 -0.197975913 -0.240893961 
##          241          242          243          244          245          246 
## -0.463975913  0.054983111  0.127024087 -0.570934937 -0.354016889  0.227024087 
##          247          248          249          250          251          252 
## -0.257934937  0.419983111 -0.363975913 -0.676016889 -0.420893961  0.095024087 
##          253          254          255          256          257          258 
## -0.184016889  0.268024087  0.808983111  0.026983111  0.174024087 -0.345934937 
##          259          260          261          262          263          264 
##  0.047147014  0.070065063  0.018024087 -0.544016889  0.402983111 -0.048893961 
##          265          266          267          268          269          270 
##  0.423024087  0.564065063 -0.518016889 -0.108934937 -0.050934937 -0.002852986 
##          271          272          273          274          275          276 
##  0.372983111  0.492983111  0.150024087 -0.113975913 -0.575016889 -0.228893961 
##          277          278          279          280          281          282 
##  0.149065063  0.388147014 -0.248016889 -0.182852986  0.016065063 -0.064893961 
##          283          284          285          286          287          288 
##  0.292024087  0.380065063 -0.138975913  0.098187990  0.125024087 -0.027852986 
##          289          290          291          292          293          294 
## -0.449975913  0.549065063  0.578065063  0.056983111 -0.839016889 -0.583975913 
##          295          296          297          298          299          300 
##  0.367983111 -0.072893961 -0.208975913 -0.561016889 -0.315812010 -0.336893961 
##          301          302          303          304          305          306 
## -0.159934937  0.257983111 -0.550975913 -0.091893961 -0.192975913  0.385065063 
##          307          308          309          310          311          312 
##  0.573147014  0.120024087 -0.490934937  0.009901159 -0.324057865 -0.159221769 
##          313          314          315          316          317          318 
## -0.704098841  0.595901159 -0.038139817 -0.841057865 -0.350098841 -0.010057865 
##          319          320          321          322          323          324 
##  0.200778231  1.017819207  1.301778231  1.453860183 -0.255139817  0.513942135 
##          325          326          327          328          329          330 
##  0.497819207 -0.091057865  0.779901159 -0.171057865 -0.209098841  0.550942135 
##          331          332          333          334          335          336 
##  0.230819207 -1.304221769  0.347901159  0.458942135  0.615901159 -0.171180793 
##          337          338          339          340          341          342 
## -0.132057865 -0.360057865 -0.207139817 -0.406057865 -0.715057865 -0.006057865 
##          343          344          345          346          347          348 
##  0.082901159  1.132901159 -0.488098841  0.598942135 -0.112098841  0.136901159 
##          349          350          351          352          353          354 
##  0.986819207  0.587819207  0.708901159  0.361901159 -0.104221769  0.183901159 
##          355          356          357          358          359          360 
##  0.354942135  0.836942135  0.211942135 -0.112221769 -0.499180793 -0.402057865 
##          361          362          363          364          365          366 
##  1.142778231 -0.300057865  0.233901159  0.675819207  1.296860183 -0.110180793 
##          367          368          369          370          371          372 
## -0.060057865 -0.125180793 -1.180098841  0.416778231 -0.528098841  1.470819207 
##          373          374          375          376          377          378 
##  0.640901159 -0.120098841  0.067942135 -0.411098841 -0.241098841  0.395942135 
##          379          380          381          382          383          384 
##  0.236901159  0.426819207 -0.712139817 -0.558057865  0.530942135 -0.466221769 
##          385          386          387          388          389          390 
##  1.102901159  0.701942135  0.536901159 -0.265057865  0.296901159  0.568819207 
##          391          392          393          394          395          396 
## -0.672180793 -0.148057865  0.712901159  0.970901159 -0.125139817  0.238942135 
##          397          398          399          400          401          402 
## -0.829057865 -0.457098841 -0.477057865 -0.139098841  0.732778231 -0.342180793 
##          403          404          405          406          407          408 
##  0.738860183  1.190901159 -0.556098841  0.721901159 -0.145221769 -0.345139817 
##          409          410          411          412          413          414 
##  0.020942135 -0.540139817 -0.332098841 -0.044057865 -0.520098841 -0.719180793 
##          415          416          417          418          419          420 
## -1.194057865  1.142942135 -0.383098841 -0.258180793 -0.107057865  0.118901159 
##          421          422          423          424          425          426 
##  0.652942135  1.437819207  0.899901159  0.202942135  0.113901159 -0.376098841 
##          427          428          429          430          431          432 
## -0.371221769  0.012901159 -0.614180793  0.640901159  0.550901159 -0.365057865 
##          433          434          435          436          437          438 
## -0.884180793 -0.287057865 -0.232180793  0.043942135 -0.228139817  0.160942135 
##          439          440          441          442          443          444 
##  0.768778231  0.158860183  0.760942135  1.718901159  0.570778231 -1.180139817 
##          445          446          447          448          449          450 
## -0.794057865  0.322942135  0.697942135  0.248942135 -0.855139817  0.906819207 
##          451          452          453          454          455          456 
##  0.348901159  2.127860183  1.198901159  0.983860183 -0.268098841  0.294901159 
##          457          458          459          460          461          462 
##  1.314860183  0.694860183 -0.229180793 -0.409098841  0.246860183  0.547942135 
##          463          464          465          466          467          468 
## -0.183139817  1.558819207 -0.294057865  0.182860183 -0.071057865 -0.749139817 
##          469          470          471          472          473          474 
##  0.038942135 -0.047098841 -0.779057865  0.654860183 -1.002221769  0.105942135 
##          475          476          477          478          479          480 
##  0.397942135 -0.017139817 -0.451057865 -0.794057865 -1.102180793  0.306860183 
##          481          482          483          484          485          486 
##  0.404860183 -0.296098841 -0.240180793  0.089860183 -0.987057865 -0.793098841 
##          487          488          489          490          491          492 
##  0.099901159 -0.021180793  0.976860183  1.129819207  0.665819207 -0.402057865 
##          493          494          495          496          497          498 
## -0.344139817 -0.792139817  0.139778231  0.227901159  0.209942135 -0.641180793 
##          499          500          501          502          503          504 
##  0.148901159  0.806901159 -0.063180793  0.595860183 -0.296057865  1.938942135 
##          505          506          507          508          509          510 
## -0.014139817 -0.021180793  0.383901159 -0.436057865  0.372901159  1.449901159 
##          511          512          513          514          515          516 
## -0.512098841 -0.311098841  0.331901159  0.044778231  1.623860183  0.012819207 
##          517          518          519          520          521          522 
##  1.764819207  0.845942135 -0.679139817 -0.288057865 -0.311057865 -0.337139817 
##          523          524          525          526          527          528 
##  0.078901159  0.134860183  0.203901159  0.494901159  0.432860183 -0.230139817 
##          529          530          531          532          533          534 
## -0.649221769  0.147901159  0.474942135 -0.008098841 -0.491139817 -0.262180793 
##          535          536          537          538          539          540 
## -0.527139817 -0.373098841  0.445901159 -0.751098841  0.239778231  0.972901159 
##          541          542          543          544          545          546 
##  0.466819207  0.827860183 -0.520057865 -0.344139817 -0.869180793  0.803942135 
##          547          548          549          550          551          552 
##  0.420942135  0.035942135  0.676942135  0.730778231  0.433860183  0.053901159 
##          553          554          555          556          557          558 
## -0.185098841 -0.764139817 -0.606221769 -1.264221769  0.457942135  0.019901159 
##          559          560          561          562          563          564 
##  1.762901159 -0.217057865  0.185942135 -0.061139817  1.734860183 -0.062098841 
##          565          566          567          568          569          570 
## -0.382139817  0.433942135  0.422860183  0.913819207  0.117942135  0.244860183 
##          571          572          573          574          575          576 
##  0.437942135 -0.787180793 -0.274139817  0.385942135 -0.161139817 -0.084057865 
##          577          578          579          580          581          582 
## -0.487098841 -0.597139817  1.255901159 -0.095139817  0.479942135  0.258819207 
##          583          584          585          586          587          588 
##  0.125860183  0.405901159 -0.215098841 -0.052098841 -0.734098841  1.106860183 
##          589          590          591          592          593          594 
## -0.543221769  0.245901159 -0.312098841 -0.014139817  0.265778231  0.464901159 
##          595          596          597          598          599          600 
## -0.166180793 -0.416098841 -0.261057865 -0.177180793 -0.517139817  0.229901159 
##          601          602          603          604          605          606 
##  0.726819207  1.222778231 -0.552057865  0.194901159 -0.089098841  0.521737255 
##          607          608          609          610          611          612 
##  0.743737255 -1.522385673  0.451573351 -1.131426649 -0.296303721  0.222655303 
##          613          614          615          616          617          618 
##  0.516737255  0.737737255 -1.127262745 -1.083262745 -1.564262745 -1.305426649 
##          619          620          621          622          623          624 
## -1.346385673 -0.597303721 -1.124344697 -1.081303721 -0.758262745  2.030737255 
##          625          626          627          628          629          630 
##  0.222737255 -0.208385673  0.517655303 -0.031262745 -0.800344697 -0.706344697 
##          631          632          633          634          635          636 
## -0.310303721  1.426655303 -0.640262745 -0.432262745 -1.376303721 -0.339303721 
##          637          638          639          640          641          642 
##  0.036737255 -0.342385673 -0.875262745  0.085696279 -0.246344697  0.314696279 
##          643          644          645          646          647          648 
## -1.003303721 -1.498262745 -0.024385673 -1.484262745  0.519696279  1.431655303 
##          649          650          651          652          653          654 
##  0.887696279  0.285696279 -0.035262745 -1.575385673 -1.189303721 -0.551262745

fitted value=2.207976 residual=-0.483975913


(d) What does it tell us if we have a residual of exactly zero? What if our residual is positive? Negative?

if our residual is exactly 0, that means that the fitted value and the observed value are the same, if the residual is positive, that means that the observed value is greater than the fitted value, if the residual is negative, then the observed value is less than the fitted value.


(e) Uncomment the commands in the chunk below, and complete the lines of code to store the fitted values and the residuals (according to mod1) for all of the observed FEV data:

residuals=resid(mod1)
  
fitted=fitted(mod1)


(f) Look at the second value of the residuals and fitted values you found in (e). These should match your answers to (c). Why?

because the 2nd line of the table is the 8 year old kid with the 1.724 fev.


(g) An alternative to R-squared is obtained by calculating the standard deviation of the residuals (i.e., the residual standard error). This provides a measure of the typical size of a residual (a summary of how well a model can predict the response variable). Compute the standard deviation of the residuals for mod1.

sd(resid(mod1))
## [1] 0.5670923

SD=0.5670923


(h) Now compute the R-squared value for mod1, and write down an interpretation of it.

ssr1B=sum((fitted(mod1)-mean(fevdata$fev))^2)
ssr1B
## [1] 280.9192
sse1B=sum((fitted(mod1)-fevdata$fev)^2)
sse1B
## [1] 210.0007
sst1B=ssr1B+sse1B
sst1B
## [1] 490.9198
Rsquared1B=1-(ssr1B/sst1B)
Rsquared1B
## [1] 0.4277698

R2=0.4277698 this value is pretty low, meaning that the variance in the explanatory does not really explain the variance in the response.


(i) Now find the R-squared and the residual standard error for mod2, mod3, and mod4. Which model has the best residual standard error? The best R-squared?

"Rfunction(x)=function(){ssr=sum((fitted(x)-mean(fevdata$fev))^2)
sse=sum((fitted(x)-fevdata$fev)^2)
sst=ssr+sse
1-(ssr1B/sst1B)}"
## [1] "Rfunction(x)=function(){ssr=sum((fitted(x)-mean(fevdata$fev))^2)\nsse=sum((fitted(x)-fevdata$fev)^2)\nsst=ssr+sse\n1-(ssr1B/sst1B)}"
ssr1C=sum((fitted(mod2)-mean(fevdata$fev))^2)
ssr1C
## [1] 29.56968
sse1C=sum((fitted(mod2)-fevdata$fev)^2)
sse1C
## [1] 461.3502
sst1C=ssr1C+sse1C
sst1C
## [1] 490.9198
Rsquared1C=1-(ssr1C/sst1C)
Rsquared1C
## [1] 0.9397668
"print(Rfunction(mod2))
print(Rfunction(mod3))
print(Rfunction(mod4))"
## [1] "print(Rfunction(mod2))\nprint(Rfunction(mod3))\nprint(Rfunction(mod4))"

R2(mod2)=0.9397668 R2(mod3)=0.4234125 R2(mod4)=0.4059151

Question 9

The high_peaks data includes information on hiking trails in the 46 “high peaks” in the Adirondack mountains of northern New York state. Our goal will be to understand the variability in the time in hours that it takes to complete each hike. In doing so, we’ll separately consider four possible predictors:

ascent: a hike’s vertical ascent (feet)

elevation: highest elevation (feet)

length: length (miles)

rating: difficulty rating (easy / moderate / difficult)

Read in the data:

peaks <- read.csv('https://raw.githubusercontent.com/vittorioaddona/data/main/high_peaks.csv')


(a) Construct a separate model of time by each predictor.


(b) Graph the data and the model for each of the 3 quantitative predictors.


(c) Look at the model coefficients for the categorical explanatory variable. Interpret each coefficient, and use this model to predict the time it takes to complete an easy hike. Then, graph the data for this categorical variable.


(d) Which of the 4 relationships appears to be the strongest? Or, which variable seems to be the best predictor of the time required to complete the hike?


(e) Compute the R-squared values for each of the 4 models. Does this agree with your intuition from (d)?