HighPeaks

A.

lm(Time ~ Ascent, data = HighPeaks)

## 
## Call:
## lm(formula = Time ~ Ascent, data = HighPeaks)
## 
## Coefficients:
## (Intercept)       Ascent  
##    4.210054     0.002081

The equation of the least squares line for this model is y = 0.0020805x + 4.2100541.

B.

The code for the scatterplot and least squares line:

mod1 = lm(Time ~ Ascent, data = HighPeaks)
plot(Time ~ Ascent, data = HighPeaks)
abline(mod1)

C.

mod1 = lm(Time ~ Ascent, data = HighPeaks)
mod1$fitted.values

##         1         2         3         4         5         6         7         8 
## 10.796997 10.318476 11.637529 13.083494  9.484185 10.035524 12.902488 10.971761 
##         9        10        11        12        13        14        15        16 
## 11.803971 12.677791 10.139551 10.555656 12.636181 10.160356 10.919748  9.203314 
##        17        18        19        20        21        22        23        24 
## 10.430824 12.036989 11.824776 13.572417  9.086805 11.978735 10.867734 11.471087 
##        25        26        27        28        29        30        31        32 
##  9.567406  9.494588 10.035524 10.493240 11.739474 10.087537  8.610365 11.271356 
##        33        34        35        36        37        38        39        40 
## 11.471087  9.140899 10.399616  8.246273 10.555656  9.827472  8.641573 11.471087 
##        41        42        43        44        45        46 
## 11.387866 10.455790 11.013371  8.703988  8.046542 10.742903

The predicted time for climbing Couchsachraga Peak is 10.74 hours.

mod1 = lm(Time ~ Ascent, data = HighPeaks)
mod1$residuals

##           1           2           3           4           5           6 
## -0.79699655 -1.31847578  0.36247132  1.91650639 -0.98418522 -0.03552437 
##           7           8           9          10          11          12 
##  1.09751207 -2.47176066 -0.80397069 -3.67779122 -0.13955062 -3.05565564 
##          13          14          15          16          17          18 
## -2.63618072 -0.16035588  3.08025247 -2.20331434 -0.43082414  1.46301050 
##          19          20          21          22          23          24 
##  1.17522406 -2.57241701 -1.08680493 -0.97873480  0.13226560  5.52891332 
##          25          26          27          28          29          30 
##  1.43259377  3.50541215 -2.03552437 -2.99323989 -0.73947441 -2.08753750 
##          31          32          33          34          35          36 
##  0.88963531 -2.27135627  5.52891332  1.85910141 -1.39961626 -3.24627280 
##          37          38          39          40          41          42 
##  1.44434436 -4.32747186  1.35842743  6.52891332 -2.38786567  1.54420956 
##          43          44          45          46 
##  0.98662884  3.29601168  0.45345761  1.25709710

The residual for Couchsachraga Peak is 1.26 hours.

D.

mod1 = lm(Time ~ Ascent, data = HighPeaks)
mod1$residuals

##           1           2           3           4           5           6 
## -0.79699655 -1.31847578  0.36247132  1.91650639 -0.98418522 -0.03552437 
##           7           8           9          10          11          12 
##  1.09751207 -2.47176066 -0.80397069 -3.67779122 -0.13955062 -3.05565564 
##          13          14          15          16          17          18 
## -2.63618072 -0.16035588  3.08025247 -2.20331434 -0.43082414  1.46301050 
##          19          20          21          22          23          24 
##  1.17522406 -2.57241701 -1.08680493 -0.97873480  0.13226560  5.52891332 
##          25          26          27          28          29          30 
##  1.43259377  3.50541215 -2.03552437 -2.99323989 -0.73947441 -2.08753750 
##          31          32          33          34          35          36 
##  0.88963531 -2.27135627  5.52891332  1.85910141 -1.39961626 -3.24627280 
##          37          38          39          40          41          42 
##  1.44434436 -4.32747186  1.35842743  6.52891332 -2.38786567  1.54420956 
##          43          44          45          46 
##  0.98662884  3.29601168  0.45345761  1.25709710

Mt. Emmons has the largest positive residual with a value of 6.53.

E.

mod1 = lm(Time ~ Ascent, data = HighPeaks)
mod1$residuals

##           1           2           3           4           5           6 
## -0.79699655 -1.31847578  0.36247132  1.91650639 -0.98418522 -0.03552437 
##           7           8           9          10          11          12 
##  1.09751207 -2.47176066 -0.80397069 -3.67779122 -0.13955062 -3.05565564 
##          13          14          15          16          17          18 
## -2.63618072 -0.16035588  3.08025247 -2.20331434 -0.43082414  1.46301050 
##          19          20          21          22          23          24 
##  1.17522406 -2.57241701 -1.08680493 -0.97873480  0.13226560  5.52891332 
##          25          26          27          28          29          30 
##  1.43259377  3.50541215 -2.03552437 -2.99323989 -0.73947441 -2.08753750 
##          31          32          33          34          35          36 
##  0.88963531 -2.27135627  5.52891332  1.85910141 -1.39961626 -3.24627280 
##          37          38          39          40          41          42 
##  1.44434436 -4.32747186  1.35842743  6.52891332 -2.38786567  1.54420956 
##          43          44          45          46 
##  0.98662884  3.29601168  0.45345761  1.25709710

Porter Mtn. has the most negative residual with a value of -4.33.

F.

plot(mod1$residuals ~ mod1$fitted.values)
abline(a = 0, b = 0)

Using a plot of residuals versus fitted values, we can see that this model satisfies the second condition of zero mean. The points are symmetrically distributed with many of the points close to the middle of the plot. There is no clear shape to the plot, which means the points are distributed evenly. Lastly, there are no extreme outliers that modify the distribution.

hist(mod1$residuals)

Using a histogram of residuals, we can see that the residuals are skewed to the left - the distribution of the errors are not centered at zero. There appears to be outliers that are skewing the data. This plot does not satisfy the fifth condition of normality because the values do not follow a normal distribution.

x <- rnorm(54, 0, 18.26)
qqnorm(x)
qqline(x)

Using a normal q-q plot, we can see that there is not too much variability expected because the line fits well overall - the variance for Y is the same at each X (homoscedastcity). However, there is a bit of a curvature at two or three of the points, which indicates the data may be skewed. The curves may also be another indication of outliers in the data. This conclusion fits with the histogram that the data is not normally distributed and/or there may be relationships among the errors.

Perch

A.

mod1 = lm(Weight ~ Length, data = Perch)
plot(Weight ~ Length, data = Perch)
abline(mod1)

The plot shows that the the means for Y do not vary as a linear function of X. The plot shows more of a curve rather than a line. Additionally, the plot indicates heteroscedastity, which means the variance is not the same for all values of X. The linearity condition is clearly not met.

B.

mod2 = lm(log(Weight) ~ Length, data = Perch)
plot(log(Weight) ~ Length, data = Perch)
abline(mod2)

I tried using a logarithmic function with the response variable. The plot did not demonstrate linearity.

mod3 = lm(Weight ~ log(Length), data = Perch)
plot(Weight ~ log(Length), data = Perch)
abline(mod3)

I tried using a logarithmic function with the predictor variable. This plot did not demonstrate linearity either.

mod4 = lm(log(Weight) ~ log(Length), data = Perch)
plot(log(Weight) ~ log(Length), data = Perch)
abline(mod4)

Lastly, I tried using a logarithmic function with both the response and predictor variables. The plot did indeed demonstrate linearity. The summary output for this model is provided below:

mod4 = lm(log(Weight) ~ log(Length), data = Perch)
summary(mod4)

## 
## Call:
## lm(formula = log(Weight) ~ log(Length), data = Perch)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.21976 -0.07130 -0.00080  0.03965  0.36650 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.07884    0.15212  -33.39   <2e-16 ***
## log(Length)  3.16269    0.04542   69.64   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1157 on 54 degrees of freedom
## Multiple R-squared:  0.989,  Adjusted R-squared:  0.9888 
## F-statistic:  4849 on 1 and 54 DF,  p-value: < 2.2e-16

C.

hist(mod4$residuals)

Using my formation, the histogram shows a normal distribution of the residuals. There is no indication of outliers skewing the data. Additionally, most of the data is centered around 0, which indicates the normality of the residuals. There does slightly appear to be data that almost skews the plot to the right. Overall, the plot satisfies the fifth condition - the plot shows a normal distribution.

plot(mod4$residuals ~ mod4$fitted.values)
abline(a = 0, b = 0)

Also using my formation, the plot shows the data kind of evenly distributed. Most of the values do land around zero, but they are not completely distributed because a majority of the points fall on the right side of the horizontal line. Additionally, there appears to be at least outlier, which may contribute to a skewness of the data. However, there is no clear shape to the plot, which indicates linearity. I would conclude the plot indicates satisfaction of the second condition - distribution of errors is centered at zero.

mod4 = lm(log(Weight) ~ log(Length), data = Perch)
plot(log(Weight) ~ log(Length), data = Perch)
abline(mod4)

Lastly, I feel more confident about my transformation because the linear regression line fits the points very well. The variance for y is the same at each x, which indicates homoscedasticity. This shows that the new equation predicts values relatively close to the actual values. The third condition of constant variance is satisfied because the variance is the same for almost all values of X.

D.

lm(log(Weight) ~ log(Length), data = Perch)

## 
## Call:
## lm(formula = log(Weight) ~ log(Length), data = Perch)
## 
## Coefficients:
## (Intercept)  log(Length)  
##      -5.079        3.163

The least squares line equation is y = 3.163x - 5.079. The predicted weight for a perch that is 19 cm: y = 3.163(19) - 5.079 y = 60.097 - 5.079 y = 55.018 grams.

Shreya Patnaik - STOR 455 HW1

HighPeaks

A.

B.

C.

D.

E.

F.

Perch

A.

B.

C.

D.