Name: Kirk Swanson

Collaborators: None

Stat 202 HW 1

1) Sparrows: Complete exercise 1.2 in the text. The slope of the sparrows regression line is 0.467 grams per mm, which indicates that when we observe a sparrow which has a winglength longer than some other observed sparrow by one mm, we expect its weight to also be larger by 0.467 grams compared to that other observed sparrow.

2) Sparrows: Complete exercise 1.4 in the text. The regression standard error estimate is labeled S in Minitab output, so we see that the size of a typical error is 1.39959 grams.

# Use R as your calculator.

3) Cereal: Complete exercise 1.8 in the text.

load("Cereal.RData")
plot(Cereal$Sugar, Cereal$Calories)

plot of chunk unnamed-chunk-2

cor(Cereal$Sugar, Cereal$Calories)
## [1] 0.5154

There seems to be a positive linear trend with sugar as the predictor variable and calories as the response variable in the scatter plot. The correlation coefficient is approximately .5, which supports this claim. So, we will choose a linear model.

m <- lm(Cereal$Calories ~ Cereal$Sugar)
abline(m, col = "Red")
## Error: plot.new has not been called yet
m
## 
## Call:
## lm(formula = Cereal$Calories ~ Cereal$Sugar)
## 
## Coefficients:
##  (Intercept)  Cereal$Sugar  
##        87.43          2.48

Fill in values for a and b below. \[ \widehat{\text{calories}} = 87.428 + 2.481 \cdot \text{sugar} \] The slope is 2.481 calories per serving per number of grams of sugar per serving. When we find a cereal with one more gram of sugar per serving, we expect its number of calories per serving to be larger by 2.481.

4) Caterpillar waste (fun!): Complete exercise 1.12 in the text.

load("Caterpillars.RData")
plot(Caterpillars$Mass, Caterpillars$WetFrass)

plot of chunk unnamed-chunk-4

plot(Caterpillars$LogMass, Caterpillars$LogWetFrass)

plot of chunk unnamed-chunk-4

There seems to be a general positive trend, but the scatter plot shows a lot of curvature as mass increases.

abline(lm(Caterpillars$LogWetFrass ~ Caterpillars$LogMass), col = "Red")
## Error: plot.new has not been called yet
lm(Caterpillars$LogWetFrass ~ Caterpillars$LogMass)
## 
## Call:
## lm(formula = Caterpillars$LogWetFrass ~ Caterpillars$LogMass)
## 
## Coefficients:
##          (Intercept)  Caterpillars$LogMass  
##               -0.739                 1.054
symb <- 1:5
plot(Caterpillars$LogMass, Caterpillars$LogWetFrass, pch = symb[Caterpillars$Instar])

plot of chunk unnamed-chunk-7

Based on the symbol groupings, it appears that stage five is the most linear for logWetFrass versus logMass in base 10 log.

symb <- 1:2
plot(Caterpillars$LogMass, Caterpillars$LogWetFrass, pch = symb[Caterpillars$Fgp])

plot of chunk unnamed-chunk-8

Based on the symbol groupings, it appears that growth period two is the most linear for logWetFrass versus logMass in base 10 log.

5) Baseball: Complete exercise 1.27 in the text.

load("BaseballTimes.RData")
hist(BaseballTimes$Time, breaks = 10)

plot of chunk unnamed-chunk-9

summary(BaseballTimes$Time)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     133     154     168     183     194     317
sd(BaseballTimes$Time)
## [1] 46.21

The histogram shows a right-skewed distribution with a large outlier at 317 minutes. The mean is 182.7 minutes with a standard deviation of 46.20554 minutes. Looking at the data table, the largest time game was also the game with the largest number of pitchers - perhaps this hints at a possible explanation, although I don't know much about baseball. It also had the second largest attendence and one of the smallest margins.

par(mfrow = c(2, 2))
plot(BaseballTimes$Runs, BaseballTimes$Time)
plot(BaseballTimes$Margin, BaseballTimes$Time)
plot(BaseballTimes$Pitchers, BaseballTimes$Time)
plot(BaseballTimes$Attendance, BaseballTimes$Time)

plot of chunk unnamed-chunk-11

summary(lm(BaseballTimes$Time ~ BaseballTimes$Pitchers))
## 
## Call:
## lm(formula = BaseballTimes$Time ~ BaseballTimes$Pitchers)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -37.94  -8.44  -3.10   9.75  50.79 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               94.84      13.39    7.08  8.2e-06 ***
## BaseballTimes$Pitchers    10.71       1.49    7.21  6.9e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.5 on 13 degrees of freedom
## Multiple R-squared:   0.8,   Adjusted R-squared:  0.784 
## F-statistic: 51.9 on 1 and 13 DF,  p-value: 6.88e-06
plot(BaseballTimes$Time ~ BaseballTimes$Pitchers)
abline(lm(BaseballTimes$Time ~ BaseballTimes$Pitchers), col = "Red")

plot of chunk unnamed-chunk-12

Fill in values for a, b, and the predictor below. \[ \widehat{\text{Time}} = 94.84 + 10.71\cdot \text{Pitchers} \]

resid <- BaseballTimes$Time - (94.84 + 10.71 * BaseballTimes$Pitchers)
qqnorm(resid)
qqline(resid)

plot of chunk unnamed-chunk-13

plot(BaseballTimes$Pitchers, resid)
abline(0, 0)

plot of chunk unnamed-chunk-13

6) Caterpillar Metablic Rates: Complete exercise 1.30 in the text.

load("Metabolic.RData")
plot(MetabolicRate$Computer, MetabolicRate$Mrate)

plot of chunk unnamed-chunk-14

plot(MetabolicRate$BodySize, MetabolicRate$Mrate)

plot of chunk unnamed-chunk-14

plot(MetabolicRate$LogBodySize, MetabolicRate$Mrate)

plot of chunk unnamed-chunk-14

plot(MetabolicRate$LogBodySize, MetabolicRate$LogMrate)

plot of chunk unnamed-chunk-14

plot(MetabolicRate$BodySize, MetabolicRate$LogMrate)

plot of chunk unnamed-chunk-14

m <- lm(MetabolicRate$LogMrate ~ MetabolicRate$LogBodySize)
summary(m)
## 
## Call:
## lm(formula = MetabolicRate$LogMrate ~ MetabolicRate$LogBodySize)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5920 -0.1119  0.0036  0.1212  0.4729 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 1.3066     0.0136    96.3   <2e-16 ***
## MetabolicRate$LogBodySize   0.9164     0.0124    74.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.175 on 303 degrees of freedom
## Multiple R-squared:  0.948,  Adjusted R-squared:  0.948 
## F-statistic: 5.51e+03 on 1 and 303 DF,  p-value: <2e-16
plot(MetabolicRate$LogBodySize, MetabolicRate$LogMrate)
abline(m, col = "Red")

plot of chunk unnamed-chunk-15

m
## 
## Call:
## lm(formula = MetabolicRate$LogMrate ~ MetabolicRate$LogBodySize)
## 
## Coefficients:
##               (Intercept)  MetabolicRate$LogBodySize  
##                     1.307                      0.916

For a caterpillar that has a body size of 1 gram, we predict that LogMrate = 1.3066+0.9164LogBodySize = .3066, or that Mrate = 1.3587 metabolic rate units.