Problem 47 Chapter 8

##enter the data, 2 variable quatitative data
FatGrams = c(19,31,34,35,39,39,43)
Calories = c(410,580,590,570,640,680,660)

## make the scatterplot
plot(FatGrams, Calories, col = "purple", type ='p', pch = 16)

## calculate the linear regression model
lm.r = lm(Calories~FatGrams)

## add the regression line to the scatterplot
abline(lm.r, col = "dark green")

## state the correlation coeficient
cor(FatGrams, Calories)
## [1] 0.9606329
## summary provides lots of information
summary(lm.r)
## 
## Call:
## lm(formula = Calories ~ FatGrams)
## 
## Residuals:
##       1       2       3       4       5       6       7 
## -11.009  26.325   3.159 -27.897  -2.119  37.881 -26.341 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   210.95      50.10   4.211 0.008404 ** 
## FatGrams       11.06       1.43   7.732 0.000578 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.33 on 5 degrees of freedom
## Multiple R-squared:  0.9228, Adjusted R-squared:  0.9074 
## F-statistic: 59.78 on 1 and 5 DF,  p-value: 0.0005782
##look at the residuals
resid(lm.r)
##          1          2          3          4          5          6 
## -11.008600  26.325254   3.158718 -27.896794  -2.118843  37.881157 
##          7 
## -26.340891
plot(Calories,resid(lm.r), col = "red", type ='p', pch = 16, main = "Residual Plot")

  1. You cannot esime the fat content from a burger since the model would be different when estimating the reverse values from x to y to y to x.

##enter the data, 2 variable quatitative data
FatGrams = c(19,31,34,35,39,39,43)
Calories = c(410,580,590,570,640,680,660)

## make the scatterplot
plot(Calories, FatGrams, col = "purple", type ='p', pch = 16)

## calculate the linear regression model
lm.r = lm(FatGrams~Calories)

## add the regression line to the scatterplot
abline(lm.r, col = "dark green")

## state the correlation coeficient
cor(FatGrams, Calories)
## [1] 0.9606329
## summary provides lots of information
summary(lm.r)
## 
## Call:
## lm(formula = FatGrams ~ Calories)
## 
## Residuals:
##       1       2       3       4       5       6       7 
## -0.2609 -2.4510 -0.2857  2.3837  0.5407 -2.7981  2.8713 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -14.96222    6.43253  -2.326 0.067545 .  
## Calories      0.08347    0.01080   7.732 0.000578 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.375 on 5 degrees of freedom
## Multiple R-squared:  0.9228, Adjusted R-squared:  0.9074 
## F-statistic: 59.78 on 1 and 5 DF,  p-value: 0.0005782
##look at the residuals
resid(lm.r)
##          1          2          3          4          5          6 
## -0.2609209 -2.4510035 -0.2857143  2.3837072  0.5407320 -2.7981110 
##          7 
##  2.8713105
plot(Calories,resid(lm.r), col = "red", type ='p', pch = 16, main = "Residual Plot")

lm.r$coefficients[1]+lm.r$coefficients[2]*600
## (Intercept) 
##    35.12043

Problem 31

  1. I removed the costa rica data becuase it’s unlikely that woman on average give birth to 25 kids

##enter the data, 2 variable quatitative data
BirthsPerWoman = c(2.3,2.3,1.7,3.0,3.7,2.3,1.5,2.0,2.4,2.8,2.7,2.8,4.4,3.6,2.4,2.2,3.2,2.6,3.7,2.8,1.9,2.0,2.1,2.7,2.2)
LifeExp = c(74.6,70.5,75.4,71.9,64.5,70.9,79.8,78.0,72.6,67.8,74.5,71.1,67.6,68.2,70.8,75.1,70.1,75.1,71.2,70.4,77.5,77.4,75.2,73.7,78.6)

## make the scatterplot
plot(BirthsPerWoman, LifeExp, col = "purple", type ='p', pch = 16)

## calculate the linear regression model
lm.r = lm(LifeExp~BirthsPerWoman)

## add the regression line to the scatterplot
abline(lm.r, col = "dark green")

## state the correlation coeficient
cor(BirthsPerWoman, LifeExp)
## [1] -0.7956443
## summary provides lots of information
summary(lm.r)
## 
## Call:
## lm(formula = LifeExp ~ BirthsPerWoman)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.2653 -1.5492  0.3147  1.9628  3.8707 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     84.4971     1.9019  44.427  < 2e-16 ***
## BirthsPerWoman  -4.4399     0.7048  -6.299 1.99e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.387 on 23 degrees of freedom
## Multiple R-squared:  0.633,  Adjusted R-squared:  0.6171 
## F-statistic: 39.68 on 1 and 23 DF,  p-value: 1.991e-06
##look at the residuals
resid(lm.r)
##           1           2           3           4           5           6 
##  0.31474220 -3.78525780 -1.54921510  0.72269239 -3.56935743 -3.38525780 
##           7           8           9          10          11          12 
##  1.96279913  2.38276355 -1.24126491 -4.26529338  1.99071374 -0.96529338 
##          13          14          15          16          17          18 
##  2.63859276 -0.31335031 -3.04126491  0.37074932 -0.18932184  2.14672085 
##          19          20          21          22          23          24 
##  3.13064257 -1.66529338  1.43877067  1.78276355  0.02675644  1.19071374 
##          25 
##  3.87074932
plot(BirthsPerWoman,resid(lm.r), col = "red", type ='p', pch = 16, main = "Residual Plot")

  1. Correlation is -0.7956443, meaning there is a fairly strong negative linear relationsip of births per woman and life expectancy. r^2 is 0.633, meaning that about 63 % of the life expectancy results are explained by the linear relationship to briths per woman.

  2. LifeExpectance = 84.49 + -4.44 * BirthsPerWoman

  3. Yes, the line is apprioiate since the residuals are random

  4. for about every 1 birth per woman increase, the life expectancy will decrease by 4.4 years. The intercept predicts that if a country had an average of 0 births per woman, the life expectance woud be 85 years old, but thisis extrapolation.

  5. They could, but they also do more direct help, such as building more hospitals, providing better education and services, etc.

Problem 32

##enter the data, 2 variable quatitative data
AverageSpeed = c(25.3,24.3,27.3,40.3,39.56,40.02,39.93,40.94,40.53,41.65,40.78,38.97,40.50)
YearPassed1900 = c(3,4,5,99,100,101,102,103,104,105,106,107,108)

## make the scatterplot
plot(YearPassed1900, AverageSpeed, col = "purple", type ='p', pch = 16)

## calculate the linear regression model
lm.r = lm(AverageSpeed~YearPassed1900)

## add the regression line to the scatterplot
abline(lm.r, col = "dark green")

## state the correlation coeficient
cor(YearPassed1900, AverageSpeed)
## [1] 0.9900163
## summary provides lots of information
summary(lm.r)
## 
## Call:
## lm(formula = AverageSpeed ~ YearPassed1900)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.85605 -0.23521  0.07753  0.65206  1.49483 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    25.068848   0.574205   43.66 1.11e-13 ***
## YearPassed1900  0.147264   0.006322   23.30 1.03e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9573 on 11 degrees of freedom
## Multiple R-squared:  0.9801, Adjusted R-squared:  0.9783 
## F-statistic: 542.7 on 1 and 11 DF,  p-value: 1.035e-10
##look at the residuals
resid(lm.r)
##          1          2          3          4          5          6 
## -0.2106385 -1.3579021  1.4948343  0.6520568 -0.2352068  0.0775296 
##          7          8          9         10         11         12 
## -0.1597340  0.7030024  0.1457388  1.1184752  0.1012116 -1.8560519 
##         13 
## -0.4733155
plot(YearPassed1900,resid(lm.r), col = "red", type ='p', pch = 16, main = "Residual Plot")

  1. The correlation between Avgerage speed and Year is very strong, postive, and linear, with an r of 0.99. Yet we’re missing a lot of data inbetween the years 1905 and 1999, making me wonder if the correlation is really that strong in reality.

  2. AverageSpeed = 25.06 + 0.147 * YearPassed1900

  3. Yes, the relationship is clealry linear and positive with a strong correlation

Problem 34

##enter the data, 2 variable quatitative data
AverageSpeed = c(40.3,39.56,40.02,39.93,40.94,40.53,41.65,40.78,38.97,40.50)
YearPassed1900 = c(99,100,101,102,103,104,105,106,107,108)

## make the scatterplot
plot(YearPassed1900, AverageSpeed, col = "purple", type ='p', pch = 16)

## calculate the linear regression model
lm.r = lm(AverageSpeed~YearPassed1900)

## add the regression line to the scatterplot
abline(lm.r, col = "dark green")

## state the correlation coeficient
cor(YearPassed1900, AverageSpeed)
## [1] 0.1518561
## summary provides lots of information
summary(lm.r)
## 
## Call:
## lm(formula = AverageSpeed ~ YearPassed1900)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4799 -0.2995  0.0820  0.3241  1.2754 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    36.41636    8.98195   4.054  0.00366 **
## YearPassed1900  0.03770    0.08675   0.435  0.67537   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7879 on 8 degrees of freedom
## Multiple R-squared:  0.02306,    Adjusted R-squared:  -0.09906 
## F-statistic: 0.1888 on 1 and 8 DF,  p-value: 0.6754
##look at the residuals
resid(lm.r)
##           1           2           3           4           5           6 
##  0.15163636 -0.62606061 -0.20375758 -0.33145455  0.64084848  0.19315152 
##           7           8           9          10 
##  1.27545455  0.36775758 -1.47993939  0.01236364
plot(YearPassed1900,resid(lm.r), col = "red", type ='p', pch = 16, main = "Residual Plot")

  1. the regression is very weak and does not meet the conditions for regression, since there is no clear relationship from only this data.

  2. The slope is almost 0, meaning that the year and avergae speed are almost not related at all.

  3. Bernad Hinault becuase for his time he was more standard deviations away from the mean than Lance Armstrong was in 2005.