Suppose that you want to build a regression model that predicts the weight of individuals using a data set named bdims. To answer the questions below, replace both the name of the data and the name of the variables in the given code below.

To learn more about the bdims data, including the name of the variables, Google “openintro r package” and see its manual, which is usually posted as a pdf file on the CRAN website. Search bdims within the manual. Or click the link here.

Q1. Describe the relationship between the two variables - height (hgt) and weight (wgt).

Hint: Interpret both the direction and the magnitude of the relationship. For the direction of the relationship, see the scatterplot: an upward sloing line indicates a positive relationship while a downward sloping line suggests a negative relationship. A positive (negative) relationship is also reflected as a positve (negative) correlation coefficient. For the magnitude of the relationship, see the absolute value of the correlation coefficient: the relationship may be viewed as strong when the coefficient’s absolute value > 0.6; moderate when it’s greater than 0.4 but less than 0.6; and weak when it’s less than 0.4. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.

According to the scatterplot, the relationship between height and weight has a strong positive relationship with the correlation coefficient being 0.72 and the points plotted going up and towards the right on the plot.

## 'data.frame':    507 obs. of  25 variables:
##  $ bia.di: num  42.9 43.7 40.1 44.3 42.5 43.3 43.5 44.4 43.5 42 ...
##  $ bii.di: num  26 28.5 28.2 29.9 29.9 27 30 29.8 26.5 28 ...
##  $ bit.di: num  31.5 33.5 33.3 34 34 31.5 34 33.2 32.1 34 ...
##  $ che.de: num  17.7 16.9 20.9 18.4 21.5 19.6 21.9 21.8 15.5 22.5 ...
##  $ che.di: num  28 30.8 31.7 28.2 29.4 31.3 31.7 28.8 27.5 28 ...
##  $ elb.di: num  13.1 14 13.9 13.9 15.2 14 16.1 15.1 14.1 15.6 ...
##  $ wri.di: num  10.4 11.8 10.9 11.2 11.6 11.5 12.5 11.9 11.2 12 ...
##  $ kne.di: num  18.8 20.6 19.7 20.9 20.7 18.8 20.8 21 18.9 21.1 ...
##  $ ank.di: num  14.1 15.1 14.1 15 14.9 13.9 15.6 14.6 13.2 15 ...
##  $ sho.gi: num  106 110 115 104 108 ...
##  $ che.gi: num  89.5 97 97.5 97 97.5 ...
##  $ wai.gi: num  71.5 79 83.2 77.8 80 82.5 82 76.8 68.5 77.5 ...
##  $ nav.gi: num  74.5 86.5 82.9 78.8 82.5 80.1 84 80.5 69 81.5 ...
##  $ hip.gi: num  93.5 94.8 95 94 98.5 95.3 101 98 89.5 99.8 ...
##  $ thi.gi: num  51.5 51.5 57.3 53 55.4 57.5 60.9 56 50 59.8 ...
##  $ bic.gi: num  32.5 34.4 33.4 31 32 33 42.4 34.1 33 36.5 ...
##  $ for.gi: num  26 28 28.8 26.2 28.4 28 32.3 28 26 29.2 ...
##  $ kne.gi: num  34.5 36.5 37 37 37.7 36.6 40.1 39.2 35.5 38.3 ...
##  $ cal.gi: num  36.5 37.5 37.3 34.8 38.6 36.1 40.3 36.7 35 38.6 ...
##  $ ank.gi: num  23.5 24.5 21.9 23 24.4 23.5 23.6 22.5 22 22.2 ...
##  $ wri.gi: num  16.5 17 16.9 16.6 18 16.9 18.8 18 16.5 16.9 ...
##  $ age   : int  21 23 28 23 22 21 26 27 23 21 ...
##  $ wgt   : num  65.6 71.8 80.7 72.6 78.8 74.8 86.4 78.4 62 81.6 ...
##  $ hgt   : num  174 175 194 186 187 ...
##  $ sex   : int  1 1 1 1 1 1 1 1 1 1 ...

## [1] 0.7173011

Run a regression model for weight (wgt) with one explanatory variable, height (hgt), and answer Q2 through Q5.

Q2. See model 1. Is the coefficient of height statistically significant at 5%? Interpret the coefficient.

Hint: One place where this information is stored is the last column on the far right, Pr (>|t|) under coefficients. One could conclude that the coefficient is significant at 0.1% if Pr < 0.001 (three stars); significant at 1% if Pr < 0.01 (two stars); and significant at 5% if Pr < 0.05 (one star). When significant, changes in the explanatory variable are highly likely to be meaningful in explaining changes in the response (or dependent) variable. The same can be said for the y-intercept. When interpreting the magnitude of the coefficient, make sure that you use the correct unit of the data. The definition of the variables can be found in the manual of the openintro package.

The coefficient is not statistically significant at 5%, in the model height has three stars so that means that the coefficient is significant at 0.1%. This suggests that with an increse in height we will see an increase in weight.

Q3. See model 1. What weight does the model predict for an individual that is 190 centimeter tall?

Hint: The predicted value can be found by: y-intercept + coefficient of the height * 190. In addition, make sure to use the correct units of the variables.

88 KG

Q4. See model 1. What is the reported residual standard error? What does it mean?

Hint: The residual standard error shows the typical difference between the actual value of the response (dependent) variable and the value of the response variable predicted by the model.

The residual standard error of 6.561 means that the model misses the actual values by 6.581 kg.

Q5. See model 1. What is the reported adjusted R squared? What does it mean?

Hint: The adjusted R squared shows how much of the variations in the response variable is explained by the model.

The adjusted R-squared of 0.5136 means that the model explains 51.4% of the variations in the weight.

Run a second regression model for weight (wgt) with two explanatory variables: height (hgt) and sex (sex), and answer Q6.

Q6. Compare model 1 and model 2. Which of the two models better fits the data? Discuss your answer by comparing the residual standard error and the adjusted R squared between the two models. Replace passengers by sex in model 2.

Model 1 fits the data better. Model 2 explains the variations in the price better due to its lower adjusted R-squared of 0.5651 compared to Model 1 0.5136, but Model 2 is less accurate in pedicting weight due to having a higher residual standard error of 8.802 compared to Model 1 being 6.561.

## (Intercept)         wgt 
## 136.1818561   0.5056136
## 
## Call:
## lm(formula = hgt ~ wgt, data = bdims)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -23.7162  -3.8782   0.0083   4.6532  18.6882 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 136.18186    1.53908   88.48   <2e-16 ***
## wgt           0.50561    0.02186   23.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.561 on 505 degrees of freedom
## Multiple R-squared:  0.5145, Adjusted R-squared:  0.5136 
## F-statistic: 535.2 on 1 and 505 DF,  p-value: < 2.2e-16

## (Intercept)         hgt         sex 
## -56.9494889   0.7129752   8.3659935
## 
## Call:
## lm(formula = wgt ~ hgt + sex, data = bdims)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.184  -5.978  -1.356   4.709  43.337 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.94949    9.42444  -6.043 2.95e-09 ***
## hgt           0.71298    0.05707  12.494  < 2e-16 ***
## sex           8.36599    1.07296   7.797 3.66e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.802 on 504 degrees of freedom
## Multiple R-squared:  0.5668, Adjusted R-squared:  0.5651 
## F-statistic: 329.7 on 2 and 504 DF,  p-value: < 2.2e-16