Quiz 4

Q1 Build a regression model to predict life expectancy using gdp per capita.
Q2 Is the coefficient of gdpPercap statistically significant at 5%?
Q3 Interpret the coefficient of gdpPercap.
Q4 Interpret the Intercept.
Q5 Build another model that predicts life expectancy using gdpPercap, but also controls for another important variable, year.
Q6 Which of the two models is better?
Q7 Interpret the coefficient of year.
Q7.a Based on the second model, what is the predicted life expectancy for a country with gdpPercap of $40,000 a year in 1997.
Q8 Hide the messages, but display the code and its results on the webpage.
Q9 Display the title and your name correctly at the top of the webpage.
Q10 Use the correct slug.

Make sure to include the unit of the values whenever appropriate.

Q1 Build a regression model to predict life expectancy using gdp per capita.

Hint: The variables are available in the gapminder data set from the gapminder package. Note that the data set and package both have the same name, gapminder.

data(gapminder, package="gapminder")
houses_lm <- lm(lifeExp ~ gdpPercap,
                data = gapminder)

summary(houses_lm)
## 
## Call:
## lm(formula = lifeExp ~ gdpPercap, data = gapminder)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -82.754  -7.758   2.176   8.225  18.426 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5.396e+01  3.150e-01  171.29   <2e-16 ***
## gdpPercap   7.649e-04  2.579e-05   29.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.49 on 1702 degrees of freedom
## Multiple R-squared:  0.3407, Adjusted R-squared:  0.3403 
## F-statistic: 879.6 on 1 and 1702 DF,  p-value: < 2.2e-16

Q2 Is the coefficient of gdpPercap statistically significant at 5%?

Hint: Your answer must include a discussion on the p-value.

Yes, the p-value is much lower than 0.05. The p-value of this model is 2.2e-16, which translates to 0.00000000000000022. Oviously very significant.

Q3 Interpret the coefficient of gdpPercap.

Hint: Discuss both its sign and magnitude.

There is no sign, which means that it is a positive coefficient. The coefficient is 7.649e-04, which translates to 0.0007649. This means that every increase in $1 of gdpPercap results in an additional 0.0007649 of a year in life expectancy.

Q4 Interpret the Intercept.

Hint: Provide a technical interpretation.

The intercept of our model is 5.396e+01, or 53.96. This can be interpreted as with $0 in gdpPercap, the life expectancy is 53.96 years.

Q5 Build another model that predicts life expectancy using gdpPercap, but also controls for another important variable, year.

Hint: This is a model with two explanatory variables. Insert another code chunk below.

data(gapminder, package="gapminder")
houses_lm <- lm(lifeExp ~ gdpPercap + year,
                data = gapminder)

summary(houses_lm)
## 
## Call:
## lm(formula = lifeExp ~ gdpPercap + year, data = gapminder)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -67.262  -6.954   1.219   7.759  19.553 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.184e+02  2.762e+01  -15.15   <2e-16 ***
## gdpPercap    6.697e-04  2.447e-05   27.37   <2e-16 ***
## year         2.390e-01  1.397e-02   17.11   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.694 on 1701 degrees of freedom
## Multiple R-squared:  0.4375, Adjusted R-squared:  0.4368 
## F-statistic: 661.4 on 2 and 1701 DF,  p-value: < 2.2e-16

Q6 Which of the two models is better?

Hint: Discuss in terms of both residual standard error and reported adjusted R squared.

I think that the second model is a better representation of the regression. The residual standard error for the first model is 10.49, meaning that the model misses 10.49 people. The residual standard error for the second model is 9.649, meaning that it misses 9.649 people, which is less error than the first. The adjusted R squared for the first model is 0.3403 and the adjusted R squared for the second model is 0.4368. We know that a higher adjusted R squared means that a model is more accurate and that we can account the variation of the model to the variables. Therefore, because there is less residual standard error and the adjusted R squared is higher.

Q7 Interpret the coefficient of year.

Hint: Discuss both its sign and magnitude.

There is no sign, which means that the coefficient is positive. This means that as the years progress, 0.239 of a year is added to life expectancy. The model represents the coefficient as 2.390e-01, which is equal to 0.239.

Q7.a Based on the second model, what is the predicted life expectancy for a country with gdpPercap of $40,000 a year in 1997.

Hint: We had this discussion in class while watching the video at DataCamp, Correlation and Regression in R. The video is titled as “Interpretation of Regression” in Chapter 4: Interpreting Regression Models.

The predicted life expectancy for a country with gdpPercap of $40,000 in the year 1997 is 76.49 years, according to the second model.

Q8 Hide the messages, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.