Quiz 4

Q1 Build a regression model to predict life expectancy using gdp per capita.
Q2 Is the coefficient of gdpPercap statistically significant at 5%?
Q3 Interpret the coefficient of gdpPercap.
Q4 Interpret the Intercept.
Q5 Build another model that predicts life expectancy using gdpPercap, but also controls for another important variable, year.
Q6 Which of the two models is better?
Q7 Interpret the coefficient of year.
Q7.a Based on the second model, what is the predicted life expectancy for a country with gdpPercap of $40,000 a year in 1997.
Q8 Hide the messages, but display the code and its results on the webpage.
Q9 Display the title and your name correctly at the top of the webpage.
Q10 Use the correct slug.

Make sure to include the unit of the values whenever appropriate.

Q1 Build a regression model to predict life expectancy using gdp per capita.

Hint: The variables are available in the gapminder data set from the gapminder package. Note that the data set and package both have the same name, gapminder.

library(tidyverse)
options(scipen=999)

data(gapminder, package="gapminder")
gdp_lm <- lm(gdpPercap ~ lifeExp,
                data = gapminder)
summary(gdp_lm)
## 
## Call:
## lm(formula = gdpPercap ~ lifeExp, data = gapminder)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -11483  -4539  -1223   2482 106950 
## 
## Coefficients:
##              Estimate Std. Error t value            Pr(>|t|)    
## (Intercept) -19277.25     914.09  -21.09 <0.0000000000000002 ***
## lifeExp        445.44      15.02   29.66 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8006 on 1702 degrees of freedom
## Multiple R-squared:  0.3407, Adjusted R-squared:  0.3403 
## F-statistic: 879.6 on 1 and 1702 DF,  p-value: < 0.00000000000000022

Q2 Is the coefficient of gdpPercap statistically significant at 5%?

Hint: Your answer must include a discussion on the p-value.

Yes, the coefficient of gdpPercap is statistically significant at 5% because its p-value is smaller than 1%.

Q3 Interpret the coefficient of gdpPercap.

Hint: Discuss both its sign and magnitude.

With the coefficient of gdpPercap being .00076488, as gdpPercap increases by $1, the life expectancy of the individual increases by .00076488 years.

Q4 Interpret the Intercept.

Hint: Provide a technical interpretation.

The intercept value is 53.955, which would mean that if you’re born with a 0 gdpPercap, your life expectancy at birth is 53.95 years.

Q5 Build another model that predicts life expectancy using gdpPercap, but also controls for another important variable, year.

Hint: This is a model with two explanatory variables. Insert another code chunk below.

data(gapminder, package="gapminder")
houses_lm <- lm(lifeExp ~ year, gdpPercap,
                data = gapminder)
summary(houses_lm)
## 
## Call:
## lm(formula = lifeExp ~ year, data = gapminder, subset = gdpPercap)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.221  -9.436   1.517  11.201  21.581 
## 
## Coefficients:
##               Estimate Std. Error t value            Pr(>|t|)    
## (Intercept) -573.69800   56.15343  -10.22 <0.0000000000000002 ***
## year           0.31998    0.02837   11.28 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.86 on 580 degrees of freedom
##   (1122 observations deleted due to missingness)
## Multiple R-squared:  0.1799, Adjusted R-squared:  0.1784 
## F-statistic: 127.2 on 1 and 580 DF,  p-value: < 0.00000000000000022

Q6 Which of the two models is better?

Hint: Discuss in terms of both residual standard error and reported adjusted R squared.

The first model shows that the residual standard error is 10.49, while in the second model, the error is 11.86. This means that the first model misses 10.49 people, while the second model misses 11.86 people. The R-squared value of the first model is .3403, when the R-squared value of the other model is .1784. The values mean that the first models data points are going to be further to the line of regression than the second models. The first model would be better because the model misses less people even though the second models data points are closer to the line of regression.

Q7 Interpret the coefficient of year.

Hint: Discuss both its sign and magnitude.

Since the coefficient of year is .31998, this would mean that for every year someone is born after 1952, their life expectancy is increased by .31998 years.

Q7.a Based on the second model, what is the predicted life expectancy for a country with gdpPercap of $40,000 a year in 1997.

Hint: We had this discussion in class while watching the video at DataCamp, Correlation and Regression in R. The video is titled as “Interpretation of Regression” in Chapter 4: Interpreting Regression Models.

Based on the second model, the predicted life expectancy for a country with a gdpPercap of $40,000 in 1997 is almost 77 years.

Q8 Hide the messages, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.