Make sure to include the unit of the values whenever appropriate.

Q1 Build a regression model to predict life expectancy using gdp per capita.

Hint: The variables are available in the gapminder data set from the gapminder package. Note that the data set and package both have the same name, gapminder.

library(tidyverse)
options(scipen=999)

data(gapminder, package="gapminder")
gdp_lm <- lm(gdpPercap ~ lifeExp,
                data = gapminder)

# View summary of model 1
summary(gdp_lm)
## 
## Call:
## lm(formula = gdpPercap ~ lifeExp, data = gapminder)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -11483  -4539  -1223   2482 106950 
## 
## Coefficients:
##              Estimate Std. Error t value            Pr(>|t|)    
## (Intercept) -19277.25     914.09  -21.09 <0.0000000000000002 ***
## lifeExp        445.44      15.02   29.66 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8006 on 1702 degrees of freedom
## Multiple R-squared:  0.3407, Adjusted R-squared:  0.3403 
## F-statistic: 879.6 on 1 and 1702 DF,  p-value: < 0.00000000000000022

Q2 Is the coefficient of gdpPercap statistically significant at 5%?

Hint: Your answer must include a discussion on the p-value.

The gdp per capita is statistically significant at 5% because .0000000000000002 is smaller than .05.

Q3 Interpret the coefficient of gdpPercap.

Hint: Discuss both its sign and magnitude.

for every US dollar that has been adjusted for inflation life expectancy increases by 445.44 years of life expectancy at birth.

Q4 Interpret the Intercept.

Hint: Provide a technical interpretation.

the intercept of gdp per capita is statistically signigicant at 5% becuase .0000000000000002 is smaller then .05.

Q5 Build another model that predicts life expectancy using gdpPercap, but also controls for another important variable, year.

Hint: This is a model with two explanatory variables. Insert another code chunk below.

data(gapminder, package="gapminder")
gdp_lm <- lm(gdpPercap ~ lifeExp + year,
                data = gapminder)

# View summary of model 1
summary(gdp_lm)
## 
## Call:
## lm(formula = gdpPercap ~ lifeExp + year, data = gapminder)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -11206  -4584  -1266   2330 106539 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept) 17657.83   24286.80   0.727               0.467    
## lifeExp       456.50      16.68  27.369 <0.0000000000000002 ***
## year          -18.99      12.48  -1.522               0.128    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8003 on 1701 degrees of freedom
## Multiple R-squared:  0.3416, Adjusted R-squared:  0.3408 
## F-statistic: 441.3 on 2 and 1701 DF,  p-value: < 0.00000000000000022

Q6 Which of the two models is better?

Hint: Discuss in terms of both residual standard error and reported adjusted R squared.

Q7 Interpret the coefficient of year.

Hint: Discuss both its sign and magnitude.

Q7.a Based on the second model, what is the predicted life expectancy for a country with gdpPercap of $40,000 a year in 1997.

Hint: We had this discussion in class while watching the video at DataCamp, Correlation and Regression in R. The video is titled as “Interpretation of Regression” in Chapter 4: Interpreting Regression Models.

Q8 Hide the messages, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.