Make sure to include the unit of the values whenever appropriate.
Hint: The variables are available in the gapminder data set from the gapminder package. Note that the data set and package both have the same name, gapminder.
library(tidyverse)
options(scipen=999)
data(gapminder, package="gapminder")
houses_lm <- lm(lifeExp ~ gdpPercap,
data = gapminder)
# View summary of model 1
summary(houses_lm)
##
## Call:
## lm(formula = lifeExp ~ gdpPercap, data = gapminder)
##
## Residuals:
## Min 1Q Median 3Q Max
## -82.754 -7.758 2.176 8.225 18.426
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 53.95556088 0.31499494 171.29 <0.0000000000000002 ***
## gdpPercap 0.00076488 0.00002579 29.66 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.49 on 1702 degrees of freedom
## Multiple R-squared: 0.3407, Adjusted R-squared: 0.3403
## F-statistic: 879.6 on 1 and 1702 DF, p-value: < 0.00000000000000022
Hint: Your answer must include a discussion on the p-value. Coefficient is significant,it is less than 5%.
Hint: Discuss both its sign and magnitude. gdpPercap is,00076488, gdpPercap increases by $1, the life expectancy increases by .00076488 years.
Hint: Provide a technical interpretation.
With the intercept being 53.955, this is projected that if you’re born with a $0 gdpPercap, your life expectancy at birth is 53.95 years.
Hint: This is a model with two explanatory
data(gapminder, package="gapminder")
houses_lm <- lm(lifeExp ~ year, gdpPercap,
data = gapminder)
# View summary of model 1
summary(houses_lm)
##
## Call:
## lm(formula = lifeExp ~ year, data = gapminder, subset = gdpPercap)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.221 -9.436 1.517 11.201 21.581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -573.69800 56.15343 -10.22 <0.0000000000000002 ***
## year 0.31998 0.02837 11.28 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.86 on 580 degrees of freedom
## (1122 observations deleted due to missingness)
## Multiple R-squared: 0.1799, Adjusted R-squared: 0.1784
## F-statistic: 127.2 on 1 and 580 DF, p-value: < 0.00000000000000022
summary(houses_lm) ## Q6 Which of the two models is better? Hint: Discuss in terms of both residual standard error and reported adjusted R squared.
I think the second one is better because it is closer to the line of regression it hits more data points. Allthought the first modle misses more people.
Hint: Discuss both its sign and magnitude. Coefficient is equal to 5 percent in magnitude.
Hint: We had this discussion in class while watching the video at DataCamp, Correlation and Regression in R. The video is titled as “Interpretation of Regression” in Chapter 4: Interpreting Regression Models.
The coefficient for a year in 1997 is positive. 76 years predicted life expectancy for a country with gdpPercap of 40,000 in the year 1997.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.