We’ve already loaded several relevant packages above, including
ggplot2 and dplyr, to facilitate data
visualization for the gapminder dataset. Next, let’s
install and then load the gapminder dataset. Run the
following chunk of code to install and learn about
gapminder:
Learn more about the gapminder dataset here:
https://cran.r-project.org/web/packages/gapminder/readme/README.html
# Load required packages
install.packages("gapminder", repos = "http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/gg/hw0c_7m17zz__l5mpsm9j6vw0000gq/T//RtmpW1KeNq/downloaded_packages
library(gapminder)
Run a linear regression using gdpPercap to predict
lifeExp.
lm_original <- lm(lifeExp ~ gdpPercap, data = gapminder)
summary(lm_original)
##
## Call:
## lm(formula = lifeExp ~ gdpPercap, data = gapminder)
##
## Residuals:
## Min 1Q Median 3Q Max
## -82.754 -7.758 2.176 8.225 18.426
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.396e+01 3.150e-01 171.29 <2e-16 ***
## gdpPercap 7.649e-04 2.579e-05 29.66 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.49 on 1702 degrees of freedom
## Multiple R-squared: 0.3407, Adjusted R-squared: 0.3403
## F-statistic: 879.6 on 1 and 1702 DF, p-value: < 2.2e-16
Sort the values of gdpPercap and lifeExp,
independently of each other.
# Create a new dataset with independently sorted variables
sorted_gapminder_gdp <- gapminder[order(gapminder$gdpPercap), ]
sorted_gapminder_life <- gapminder[order(gapminder$lifeExp), ]
# Combine the sorted datasets into one dataset
combined_sorted_gapminder <- data.frame(
gdpPercap = sorted_gapminder_gdp$gdpPercap,
lifeExp = sorted_gapminder_life$lifeExp
)
# Show the first few rows of the combined sorted dataset
head(combined_sorted_gapminder)
Run a linear regression using the sorted variables,
gdpPercap to predict lifeExp.
lm_sorted <- lm(lifeExp ~ gdpPercap, data = combined_sorted_gapminder)
summary(lm_sorted)
##
## Call:
## lm(formula = lifeExp ~ gdpPercap, data = combined_sorted_gapminder)
##
## Residuals:
## Min 1Q Median 3Q Max
## -76.257 -6.063 2.031 7.872 10.117
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.273e+01 2.718e-01 194 <2e-16 ***
## gdpPercap 9.349e-04 2.226e-05 42 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.054 on 1702 degrees of freedom
## Multiple R-squared: 0.509, Adjusted R-squared: 0.5087
## F-statistic: 1764 on 1 and 1702 DF, p-value: < 2.2e-16
# Scatterplot of original data
plot(lifeExp ~ gdpPercap,
xlab = "GDP per capita",
ylab = "Life expectancy",
main = "Life expectancy vs. GDP per capita",
data = gapminder
)
abline(lm_original)
# Scatterplot of sorted data
plot(lifeExp ~ gdpPercap,
xlab = "GDP per capita",
ylab = "Life expectancy",
main = "Life expectancy vs. GDP per capita",
data = combined_sorted_gapminder
)
abline(lm_sorted)
This assignment references and cites the following sources:
Gapminder Dataset. Download Dataset
Gapminder. Gapminder