Preparation

We’ve already loaded several relevant packages above, including ggplot2 and dplyr, to facilitate data visualization for the gapminder dataset. Next, let’s install and then load the gapminder dataset. Run the following chunk of code to install and learn about gapminder:

Learn more about the gapminder dataset here:
https://cran.r-project.org/web/packages/gapminder/readme/README.html

# Load required packages
install.packages("gapminder", repos = "http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/gg/hw0c_7m17zz__l5mpsm9j6vw0000gq/T//RtmpW1KeNq/downloaded_packages
library(gapminder)


Perform linear regression of GDP per capita to predict life expectancy.

Run a linear regression using gdpPercap to predict lifeExp.

lm_original <- lm(lifeExp ~ gdpPercap, data = gapminder)
summary(lm_original)
## 
## Call:
## lm(formula = lifeExp ~ gdpPercap, data = gapminder)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -82.754  -7.758   2.176   8.225  18.426 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5.396e+01  3.150e-01  171.29   <2e-16 ***
## gdpPercap   7.649e-04  2.579e-05   29.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.49 on 1702 degrees of freedom
## Multiple R-squared:  0.3407, Adjusted R-squared:  0.3403 
## F-statistic: 879.6 on 1 and 1702 DF,  p-value: < 2.2e-16

Sort data to demonstrate why this is not appropriate.

Sort the values of gdpPercap and lifeExp, independently of each other.

# Create a new dataset with independently sorted variables
sorted_gapminder_gdp <- gapminder[order(gapminder$gdpPercap), ]
sorted_gapminder_life <- gapminder[order(gapminder$lifeExp), ]

# Combine the sorted datasets into one dataset
combined_sorted_gapminder <- data.frame(
  gdpPercap = sorted_gapminder_gdp$gdpPercap,
  lifeExp = sorted_gapminder_life$lifeExp
)

# Show the first few rows of the combined sorted dataset
head(combined_sorted_gapminder)

Perform linear regression of sorted variables, GDP per capita to predict life expectancy.

Run a linear regression using the sorted variables, gdpPercap to predict lifeExp.

lm_sorted <- lm(lifeExp ~ gdpPercap, data = combined_sorted_gapminder)
summary(lm_sorted)
## 
## Call:
## lm(formula = lifeExp ~ gdpPercap, data = combined_sorted_gapminder)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -76.257  -6.063   2.031   7.872  10.117 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5.273e+01  2.718e-01     194   <2e-16 ***
## gdpPercap   9.349e-04  2.226e-05      42   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.054 on 1702 degrees of freedom
## Multiple R-squared:  0.509,  Adjusted R-squared:  0.5087 
## F-statistic:  1764 on 1 and 1702 DF,  p-value: < 2.2e-16

Compare scatterplots of the two regression equations.

#   Scatterplot of original data 
plot(lifeExp ~ gdpPercap,
     xlab = "GDP per capita",
     ylab = "Life expectancy",
     main = "Life expectancy vs. GDP per capita",
     data = gapminder
     )
abline(lm_original)

#   Scatterplot of sorted data 
plot(lifeExp ~ gdpPercap,
     xlab = "GDP per capita",
     ylab = "Life expectancy",
     main = "Life expectancy vs. GDP per capita",
     data = combined_sorted_gapminder
     )
abline(lm_sorted)

Works Cited

This assignment references and cites the following sources: