Description of the data, data_quiz5
countrycontinentlifeExp life expectancy in yearpop total populationgdpPercap GDP per capita in U.S. dollarHint: The data is posted in Moodle. Look for data_quiz5.csv under the Data Files section.
myClusterData <- read.csv("data_quiz5.csv")
Hint: Use head() to display the first six rows.
head(myClusterData,6)
## country continent lifeExp pop gdpPercap
## 1 Albania Europe 76.423 3600523 5937.030
## 2 Algeria Africa 72.301 33333216 6223.367
## 3 Argentina Americas 75.320 40301927 12779.380
## 4 Australia Oceania 81.235 20434176 34435.367
## 5 Austria Europe 79.829 8199783 36126.493
## 6 Bahrain Asia 75.635 708573 29796.048
Hint: Create a scatter plot to examine the relationship between GDP per capita (mapped to y-axis) and life expectancy (mapped to x-axis).
library(tidyverse)
ggplot(myClusterData,
aes(x = lifeExp,
y = gdpPercap)) +
geom_point() +
geom_smooth(method="lm")
gdpPercap_lm <- lm(gdpPercap ~ lifeExp + pop,
data = myClusterData)
summary(gdpPercap_lm)
##
## Call:
## lm(formula = gdpPercap ~ lifeExp + pop, data = myClusterData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17337 -4536 -82 3463 23993
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.151e+05 1.833e+04 -11.735 <2e-16 ***
## lifeExp 3.073e+03 2.408e+02 12.762 <2e-16 ***
## pop -6.064e-07 5.660e-06 -0.107 0.915
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7625 on 80 degrees of freedom
## Multiple R-squared: 0.6741, Adjusted R-squared: 0.666
## F-statistic: 82.75 on 2 and 80 DF, p-value: < 2.2e-16
options(scipen = 999)
At the age of 3075 the life expectancy is over 5% which makes it statistically significant.
Hint: Discuss both its sign and magnitude.
The coefficient of life expectancy is positve so it effects it positively.
Hint: Make your argument using the relevant test results, such as p-value.
gdpPercap_lm <- lm(gdpPercap ~ lifeExp + pop,
data = myClusterData)
summary(gdpPercap_lm)
##
## Call:
## lm(formula = gdpPercap ~ lifeExp + pop, data = myClusterData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17337 -4536 -82 3463 23993
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) -215081.8386328746 18328.1060846602 -11.735
## lifeExp 3072.6060571578 240.7532777810 12.762
## pop -0.0000006064 0.0000056600 -0.107
## Pr(>|t|)
## (Intercept) <0.0000000000000002 ***
## lifeExp <0.0000000000000002 ***
## pop 0.915
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7625 on 80 degrees of freedom
## Multiple R-squared: 0.6741, Adjusted R-squared: 0.666
## F-statistic: 82.75 on 2 and 80 DF, p-value: < 0.00000000000000022
options(scipen = 999)