Description of the data, data_quiz5
countrycontinentlifeExp life expectancy in yearpop total populationgdpPercap GDP per capita in U.S. dollarHint: The data is posted in Moodle. Look for data_quiz5.csv under the Data Files section.
myRegressionData <- read.csv("data_quiz5.csv")
Hint: Use head() to display the first six rows.
head(myRegressionData, 6)
## country continent lifeExp pop gdpPercap
## 1 Albania Europe 76.423 3600523 5937.030
## 2 Algeria Africa 72.301 33333216 6223.367
## 3 Argentina Americas 75.320 40301927 12779.380
## 4 Australia Oceania 81.235 20434176 34435.367
## 5 Austria Europe 79.829 8199783 36126.493
## 6 Bahrain Asia 75.635 708573 29796.048
Hint: Create a scatter plot to examine the relationship between GDP per capita (mapped to y-axis) and life expectancy (mapped to x-axis).
library(ggplot2)
library(tidyquant)
ggplot(myRegressionData,
aes(x = lifeExp,
y = gdpPercap)) +
geom_point() +
geom_smooth(method = "lm")
Regression_lm <- lm(gdpPercap ~ lifeExp,
data = myRegressionData)
summary(Regression_lm)
##
## Call:
## lm(formula = gdpPercap ~ lifeExp, data = myRegressionData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17319.8 -4512.4 -63.2 3443.1 24014.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -215340.5 18057.2 -11.93 <2e-16 ***
## lifeExp 3075.6 237.6 12.94 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7578 on 81 degrees of freedom
## Multiple R-squared: 0.6741, Adjusted R-squared: 0.6701
## F-statistic: 167.5 on 1 and 81 DF, p-value: < 2.2e-16
Everything involved with this question has to do with the p-vbalue, and the reason I believe this signifys that the life expentancy is statistically significant is simply due to the fact that the p value is less than 5.
Hint: Discuss both its sign and magnitude.
The life expectancy coefficient tells us how much the gdp per limit variable will change per unit. The number is optimistic so we can assume that the gdp per cap is positively affected by life expectancy.
Hint: Make your argument using the relevant test results, such as p-value.
gdp_lm <- lm(gdpPercap ~ lifeExp + pop,
data = myRegressionData)
summary(gdp_lm)
##
## Call:
## lm(formula = gdpPercap ~ lifeExp + pop, data = myRegressionData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17337 -4536 -82 3463 23993
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.151e+05 1.833e+04 -11.735 <2e-16 ***
## lifeExp 3.073e+03 2.408e+02 12.762 <2e-16 ***
## pop -6.064e-07 5.660e-06 -0.107 0.915
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7625 on 80 degrees of freedom
## Multiple R-squared: 0.6741, Adjusted R-squared: 0.666
## F-statistic: 82.75 on 2 and 80 DF, p-value: < 2.2e-16
Comparing this at five percent, the population predictor is not statistically significant. This can be found by looking at the P value and deciding that 0.915 is well above 0.5 which is what it needs to be to be relevant. I’d say their theory to this friend seems to be incorrect because a country’s population is not statistically significant when calculating the gdp per limit.