Description of the data, data_quiz5

library(tidyverse)
library(scales)

Q1 Import data

Hint: The data is posted in Moodle. Look for data_quiz5.csv under the Data Files section.

data_quiz5 <- read.csv ("data_quiz5.csv")

Q2 Review data

Hint: Use head() to display the first six rows.

head(data_quiz5)
##     country continent lifeExp      pop gdpPercap
## 1   Albania    Europe  76.423  3600523  5937.030
## 2   Algeria    Africa  72.301 33333216  6223.367
## 3 Argentina  Americas  75.320 40301927 12779.380
## 4 Australia   Oceania  81.235 20434176 34435.367
## 5   Austria    Europe  79.829  8199783 36126.493
## 6   Bahrain      Asia  75.635   708573 29796.048

Q3 Visualize data

Hint: Create a scatter plot to examine the relationship between GDP per capita (mapped to y-axis) and life expectancy (mapped to x-axis).

library(ggplot2)
data(data_quiz5)

ggplot(data_quiz5, 
       aes(x = gdpPercap, 
           y = lifeExp)) +
  geom_point()

Q4 Build a regression model to predict GDP per capita using life expectancy.

houses_lm <- lm(gdpPercap ~ lifeExp, 
                data = data_quiz5)
summary(houses_lm)
## 
## Call:
## lm(formula = gdpPercap ~ lifeExp, data = data_quiz5)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17319.8  -4512.4    -63.2   3443.1  24014.4 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -215340.5    18057.2  -11.93   <2e-16 ***
## lifeExp        3075.6      237.6   12.94   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7578 on 81 degrees of freedom
## Multiple R-squared:  0.6741, Adjusted R-squared:  0.6701 
## F-statistic: 167.5 on 1 and 81 DF,  p-value: < 2.2e-16

Q5 Is the coefficient of life expectancy statistically significant at 5%?

The coefficient of life expectancy is statistically significant at 5%.

Q6 Interpret the coefficient of life expectancy.

Hint: Discuss both its sign and magnitude.

The coefficient of life expectancy has a P-value of -2e16, meaning that it has no statistical significance at 5%.

Q7 Your friend suggests that the more populous a country, the higher its standard living (GDP per capita) is. Create a new model below by adding an additional predictor to the regression model above to test this hypothesis. Is the new variable statistically significant? What would you say to your friend regarding his/her claim?

Hint: Make your argument using the relevant test results, such as p-value.

library(ggplot2)
data(data_quiz5)

ggplot(data_quiz5, 
       aes(x = pop, 
           y = gdpPercap)) +
  geom_point()

houses_lm <- lm(pop ~ gdpPercap, 
                data = data_quiz5)
summary(houses_lm)
## 
## Call:
## lm(formula = pop ~ gdpPercap, data = data_quiz5)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
##  -60003181  -46650502  -28125705   -7160866 1256933472 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 67575854   28089529   2.406   0.0184 *
## gdpPercap      -1175       1255  -0.936   0.3519  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 149900000 on 81 degrees of freedom
## Multiple R-squared:  0.01071,    Adjusted R-squared:  -0.001508 
## F-statistic: 0.8765 on 1 and 81 DF,  p-value: 0.3519

Since the p-value of .3519 < .05, we can not support the hypothesis that population has an impact on the GDP of a country.

Q8 Hide the messages, but display the code and its results on the webpage.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.