Q1 Import data.

Hint: The data file is posted in Moodle. See Module 5. It’s named as “gapminder.csv”.

data <- read.csv("~//busstat/data/gapminder.csv")
head(data)
##       country continent year lifeExp      pop gdpPercap
## 1 Afghanistan      Asia 1952  28.801  8425333  779.4453
## 2 Afghanistan      Asia 1957  30.332  9240934  820.8530
## 3 Afghanistan      Asia 1962  31.997 10267083  853.1007
## 4 Afghanistan      Asia 1967  34.020 11537966  836.1971
## 5 Afghanistan      Asia 1972  36.088 13079460  739.9811
## 6 Afghanistan      Asia 1977  38.438 14880372  786.1134
summary(data)
##    country           continent              year         lifeExp     
##  Length:1704        Length:1704        Min.   :1952   Min.   :23.60  
##  Class :character   Class :character   1st Qu.:1966   1st Qu.:48.20  
##  Mode  :character   Mode  :character   Median :1980   Median :60.71  
##                                        Mean   :1980   Mean   :59.47  
##                                        3rd Qu.:1993   3rd Qu.:70.85  
##                                        Max.   :2007   Max.   :82.60  
##       pop              gdpPercap       
##  Min.   :6.001e+04   Min.   :   241.2  
##  1st Qu.:2.794e+06   1st Qu.:  1202.1  
##  Median :7.024e+06   Median :  3531.8  
##  Mean   :2.960e+07   Mean   :  7215.3  
##  3rd Qu.:1.959e+07   3rd Qu.:  9325.5  
##  Max.   :1.319e+09   Max.   :113523.1

Q2 Create a scatter plot to visualize the relationship between life expectancy and GDP per capita.

Hint: For the code, refer to one of our textbooks, Data Visualization with R: Chapter 4.2. Map lifeExp to the y-axis and gdpPercap to the x-axis.

library(tidyverse)
ggplot(data, 
       aes(x = gdpPercap, 
           y = lifeExp)) +
  geom_point()

Q3 Calculate and interpret the Pearson correlation coefficient.

Hint: Interpret both the direction and the strength of the correlation

cor(data$gdpPercap, data$lifeExp)
## [1] 0.5837062

It’s a measure of the strength of the association between the two variables. It from -1 to 1 so our 0.58 is a strong, postive direction.

Q4 Based on your analysis in Q2 and Q3, can you conclude that the standard of living (measured by GDP per capita) causes life expectancy to rise? Why or why not?

Yes because as the graph shows the higher the gdp the the life expectancy rises.

Q5 You suspect that there may be other variables that are asociated with life expectancy. Create a correlation plot.

Hint: For the code, refer to one of our textbooks, Data Visualization with R: Chapter 8.1.


# select numeric variables
df <- dplyr::select_if(data, is.numeric)

# calulate the correlations
r <- cor(df, use="complete.obs")
round(r,2) 
##           year lifeExp   pop gdpPercap
## year      1.00    0.44  0.08      0.23
## lifeExp   0.44    1.00  0.06      0.58
## pop       0.08    0.06  1.00     -0.03
## gdpPercap 0.23    0.58 -0.03      1.00

library(ggcorrplot)
ggcorrplot(r) 


ggcorrplot(r, 
           hc.order = TRUE, 
           type = "lower",
           lab = TRUE)

Q6 List any variable with a strong or moderate positive association with life expectancy, if any.

The stong variable associated with life expectancy is gdpPercap. Year is moderatly associated with life expectancy is year.

Q7 Your classmate argues that the world has gotten better in the recent past and people tend to live longer each year. Would you agree? Argue your case based on the correlation coefficient between life expectancy and year.

Hint: A correct answer must include all of the following: 1) direction and strength of the correlation coefficient, and 2) linear versus non-linear relationship.

Yes I would agree because year is on a positive direction as life moves forward, which means life expectancy increases. Our data shows that It’s in on a linear relationship because as the years go by, life expectancy rises.

You could check to see if there are any non-linear relationships by checking the scatter plot.

Q8 Hide the messages, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.