Q1 Import data.

Hint: The data file is posted in Moodle. See Module 5. It’s named as “gapminder.csv”.

data <- read.csv("~//busStats/Data/gapminder.csv")
head(data)
##       country continent year lifeExp      pop gdpPercap
## 1 Afghanistan      Asia 1952  28.801  8425333  779.4453
## 2 Afghanistan      Asia 1957  30.332  9240934  820.8530
## 3 Afghanistan      Asia 1962  31.997 10267083  853.1007
## 4 Afghanistan      Asia 1967  34.020 11537966  836.1971
## 5 Afghanistan      Asia 1972  36.088 13079460  739.9811
## 6 Afghanistan      Asia 1977  38.438 14880372  786.1134
summary(data)
##    country           continent              year         lifeExp     
##  Length:1704        Length:1704        Min.   :1952   Min.   :23.60  
##  Class :character   Class :character   1st Qu.:1966   1st Qu.:48.20  
##  Mode  :character   Mode  :character   Median :1980   Median :60.71  
##                                        Mean   :1980   Mean   :59.47  
##                                        3rd Qu.:1993   3rd Qu.:70.85  
##                                        Max.   :2007   Max.   :82.60  
##       pop              gdpPercap       
##  Min.   :6.001e+04   Min.   :   241.2  
##  1st Qu.:2.794e+06   1st Qu.:  1202.1  
##  Median :7.024e+06   Median :  3531.8  
##  Mean   :2.960e+07   Mean   :  7215.3  
##  3rd Qu.:1.959e+07   3rd Qu.:  9325.5  
##  Max.   :1.319e+09   Max.   :113523.1
str(data)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ continent: chr  "Asia" "Asia" "Asia" "Asia" ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Q2 Create a scatter plot to visualize the relationship between life expectancy and GDP per capita.

Hint: For the code, refer to one of our textbooks, Data Visualization with R: Chapter 4.2. Map lifeExp to the y-axis and gdpPercap to the x-axis.

library(tidyverse)
## -- Attaching packages -------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## -- Conflicts ----------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
ggplot(data, 
       aes(x = gdpPercap, 
           y = lifeExp)) +
  geom_point()

Q3 Calculate and interpret the Pearson correlation coefficient.

Hint: Interpret both the direction and the strength of the correlation

cor(data$gdpPercap,data$lifeExp)
## [1] 0.5837062

Q4 Based on your analysis in Q2 and Q3, can you conclude that the standard of living (measured by GDP per capita) causes life expectancy to rise? Why or why not?

yes, the standard of living increases as life expectancy increases and it can be seen on the graph.

Q5 You suspect that there may be other variables that are asociated with life expectancy. Create a correlation plot.

Hint: For the code, refer to one of our textbooks, Data Visualization with R: Chapter 8.1.

df <- dplyr::select_if(data, is.numeric)
r <- cor(df, use="complete.obs")
round(r,2)
##           year lifeExp   pop gdpPercap
## year      1.00    0.44  0.08      0.23
## lifeExp   0.44    1.00  0.06      0.58
## pop       0.08    0.06  1.00     -0.03
## gdpPercap 0.23    0.58 -0.03      1.00
library(ggcorrplot)
## Warning: package 'ggcorrplot' was built under R version 4.0.3
ggcorrplot(r)

ggcorrplot(r, 
           hc.order = TRUE, 
           type = "lower",
           lab = TRUE)

Q6 List any variable with a strong or moderate positive association with life expectancy, if any.

life expectancy associated with gdpPercap has a strong positive variable. Life expectancy associated with year is a moderate postive variable.

Q7 Your classmate argues that the world has gotten better in the recent past and people tend to live longer each year. Would you agree? Argue your case based on the correlation coefficient between life expectancy and year.

Hint: A correct answer must include all of the following: 1) direction and strength of the correlation coefficient, and 2) linear versus non-linear relationship.

yes I agree that each year life expectancy is moderately positive. It has a strong linear relationship, but if you wanted to check the non linear relationship of year and life expectancy in a scatter plot.

Q8 Hide the messages, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.