Hint: The data file is posted in Moodle. See Module 5. It’s named as “gapminder.csv”.
data <- read.csv("~//Bisstats/Data/gapminder.csv")
head(data)
## country continent year lifeExp pop gdpPercap
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Afghanistan Asia 1957 30.332 9240934 820.8530
## 3 Afghanistan Asia 1962 31.997 10267083 853.1007
## 4 Afghanistan Asia 1967 34.020 11537966 836.1971
## 5 Afghanistan Asia 1972 36.088 13079460 739.9811
## 6 Afghanistan Asia 1977 38.438 14880372 786.1134
summary(data)
## country continent year lifeExp
## Length:1704 Length:1704 Min. :1952 Min. :23.60
## Class :character Class :character 1st Qu.:1966 1st Qu.:48.20
## Mode :character Mode :character Median :1980 Median :60.71
## Mean :1980 Mean :59.47
## 3rd Qu.:1993 3rd Qu.:70.85
## Max. :2007 Max. :82.60
## pop gdpPercap
## Min. :6.001e+04 Min. : 241.2
## 1st Qu.:2.794e+06 1st Qu.: 1202.1
## Median :7.024e+06 Median : 3531.8
## Mean :2.960e+07 Mean : 7215.3
## 3rd Qu.:1.959e+07 3rd Qu.: 9325.5
## Max. :1.319e+09 Max. :113523.1
Hint: For the code, refer to one of our textbooks, Data Visualization with R: Chapter 4.2. Map lifeExp to the y-axis and gdpPercap to the x-axis.
library(tidyverse)
ggplot(data,aes(x = gdpPercap,y = lifeExp)) +
geom_point(color="cornflowerblue",
size = 2,
alpha=.8)
Hint: Interpret both the direction and the strength of the correlation
cor(data$gdpPercap, data$lifeExp)
## [1] 0.5837062
The correlation coefficient is roughly 0.6 which is a fairly strong coefficient and is expressed as both variables relying on one another.
No since the correlattion is moderately strong, the spread is still in the majority in the bottom half. The variables are indeed correlated but that does not directly lead to causation. The standard of living does not have an effect on life expectancy.
Hint: For the code, refer to one of our textbooks, Data Visualization with R: Chapter 8.1.
df <- dplyr::select_if(data, is.numeric)
r <- cor(df, use="complete.obs")
round(r,2)
## year lifeExp pop gdpPercap
## year 1.00 0.44 0.08 0.23
## lifeExp 0.44 1.00 0.06 0.58
## pop 0.08 0.06 1.00 -0.03
## gdpPercap 0.23 0.58 -0.03 1.00
library(ggcorrplot)
## Warning: package 'ggcorrplot' was built under R version 4.0.3
ggcorrplot(r,
hc.order = TRUE,
type = "lower",
lab = TRUE)
The GDP has strong correlation to life expectancy and the year has moderate correlation to life expectancy.
Hint: A correct answer must include all of the following: 1) direction and strength of the correlation coefficient, and 2) linear versus non-linear relationship.
If the classmate is referring to the linear correlation to life expectancy i would have to disagree but if my classmate were referring to all the variables associated with this correlation i would have to agree.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.