Hint: The data file is posted in Moodle. See Module 5. It’s named as “gapminder.csv”.
data <- read.csv("~//busstat/data/gapminder.csv")
head(data)
## country continent year lifeExp pop gdpPercap
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Afghanistan Asia 1957 30.332 9240934 820.8530
## 3 Afghanistan Asia 1962 31.997 10267083 853.1007
## 4 Afghanistan Asia 1967 34.020 11537966 836.1971
## 5 Afghanistan Asia 1972 36.088 13079460 739.9811
## 6 Afghanistan Asia 1977 38.438 14880372 786.1134
summary(data)
## country continent year lifeExp
## Length:1704 Length:1704 Min. :1952 Min. :23.60
## Class :character Class :character 1st Qu.:1966 1st Qu.:48.20
## Mode :character Mode :character Median :1980 Median :60.71
## Mean :1980 Mean :59.47
## 3rd Qu.:1993 3rd Qu.:70.85
## Max. :2007 Max. :82.60
## pop gdpPercap
## Min. :6.001e+04 Min. : 241.2
## 1st Qu.:2.794e+06 1st Qu.: 1202.1
## Median :7.024e+06 Median : 3531.8
## Mean :2.960e+07 Mean : 7215.3
## 3rd Qu.:1.959e+07 3rd Qu.: 9325.5
## Max. :1.319e+09 Max. :113523.1
Hint: For the code, refer to one of our textbooks, Data Visualization with R: Chapter 4.2. Map lifeExp to the y-axis and gdpPercap to the x-axis.
library(tidyverse)
ggplot(data,
aes(x = gdpPercap,
y = lifeExp)) +
geom_point()
Hint: Interpret both the direction and the strength of the correlation
cor(data$gdpPercap, data$lifeExp)
## [1] 0.5837062
It’s a measure of the strength of the association between the two variables. It from -1 to 1 so our 0.58 is a strong, postive direction.
Yes because as the graph shows the higher the gdp the the life expectancy rises.
Hint: For the code, refer to one of our textbooks, Data Visualization with R: Chapter 8.1.
# select numeric variables
df <- dplyr::select_if(data, is.numeric)
# calulate the correlations
r <- cor(df, use="complete.obs")
round(r,2)
## year lifeExp pop gdpPercap
## year 1.00 0.44 0.08 0.23
## lifeExp 0.44 1.00 0.06 0.58
## pop 0.08 0.06 1.00 -0.03
## gdpPercap 0.23 0.58 -0.03 1.00
library(ggcorrplot)
ggcorrplot(r)
ggcorrplot(r,
hc.order = TRUE,
type = "lower",
lab = TRUE)
The stong variable associated with life expectancy is gdpPercap. Year is moderatly associated with life expectancy is year.
Hint: A correct answer must include all of the following: 1) direction and strength of the correlation coefficient, and 2) linear versus non-linear relationship.
Yes I would agree because year is on a positive direction as life moves forward, which means life expectancy increases. Our data shows that It’s in on a linear relationship because as the years go by, life expectancy rises.
You could check to see if there are any non-linear relationships by checking the scatter plot.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.