Hint: The data file is posted in Moodle. See Module 5. It’s named as “gapminder.csv”.
data <- read.csv("~//Business Stats/data/Minitab.csv")
head(data)
## country continent year lifeExp pop gdpPercap
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Afghanistan Asia 1957 30.332 9240934 820.8530
## 3 Afghanistan Asia 1962 31.997 10267083 853.1007
## 4 Afghanistan Asia 1967 34.020 11537966 836.1971
## 5 Afghanistan Asia 1972 36.088 13079460 739.9811
## 6 Afghanistan Asia 1977 38.438 14880372 786.1134
summary(data)
## country continent year lifeExp
## Length:1704 Length:1704 Min. :1952 Min. :23.60
## Class :character Class :character 1st Qu.:1966 1st Qu.:48.20
## Mode :character Mode :character Median :1980 Median :60.71
## Mean :1980 Mean :59.47
## 3rd Qu.:1993 3rd Qu.:70.85
## Max. :2007 Max. :82.60
## pop gdpPercap
## Min. :6.001e+04 Min. : 241.2
## 1st Qu.:2.794e+06 1st Qu.: 1202.1
## Median :7.024e+06 Median : 3531.8
## Mean :2.960e+07 Mean : 7215.3
## 3rd Qu.:1.959e+07 3rd Qu.: 9325.5
## Max. :1.319e+09 Max. :113523.1
str(data)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ continent: chr "Asia" "Asia" "Asia" "Asia" ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num 779 821 853 836 740 ...
Hint: For the code, refer to one of our textbooks, Data Visualization with R: Chapter 4.2. Map lifeExp to the y-axis and gdpPercap to the x-axis.
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
ggplot(data,
aes(x = gdpPercap,
y = lifeExp)) +
geom_point()
Hint: Interpret both the direction and the strength of the correlation
cor(data$lifeExp, data$gdpPercap)
## [1] 0.5837062
With the Pearson Correlation Coefficent being 0.58 this means that GDP per capita and life expectancy are positive and associated
Yes we can conclude that standard of living causes life expectancy to rise because there is a positive correlation between the GDP per capita and life expectancy.
Hint: For the code, refer to one of our textbooks, Data Visualization with R: Chapter 8.1.
df <- dplyr::select_if(data, is.numeric)
r <- cor(df, use="complete.obs")
round(r,2)
## year lifeExp pop gdpPercap
## year 1.00 0.44 0.08 0.23
## lifeExp 0.44 1.00 0.06 0.58
## pop 0.08 0.06 1.00 -0.03
## gdpPercap 0.23 0.58 -0.03 1.00
library(ggcorrplot)
## Warning: package 'ggcorrplot' was built under R version 4.0.3
ggcorrplot(r,
hc.order = TRUE,
type = "lower",
lab = TRUE)
GDP per capita has moderate positive association with life expectancy at 0.58 and Life expectancy has a positive association with year at 0.44
I would agree with that because there is a positively moderate correlation between the life expectancy and the year in a linear relationship based on the chart.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.