library(Hmisc)
library(tidyverse)
url <- "https://raw.githubusercontent.com/peterphung2043/DATA-606---Final-Project/main/Life%20Expectancy%20Data.csv"
life_expectancy_data <- read.csv(url(url))
parsed_life_expectancy_data <- life_expectancy_data %>%
select(Life.expectancy, Schooling)
Does spending more years in school have any correspondence with increased or decreased life expectancy?
The cases are countries across the world. The dataset includes 15 years of school retention data for each case. There are 193 countries in this dataset (193 countries and 15 years of school retention data for each country is 2893 cases). I think that I will probably focus on the most recent year for each country (2015). I might possibly perform a comparison from the earliest year to the latest year for each country.
The World Health Organization keeps track of the health status of every country in the world. This data was collected from the WHO and United Nations website and uploaded onto kaggle.
This is an observational study.
KumarRajarshi. (2018, February 10). Life expectancy (WHO). Kaggle. Retrieved October 31, 2021, from https://www.kaggle.com/kumarajarshi/life-expectancy-who.
The dependent variable is the life expectancy and it is quantitative.
The independent variable is the number of years of schooling and it is quantitative.
describe(parsed_life_expectancy_data$Life.expectancy)
## parsed_life_expectancy_data$Life.expectancy
## n missing distinct Info Mean Gmd .05 .10
## 2928 10 362 1 69.22 10.62 51.4 54.8
## .25 .50 .75 .90 .95
## 63.1 72.1 75.7 79.7 82.0
##
## lowest : 36.3 39.0 41.0 41.5 42.3, highest: 85.0 86.0 87.0 88.0 89.0
describe(parsed_life_expectancy_data$Schooling)
## parsed_life_expectancy_data$Schooling
## n missing distinct Info Mean Gmd .05 .10
## 2775 163 173 1 11.99 3.713 5.8 7.7
## .25 .50 .75 .90 .95
## 10.1 12.3 14.3 15.9 16.8
##
## lowest : 0.0 2.8 2.9 3.0 3.1, highest: 20.3 20.4 20.5 20.6 20.7
parsed_life_expectancy_data %>%
pivot_longer(cols = c(Life.expectancy, Schooling)) %>%
ggplot(mapping = aes(x = value)) +
geom_histogram() +
facet_wrap(~name, scales = "free_x")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 173 rows containing non-finite values (stat_bin).
parsed_life_expectancy_data %>%
ggplot(mapping = aes(x = Schooling, y = Life.expectancy)) +
geom_point()
## Warning: Removed 170 rows containing missing values (geom_point).