Introduction

We want to research the association of basic education and its impact to different aspects of society. There’s a common understanding that education is beneficial for society and as people become more educated, society becomes more civilized. This is paired with the idea that when evaluating a scale of animal instincts to conscious human actions– its better for society if its members are more like wise sages and less like impulsive cave men.

The data that we will use to explore these assumptions and its effects on different facets of society will be sourced from gapminder.

To proxy or measure basic education, we’ll take a look at literacy rates of adults, completion rates of primary school, and expenditure per student as a % of GDP. Primary school completion rates can be influenced by many factors, but our initial idea is that this is a suitable proxy to measure how much a society values and is capable of putting their children through school.

We’ll then be looking at a few other factors that we believe may be related and discuss the results of regression models.

Preparing the data

In this section, lets grab all the data from gapminder and tidy it into one dataframe that we’ll then use to create the models.

All data has been downloaded from Gapminder

Explanatory Variables

Lets download, join, and tidy the two data sources we’ll be looking at for our predictor variables. Both are csv files that contain metrics by country and year.

Since all of the gapminder datasets seem to be in the same format, with countries as rows and years as columns, lets use a function to unpivot them into country and year in one line, since those will be the observations we’ll be using.

##                    country year literacy_rate pschool_crate pschool_erate
## 1             Burkina Faso 1975          8.69          7.63            NA
## 2 Central African Republic 1975         18.20         36.20            NA
## 3                   Kuwait 1975         59.60         60.60            NA
## 4                   Turkey 1975         61.60            NA            NA
## 5     United Arab Emirates 1975         53.50         41.70            NA
## 6                  Uruguay 1975         93.90            NA            NA
## [1] 144

It looks like there’s only 144 country-year observations with all 3 metrics, which may make it difficult to use all three predictor variables in one model.

Lets evaluate which variables have enough data that we can use:

## # A tibble: 3 x 5
##       n    ns    ls    cs    es
##   <dbl> <int> <dbl> <dbl> <dbl>
## 1     3   144   144   144   144
## 2     2  1303   256  1274  1076
## 3     1  3684   161  3197   326

This is a table that shows the instances of the variables.

  • n = the number of variables, out of 3, that exist per observation.

There are 144 observations with all 3 variables, 1303 observations with 2 variables, and 3684 with only 1 variable.

## Number of Rows for literacy rate dataframe: 561 
## Number of Distinct Countries: 150
## Number of Rows for primary school completion rate dataframe: 4615 
## Number of Distinct Countries: 185
## Number of Rows for primary school expenditure rate dataframe: 1546 
## Number of Distinct Countries: 163

It looks like the primary school completion rate has the most observations, but we’ll still need to determine which variables can be used based on the country-year match in the response variables.

Murder and Suicide

For most people, the idea of murder and suicide is a rare occurence in civilized society. We hear about these acts of violence rarely with people we know first hand, and unfortunately quite often in the news.

It wouldn’t be a huge leap to say that people who are more educated are less likely to be involved in these kinds of situations, but we’ll take a look at the data to see if this association can be supported by data.

##                    country year literacy_rate pschool_crate pschool_erate
## 1             Burkina Faso 1975          8.69          7.63            NA
## 2 Central African Republic 1975         18.20         36.20            NA
## 3                   Kuwait 1975         59.60         60.60            NA
## 4                   Turkey 1975         61.60            NA            NA
## 5     United Arab Emirates 1975         53.50         41.70            NA
## 6                  Uruguay 1975         93.90            NA            NA
##   murder_rate suicide_rate
## 1          NA           NA
## 2          NA           NA
## 3        1.64        0.504
## 4          NA           NA
## 5          NA           NA
## 6        2.96        9.930

Vaccination Rate

Childhood vaccinations are one way in which successful societies protect their population. Here in the United States, vaccinations are received for a variety of potential ailments. As any of these could serve as a proxy for society wellness and mindfulness, we pulled in each of these datasets to see which was the most complete.

##    country               year      literacy_rate    pschool_crate   
##  Length:7369        Min.   :1970   Min.   :  8.69   Min.   :  1.52  
##  Class :character   1st Qu.:1987   1st Qu.: 64.90   1st Qu.: 62.50  
##  Mode  :character   Median :1997   Median : 85.10   Median : 90.70  
##                     Mean   :1997   Mean   : 77.04   Mean   : 79.38  
##                     3rd Qu.:2007   3rd Qu.: 94.10   3rd Qu.: 98.70  
##                     Max.   :2019   Max.   :100.00   Max.   :135.00  
##                                    NA's   :6808     NA's   :2754    
##  pschool_erate       DTP_rate        MCV_rate        PAB_rate    
##  Min.   : 0.235   Min.   : 0.00   Min.   : 0.00   Min.   : 1.00  
##  1st Qu.:10.500   1st Qu.:66.00   1st Qu.:62.00   1st Qu.:42.00  
##  Median :15.000   Median :86.00   Median :84.00   Median :67.00  
##  Mean   :15.791   Mean   :76.87   Mean   :75.49   Mean   :60.31  
##  3rd Qu.:20.000   3rd Qu.:95.00   3rd Qu.:94.00   3rd Qu.:83.00  
##  Max.   :65.100   Max.   :99.00   Max.   :99.00   Max.   :99.00  
##  NA's   :5823     NA's   :1817    NA's   :1937    NA's   :4479   
##    hepb3_rate   
##  Min.   : 1.00  
##  1st Qu.:77.00  
##  Median :91.00  
##  Mean   :82.47  
##  3rd Qu.:96.00  
##  Max.   :99.00  
##  NA's   :5106

Looking at the quantity of NAs in each variable in the summary, it is clear there is significantly more data on DTP and measles vaccinations. For these reasons we will exclude the PAB and HepB vaccination datasets.

##                    country year literacy_rate pschool_crate pschool_erate
## 1             Burkina Faso 1975          8.69          7.63            NA
## 2 Central African Republic 1975         18.20         36.20            NA
## 3                   Kuwait 1975         59.60         60.60            NA
## 4                   Turkey 1975         61.60            NA            NA
## 5     United Arab Emirates 1975         53.50         41.70            NA
## 6                  Uruguay 1975         93.90            NA            NA
##   murder_rate suicide_rate DTP_rate MCV_rate
## 1          NA           NA       NA       NA
## 2          NA           NA       NA       NA
## 3        1.64        0.504       NA       NA
## 4          NA           NA       NA       NA
## 5          NA           NA       NA       NA
## 6        2.96        9.930       NA       NA

Inequality

The Gini coefficient, a measure of inequality, is another metric in this dataset we wanted to explore. The Gini coefficient is on a scale from 1 to 100, with higher numbers implying a greater rate of inequality. In this dataset, it appears that each country has a Gini coefficient value for all years, making it ideal for our purposes.