Data Preparation

library(readr)
Comparable_20Returns_20to_20Education_20sheet3 <- read_csv("https://raw.githubusercontent.com/candrewxs/Project_Proposal_D606/main/data/Comparable%20Returns%20to%20Education%20sheet3.csv")
comp_edu <- Comparable_20Returns_20to_20Education_20sheet3
View(comp_edu)

Research question

Will the returns on education influence an individual’s capital investment?

Which gender has the highest returns on education?

Cases

The data set on private returns to education includes estimates for 142 economies from 1970 to 2014 using 853 harmonized household surveys. There 853 observations in the given data set.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
n_distinct(comp_edu$Economy)
## [1] 142

Take a look of the dataset

library(reactable)
reactable(comp_edu)

Variables name in the datasets

names(comp_edu)
##  [1] "Economy"   "Year"      "Survey"    "Unit wage" "ß1"        "ß1sd"     
##  [7] "ß2"        "ß2sd"      "ß1 M"      "ß1Msd"     "ß2 M"      "ß2Msd"    
## [13] "ß1 F"      "ß1Fsd"     "ß2 F"      "ß2Fsd"     "Pri"       "Sec"      
## [19] "Ter"       "Pri M"     "Sec M"     "Ter M"     "Pri F"     "Sec F"    
## [25] "Ter F"

Data collection

Data is collected by The World Bank. The research was published online on Emerald Insight | Dicover Journals, Books & Case Studies by International Journal of Manpower. The authors developed a database from existing national household surveys that was initially complied by the World Bank’s World Development Report unit over the period 2005 - 2011, and now under the World Bank’s Global Working Poverty Group.

Type of study

This research study is observational.

Data Source

Data is submitted by Montenegro, C.E. and Patrinos, H.A. (2021), “A data set of comparable estimates of the private rate of return to schooling in the world, 1970–2014”, International Journal of Manpower, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/IJM-03-2021-0184

Dependent Variable

The dependent variable is yearly return to schooling and is qualitative.

Independent Variable

The quantitative variables are: (1) level of education, which is defined by the highest grade attended and completed; (2) returns to primary education (3) returns to secondary education (4) returns to tertiary education (5) returns to primary education for males (6) returns to secondary educaton for males (7) returns to tertiary education for males (8) returns to primary education for females (9) returns to secondary educaton for females (7) returns to tertiary education for females

The qualitative variables are: (1) economy (2) year (3) survey (4) unit wage

Relevant summary statistics

The authors estimate an average private rate of return to schooling of 10%. This provides a reasonable estimate of the returns to education and should be useful for a variety of empirical work, including critical information for youth.

Summary statistics of all variables

summary(comp_edu)
##    Economy               Year         Survey           Unit wage        
##  Length:853         Min.   :1970   Length:853         Length:853        
##  Class :character   1st Qu.:2001   Class :character   Class :character  
##  Mode  :character   Median :2005   Mode  :character   Mode  :character  
##                     Mean   :2004                                        
##                     3rd Qu.:2009                                        
##                     Max.   :2014                                        
##                                                                         
##        ß1             ß1sd                 ß2              ß2sd          
##  Min.   : 0.20   Min.   :0.0000000   Min.   :-2.500   Min.   :0.0000000  
##  1st Qu.: 8.00   1st Qu.:0.0001000   1st Qu.: 2.100   1st Qu.:0.0000000  
##  Median : 9.90   Median :0.0001000   Median : 3.000   Median :0.0001000  
##  Mean   :10.02   Mean   :0.0006216   Mean   : 3.021   Mean   :0.0002681  
##  3rd Qu.:11.60   3rd Qu.:0.0002000   3rd Qu.: 3.800   3rd Qu.:0.0001000  
##  Max.   :26.30   Max.   :0.0193000   Max.   : 8.900   Max.   :0.0116000  
##                                                                          
##       ß1 M            ß1Msd                ß2 M            ß2Msd          
##  Min.   : 0.500   Min.   :0.0000000   Min.   :-2.500   Min.   :0.0000000  
##  1st Qu.: 7.800   1st Qu.:0.0001000   1st Qu.: 2.300   1st Qu.:0.0001000  
##  Median : 9.400   Median :0.0002000   Median : 3.200   Median :0.0001000  
##  Mean   : 9.601   Mean   :0.0007952   Mean   : 3.312   Mean   :0.0003619  
##  3rd Qu.:11.200   3rd Qu.:0.0003000   3rd Qu.: 4.200   3rd Qu.:0.0002000  
##  Max.   :24.500   Max.   :0.0272000   Max.   :11.600   Max.   :0.0148000  
##                                                                           
##       ß1 F           ß1Fsd                ß2 F            ß2Fsd          
##  Min.   : 1.80   Min.   :0.0000000   Min.   :-3.000   Min.   :0.0000000  
##  1st Qu.: 9.50   1st Qu.:0.0001000   1st Qu.: 1.800   1st Qu.:0.0001000  
##  Median :11.30   Median :0.0002000   Median : 2.400   Median :0.0001000  
##  Mean   :11.61   Mean   :0.0009892   Mean   : 2.619   Mean   :0.0004313  
##  3rd Qu.:13.40   3rd Qu.:0.0004000   3rd Qu.: 3.400   3rd Qu.:0.0002000  
##  Max.   :28.40   Max.   :0.0341000   Max.   :10.900   Max.   :0.0193000  
##                                                                          
##       Pri              Sec              Ter            Pri M       
##  Min.   :  0.00   Min.   : 0.000   Min.   : 0.20   Min.   :  0.00  
##  1st Qu.:  6.50   1st Qu.: 4.900   1st Qu.:10.80   1st Qu.:  5.90  
##  Median :  9.70   Median : 6.400   Median :14.90   Median :  9.00  
##  Mean   : 11.04   Mean   : 7.388   Mean   :15.06   Mean   : 10.41  
##  3rd Qu.: 13.45   3rd Qu.: 9.500   3rd Qu.:18.80   3rd Qu.: 12.60  
##  Max.   :135.50   Max.   :26.800   Max.   :43.20   Max.   :129.70  
##  NA's   :278      NA's   :200      NA's   :62      NA's   :284     
##      Sec M            Ter M           Pri F           Sec F       
##  Min.   : 0.000   Min.   : 0.10   Min.   : 0.40   Min.   : 0.000  
##  1st Qu.: 4.900   1st Qu.:10.70   1st Qu.: 5.80   1st Qu.: 5.525  
##  Median : 6.300   Median :14.95   Median : 9.80   Median : 7.600  
##  Mean   : 7.198   Mean   :15.09   Mean   :11.15   Mean   : 8.859  
##  3rd Qu.: 8.800   3rd Qu.:18.82   3rd Qu.:13.60   3rd Qu.:11.500  
##  Max.   :28.400   Max.   :42.70   Max.   :80.90   Max.   :45.600  
##  NA's   :206      NA's   :77      NA's   :324     NA's   :219     
##      Ter F      
##  Min.   : 0.40  
##  1st Qu.:12.45  
##  Median :16.20  
##  Mean   :16.69  
##  3rd Qu.:20.10  
##  Max.   :45.10  
##  NA's   :90

Variables Data Dictionary

library(readr)
data_dict <- Comparable_20Returns_20to_20Education_20Data_20Dictionary <- read_csv("https://raw.githubusercontent.com/candrewxs/Project_Proposal_D606/main/data/Comparable%20Returns%20to%20Education%20Data%20Dictionary.csv")
## New names:
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## Rows: 21 Columns: 6
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (2): Variable, Description
## lgl (4): ...3, ...4, ...5, ...6
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
data_dict %>% select(-3: -6)

Visualizations

library(ggplot2)

year_male <- ggplot(comp_edu, aes(x = `β1 M`, y = Year)) 

year_male +
  geom_point() +
  stat_smooth(method = lm) +
  xlab("Males Return to School") +
  labs(title = "Economies Survey Graph of Year by Males Return to School")
## `geom_smooth()` using formula 'y ~ x'

ggplot(comp_edu, aes(x = Pri)) +
  geom_histogram(binwidth = 5, fill = "blue", color = "black") +
  labs(title = "Observations with Primary Education return to Schooling")
## Warning: Removed 278 rows containing non-finite values (stat_bin).

boxplot(comp_edu[,17:19])

Source: GitHub

RPubs