library(readr)
Comparable_20Returns_20to_20Education_20sheet3 <- read_csv("https://raw.githubusercontent.com/candrewxs/Project_Proposal_D606/main/data/Comparable%20Returns%20to%20Education%20sheet3.csv")
comp_edu <- Comparable_20Returns_20to_20Education_20sheet3
View(comp_edu)
Will the returns on education influence an individual’s capital investment?
Which gender has the highest returns on education?
The data set on private returns to education includes estimates for 142 economies from 1970 to 2014 using 853 harmonized household surveys. There 853 observations in the given data set.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
n_distinct(comp_edu$Economy)
## [1] 142
Take a look of the dataset
library(reactable)
reactable(comp_edu)
Variables name in the datasets
names(comp_edu)
## [1] "Economy" "Year" "Survey" "Unit wage" "ß1" "ß1sd"
## [7] "ß2" "ß2sd" "ß1 M" "ß1Msd" "ß2 M" "ß2Msd"
## [13] "ß1 F" "ß1Fsd" "ß2 F" "ß2Fsd" "Pri" "Sec"
## [19] "Ter" "Pri M" "Sec M" "Ter M" "Pri F" "Sec F"
## [25] "Ter F"
Data is collected by The World Bank. The research was published online on Emerald Insight | Dicover Journals, Books & Case Studies by International Journal of Manpower. The authors developed a database from existing national household surveys that was initially complied by the World Bank’s World Development Report unit over the period 2005 - 2011, and now under the World Bank’s Global Working Poverty Group.
This research study is observational.
Data is submitted by Montenegro, C.E. and Patrinos, H.A. (2021), “A data set of comparable estimates of the private rate of return to schooling in the world, 1970–2014”, International Journal of Manpower, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/IJM-03-2021-0184
The dependent variable is yearly return to schooling and is qualitative.
The quantitative variables are: (1) level of education, which is defined by the highest grade attended and completed; (2) returns to primary education (3) returns to secondary education (4) returns to tertiary education (5) returns to primary education for males (6) returns to secondary educaton for males (7) returns to tertiary education for males (8) returns to primary education for females (9) returns to secondary educaton for females (7) returns to tertiary education for females
The qualitative variables are: (1) economy (2) year (3) survey (4) unit wage
The authors estimate an average private rate of return to schooling of 10%. This provides a reasonable estimate of the returns to education and should be useful for a variety of empirical work, including critical information for youth.
Summary statistics of all variables
summary(comp_edu)
## Economy Year Survey Unit wage
## Length:853 Min. :1970 Length:853 Length:853
## Class :character 1st Qu.:2001 Class :character Class :character
## Mode :character Median :2005 Mode :character Mode :character
## Mean :2004
## 3rd Qu.:2009
## Max. :2014
##
## ß1 ß1sd ß2 ß2sd
## Min. : 0.20 Min. :0.0000000 Min. :-2.500 Min. :0.0000000
## 1st Qu.: 8.00 1st Qu.:0.0001000 1st Qu.: 2.100 1st Qu.:0.0000000
## Median : 9.90 Median :0.0001000 Median : 3.000 Median :0.0001000
## Mean :10.02 Mean :0.0006216 Mean : 3.021 Mean :0.0002681
## 3rd Qu.:11.60 3rd Qu.:0.0002000 3rd Qu.: 3.800 3rd Qu.:0.0001000
## Max. :26.30 Max. :0.0193000 Max. : 8.900 Max. :0.0116000
##
## ß1 M ß1Msd ß2 M ß2Msd
## Min. : 0.500 Min. :0.0000000 Min. :-2.500 Min. :0.0000000
## 1st Qu.: 7.800 1st Qu.:0.0001000 1st Qu.: 2.300 1st Qu.:0.0001000
## Median : 9.400 Median :0.0002000 Median : 3.200 Median :0.0001000
## Mean : 9.601 Mean :0.0007952 Mean : 3.312 Mean :0.0003619
## 3rd Qu.:11.200 3rd Qu.:0.0003000 3rd Qu.: 4.200 3rd Qu.:0.0002000
## Max. :24.500 Max. :0.0272000 Max. :11.600 Max. :0.0148000
##
## ß1 F ß1Fsd ß2 F ß2Fsd
## Min. : 1.80 Min. :0.0000000 Min. :-3.000 Min. :0.0000000
## 1st Qu.: 9.50 1st Qu.:0.0001000 1st Qu.: 1.800 1st Qu.:0.0001000
## Median :11.30 Median :0.0002000 Median : 2.400 Median :0.0001000
## Mean :11.61 Mean :0.0009892 Mean : 2.619 Mean :0.0004313
## 3rd Qu.:13.40 3rd Qu.:0.0004000 3rd Qu.: 3.400 3rd Qu.:0.0002000
## Max. :28.40 Max. :0.0341000 Max. :10.900 Max. :0.0193000
##
## Pri Sec Ter Pri M
## Min. : 0.00 Min. : 0.000 Min. : 0.20 Min. : 0.00
## 1st Qu.: 6.50 1st Qu.: 4.900 1st Qu.:10.80 1st Qu.: 5.90
## Median : 9.70 Median : 6.400 Median :14.90 Median : 9.00
## Mean : 11.04 Mean : 7.388 Mean :15.06 Mean : 10.41
## 3rd Qu.: 13.45 3rd Qu.: 9.500 3rd Qu.:18.80 3rd Qu.: 12.60
## Max. :135.50 Max. :26.800 Max. :43.20 Max. :129.70
## NA's :278 NA's :200 NA's :62 NA's :284
## Sec M Ter M Pri F Sec F
## Min. : 0.000 Min. : 0.10 Min. : 0.40 Min. : 0.000
## 1st Qu.: 4.900 1st Qu.:10.70 1st Qu.: 5.80 1st Qu.: 5.525
## Median : 6.300 Median :14.95 Median : 9.80 Median : 7.600
## Mean : 7.198 Mean :15.09 Mean :11.15 Mean : 8.859
## 3rd Qu.: 8.800 3rd Qu.:18.82 3rd Qu.:13.60 3rd Qu.:11.500
## Max. :28.400 Max. :42.70 Max. :80.90 Max. :45.600
## NA's :206 NA's :77 NA's :324 NA's :219
## Ter F
## Min. : 0.40
## 1st Qu.:12.45
## Median :16.20
## Mean :16.69
## 3rd Qu.:20.10
## Max. :45.10
## NA's :90
Variables Data Dictionary
library(readr)
data_dict <- Comparable_20Returns_20to_20Education_20Data_20Dictionary <- read_csv("https://raw.githubusercontent.com/candrewxs/Project_Proposal_D606/main/data/Comparable%20Returns%20to%20Education%20Data%20Dictionary.csv")
## New names:
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## Rows: 21 Columns: 6
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (2): Variable, Description
## lgl (4): ...3, ...4, ...5, ...6
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
data_dict %>% select(-3: -6)
Visualizations
library(ggplot2)
year_male <- ggplot(comp_edu, aes(x = `β1 M`, y = Year))
year_male +
geom_point() +
stat_smooth(method = lm) +
xlab("Males Return to School") +
labs(title = "Economies Survey Graph of Year by Males Return to School")
## `geom_smooth()` using formula 'y ~ x'
ggplot(comp_edu, aes(x = Pri)) +
geom_histogram(binwidth = 5, fill = "blue", color = "black") +
labs(title = "Observations with Primary Education return to Schooling")
## Warning: Removed 278 rows containing non-finite values (stat_bin).
boxplot(comp_edu[,17:19])
Source: GitHub