library(DT)
RecentGrads = read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/recent-grads.csv", stringsAsFactors = F)
RecentGrads$CollegeDegreePct = RecentGrads$College_jobs/(RecentGrads$College_jobs + RecentGrads$Non_college_jobs + RecentGrads$Low_wage_jobs)
datatable(RecentGrads)
Is there a relationship between median full time earnings and major category?
The cases are individual college majors. There are 173 of them.
The data is collected from the US Census in a program called the American Community Survey. The American Community Survey program gathers data for the Public Use Microdata Sample, which creates aggregated counts and statistics of households across the United States.
This study is an observational study.
The data is collected by the US Census Bureau and can be found here:
https://www.census.gov/programs-surveys/acs/data/pums.html
For this project, the data has been posted to Github by the people at fivethirtyeight, where I am accessing the data directly.
The github link can be found here:
https://github.com/fivethirtyeight/data/blob/master/college-majors/recent-grads.csv
Documentation for PUMS can be found here:
https://www.census.gov/programs-surveys/acs/technical-documentation/pums.html
The response variable is median full time earnings. This variable is numeric.
The explanatory variable is major category. This variable is categorical.
Further analysis will probably uncover other explanatory variables that include, the proportion of women in the major, the unemployment rate, and the proportion of people in a job requiring a college degree.
summary(RecentGrads$Median)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22000 33000 36000 40151 45000 110000
hist(RecentGrads$Median)
table(RecentGrads$Major_category)
##
## Agriculture & Natural Resources Arts
## 10 8
## Biology & Life Science Business
## 14 13
## Communications & Journalism Computers & Mathematics
## 4 11
## Education Engineering
## 16 29
## Health Humanities & Liberal Arts
## 12 15
## Industrial Arts & Consumer Services Interdisciplinary
## 7 1
## Law & Public Policy Physical Sciences
## 5 10
## Psychology & Social Work Social Science
## 9 9
summary(RecentGrads$ShareWomen)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.3360 0.5340 0.5222 0.7033 0.9690 1
plot(RecentGrads$ShareWomen,RecentGrads$Median, xlab = "Proportion of Women",ylab = "Median Annual Income",main = "Proportion of Women vs. Median Annual Income")
abline(lm(RecentGrads$Median~RecentGrads$ShareWomen),col = "red")
summary(RecentGrads$Unemployment_rate)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.05031 0.06796 0.06819 0.08756 0.17723
plot(RecentGrads$Unemployment_rate,RecentGrads$Median, xlab = "Unemployment Rate",ylab = "Median Annual Income",main = "Unemployment Rate vs. Median Annual Income")
abline(lm(RecentGrads$Median~RecentGrads$Unemployment_rate),col = "red")
summary(RecentGrads$CollegeDegreePct)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.05068 0.30403 0.41001 0.45609 0.62991 0.84764 1
plot(RecentGrads$CollegeDegreePct,RecentGrads$Median, xlab = "College Degree Percentage",ylab = "Median Annual Income",main = "College Degree Percentage vs. Median Annual Income")
abline(lm(RecentGrads$Median~RecentGrads$CollegeDegreePct),col = "red")