All done behind the scene. :)
# Load Libraries
library("DT")
library("knitr")
library("dplyr")
library("tidyr")
library("stringr")
library("psych")# load data
# All Students
url1 = "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/all-ages.csv"
all.students <- read.csv(url1, sep=",", header=T, stringsAsFactors = FALSE)
all.students <- all.students %>%
tbl_df() %>%
arrange(Major_category)
head(all.students, 2)## # A tibble: 2 x 11
## Major_code Major Majo~ Total Empl~ Empl~ Unem~ Unemp~ Medi~ P25th P75th
## <int> <chr> <chr> <int> <int> <int> <int> <dbl> <int> <int> <dbl>
## 1 1100 GENE~ Agri~ 128148 90245 74078 2423 0.0261 50000 34000 80000
## 2 1101 AGRI~ Agri~ 95326 76865 64240 2266 0.0286 54000 36000 80000
# Grad Students
url2 = "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/grad-students.csv"
grad.students <- read.csv(url2, sep=",", header=T, stringsAsFactors = FALSE)
grad.students <- grad.students %>%
tbl_df() %>%
arrange(Major_category)
head(grad.students, 2)## # A tibble: 2 x 22
## Major~ Major Major_~ Grad_~ Grad~ Grad~ Grad~ Grad~ Grad_~ Grad~ Grad~
## <int> <chr> <chr> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
## 1 1101 AGRICU~ Agricu~ 17488 386 13104 11207 473 0.0348 67000 41600
## 2 1100 GENERA~ Agricu~ 44306 764 28930 23024 874 0.0293 68000 45000
## # ... with 11 more variables: Grad_P75 <dbl>, Nongrad_total <int>,
## # Nongrad_employed <int>, Nongrad_full_time_year_round <int>,
## # Nongrad_unemployed <int>, Nongrad_unemployment_rate <dbl>,
## # Nongrad_median <dbl>, Nongrad_P25 <int>, Nongrad_P75 <dbl>, Grad_share
## # <dbl>, Grad_premium <dbl>
# UnderGrad Students
url3 = "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/recent-grads.csv"
ungrad.students <- read.csv(url3, sep=",", header=T, stringsAsFactors = FALSE)
ungrad.students <- ungrad.students %>%
tbl_df() %>%
arrange(Major_category)
head(ungrad.students, 2)## # A tibble: 2 x 21
## Rank Majo~ Major Total Men Women Majo~ Share~ Samp~ Empl~ Full~ Part~
## <int> <int> <chr> <int> <int> <int> <chr> <dbl> <int> <int> <int> <int>
## 1 22 1104 FOOD~ NA NA NA Agri~ NA 36 3149 2558 1121
## 2 64 1101 AGRI~ 14240 9658 4582 Agri~ 0.322 273 12323 11119 2196
## # ... with 9 more variables: Full_time_year_round <int>, Unemployed <int>,
## # Unemployment_rate <dbl>, Median <int>, P25th <int>, P75th <int>,
## # College_jobs <int>, Non_college_jobs <int>, Low_wage_jobs <int>
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
College Majors The Economic Guide to picking a College Major
I come from India. There is a joke in my country which ges like this, “Indian parents give their kids FULL freedom to select a career of their choice, as long as, it is an engineer, doctor or a lawyer!”
I have always been curious on how does selection of a college major influence a person’s success? I want to examine which fields can guarantee financial success by performing hypothesis testing after analyzing the employability and median incomes.
What are the cases, and how many are there?
All_ages: This data represents a case of both undergrads and grad students from 173 majors offered by colleges in USA. Grad Students: This data is subset of above and each case represents majoes offered from list of 173 majors offered by colleges in USA for grad students over 25+ years of age. Under Grad Students: This data is subset of above and each case represents majoes offered from list of 173 majors offered by colleges in USA for undergrad students under 28 years of age.
Describe the method of data collection.
The data was obtained from 538 data set https://github.com/fivethirtyeight/data/tree/master/college-majors
The data and code behind the story The Economic Guide To Picking A College Major. All data is from American Community Survey 2010-2012 Public Use Microdata Series. Download data here: http://www.census.gov/programs-surveys/acs/data/pums.html Documentation here: http://www.census.gov/programs-surveys/acs/technical-documentation/pums.html
majors-list.csv List of majors with their FOD1P codes and major categories. Major categories are from Carnevale et al, “What’s It Worth?: The Economic Value of College Majors.” Georgetown University Center on Education and the Workforce, 2011. http://cew.georgetown.edu/whatsitworth
Three main data files: all-ages.csv recent-grads.csv (ages <28) grad-students.csv (ages 25+)
All contain basic earnings and labor force information. recent-grads.csv contains a more detailed breakdown, including by sex and by the type of job they got. grad-students.csv contains details on graduate school attendees. Additionally, women-stem.csv contains data for scatter plot in associated DataLab post on women in science/technology jobs. It is a subset of recent-grads.csv.
What type of study is this (observational/experiment)?
This is an observational study
If you collected the data, state self-collected. If not, provide a citation/link.
The data was obtained from 538 data set https://github.com/fivethirtyeight/data/tree/master/college-majors
What is the response variable, and what type is it (numerical/categorical)?
The response variables is College Majors and it is categorical
What is the explanatory variable, and what type is it (numerical/categorival)?
Explanatory variables is median income and also number of employed and unemployed degree holders
Provide summary statistics relevant to your research question. For example, if you are comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
summary(all.students)## Major_code Major Major_category Total
## Min. :1100 Length:173 Length:173 Min. : 2396
## 1st Qu.:2403 Class :character Class :character 1st Qu.: 24280
## Median :3608 Mode :character Mode :character Median : 75791
## Mean :3880 Mean : 230257
## 3rd Qu.:5503 3rd Qu.: 205763
## Max. :6403 Max. :3123510
## Employed Employed_full_time_year_round Unemployed
## Min. : 1492 Min. : 1093 Min. : 0
## 1st Qu.: 17281 1st Qu.: 12722 1st Qu.: 1101
## Median : 56564 Median : 39613 Median : 3619
## Mean : 166162 Mean : 126308 Mean : 9725
## 3rd Qu.: 142879 3rd Qu.: 111025 3rd Qu.: 8862
## Max. :2354398 Max. :1939384 Max. :147261
## Unemployment_rate Median P25th P75th
## Min. :0.00000 Min. : 35000 Min. :24900 Min. : 45800
## 1st Qu.:0.04626 1st Qu.: 46000 1st Qu.:32000 1st Qu.: 70000
## Median :0.05472 Median : 53000 Median :36000 Median : 80000
## Mean :0.05736 Mean : 56816 Mean :38697 Mean : 82506
## 3rd Qu.:0.06904 3rd Qu.: 65000 3rd Qu.:42000 3rd Qu.: 95000
## Max. :0.15615 Max. :125000 Max. :78000 Max. :210000
summary(grad.students)## Major_code Major Major_category Grad_total
## Min. :1100 Length:173 Length:173 Min. : 1542
## 1st Qu.:2403 Class :character Class :character 1st Qu.: 15284
## Median :3608 Mode :character Mode :character Median : 37872
## Mean :3880 Mean : 127672
## 3rd Qu.:5503 3rd Qu.: 148255
## Max. :6403 Max. :1184158
## Grad_sample_size Grad_employed Grad_full_time_year_round
## Min. : 22 Min. : 1008 Min. : 770
## 1st Qu.: 314 1st Qu.: 12659 1st Qu.: 9894
## Median : 688 Median : 28930 Median : 22523
## Mean : 2251 Mean : 94037 Mean : 72861
## 3rd Qu.: 2528 3rd Qu.:109944 3rd Qu.: 80794
## Max. :21994 Max. :915341 Max. :703347
## Grad_unemployed Grad_unemployment_rate Grad_median Grad_P25
## Min. : 0 Min. :0.00000 Min. : 47000 Min. :24500
## 1st Qu.: 453 1st Qu.:0.02607 1st Qu.: 65000 1st Qu.:45000
## Median : 1179 Median :0.03665 Median : 75000 Median :50000
## Mean : 3506 Mean :0.03934 Mean : 76756 Mean :52597
## 3rd Qu.: 3329 3rd Qu.:0.04805 3rd Qu.: 90000 3rd Qu.:60000
## Max. :35718 Max. :0.13851 Max. :135000 Max. :85000
## Grad_P75 Nongrad_total Nongrad_employed
## Min. : 65000 Min. : 2232 Min. : 1328
## 1st Qu.: 93000 1st Qu.: 20564 1st Qu.: 15914
## Median :108000 Median : 68993 Median : 50092
## Mean :112087 Mean : 214720 Mean : 154554
## 3rd Qu.:130000 3rd Qu.: 184971 3rd Qu.: 129179
## Max. :294000 Max. :2996892 Max. :2253649
## Nongrad_full_time_year_round Nongrad_unemployed Nongrad_unemployment_rate
## Min. : 980 Min. : 0 Min. :0.00000
## 1st Qu.: 11755 1st Qu.: 880 1st Qu.:0.04198
## Median : 38384 Median : 3157 Median :0.05103
## Mean : 120737 Mean : 8486 Mean :0.05395
## 3rd Qu.: 103629 3rd Qu.: 7409 3rd Qu.:0.06439
## Max. :1882507 Max. :136978 Max. :0.16091
## Nongrad_median Nongrad_P25 Nongrad_P75 Grad_share
## Min. : 37000 Min. :25000 Min. : 48000 Min. :0.09632
## 1st Qu.: 48700 1st Qu.:34000 1st Qu.: 72000 1st Qu.:0.26757
## Median : 55000 Median :38000 Median : 80000 Median :0.39875
## Mean : 58584 Mean :40078 Mean : 84333 Mean :0.40059
## 3rd Qu.: 65000 3rd Qu.:44000 3rd Qu.: 97000 3rd Qu.:0.49912
## Max. :126000 Max. :80000 Max. :215000 Max. :0.93117
## Grad_premium
## Min. :-0.0250
## 1st Qu.: 0.2308
## Median : 0.3208
## Mean : 0.3285
## 3rd Qu.: 0.4000
## Max. : 1.6471
summary(ungrad.students)## Rank Major_code Major Total
## Min. : 1 Min. :1100 Length:173 Min. : 124
## 1st Qu.: 44 1st Qu.:2403 Class :character 1st Qu.: 4550
## Median : 87 Median :3608 Mode :character Median : 15104
## Mean : 87 Mean :3880 Mean : 39370
## 3rd Qu.:130 3rd Qu.:5503 3rd Qu.: 38910
## Max. :173 Max. :6403 Max. :393735
## NA's :1
## Men Women Major_category ShareWomen
## Min. : 119 Min. : 0 Length:173 Min. :0.0000
## 1st Qu.: 2178 1st Qu.: 1778 Class :character 1st Qu.:0.3360
## Median : 5434 Median : 8386 Mode :character Median :0.5340
## Mean : 16723 Mean : 22647 Mean :0.5222
## 3rd Qu.: 14631 3rd Qu.: 22554 3rd Qu.:0.7033
## Max. :173809 Max. :307087 Max. :0.9690
## NA's :1 NA's :1 NA's :1
## Sample_size Employed Full_time Part_time
## Min. : 2.0 Min. : 0 Min. : 111 Min. : 0
## 1st Qu.: 39.0 1st Qu.: 3608 1st Qu.: 3154 1st Qu.: 1030
## Median : 130.0 Median : 11797 Median : 10048 Median : 3299
## Mean : 356.1 Mean : 31193 Mean : 26029 Mean : 8832
## 3rd Qu.: 338.0 3rd Qu.: 31433 3rd Qu.: 25147 3rd Qu.: 9948
## Max. :4212.0 Max. :307933 Max. :251540 Max. :115172
##
## Full_time_year_round Unemployed Unemployment_rate Median
## Min. : 111 Min. : 0 Min. :0.00000 Min. : 22000
## 1st Qu.: 2453 1st Qu.: 304 1st Qu.:0.05031 1st Qu.: 33000
## Median : 7413 Median : 893 Median :0.06796 Median : 36000
## Mean : 19694 Mean : 2416 Mean :0.06819 Mean : 40151
## 3rd Qu.: 16891 3rd Qu.: 2393 3rd Qu.:0.08756 3rd Qu.: 45000
## Max. :199897 Max. :28169 Max. :0.17723 Max. :110000
##
## P25th P75th College_jobs Non_college_jobs
## Min. :18500 Min. : 22000 Min. : 0 Min. : 0
## 1st Qu.:24000 1st Qu.: 42000 1st Qu.: 1675 1st Qu.: 1591
## Median :27000 Median : 47000 Median : 4390 Median : 4595
## Mean :29501 Mean : 51494 Mean : 12323 Mean : 13284
## 3rd Qu.:33000 3rd Qu.: 60000 3rd Qu.: 14444 3rd Qu.: 11783
## Max. :95000 Max. :125000 Max. :151643 Max. :148395
##
## Low_wage_jobs
## Min. : 0
## 1st Qu.: 340
## Median : 1231
## Mean : 3859
## 3rd Qu.: 3466
## Max. :48207
##
hist(all.students$Median, main = "All Student Median Income", xlab = "Median Incomes (USD)", col = "blue")combine.unemployment <- cbind(all.students$Unemployment_rate, ungrad.students$Unemployment_rate, grad.students$Grad_unemployment_rate)
barplot(combine.unemployment/nrow(combine.unemployment), names.arg = c("All", "Recent Grad", "Grad Student"), xlab = "Unemployment Rate", col = heat.colors(nrow(combine.unemployment)))The above graphs already give us a flavor of the data that graduate students clearly have much higher median income compared to students who recently completed under grads.