This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
This story is interesting to me as my niece is about to go from 2yr college to a 4 year college. I’d like to be able to send her this story and perform some analysis for myself using the data found on fivethirtyeight.com https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/
Data is found here: https://github.com/fivethirtyeight/data/tree/master/college-majors
Three main data files:
all-ages.csv recent-grads.csv (ages <28) grad-students.csv (ages 25+) All contain basic earnings and labor force information. recent-grads.csv contains a more detailed breakdown, including by sex and by the type of job they got. grad-students.csv contains details on graduate school attendees.
# load data
library(tidyr)
library(dplyr)
library(tidyverse)
majors <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/all-ages.csv")
majors
#women_stem <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/women-stem.csv")
#head(women_stem)
recentgrads <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/recent-grads.csv")
recentgrads
What college majors are best to keep ones unemployment low and a with good pay for a womam? What college majors to avoid if one wants to have a good pay as a women?
(my neice will be transfering from a 2 yr college to 4 yr college next year, so I find this interesting to share with her)
What are the cases, and how many are there?
The cases are recent grads under the age of 28. There are 173 cases or recent grads in the data set.
Describe the method of data collection.
All data is from American Community Survey 2010-2012 Public Use Microdata Series. A description of the data taken from the PDF that comes with the data: “The American Community Survey (ACS) Public Use Microdata Sample (PUMS) files are a set of untabulated records about individual people or housing units. The Census Bureau produces the PUMS files so that data users can create custom tables that are not available through pretabulated (or summary) ACS data product”
What type of study is this (observational/experiment)?
This is an observational study of the college majors and the income. File contain basic earnings and labor force information- including by sex and by the type of job they got.
If you collected the data, state self-collected. If not, provide a citation/link.
I obtained the data from: https://github.com/fivethirtyeight/data/tree/master/college-majors
What is the response (dependent)(variable that is being measured or tested in an experiment)variable? Is it quantitative or qualitative? The response variable is the major code, major and the major category. It is qualitative.
You should have two independent variables (vairalbe that is changed or controlled in an experiment), one quantitative and one qualitative. The quantitative independet variables are total, Sample_size, ShareWomen, Employed, Full-time, Part-time, Full_time_year_round, Unemployed, Unemployment_rate, Median, P25th, P75th, College, jobs, Non-college_jobs, Low_wage_jobs
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
summary(recentgrads)
## Rank Major_code Major Total
## Min. : 1 Min. :1100 Length:173 Min. : 124
## 1st Qu.: 44 1st Qu.:2403 Class :character 1st Qu.: 4550
## Median : 87 Median :3608 Mode :character Median : 15104
## Mean : 87 Mean :3880 Mean : 39370
## 3rd Qu.:130 3rd Qu.:5503 3rd Qu.: 38910
## Max. :173 Max. :6403 Max. :393735
## NA's :1
## Men Women Major_category ShareWomen
## Min. : 119 Min. : 0 Length:173 Min. :0.0000
## 1st Qu.: 2178 1st Qu.: 1778 Class :character 1st Qu.:0.3360
## Median : 5434 Median : 8386 Mode :character Median :0.5340
## Mean : 16723 Mean : 22647 Mean :0.5222
## 3rd Qu.: 14631 3rd Qu.: 22554 3rd Qu.:0.7033
## Max. :173809 Max. :307087 Max. :0.9690
## NA's :1 NA's :1 NA's :1
## Sample_size Employed Full_time Part_time
## Min. : 2.0 Min. : 0 Min. : 111 Min. : 0
## 1st Qu.: 39.0 1st Qu.: 3608 1st Qu.: 3154 1st Qu.: 1030
## Median : 130.0 Median : 11797 Median : 10048 Median : 3299
## Mean : 356.1 Mean : 31193 Mean : 26029 Mean : 8832
## 3rd Qu.: 338.0 3rd Qu.: 31433 3rd Qu.: 25147 3rd Qu.: 9948
## Max. :4212.0 Max. :307933 Max. :251540 Max. :115172
##
## Full_time_year_round Unemployed Unemployment_rate Median
## Min. : 111 Min. : 0 Min. :0.00000 Min. : 22000
## 1st Qu.: 2453 1st Qu.: 304 1st Qu.:0.05031 1st Qu.: 33000
## Median : 7413 Median : 893 Median :0.06796 Median : 36000
## Mean : 19694 Mean : 2416 Mean :0.06819 Mean : 40151
## 3rd Qu.: 16891 3rd Qu.: 2393 3rd Qu.:0.08756 3rd Qu.: 45000
## Max. :199897 Max. :28169 Max. :0.17723 Max. :110000
##
## P25th P75th College_jobs Non_college_jobs
## Min. :18500 Min. : 22000 Min. : 0 Min. : 0
## 1st Qu.:24000 1st Qu.: 42000 1st Qu.: 1675 1st Qu.: 1591
## Median :27000 Median : 47000 Median : 4390 Median : 4595
## Mean :29501 Mean : 51494 Mean : 12323 Mean : 13284
## 3rd Qu.:33000 3rd Qu.: 60000 3rd Qu.: 14444 3rd Qu.: 11783
## Max. :95000 Max. :125000 Max. :151643 Max. :148395
##
## Low_wage_jobs
## Min. : 0
## 1st Qu.: 340
## Median : 1231
## Mean : 3859
## 3rd Qu.: 3466
## Max. :48207
##
top5majorsbyPay<- top_n(recentgrads, 5, Median)
top5majorsbyPay
## # A tibble: 6 x 21
## Rank Major_code Major Total Men Women Major_category ShareWomen Sample_size
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 1 2419 PETR… 2339 2057 282 Engineering 0.121 36
## 2 2 2416 MINI… 756 679 77 Engineering 0.102 7
## 3 3 2415 META… 856 725 131 Engineering 0.153 3
## 4 4 2417 NAVA… 1258 1123 135 Engineering 0.107 16
## 5 5 2405 CHEM… 32260 21239 11021 Engineering 0.342 289
## 6 6 2418 NUCL… 2573 2200 373 Engineering 0.145 17
## # … with 12 more variables: Employed <dbl>, Full_time <dbl>, Part_time <dbl>,
## # Full_time_year_round <dbl>, Unemployed <dbl>, Unemployment_rate <dbl>,
## # Median <dbl>, P25th <dbl>, P75th <dbl>, College_jobs <dbl>,
## # Non_college_jobs <dbl>, Low_wage_jobs <dbl>
top5majorswhighunempl <- top_n(recentgrads, 5, Unemployment_rate)
top5majorswhighunempl
## # A tibble: 5 x 21
## Rank Major_code Major Total Men Women Major_category ShareWomen Sample_size
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 6 2418 NUCL… 2573 2200 373 Engineering 0.145 17
## 2 30 5402 PUBL… 5978 2639 3339 Law & Public … 0.559 55
## 3 85 2107 COMP… 7613 5291 2322 Computers & M… 0.305 97
## 4 90 5401 PUBL… 5629 2947 2682 Law & Public … 0.476 46
## 5 171 5202 CLIN… 2838 568 2270 Psychology & … 0.800 13
## # … with 12 more variables: Employed <dbl>, Full_time <dbl>, Part_time <dbl>,
## # Full_time_year_round <dbl>, Unemployed <dbl>, Unemployment_rate <dbl>,
## # Median <dbl>, P25th <dbl>, P75th <dbl>, College_jobs <dbl>,
## # Non_college_jobs <dbl>, Low_wage_jobs <dbl>