R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Data Preparation

This story is interesting to me as my niece is about to go from 2yr college to a 4 year college. I’d like to be able to send her this story and perform some analysis for myself using the data found on fivethirtyeight.com https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/

Data is found here: https://github.com/fivethirtyeight/data/tree/master/college-majors

Three main data files:

all-ages.csv recent-grads.csv (ages <28) grad-students.csv (ages 25+) All contain basic earnings and labor force information. recent-grads.csv contains a more detailed breakdown, including by sex and by the type of job they got. grad-students.csv contains details on graduate school attendees.

# load data
library(tidyr)
library(dplyr)
library(tidyverse)

majors <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/all-ages.csv")
majors
#women_stem <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/women-stem.csv")
#head(women_stem)
recentgrads <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/recent-grads.csv")
recentgrads

Research question

What college majors are best to keep ones unemployment low and a with good pay for a womam? What college majors to avoid if one wants to have a good pay as a women?

(my neice will be transfering from a 2 yr college to 4 yr college next year, so I find this interesting to share with her)

Cases

What are the cases, and how many are there?

The cases are recent grads under the age of 28. There are 173 cases or recent grads in the data set.

Data collection

Describe the method of data collection.

All data is from American Community Survey 2010-2012 Public Use Microdata Series. A description of the data taken from the PDF that comes with the data: “The American Community Survey (ACS) Public Use Microdata Sample (PUMS) files are a set of untabulated records about individual people or housing units. The Census Bureau produces the PUMS files so that data users can create custom tables that are not available through pretabulated (or summary) ACS data product”

Type of study

What type of study is this (observational/experiment)?

This is an observational study of the college majors and the income. File contain basic earnings and labor force information- including by sex and by the type of job they got.

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

I obtained the data from: https://github.com/fivethirtyeight/data/tree/master/college-majors

Dependent Variable

What is the response (dependent)(variable that is being measured or tested in an experiment)variable? Is it quantitative or qualitative? The response variable is the major code, major and the major category. It is qualitative.

Independent Variable

You should have two independent variables (vairalbe that is changed or controlled in an experiment), one quantitative and one qualitative. The quantitative independet variables are total, Sample_size, ShareWomen, Employed, Full-time, Part-time, Full_time_year_round, Unemployed, Unemployment_rate, Median, P25th, P75th, College, jobs, Non-college_jobs, Low_wage_jobs

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

summary(recentgrads)
##       Rank       Major_code      Major               Total       
##  Min.   :  1   Min.   :1100   Length:173         Min.   :   124  
##  1st Qu.: 44   1st Qu.:2403   Class :character   1st Qu.:  4550  
##  Median : 87   Median :3608   Mode  :character   Median : 15104  
##  Mean   : 87   Mean   :3880                      Mean   : 39370  
##  3rd Qu.:130   3rd Qu.:5503                      3rd Qu.: 38910  
##  Max.   :173   Max.   :6403                      Max.   :393735  
##                                                  NA's   :1       
##       Men             Women        Major_category       ShareWomen    
##  Min.   :   119   Min.   :     0   Length:173         Min.   :0.0000  
##  1st Qu.:  2178   1st Qu.:  1778   Class :character   1st Qu.:0.3360  
##  Median :  5434   Median :  8386   Mode  :character   Median :0.5340  
##  Mean   : 16723   Mean   : 22647                      Mean   :0.5222  
##  3rd Qu.: 14631   3rd Qu.: 22554                      3rd Qu.:0.7033  
##  Max.   :173809   Max.   :307087                      Max.   :0.9690  
##  NA's   :1        NA's   :1                           NA's   :1       
##   Sample_size        Employed        Full_time        Part_time     
##  Min.   :   2.0   Min.   :     0   Min.   :   111   Min.   :     0  
##  1st Qu.:  39.0   1st Qu.:  3608   1st Qu.:  3154   1st Qu.:  1030  
##  Median : 130.0   Median : 11797   Median : 10048   Median :  3299  
##  Mean   : 356.1   Mean   : 31193   Mean   : 26029   Mean   :  8832  
##  3rd Qu.: 338.0   3rd Qu.: 31433   3rd Qu.: 25147   3rd Qu.:  9948  
##  Max.   :4212.0   Max.   :307933   Max.   :251540   Max.   :115172  
##                                                                     
##  Full_time_year_round   Unemployed    Unemployment_rate     Median      
##  Min.   :   111       Min.   :    0   Min.   :0.00000   Min.   : 22000  
##  1st Qu.:  2453       1st Qu.:  304   1st Qu.:0.05031   1st Qu.: 33000  
##  Median :  7413       Median :  893   Median :0.06796   Median : 36000  
##  Mean   : 19694       Mean   : 2416   Mean   :0.06819   Mean   : 40151  
##  3rd Qu.: 16891       3rd Qu.: 2393   3rd Qu.:0.08756   3rd Qu.: 45000  
##  Max.   :199897       Max.   :28169   Max.   :0.17723   Max.   :110000  
##                                                                         
##      P25th           P75th         College_jobs    Non_college_jobs
##  Min.   :18500   Min.   : 22000   Min.   :     0   Min.   :     0  
##  1st Qu.:24000   1st Qu.: 42000   1st Qu.:  1675   1st Qu.:  1591  
##  Median :27000   Median : 47000   Median :  4390   Median :  4595  
##  Mean   :29501   Mean   : 51494   Mean   : 12323   Mean   : 13284  
##  3rd Qu.:33000   3rd Qu.: 60000   3rd Qu.: 14444   3rd Qu.: 11783  
##  Max.   :95000   Max.   :125000   Max.   :151643   Max.   :148395  
##                                                                    
##  Low_wage_jobs  
##  Min.   :    0  
##  1st Qu.:  340  
##  Median : 1231  
##  Mean   : 3859  
##  3rd Qu.: 3466  
##  Max.   :48207  
## 
top5majorsbyPay<- top_n(recentgrads, 5, Median)
top5majorsbyPay
## # A tibble: 6 x 21
##    Rank Major_code Major Total   Men Women Major_category ShareWomen Sample_size
##   <dbl>      <dbl> <chr> <dbl> <dbl> <dbl> <chr>               <dbl>       <dbl>
## 1     1       2419 PETR…  2339  2057   282 Engineering         0.121          36
## 2     2       2416 MINI…   756   679    77 Engineering         0.102           7
## 3     3       2415 META…   856   725   131 Engineering         0.153           3
## 4     4       2417 NAVA…  1258  1123   135 Engineering         0.107          16
## 5     5       2405 CHEM… 32260 21239 11021 Engineering         0.342         289
## 6     6       2418 NUCL…  2573  2200   373 Engineering         0.145          17
## # … with 12 more variables: Employed <dbl>, Full_time <dbl>, Part_time <dbl>,
## #   Full_time_year_round <dbl>, Unemployed <dbl>, Unemployment_rate <dbl>,
## #   Median <dbl>, P25th <dbl>, P75th <dbl>, College_jobs <dbl>,
## #   Non_college_jobs <dbl>, Low_wage_jobs <dbl>
top5majorswhighunempl <- top_n(recentgrads, 5, Unemployment_rate)
top5majorswhighunempl
## # A tibble: 5 x 21
##    Rank Major_code Major Total   Men Women Major_category ShareWomen Sample_size
##   <dbl>      <dbl> <chr> <dbl> <dbl> <dbl> <chr>               <dbl>       <dbl>
## 1     6       2418 NUCL…  2573  2200   373 Engineering         0.145          17
## 2    30       5402 PUBL…  5978  2639  3339 Law & Public …      0.559          55
## 3    85       2107 COMP…  7613  5291  2322 Computers & M…      0.305          97
## 4    90       5401 PUBL…  5629  2947  2682 Law & Public …      0.476          46
## 5   171       5202 CLIN…  2838   568  2270 Psychology & …      0.800          13
## # … with 12 more variables: Employed <dbl>, Full_time <dbl>, Part_time <dbl>,
## #   Full_time_year_round <dbl>, Unemployed <dbl>, Unemployment_rate <dbl>,
## #   Median <dbl>, P25th <dbl>, P75th <dbl>, College_jobs <dbl>,
## #   Non_college_jobs <dbl>, Low_wage_jobs <dbl>