Introduction

According to debt.org, the average borrower for the college class of 2017 will carry over $38,000 in student debt. They go on to state that since 1980, the tuition costs for public institutions have increased by 344% and 241% for private universities. However, a college education is an entry requirement for many of the highest-earning professions and graduates can expect to make more than 1.3 million more in their lifetime than high school graduates (debt.org). As of July 2018, student debt is now the second highest consumer debt category with the total student loan debt is over 1.5 trillion and there are 44.2 million American borrowers with a student loan debt, with a whopping 10.7% either delinquency or in default (Friedman, 2018).

The U.S. Bureau of Labor Statistics (BLS) that, “the more you learn, the more you earn” (Torpey, 2018). This statement highlights that their data shows that those with the higher levels of educational attainment had more than triple the median weekly earnings than those with less than a high school diploma in 2017. Unemployment rates also drop as the level of education increases (Torpey, 2018).

Georgetown University’s Center on Education and the Workforce, as of 2015, reported the economic value of a college major plays an immense factor with top-paying college majors earning $3.4 million more over a lifetime than the lowest-paying majors. STEM, health, and business majors are the highest paying with STEM and business also being the most popular and accounting for 46 percent of college graduates (Georgetown.Edu).

As the only MBA student in my Computer Science Data Analysis elective, I thought it would be interesting to compare the two for my research project.

Research Questions

What is the future economic outlook for both a Business and Computer Science degrees?
How does Washington State compare to the National averages?

The Data

IPUMS USA provided microdata from the American Community Surveys and was created by the UNIVERSITY OF MINNESOTA (IPUMS-USA, University of Minnesota, www.ipums.org). When you use this source to create a data set, it also creates a user document of the extract for all items that were included, which was extremely useful.

iplums_ACS2017 <- read_csv("C:/Users/Dan Ward/Desktop/CSC530/ResearchProject/ResearchProject/DATA/usa_00001.csv", col_types = cols(GQ = col_skip(), HHWT = col_skip()))

Clean Data

We’ll get started by glimpsing the data we’ve just imported.

It took a few minutes to import this data set with its 3,190,040 observations and 18 variables.

## Observations: 3,190,040
## Variables: 18
## $ YEAR          <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, ...
## $ DATANUM       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ SERIAL        <dbl> 1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 6, 6, 6, ...
## $ CBSERIAL      <dbl> 2.017e+12, 2.017e+12, 2.017e+12, 2.017e+12, 2.01...
## $ US2017A_ST    <chr> "01", "01", "01", "01", "01", "01", "01", "01", ...
## $ US2017A_HINCP <chr> "0010000", "0038500", "0090700", "0090700", "009...
## $ PERNUM        <dbl> 1, 1, 1, 2, 3, 1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, ...
## $ PERWT         <dbl> 206, 45, 136, 121, 111, 18, 13, 23, 21, 27, 21, ...
## $ US2017A_AGEP  <chr> "73", "31", "41", "48", "16", "37", "18", "17", ...
## $ US2017A_COW   <chr> "B", "1", "1", "B", "B", "4", "B", "B", "B", "B"...
## $ US2017A_MIL   <chr> "4", "4", "4", "4", "B", "4", "4", "4", "B", "B"...
## $ US2017A_SCH   <chr> "1", "1", "1", "1", "2", "1", "2", "2", "2", "1"...
## $ US2017A_SCHL  <chr> "10", "21", "17", "16", "13", "16", "14", "13", ...
## $ US2017A_ESR   <chr> "6", "1", "1", "6", "6", "1", "6", "6", "B", "B"...
## $ US2017A_FOD1P <chr> "BBBB", "6100", "BBBB", "BBBB", "BBBB", "BBBB", ...
## $ US2017A_FOD2P <chr> "BBBB", "BBBB", "BBBB", "BBBB", "BBBB", "BBBB", ...
## $ US2017A_PERNP <chr> "0000000", "0038500", "0072000", "0000000", "000...
## $ US2017A_SOCP  <chr> "BBBBBB", "119111", "472061", "BBBBBB", "BBBBBB"...

Improve Readability

To make the data easier to use and read:
* Removed unnecessary variables
* Renamed variables to user-friendly names
* Remove potential N/A values
* Modified some variables to show as numeric instead of characters
* Filter ages to between 18 and 60 years old

I’m also going to a precaution by putting this data into a new dataframe. This might useful later, avoiding to import this large dataset if a mistake is made throughout our analysis.

iplums_ACS2017 %>%
  select(US2017A_ST, US2017A_AGEP, US2017A_COW, US2017A_SCHL, US2017A_ESR, US2017A_FOD1P, US2017A_FOD2P, US2017A_PERNP, US2017A_SOCP, US2017A_SCH) %>%
  # Make the columns names user-friendly
  rename(State_code = US2017A_ST,           # State Code
         Age = US2017A_AGEP,                # Age
         Class = US2017A_COW,               # Class of worker
         School_Enrollment = US2017A_SCH,   # School Enrollment
         Education_Att = US2017A_SCHL,      # Educational Attainment
         Employment_Status = US2017A_ESR,   # Employment Status
         Degree1 = US2017A_FOD1P,           # Recorded field of 1st degree
         Degree2 = US2017A_FOD2P,           # Recorded field of 2nd degree
         Earnings = US2017A_PERNP,          # Total Person's Earnings
         SOC = US2017A_SOCP) %>%            # SOC Code for 2012 and later
  # Remove unreported stats before converting to numeric
  filter(Class != 'B',                      # Remove N/A (less than 16 years old)
         School_Enrollment != 'B',          # Remove N/A (less than 3 years old)
         Education_Att != 'B',              # Remove N/A (less than 3 years old)
         Employment_Status != 'B',          # Remove N/A (less than 16 years old)
         Earnings != 'BBBBBBB') %>%         # Remove unreported earnings
  # Convert Age & Earnings to numeric 
  mutate(Age = as.numeric(Age),
         Education_Att = as.numeric(Education_Att),
         Earnings = as.numeric(Earnings),
         Class = as.numeric(Class),
         School_Enrollment = as.numeric(School_Enrollment),
         Employment_Status = as.numeric(Employment_Status)) %>%
  # Set Ages to between 18 and 60
  filter(Age >=18 & Age <= 60)-> cleaned_ACS2017              

# View summary
summary(cleaned_ACS2017)

##   State_code             Age            Class       Education_Att  
##  Length:1512703     Min.   :18.00   Min.   :1.000   Min.   : 1.00  
##  Class :character   1st Qu.:29.00   1st Qu.:1.000   1st Qu.:16.00  
##  Mode  :character   Median :40.00   Median :1.000   Median :19.00  
##                     Mean   :39.62   Mean   :2.042   Mean   :18.54  
##                     3rd Qu.:51.00   3rd Qu.:2.000   3rd Qu.:21.00  
##                     Max.   :60.00   Max.   :9.000   Max.   :24.00  
##  Employment_Status   Degree1            Degree2             Earnings      
##  Min.   :1.000     Length:1512703     Length:1512703     Min.   :  -9000  
##  1st Qu.:1.000     Class :character   Class :character   1st Qu.:  12000  
##  Median :1.000     Mode  :character   Mode  :character   Median :  31500  
##  Mean   :1.703                                           Mean   :  46695  
##  3rd Qu.:1.000                                           3rd Qu.:  60000  
##  Max.   :6.000                                           Max.   :1130000  
##      SOC            School_Enrollment
##  Length:1512703     Min.   :1.000    
##  Class :character   1st Qu.:1.000    
##  Mode  :character   Median :1.000    
##                     Mean   :1.161    
##                     3rd Qu.:1.000    
##                     Max.   :3.000

Filter Again

We’re at 1,512,703 observations. We should also filter Educational Attainment to a Bachelor’s degree (21) or higher and see what the effects this has.

# Filter Educational Attainment
cleaned_ACS2017 %>%
  filter(Education_Att >= 21) -> cleaned_ACS2017

glimpse(cleaned_ACS2017)

## Observations: 515,009
## Variables: 10
## $ State_code        <chr> "01", "01", "01", "01", "01", "01", "01", "0...
## $ Age               <dbl> 31, 32, 41, 58, 30, 26, 54, 45, 44, 50, 43, ...
## $ Class             <dbl> 1, 3, 1, 1, 1, 1, 1, 3, 3, 4, 1, 3, 1, 3, 1,...
## $ Education_Att     <dbl> 21, 22, 22, 21, 21, 21, 21, 22, 22, 24, 23, ...
## $ Employment_Status <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1,...
## $ Degree1           <chr> "6100", "3401", "5200", "2102", "6203", "400...
## $ Degree2           <chr> "BBBB", "BBBB", "BBBB", "BBBB", "BBBB", "BBB...
## $ Earnings          <dbl> 38500, 43500, 25700, 344000, 120000, 50000, ...
## $ SOC               <chr> "119111", "254021", "533030", "411012", "411...
## $ School_Enrollment <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...

The observations decreased from 1,512,703 to 515,009 but the average earnings increased from 46,695 to $75,909.

I was shocked to see that about 34% of the Americans surveyed between the ages of 19-60 had at least a Bachelor’s degree and that increased the average earnings by $29,214.

State Names

It might be easier for us to refer to the states by name, rather than using the state code provided in the survey. We can use the stateTable data set to add State Names by state code.

# Load State Table 
load("C:/Users/Dan Ward/Desktop/CSC530/Assignments/Source Data/stateTable.RDA")
stateTable %>% 
  select(FIPS, state_name) %>%
  rename(State_code = FIPS, State = state_name) -> stateTable

# Merge stateTable and ACS2017 dataset
cleaned_ACS2017 <- merge(cleaned_ACS2017, stateTable, by = "State_code")

# Glimpse updated dataset
glimpse(cleaned_ACS2017)

## Observations: 515,009
## Variables: 11
## $ State_code        <chr> "01", "01", "01", "01", "01", "01", "01", "0...
## $ Age               <dbl> 31, 32, 41, 58, 30, 26, 54, 45, 44, 50, 43, ...
## $ Class             <dbl> 1, 3, 1, 1, 1, 1, 1, 3, 3, 4, 1, 3, 1, 3, 1,...
## $ Education_Att     <dbl> 21, 22, 22, 21, 21, 21, 21, 22, 22, 24, 23, ...
## $ Employment_Status <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1,...
## $ Degree1           <chr> "6100", "3401", "5200", "2102", "6203", "400...
## $ Degree2           <chr> "BBBB", "BBBB", "BBBB", "BBBB", "BBBB", "BBB...
## $ Earnings          <dbl> 38500, 43500, 25700, 344000, 120000, 50000, ...
## $ SOC               <chr> "119111", "254021", "533030", "411012", "411...
## $ School_Enrollment <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ State             <chr> "Alabama", "Alabama", "Alabama", "Alabama", ...

Job Descriptions

I also thought it would be useful to include the Job Descriptions for each of the SOC Codes, which I was able to download from: https://www.census.gov/topics/employment/industry-occupation/guidance/code-lists.html

Clean SOC Codes

Before we can use the SOC Job Descriptions, we’ll have to remove the ‘-’ in the SOC Codes used in the CSV file.

#Clean SOC_codes data by renaming columns and filter N/A values 
SOC_codes %>%
  rename(Description = `Occupation 2010 Description`,
         SOC = `2010 SOC Code`) %>%
  filter(!is.na(Description)) -> SOC_codes

# Remove '-' characters from SOC_codes
SOC_codes$SOC <- gsub('-', '', SOC_codes$SOC)

Add Job Descriptions to Data

Now, we can add the Job Descriptions by the SOC Codes.

## Observations: 451,230
## Variables: 12
## $ SOC               <chr> "111021", "111021", "111021", "111021", "111...
## $ State_code        <chr> "21", "36", "47", "34", "48", "50", "38", "0...
## $ Age               <dbl> 37, 49, 37, 49, 59, 49, 57, 42, 45, 35, 27, ...
## $ Class             <dbl> 1, 2, 1, 1, 7, 1, 7, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ Education_Att     <dbl> 21, 22, 22, 21, 21, 21, 21, 21, 21, 21, 22, ...
## $ Employment_Status <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1,...
## $ Degree1           <chr> "3301", "5200", "5200", "5506", "6200", "400...
## $ Degree2           <chr> "BBBB", "BBBB", "BBBB", "BBBB", "BBBB", "360...
## $ Earnings          <dbl> 38000, 55000, 55000, 595000, 92000, 60000, 1...
## $ School_Enrollment <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ State             <chr> "Kentucky", "New York", "Tennessee", "New Je...
## $ Description       <chr> "General and operations managers", "General ...

You may have noticed that we’ve gone down from We’re down to 515,009 observations to 451,230 observations. Looks like we lost some due to bad SOC codes, I guess it happens. But let’s move on to the next step.

Identify Business & Computer Science Degrees

To begin, we’ll narrow down the type of degrees. There are hundreds of different degrees available in the American Community Survey. Let’s start with Business Management and Administration (6203) and Computer Science (2102) degrees.

Then we’ll compare it to other potential combinations

Business
* 6200 GENERAL BUSINESS
* 6201 ACCOUNTING
* 6203 BUSINESS MANAGEMENT AND ADMINISTRATION
* 6206 MARKETING AND MARKETING RESEARCH
* 6207 FINANCE
* 6209 HUMAN RESOURCES AND PERSONNEL MANAGEMENT
* 6210 INTERNATIONAL BUSINESS
* 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS

Computer Science
* 2100 COMPUTER AND INFORMATION SYSTEMS
* 2101 COMPUTER PROGRAMMING AND DATA PROCESSING
* 2102 COMPUTER SCIENCE
* 2105 INFORMATION SCIENCES
* 2106 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY
* 2107 COMPUTER NETWORKING AND TELECOMMUNICATIONS
* 2407 COMPUTER ENGINEERING
* 4005 MATHEMATICS AND COMPUTER SCIENCE

I wouldn’t want to type that more than once, so let’s work smarter and not harder by putting this into a vector.

Business_Degrees <- c("6200", "6201", "6203", "6206", "6207", "6209", "6210", "6212")
Computer_Degrees <- c("2100", "2101", "2102", "2105", "2106", "2107", "2407", "4005")

Compare Field of Degrees

General Degrees

We’ll start by determining individuals with Business (6203) or Computer Science (2102) Degrees. This dataframe produces 41,428 observations, with a mean earnings of 79,857.

# Start with new dataframe for either Business or Computer Science degrees 
cleaned_ACS2017 %>%
  filter((Degree1 == "6203" | Degree1 == "2102") |
         (Degree2 == "6203" | Degree2 == "2102")) -> Combined_Gen
summarise(Combined_Gen, AvgEarnings = mean(Earnings), Count = n())

General Degrees Bar Chart

Let’s use a bar chart to visualize the comparison of individuals with Business and Computer Science reported as their field of study.

Significantly more Business majors and very few with both a Business and Computer Science degree.

General Degrees Boxplot

However, this boxplot tells a different story. Individuals with a Computer Science or Both a Business and Computer Science degree earned more on average.

Additional Degrees

Then create a second dataframe for any of the 16 degrees identified above. However, when we added the additional degrees, you’ll notice we have 114,909 observations and mean earnings increased slightly to 81,607.

cleaned_ACS2017 %>%
  filter((Degree1 %in% Business_Degrees | Degree2 %in% Business_Degrees) |
           (Degree1 %in% Computer_Degrees | Degree2 %in% Computer_Degrees)) -> Combined_Add
summarise(Combined_Add, AvgEarnings = mean(Earnings), Count = n())

Additional Degrees Bar Chart

This bar chart looks similar to the previous chart for only general Business and Computer Science degrees.

Additional Degrees Boxplot

Here, it seems that those with both degrees seemed to drop slightly. While Business and Computer Science had many outliers, an individual with Both stuck out well above the rest.

Cleveland Plot of Earnings by Degree Major

Next, we’ll take a side-by-side comparison of the General and Additional degree datasets.

You probably noticed the scales changed, as Business Majors earned more while Both Majors and Computer Science Majors made slightly less.

Additional Degree Comparison

I also thought it would be interesting to see how the different degrees we identified earlier might affect Median Earnings. Without wanting to look at a list of the many variations, let’s visualize this with a Cleveland Plot.

Looks like 4005 MATHEMATICS AND COMPUTER SCIENCE degree graduates earn much more than any other degree.

Degree Codes

It looked like the minimums and maximums were very close to the same. We’ll use the second dataframe going forward because the additional observations could prove beneficial. Do you agree with the additional degrees?

Before moving on, I don’t want to try remembering all the Degree Field Codes or have to keep looking them up, so let’s add them to our dataframe. This will also help limit the amount of retyping code, like in the previous graphics.

While we’re here, let’s also define a new variable called Major so we’ll quickly know if it’s a Business, Computer Science, or Both. We can also update Educational Attainment.

Combined_Add %>%
  # Use the type of degree to define a new variable 'Major' as Business, Computer, or Both
  mutate(Major = case_when(Degree1 %in% Business_Degrees & Degree2 %in% Computer_Degrees ~ "Both",
                         Degree1 %in% Computer_Degrees & Degree2 %in% Business_Degrees ~ "Both",
                         Degree1 %in% Business_Degrees ~ "Business",
                         Degree2 %in% Business_Degrees ~ "Business",
                         Degree1 %in% Computer_Degrees ~ "Computer",
                         Degree2 %in% Computer_Degrees ~ "Computer")) %>%
  # Replace the Education Attainment code with the level of education
  mutate(Education_Att = case_when(Education_Att == 21 ~ "Bachelor", 
            Education_Att == 22 ~ "Master", 
            Education_Att == 23 ~ "Professional", 
            Education_Att == 24 ~ "Doctorate",
            TRUE ~ "")) -> Combined_Add

Other Variables

We can also look at the other variables to determine if any other modifications are needed.

Employment Status

According to the study:
1. Civilian employed, at work
2. Civilian employed, with a job but not at work
3. Unemployed
4. Armed forces, at work
5. Armed forces, with a job but not at work
6. Not in labor force

As one would image, it appears Employment Status has an impact on earnings. Overall, unemployment looks low for those for this segment of individuals. Out of the 114,909, only 2,754 reported being unemployed, which is roughly 2.4%. Those that reported employed and at work reported average earnings of $88,040.

To help visualize this, let show a bar chart of the counts and show average earnings.

NOTE: to display the values as currency, I used the dollar_format from the scales package.

Employment Status WA State

I think we might want to remove 3-6 but first let’s look at these in Washington State to see if there are any potential issues we should know about.

Combined_Add %>%
  mutate(Employment_Status = case_when(Employment_Status == "1" ~ "Employed",
                             Employment_Status == "2" ~ "Employed, not at work",
                             Employment_Status == "3" ~ "Unemployed",
                             Employment_Status == "4" ~ "Armed forces",
                             Employment_Status == "5" ~ "Armed forces, not at work",
                             Employment_Status == "6" ~ "Not in labor force",
                             TRUE ~ "")) %>%
  filter(State == "Washington") %>%
  group_by(Employment_Status) %>%
  summarise(AvgEarnings = mean(Earnings), Count = n())

I don’t see anything unusual so I’m going to get rid of 3-6.

Combined_Add %>%
  filter(Employment_Status <= 2) -> Combined_Add

Class of Worker

According to the study:
1. Employee of a private for-profit company or business, or of an individual, for wages, salary, or commissions
2. Employee of a private not-for-profit, tax-exempt, or charitable organization
3. Local government employee (city, county, etc)
4. State government employee
5. Federal government employee
6. Self-employed in own not incorporated business, professional practice, or farm
7. Self-employed in own incorporated business, professional practice or farm
8. Working without pay in family business or farm
9. Unemployed and last worked 5 years ago or earlier or never worked

I didn’t see any Class 9, which is good because we just removed unemployed from our data. I not working without pay in a family business or farm, so we’ll also drop 8.

Combined_Add %>%
  filter(Employment_Status <= 7) -> Combined_Add

School Enrollment

According to the study:
1. No, has not attended in the last 3 months
2. Yes, public school or public college
3. Yes, private school or college or homeschool

Individuals that reported as currently attending school earned less than those that have not attended school in the last three months. I was a little surprised to see the average earnings from those attending public or private school. We’ll assume their attending college based on the filtering we’ve done so far. Overall this is interesting, but I don’t see any issues so we’ll leave it and move on.

The Outcome of Variables Cleanup

Let’s see what this looks like now. We dropped from 114,909 observations to 104,951.

## Observations: 104,951
## Variables: 13
## $ SOC               <chr> "111021", "111021", "111021", "111021", "111...
## $ State_code        <chr> "48", "38", "08", "37", "06", "26", "36", "1...
## $ Age               <dbl> 59, 57, 47, 34, 39, 51, 47, 44, 43, 51, 39, ...
## $ Class             <dbl> 7, 7, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1,...
## $ Education_Att     <chr> "Bachelor", "Bachelor", "Master", "Bachelor"...
## $ Employment_Status <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ Degree1           <chr> "6200", "6203", "6200", "6200", "6200", "620...
## $ Degree2           <chr> "BBBB", "BBBB", "BBBB", "BBBB", "BBBB", "BBB...
## $ Earnings          <dbl> 92000, 120000, 52000, 63000, 275000, 150000,...
## $ School_Enrollment <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ State             <chr> "Texas", "North Dakota", "Colorado", "North ...
## $ Description       <chr> "General and operations managers", "General ...
## $ Major             <chr> "Business", "Business", "Business", "Busines...

Business, Computer Science, or Both

After removing those additional items in the previous steps, let’s see what the counts and average earnings are for Business, Computer Science, and Both Degrees.

Business

Computer Science

Both

We could have combined these into one statement…

Combined_Add %>%
  group_by(Major) %>%
  summarise(AvgEarnings = mean(Earnings), Counts = n())

I think that looks a lot cleaner.

Washington State

# Filter for WA State 
Combined_Add %>%
  filter(State == "Washington") -> Combined_WA

# Group By Major and show Average Earnings & Counts
Combined_WA %>%
  group_by(Major) %>%
  summarise(AvgEarnings = mean(Earnings), Counts = n())

While individuals with either a Business or Computer Science degree earned more in Washington than the national average, individuals with Both earned less.

Density Plot for Earnings

A density plot should provide an effective way to view the distribution of the Earnings variable. Let’s do a side-by-side comparison of the national versus Washington State.

TOP JOBS

Next, let’s look at the Job Descriptions to see if we can find the most common jobs for individuals with Business and Computer Science degrees. We’ll start by gathering the TOP 20 Jobs for each and then look for duplicates to see if there’s any overlap.

We had 10 jobs overlap from our two TOP 20 Jobs lists.

Both Degrees Top 5 Jobs

Let’s take a closer look at individuals with both a Business and Computer Science degrees by looking at the top Job Descriptions for these individuals.

Nothing surprising here.

Washington State Jobs

Let’s see if the top Jobs are different in Washington State by pulling the Top 5 Job Descriptions.

Cleveland Dotplot

Let’s look at another Cleveland dotplot to show the average Earnings by State. Plus, it’s Dr. Nelson’s favorite.

Seemed like the State make a pretty large difference for Earnings.

Simple Boxplot for Educational Attainment

Next, let’s create a boxplot to look at how Education Attainment affects income in Washington.

Seems like the statement, “the more you learn, the more you earn” is an accurate statement.

Level of Education

Now, let’s see if we can find a better to visualize how the level of educational attainment affects earnings.

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

It was interesting to see how age affects earnings, though makes sense when you take experience into consideration. There was a pretty significant bump for individuals with a Professional degree, just before 50 years old.

Majors

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

We can see these bumps better in this graphic. Those with Both degrees have a larger variation. I thought it was interesting to see those with Both degrees and those with Computer Science degrees seemed to start dropping off in their mid-50s. I wonder if it has something to do with early retirement or the pace of change in technology.

Map

I also thought it might be beneficial to see earnings on a map of the states.

First, we’ll use TidyCensus to get the Median Income.

## Get ACS data for Median Income for both Bachelor and Master/Professional
WA_State <- get_acs(geography = "state",
                    variables = "B20004_001", 
                    geometry = TRUE,
                    shift_geo = TRUE)

Next, we’ll create a new dataframe to merge our datasets. After merging the datasets, we’ll create a new variable to take the Average Earnings for Business, Computer Science, & Both groups by State, and then subtract it from each States Median Income.

Show the Difference in Map

Now we’re ready to map the results.

References

Friedman, Z. (2018, June 13). Student Loan Debt Statistics In 2018: A $1.5 Trillion Crisis. Retrieved from https://www.forbes.com/sites/zackfriedman/2018/06/13/student-loan-debt-statistics-2018/#124d2cea7310

NACE Staff (2017, January 25). COMPUTER SCIENCE PROJECTED AS TOP-PAID MASTER’S GRADS. Retrieved from http://www.naceweb.org/job-market/compensation/computer-science-projected-as-top-paid-masters-grads/

Occupation Definitions: https://www2.census.gov/programs-surveys/demo/guidance/industry-occupation/soc_2018_definitions.pdf

The Economic Value of College Majors. (2018, June 12). Retrieved from https://cew.georgetown.edu/cew-reports/valueofcollegemajors/

Torpey, E. (2018, April). Measuring the value of education: Career Outlook. Retrieved from https://www.bls.gov/careeroutlook/2018/data-on-display/education-pays.htm

The Labor Market for Recent College Graduates. (2018, January 25). Retrieved from https://www.newyorkfed.org/research/college-labor-market/index.html

Steven Ruggles, Sarah Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Jose Pacas, and Matthew Sobek. IPUMS USA: Version 8.0 [dataset]. Minneapolis, MN: IPUMS, 2018. https://doi.org/10.18128/D010.V8.0

Student Loan Resources: Financial Aid & Loan Debt Management. (n.d.). Retrieved October 20, 2018, from https://www.debt.org/students/

WardD_ResearchProject_Doc

Dan Ward

December 10, 2018