Introduction:
The article that I choose is titled “The Economic Guide To Picking a College Major”. The article explores the ideas of increasing education costs, and how to choose a major with earnings power to match the rising costs. The bottom line of the article is that engineering is the field to be in for employment security and higher than average earnings.
The article can be found here:
https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/
library(readr)
url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/grad-students.csv"
data <- read_csv(url)
## Rows: 173 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Major, Major_category
## dbl (20): Major_code, Grad_total, Grad_sample_size, Grad_employed, Grad_full...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 × 22
## Major_code Major Major_category Grad_total Grad_sample_size Grad_employed
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 5601 CONSTRUCT… Industrial Ar… 9173 200 7098
## 2 6004 COMMERCIA… Arts 53864 882 40492
## 3 6211 HOSPITALI… Business 24417 437 18368
## 4 2201 COSMETOLO… Industrial Ar… 5411 72 3590
## 5 2001 COMMUNICA… Computers & M… 9109 171 7512
## 6 3201 COURT REP… Law & Public … 1542 22 1008
## # ℹ 16 more variables: Grad_full_time_year_round <dbl>, Grad_unemployed <dbl>,
## # Grad_unemployment_rate <dbl>, Grad_median <dbl>, Grad_P25 <dbl>,
## # Grad_P75 <dbl>, Nongrad_total <dbl>, Nongrad_employed <dbl>,
## # Nongrad_full_time_year_round <dbl>, Nongrad_unemployed <dbl>,
## # Nongrad_unemployment_rate <dbl>, Nongrad_median <dbl>, Nongrad_P25 <dbl>,
## # Nongrad_P75 <dbl>, Grad_share <dbl>, Grad_premium <dbl>
Explanation:
In the above code I load the dataset via raw data url from github, and read the csv file with the function read_csv from the readr library. I then preview with the head function.
All variables included in the original dataset appear like they could be relevant for data analysis, and all are named appropriately - easily understood.
library(ggplot2)
ggplot(data = data, aes(x = Major_category , y = Grad_unemployment_rate))+
geom_point() +
scale_x_discrete(guide = guide_axis(angle = 90))
ggplot(data = data, aes(x = Major_category , y = Grad_median))+
geom_point() +
scale_x_discrete(guide = guide_axis(angle = 90))
Explanation:
The graphs above give us a baseline understanding of the potential outcomes for a student deciding which grad major area to pursue. For example the majors in the education category show a low level of unemployment, but also a lower level of earnings power. Engineering has above average median earnings, with a relatively average unemployment rate.
ggplot(data = data, aes(x = Grad_median)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Explanation:
The histogram above shows us the range of reported median salaries. This visualization reminds the student of the economic implications of their major choice. We can see that the different choices of majors can give vastly different financial situations, from earning roughly 50K to 110K
Conclusions:
Major choice can have a significant impact on future earnings, and having the data available prior to selection would benefit an informed decision.
To extend the work done in the article I would take a more active help perspective on presentation. Perhaps taking the variable “major category” and comparing major category medians can provide insight for readers actively looking for help. Choosing a career field might be a good first step to choosing a specific major.
In addition, this article was written in 2014, an update of the data for timeliness would be warrented, but to include new majors are well.