In Assignment 1, I utilized the college majors data set from FiveThirtyEight, which extensively covers various college majors. This data set offers insights into salary ranges associated with each major and the employment rates of graduates from different fields. You can find more details about the data set in the linked article:
https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/
library(ggplot2)
library(readr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ stringr 1.5.0
## ✔ forcats 1.0.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Here I linked the repository to the college majors data set.
data <- read.csv( url ("https://raw.githubusercontent.com/MAB592/Data-607-Assignments/main/all-ages.csv"))
glimpse(data)
## Rows: 173
## Columns: 11
## $ Major_code <int> 1100, 1101, 1102, 1103, 1104, 1105, 1106…
## $ Major <chr> "GENERAL AGRICULTURE", "AGRICULTURE PROD…
## $ Major_category <chr> "Agriculture & Natural Resources", "Agri…
## $ Total <int> 128148, 95326, 33955, 103549, 24280, 794…
## $ Employed <int> 90245, 76865, 26321, 81177, 17281, 63043…
## $ Employed_full_time_year_round <int> 74078, 64240, 22810, 64937, 12722, 51077…
## $ Unemployed <int> 2423, 2266, 821, 3619, 894, 2070, 264, 2…
## $ Unemployment_rate <dbl> 0.02614711, 0.02863606, 0.03024832, 0.04…
## $ Median <int> 50000, 54000, 63000, 46000, 62000, 50000…
## $ P25th <int> 34000, 36000, 40000, 30000, 38500, 35000…
## $ P75th <dbl> 80000, 80000, 98000, 72000, 90000, 75000…
To understand specific columns more clearly, I changed the names of columns like Total and Employed_full_time_year_round in this data set.
data <- data %>%
rename(Total_Graduates = Total , Employed_fulltime = Employed_full_time_year_round )
Here I created a new column for employment rate
data <- data %>%
mutate(Employment_rate = Employed_fulltime / Total_Graduates)
glimpse(data)
## Rows: 173
## Columns: 12
## $ Major_code <int> 1100, 1101, 1102, 1103, 1104, 1105, 1106, 1199, 1301…
## $ Major <chr> "GENERAL AGRICULTURE", "AGRICULTURE PRODUCTION AND M…
## $ Major_category <chr> "Agriculture & Natural Resources", "Agriculture & Na…
## $ Total_Graduates <int> 128148, 95326, 33955, 103549, 24280, 79409, 6586, 85…
## $ Employed <int> 90245, 76865, 26321, 81177, 17281, 63043, 4926, 6392…
## $ Employed_fulltime <int> 74078, 64240, 22810, 64937, 12722, 51077, 4042, 5074…
## $ Unemployed <int> 2423, 2266, 821, 3619, 894, 2070, 264, 261, 4736, 21…
## $ Unemployment_rate <dbl> 0.02614711, 0.02863606, 0.03024832, 0.04267890, 0.04…
## $ Median <int> 50000, 54000, 63000, 46000, 62000, 50000, 63000, 520…
## $ P25th <int> 34000, 36000, 40000, 30000, 38500, 35000, 39400, 350…
## $ P75th <dbl> 80000, 80000, 98000, 72000, 90000, 75000, 88000, 750…
## $ Employment_rate <dbl> 0.5780660, 0.6738980, 0.6717715, 0.6271137, 0.523970…
Below is the graph representing the Employment rate by each college major by category.
ggplot(data, aes(x = Major_category, y = Employment_rate)) +
geom_bar(stat = "identity", fill = "blue") +
labs(
title = "Employment by Major",
x = "Major",
y = "Employment rate"
) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Below we can see the amount of graduates by major within our data set.
ggplot(data, aes(x = Major_category, y = Total_Graduates)) +
geom_bar(stat = "identity", fill = "red") +
labs(
title = "Graduates by major",
x = "Major",
y = "Graduates"
) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Finally, the boxplot below represents the median starting salary by each
college major by category.
ggplot(data, aes(x = Major_category, y = Median)) +
geom_boxplot(fill = "blue") +
labs(
title = "Salary Distribution by Major",
x = "Major",
y = "Median Salary"
) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Conclusions
From the visual analysis of the data, it’s evident that graduates with majors in engineering exhibit the highest employment rates and median salaries compared to all other majors. The engineering majors are closely followed by computer and mathematics majors in terms of employment and salary prospects. On the other end of the spectrum, psychology and art majors appear to have lower employment rates and median salaries.
In summary, the data suggests that pursuing a major within the engineering discipline is associated with the highest job placement rates and salary potential among college majors.