This study examines whether college majors classified as STEM (Science, Technology, Engineering, and Mathematics) are associated with higher median annual salaries than non-STEM majors in the United States. Using a dataset of median earnings, unemployment rates, and major categories, this analysis employs exploratory visualizations, summary statistics, null hypothesis testing, and regression modeling. Enhanced distribution-focused visualizations from the ggdist package support clearer interpretation of group differences. Results show a statistically significant and practically meaningful salary advantage for STEM majors.
Selecting a college major influences future career opportunities and long-term earning potential. STEM disciplines are widely considered to provide higher economic returns. This project evaluates that claim using statistical evidence.
Do STEM majors earn higher median annual salaries than non-STEM majors?
median – median
annual salarySTEM_Status –
STEM vs Non-STEM classificationmajor,
major_category, unemployment_ratemajors <- read_csv("college_majors.csv")
glimpse(majors)
## Rows: 10
## Columns: 5
## $ major <chr> "Petroleum Engineering", "Mining Engineering", "Civi…
## $ major_category <chr> "Engineering", "Engineering", "Engineering", "Comput…
## $ median <dbl> 110000, 75000, 60000, 80000, 55000, 40000, 42000, 38…
## $ unemployment_rate <dbl> 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.06, 0.08, 0.04…
## $ STEM_Status <chr> "STEM", "STEM", "STEM", "STEM", "STEM", "Non-STEM", …
majors <- majors %>%
mutate(STEM_Status = factor(STEM_Status, levels = c("Non-STEM", "STEM")))
colSums(is.na(majors))
## major major_category median unemployment_rate
## 0 0 0 0
## STEM_Status
## 0
ggplot(majors, aes(x = median)) +
geom_density(fill = "#3C6E71", alpha = 0.45, color = "#284B63") +
geom_rug(alpha = 0.2) +
labs(
title = "Distribution of Median Salaries Across Majors",
subtitle = "The right-skew suggests a subset of majors with notably higher earnings",
x = "Median Salary (USD)", y = "Density"
) +
scale_x_continuous(labels = dollar)
Interpretation:
The distribution shows substantial variability in median earnings across
majors. The right-skewed shape suggests that a subset of
majors—primarily STEM—earn considerably higher salaries.
ggplot(majors, aes(x = median, y = STEM_Status, fill = STEM_Status)) +
stat_halfeye(alpha = 0.7, adjust = 0.7) +
scale_fill_manual(values = c("Non-STEM"="#C4C4C4","STEM"="#3C6E71")) +
labs(
title = "Salary Distribution by STEM Classification (Half-Eye Plot)",
subtitle = "Combines distribution shape with interval estimates",
x = "Median Salary (USD)", y = ""
) +
scale_x_continuous(labels = dollar)
Interpretation:
The half-eye plot displays clear separation between groups. STEM majors
show higher central tendency and a distribution shifted decisively to
the right of non-STEM majors.
stats <- majors %>%
group_by(STEM_Status) %>%
summarise(
Mean = mean(median),
Median = median(median),
SD = sd(median),
Min = min(median),
Max = max(median),
Count = n()
)
kable(stats, caption = "Summary Statistics for STEM vs Non-STEM Majors") %>%
kable_styling()
| STEM_Status | Mean | Median | SD | Min | Max | Count |
|---|---|---|---|---|---|---|
| Non-STEM | 45400 | 42000 | 9633.276 | 38000 | 62000 | 5 |
| STEM | 76000 | 75000 | 21621.748 | 55000 | 110000 | 5 |
Interpretation:
STEM majors outperform non-STEM majors in all measures of central
tendency. The range also shows that the highest earning majors belong to
STEM fields.
ggplot(majors, aes(y = STEM_Status, x = median, fill = STEM_Status)) +
stat_dots(side = "both", alpha = 0.7) +
stat_interval(.width = c(.80, .95), size = 1.2) +
scale_fill_manual(values = c("Non-STEM" = "#D0D0D0", "STEM" = "#3C6E71")) +
scale_x_continuous(
labels = function(x) paste0("$", x/1000, "K"),
breaks = seq(40000, 120000, 10000)
) +
labs(
title = "Median Salary Dot + Interval Plot",
subtitle = "Shows individual salaries along with 80% and 95% interval estimates",
x = "Median Salary (USD)",
y = ""
)
Interpretation:
This visualization reinforces the salary gap visually and statistically,
showing wide separation in intervals and a higher concentration of STEM
salaries at the upper end.
ggplot(majors %>% arrange(median),
aes(x = reorder(major, median), y = median, fill = STEM_Status)) +
geom_col(alpha=0.9) +
geom_text(aes(label=dollar(median)), hjust=-0.1, size=3.6) +
coord_flip() +
scale_fill_manual(values=c("#E0E0E0","#3C6E71")) +
scale_y_continuous(labels=dollar, expand=expansion(mult=c(0,0.15))) +
labs(
title = "Median Salary by Major (Ranked)",
subtitle = "STEM majors dominate the upper portion of the ranking",
x = "", y = "Median Salary (USD)"
)
Interpretation:
STEM majors consistently appear at the top of the ranking, illustrating
strong practical differences in salary outcomes.
ggplot(majors, aes(x = unemployment_rate, y = median, color = STEM_Status)) +
geom_point(size=4, alpha=0.8) +
geom_smooth(method="lm", linewidth=1) +
scale_color_manual(values=c("Non-STEM"="#8E8E8E","STEM"="#284B63")) +
labs(
title = "Unemployment Rate vs Median Salary",
subtitle = "STEM majors earn more across comparable unemployment rates",
x = "Unemployment Rate", y = "Median Salary (USD)"
) +
scale_y_continuous(labels=dollar)
Interpretation:
Unemployment rate is not a primary driver of salary differences. STEM
majors outperform even when unemployment overlaps with non-STEM
majors.
Null Hypothesis (H₀): μ_Non-STEM − μ_STEM = 0
Alternative Hypothesis (Hₐ): μ_Non-STEM − μ_STEM <
0
t_test <- t.test(median ~ STEM_Status,
data=majors,
alternative="less",
var.equal=FALSE)
t_test
##
## Welch Two Sample t-test
##
## data: median by STEM_Status
## t = -2.8907, df = 5.5278, p-value = 0.0152
## alternative hypothesis: true difference in means between group Non-STEM and group STEM is less than 0
## 95 percent confidence interval:
## -Inf -9710.808
## sample estimates:
## mean in group Non-STEM mean in group STEM
## 45400 76000
Interpretation:
The p-value from the Welch t-test is less than 0.05, indicating strong
statistical evidence that STEM majors earn higher median annual salaries
than non-STEM majors. Therefore, we reject the null hypothesis in favor
of the alternative.
model <- lm(median ~ STEM_Status, data = majors)
summary(model)
##
## Call:
## lm(formula = median ~ STEM_Status, data = majors)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21000 -6900 -2200 2900 34000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45400 7485 6.065 0.000301 ***
## STEM_StatusSTEM 30600 10586 2.891 0.020179 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16740 on 8 degrees of freedom
## Multiple R-squared: 0.5109, Adjusted R-squared: 0.4497
## F-statistic: 8.356 on 1 and 8 DF, p-value: 0.02018
Interpretation:
The regression coefficient for STEM_Status is positive and
statistically significant, quantifying the average salary increase
associated with STEM majors.
STEM majors demonstrate a consistent and meaningful salary advantage. Visual and statistical evidence provides a clear, well-supported answer to the research question.