Abstract

This study examines whether college majors classified as STEM (Science, Technology, Engineering, and Mathematics) are associated with higher median annual salaries than non-STEM majors in the United States. Using a dataset of median earnings, unemployment rates, and major categories, this analysis employs exploratory visualizations, summary statistics, null hypothesis testing, and regression modeling. Enhanced distribution-focused visualizations from the ggdist package support clearer interpretation of group differences. Results show a statistically significant and practically meaningful salary advantage for STEM majors.

1. Introduction

Selecting a college major influences future career opportunities and long-term earning potential. STEM disciplines are widely considered to provide higher economic returns. This project evaluates that claim using statistical evidence.

Research Question

Do STEM majors earn higher median annual salaries than non-STEM majors?

Variables

  • Cases: U.S. college majors
  • Response variable: median – median annual salary
  • Explanatory variable: STEM_Status – STEM vs Non-STEM classification
  • Additional variables: major, major_category, unemployment_rate

2. Data Loading and Preparation

majors <- read_csv("college_majors.csv")
glimpse(majors)
## Rows: 10
## Columns: 5
## $ major             <chr> "Petroleum Engineering", "Mining Engineering", "Civi…
## $ major_category    <chr> "Engineering", "Engineering", "Engineering", "Comput…
## $ median            <dbl> 110000, 75000, 60000, 80000, 55000, 40000, 42000, 38…
## $ unemployment_rate <dbl> 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.06, 0.08, 0.04…
## $ STEM_Status       <chr> "STEM", "STEM", "STEM", "STEM", "STEM", "Non-STEM", …
majors <- majors %>%
  mutate(STEM_Status = factor(STEM_Status, levels = c("Non-STEM", "STEM")))
colSums(is.na(majors))
##             major    major_category            median unemployment_rate 
##                 0                 0                 0                 0 
##       STEM_Status 
##                 0

3. Exploratory Data Analysis

3.1 Distribution of Median Salaries

ggplot(majors, aes(x = median)) +
  geom_density(fill = "#3C6E71", alpha = 0.45, color = "#284B63") +
  geom_rug(alpha = 0.2) +
  labs(
    title = "Distribution of Median Salaries Across Majors",
    subtitle = "The right-skew suggests a subset of majors with notably higher earnings",
    x = "Median Salary (USD)", y = "Density"
  ) +
  scale_x_continuous(labels = dollar)

Interpretation:
The distribution shows substantial variability in median earnings across majors. The right-skewed shape suggests that a subset of majors—primarily STEM—earn considerably higher salaries.


3.2 Half-Eye Plot (ggdist)

ggplot(majors, aes(x = median, y = STEM_Status, fill = STEM_Status)) +
  stat_halfeye(alpha = 0.7, adjust = 0.7) +
  scale_fill_manual(values = c("Non-STEM"="#C4C4C4","STEM"="#3C6E71")) +
  labs(
    title = "Salary Distribution by STEM Classification (Half-Eye Plot)",
    subtitle = "Combines distribution shape with interval estimates",
    x = "Median Salary (USD)", y = ""
  ) +
  scale_x_continuous(labels = dollar)

Interpretation:
The half-eye plot displays clear separation between groups. STEM majors show higher central tendency and a distribution shifted decisively to the right of non-STEM majors.


3.3 Summary Statistics

stats <- majors %>%
  group_by(STEM_Status) %>%
  summarise(
    Mean = mean(median),
    Median = median(median),
    SD = sd(median),
    Min = min(median),
    Max = max(median),
    Count = n()
  )

kable(stats, caption = "Summary Statistics for STEM vs Non-STEM Majors") %>%
  kable_styling()
Summary Statistics for STEM vs Non-STEM Majors
STEM_Status Mean Median SD Min Max Count
Non-STEM 45400 42000 9633.276 38000 62000 5
STEM 76000 75000 21621.748 55000 110000 5

Interpretation:
STEM majors outperform non-STEM majors in all measures of central tendency. The range also shows that the highest earning majors belong to STEM fields.


3.4 Dot + Interval Plot (ggdist)

ggplot(majors, aes(y = STEM_Status, x = median, fill = STEM_Status)) +
  stat_dots(side = "both", alpha = 0.7) +
  stat_interval(.width = c(.80, .95), size = 1.2) +
  scale_fill_manual(values = c("Non-STEM" = "#D0D0D0", "STEM" = "#3C6E71")) +
  scale_x_continuous(
    labels = function(x) paste0("$", x/1000, "K"),
    breaks = seq(40000, 120000, 10000)
  ) +
  labs(
    title = "Median Salary Dot + Interval Plot",
    subtitle = "Shows individual salaries along with 80% and 95% interval estimates",
    x = "Median Salary (USD)",
    y = ""
  )

Interpretation:
This visualization reinforces the salary gap visually and statistically, showing wide separation in intervals and a higher concentration of STEM salaries at the upper end.


3.5 Ranked Salary Chart

ggplot(majors %>% arrange(median),
       aes(x = reorder(major, median), y = median, fill = STEM_Status)) +
  geom_col(alpha=0.9) +
  geom_text(aes(label=dollar(median)), hjust=-0.1, size=3.6) +
  coord_flip() +
  scale_fill_manual(values=c("#E0E0E0","#3C6E71")) +
  scale_y_continuous(labels=dollar, expand=expansion(mult=c(0,0.15))) +
  labs(
    title = "Median Salary by Major (Ranked)",
    subtitle = "STEM majors dominate the upper portion of the ranking",
    x = "", y = "Median Salary (USD)"
  )

Interpretation:
STEM majors consistently appear at the top of the ranking, illustrating strong practical differences in salary outcomes.


3.6 Salary vs Unemployment Rate

ggplot(majors, aes(x = unemployment_rate, y = median, color = STEM_Status)) +
  geom_point(size=4, alpha=0.8) +
  geom_smooth(method="lm", linewidth=1) +
  scale_color_manual(values=c("Non-STEM"="#8E8E8E","STEM"="#284B63")) +
  labs(
    title = "Unemployment Rate vs Median Salary",
    subtitle = "STEM majors earn more across comparable unemployment rates",
    x = "Unemployment Rate", y = "Median Salary (USD)"
  ) +
  scale_y_continuous(labels=dollar)

Interpretation:
Unemployment rate is not a primary driver of salary differences. STEM majors outperform even when unemployment overlaps with non-STEM majors.


4. Statistical Analysis

4.1 Hypothesis Test

Null Hypothesis (H₀): μ_Non-STEM − μ_STEM = 0
Alternative Hypothesis (Hₐ): μ_Non-STEM − μ_STEM < 0

t_test <- t.test(median ~ STEM_Status,
                 data=majors,
                 alternative="less",
                 var.equal=FALSE)
t_test
## 
##  Welch Two Sample t-test
## 
## data:  median by STEM_Status
## t = -2.8907, df = 5.5278, p-value = 0.0152
## alternative hypothesis: true difference in means between group Non-STEM and group STEM is less than 0
## 95 percent confidence interval:
##       -Inf -9710.808
## sample estimates:
## mean in group Non-STEM     mean in group STEM 
##                  45400                  76000

Interpretation:
The p-value from the Welch t-test is less than 0.05, indicating strong statistical evidence that STEM majors earn higher median annual salaries than non-STEM majors. Therefore, we reject the null hypothesis in favor of the alternative.


5. Regression Model

model <- lm(median ~ STEM_Status, data = majors)
summary(model)
## 
## Call:
## lm(formula = median ~ STEM_Status, data = majors)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -21000  -6900  -2200   2900  34000 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        45400       7485   6.065 0.000301 ***
## STEM_StatusSTEM    30600      10586   2.891 0.020179 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16740 on 8 degrees of freedom
## Multiple R-squared:  0.5109, Adjusted R-squared:  0.4497 
## F-statistic: 8.356 on 1 and 8 DF,  p-value: 0.02018

Interpretation:
The regression coefficient for STEM_Status is positive and statistically significant, quantifying the average salary increase associated with STEM majors.


6. Discussion

Key Findings

  • STEM majors earn substantially higher salaries.
  • Multiple visualizations—including ggdist—confirm this pattern.
  • Hypothesis testing and regression reinforce the statistical significance of the relationship.

Limitations

  • Small dataset
  • Observational data
  • Median salary does not include geographic variation

Conclusion

STEM majors demonstrate a consistent and meaningful salary advantage. Visual and statistical evidence provides a clear, well-supported answer to the research question.