Objective

This analysis investigates the relationship between education level, field of study, employment status, annual income, and perceived return on investment (ROI). Our goal is to understand how educational factors impact income and job satisfaction, offering insights for students and policymakers.

Data Preparation

Load and preview the data to ensure we have all relevant columns for analysis.

# Load the data
df <- read_excel("~/Data/Education.xlsx")
head(df)
## # A tibble: 6 × 9
##     Age Gender `Education Level` `Field of Study` `Employment Status`
##   <dbl> <chr>  <chr>             <chr>            <chr>              
## 1    20 Male   Master's          Business         Part-time          
## 2    36 Male   Bachelor's        Social Sciences  Part-time          
## 3    23 Female Master's          Business         Full-time          
## 4    24 Female Bachelor's        Humanities       Full-time          
## 5    21 Female Master's          STEM             Part-time          
## 6    24 Male   High School       Humanities       Part-time          
## # ℹ 4 more variables: `Annual Income (INR)` <dbl>, `Years to Employment` <dbl>,
## #   `Job Satisfaction (1-5)` <dbl>, `ROI (1-5)` <dbl>

1. Histogram of Annual Income

This histogram displays the distribution of Annual Income among respondents.

ggplot(df, aes(x = `Annual Income (INR)`)) + 
  geom_histogram(bins = 20, fill = "steelblue", color = "black") +
  labs(title = "Histogram of Annual Income (INR)", x = "Annual Income (INR)", y = "Frequency") +
  theme_minimal()

Conclusion

The histogram shows a right-skewed distribution, indicating that while some respondents have higher incomes, most earn within a moderate range (₹3,00,000 to ₹11,00,000).

2. Distribution of Education Levels

The bar chart shows the proportion of respondents with different Education Levels.

ggplot(df, aes(x = `Education Level`, fill = `Education Level`)) +
  geom_bar() +
  labs(title = "Distribution of Education Levels", x = "Education Level", y = "Count") +
  theme_minimal() +
  scale_fill_brewer(palette = "Pastel1")

Conclusion

Most respondents hold a Bachelor’s or Master’s degree, indicating that these are the common entry points into the workforce among the sample.

3. Boxplot of Annual Income by Education Level

The boxplot compares Annual Income across different Education Levels.

ggplot(df, aes(x = `Education Level`, y = `Annual Income (INR)`, fill = `Education Level`)) +
  geom_boxplot() +
  labs(title = "Boxplot of Annual Income by Education Level", x = "Education Level", y = "Annual Income (INR)") +
  theme_minimal() +
  scale_fill_brewer(palette = "Pastel1")

Conclusion

There is a visible increase in median income with higher education levels, suggesting that advanced degrees (like Master’s and PhDs) are associated with better income prospects.

4. Scatter Plot: Annual Income vs. Years to Employment

This scatter plot with a regression line visualizes the relationship between Years to Employment and Annual Income.

ggplot(df, aes(x = `Years to Employment`, y = `Annual Income (INR)`)) +
  geom_point(color = "steelblue") +
  geom_smooth(method = "lm", color = "black") +
  labs(title = "Annual Income vs. Years to Employment", x = "Years to Employment", y = "Annual Income (INR)") +
  theme_minimal()

Conclusion

A negative trend is observed, with shorter times to employment associated with higher incomes. This may suggest early career momentum or efficient job placement benefits higher income over time.

5. Correlation Heatmap

The heat-map shows correlations between Age, Annual Income, Years to Employment, Job Satisfaction, and ROI.

corr_data <- cor(df[, c("Age", "Annual Income (INR)", "Years to Employment", "Job Satisfaction (1-5)", "ROI (1-5)")])
melted_corr <- melt(corr_data)
ggplot(data = melted_corr, aes(x=Var1, y=Var2, fill=value)) + 
  geom_tile() +
  scale_fill_gradient2(low = "lightsteelblue", mid = "white", high = "linen", midpoint = 0, limit = c(-1, 1), space = "Lab", name="Correlation") +
  geom_text(aes(label = round(value, 2)), color = "black", size = 5) +
  labs(title = "Correlation Heatmap") +
  theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
        axis.text.x = element_text(angle = 90, hjust = 1))

Conclusion

Annual Income shows moderate positive correlations with both Job Satisfaction and ROI, suggesting that higher income is associated with greater job satisfaction and a better perception of education’s return on investment.

6. Violin Plot of ROI by Field of Study

This violin plot displays ROI distributions across different Fields of Study.

ggplot(df, aes(x = `Field of Study`, y = `ROI (1-5)`, fill = `Field of Study`)) +
  geom_violin(trim = FALSE) +
  labs(title = "Violin Plot of ROI by Field of Study", x = "Field of Study", y = "ROI (1-5)") +
  theme_minimal() +
  scale_fill_brewer(palette = "Pastel1")

Conclusion

The plot shows that STEM fields generally report higher and more consistent ROI scores, indicating that these fields may provide stronger economic returns on educational investment compared to others, like Humanities, which show greater variability.

7. ANOVA for Annual Income by Education Level

An ANOVA test is performed to assess income differences across Education Levels.

aov_result <- aov(`Annual Income (INR)` ~ `Education Level`, data = df)
summary(aov_result)
##                    Df    Sum Sq   Mean Sq F value   Pr(>F)    
## `Education Level`   3 1.446e+13 4.819e+12    32.7 3.19e-15 ***
## Residuals         111 1.636e+13 1.474e+11                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion

The ANOVA results indicate highly significant differences in income across education levels, supporting the observation that advanced degrees are linked to higher income.

8. T-Test for Bachelor’s and Master’s Income

A t-test compares the average income between Bachelor’s and Master’s degree holders to test for a significant income difference.

bachelors_income <- subset(df, `Education Level` == "Bachelor's")$`Annual Income (INR)`
masters_income <- subset(df, `Education Level` == "Master's")$`Annual Income (INR)`
t_test_result <- t.test(bachelors_income, masters_income, var.equal = TRUE)
t_test_result
## 
##  Two Sample t-test
## 
## data:  bachelors_income and masters_income
## t = -5.0859, df = 87, p-value = 2.081e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -590287.9 -258552.4
## sample estimates:
## mean of x mean of y 
##  625054.7 1049474.8

Conclusion

The t-test shows a statistically significant income difference between Bachelor’s and Master’s degree holders, indicating that higher degrees tend to lead to higher earnings.

9. Regression Analysis

Regression analysis examines how Annual Income is influenced by Years to Employment, Education Level, and Field of Study.

lm_model <- lm(`Annual Income (INR)` ~ `Years to Employment` + `Education Level` + `Field of Study`, data = df)
summary(lm_model)
## 
## Call:
## lm(formula = `Annual Income (INR)` ~ `Years to Employment` + 
##     `Education Level` + `Field of Study`, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -966169 -230874  -10224  172380 1111054 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       717673     123209   5.825 6.08e-08 ***
## `Years to Employment`             -28852      49496  -0.583 0.561183    
## `Education Level`High School     -291667     124237  -2.348 0.020730 *  
## `Education Level`Master's         392211     107366   3.653 0.000403 ***
## `Education Level`PhD              926250     143690   6.446 3.38e-09 ***
## `Field of Study`Humanities        -35010     119011  -0.294 0.769196    
## `Field of Study`Social Sciences    -8878     118464  -0.075 0.940403    
## `Field of Study`STEM              -68762     104150  -0.660 0.510528    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 389100 on 107 degrees of freedom
## Multiple R-squared:  0.4743, Adjusted R-squared:  0.4399 
## F-statistic: 13.79 on 7 and 107 DF,  p-value: 1.28e-12

Conclusion

Education Level is a significant predictor of income. The regression model highlights that higher education is associated with better income outcomes.

10. Cluster Analysis

K-means clustering segments respondents based on Annual Income, Job Satisfaction, and ROI.

set.seed(123)  # For reproducibility
df_cluster <- df[, c("Annual Income (INR)", "ROI (1-5)", "Job Satisfaction (1-5)")]
kmeans_result <- kmeans(df_cluster, centers = 3)
df$Cluster <- as.factor(kmeans_result$cluster)

# Visualize clusters
ggplot(df, aes(x = `Job Satisfaction (1-5)`, y = `Annual Income (INR)`, color = Cluster)) +
  geom_point() +
  labs(title = "Cluster Analysis: Income vs. Job Satisfaction", x = "Job Satisfaction (1-5)", y = "Annual Income (INR)") +
  theme_minimal() +
  scale_color_brewer(palette = "Pastel1", name = "Cluster")

Conclusion

Cluster analysis reveals three distinct groups:

  • Cluster 3: High-income and high-satisfaction individuals, typically with advanced degrees.
  • Cluster 2: Moderate-income individuals with high ROI perception.
  • Cluster 1: Lower-income respondents with mixed satisfaction and ROI perceptions.

These clusters provide a nuanced understanding of respondent types based on economic and satisfaction-related characteristics.


Final Conclusion

This analysis underscores the positive impact of higher education, especially in STEM fields, on income, job satisfaction, and ROI. Advanced degrees tend to improve economic outcomes, making them a valuable investment for students seeking career growth.