This analysis investigates the relationship between education level, field of study, employment status, annual income, and perceived return on investment (ROI). Our goal is to understand how educational factors impact income and job satisfaction, offering insights for students and policymakers.
Load and preview the data to ensure we have all relevant columns for analysis.
# Load the data
df <- read_excel("~/Data/Education.xlsx")
head(df)
## # A tibble: 6 × 9
## Age Gender `Education Level` `Field of Study` `Employment Status`
## <dbl> <chr> <chr> <chr> <chr>
## 1 20 Male Master's Business Part-time
## 2 36 Male Bachelor's Social Sciences Part-time
## 3 23 Female Master's Business Full-time
## 4 24 Female Bachelor's Humanities Full-time
## 5 21 Female Master's STEM Part-time
## 6 24 Male High School Humanities Part-time
## # ℹ 4 more variables: `Annual Income (INR)` <dbl>, `Years to Employment` <dbl>,
## # `Job Satisfaction (1-5)` <dbl>, `ROI (1-5)` <dbl>
This histogram displays the distribution of Annual Income among respondents.
ggplot(df, aes(x = `Annual Income (INR)`)) +
geom_histogram(bins = 20, fill = "steelblue", color = "black") +
labs(title = "Histogram of Annual Income (INR)", x = "Annual Income (INR)", y = "Frequency") +
theme_minimal()
The histogram shows a right-skewed distribution, indicating that while some respondents have higher incomes, most earn within a moderate range (₹3,00,000 to ₹11,00,000).
The bar chart shows the proportion of respondents with different Education Levels.
ggplot(df, aes(x = `Education Level`, fill = `Education Level`)) +
geom_bar() +
labs(title = "Distribution of Education Levels", x = "Education Level", y = "Count") +
theme_minimal() +
scale_fill_brewer(palette = "Pastel1")
Most respondents hold a Bachelor’s or Master’s degree, indicating that these are the common entry points into the workforce among the sample.
The boxplot compares Annual Income across different Education Levels.
ggplot(df, aes(x = `Education Level`, y = `Annual Income (INR)`, fill = `Education Level`)) +
geom_boxplot() +
labs(title = "Boxplot of Annual Income by Education Level", x = "Education Level", y = "Annual Income (INR)") +
theme_minimal() +
scale_fill_brewer(palette = "Pastel1")
There is a visible increase in median income with higher education levels, suggesting that advanced degrees (like Master’s and PhDs) are associated with better income prospects.
This scatter plot with a regression line visualizes the relationship between Years to Employment and Annual Income.
ggplot(df, aes(x = `Years to Employment`, y = `Annual Income (INR)`)) +
geom_point(color = "steelblue") +
geom_smooth(method = "lm", color = "black") +
labs(title = "Annual Income vs. Years to Employment", x = "Years to Employment", y = "Annual Income (INR)") +
theme_minimal()
A negative trend is observed, with shorter times to employment associated with higher incomes. This may suggest early career momentum or efficient job placement benefits higher income over time.
The heat-map shows correlations between Age, Annual Income, Years to Employment, Job Satisfaction, and ROI.
corr_data <- cor(df[, c("Age", "Annual Income (INR)", "Years to Employment", "Job Satisfaction (1-5)", "ROI (1-5)")])
melted_corr <- melt(corr_data)
ggplot(data = melted_corr, aes(x=Var1, y=Var2, fill=value)) +
geom_tile() +
scale_fill_gradient2(low = "lightsteelblue", mid = "white", high = "linen", midpoint = 0, limit = c(-1, 1), space = "Lab", name="Correlation") +
geom_text(aes(label = round(value, 2)), color = "black", size = 5) +
labs(title = "Correlation Heatmap") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1))
Annual Income shows moderate positive correlations with both Job Satisfaction and ROI, suggesting that higher income is associated with greater job satisfaction and a better perception of education’s return on investment.
This violin plot displays ROI distributions across different Fields of Study.
ggplot(df, aes(x = `Field of Study`, y = `ROI (1-5)`, fill = `Field of Study`)) +
geom_violin(trim = FALSE) +
labs(title = "Violin Plot of ROI by Field of Study", x = "Field of Study", y = "ROI (1-5)") +
theme_minimal() +
scale_fill_brewer(palette = "Pastel1")
The plot shows that STEM fields generally report higher and more consistent ROI scores, indicating that these fields may provide stronger economic returns on educational investment compared to others, like Humanities, which show greater variability.
An ANOVA test is performed to assess income differences across Education Levels.
aov_result <- aov(`Annual Income (INR)` ~ `Education Level`, data = df)
summary(aov_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## `Education Level` 3 1.446e+13 4.819e+12 32.7 3.19e-15 ***
## Residuals 111 1.636e+13 1.474e+11
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA results indicate highly significant differences in income across education levels, supporting the observation that advanced degrees are linked to higher income.
A t-test compares the average income between Bachelor’s and Master’s degree holders to test for a significant income difference.
bachelors_income <- subset(df, `Education Level` == "Bachelor's")$`Annual Income (INR)`
masters_income <- subset(df, `Education Level` == "Master's")$`Annual Income (INR)`
t_test_result <- t.test(bachelors_income, masters_income, var.equal = TRUE)
t_test_result
##
## Two Sample t-test
##
## data: bachelors_income and masters_income
## t = -5.0859, df = 87, p-value = 2.081e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -590287.9 -258552.4
## sample estimates:
## mean of x mean of y
## 625054.7 1049474.8
The t-test shows a statistically significant income difference between Bachelor’s and Master’s degree holders, indicating that higher degrees tend to lead to higher earnings.
Regression analysis examines how Annual Income is influenced by Years to Employment, Education Level, and Field of Study.
lm_model <- lm(`Annual Income (INR)` ~ `Years to Employment` + `Education Level` + `Field of Study`, data = df)
summary(lm_model)
##
## Call:
## lm(formula = `Annual Income (INR)` ~ `Years to Employment` +
## `Education Level` + `Field of Study`, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -966169 -230874 -10224 172380 1111054
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 717673 123209 5.825 6.08e-08 ***
## `Years to Employment` -28852 49496 -0.583 0.561183
## `Education Level`High School -291667 124237 -2.348 0.020730 *
## `Education Level`Master's 392211 107366 3.653 0.000403 ***
## `Education Level`PhD 926250 143690 6.446 3.38e-09 ***
## `Field of Study`Humanities -35010 119011 -0.294 0.769196
## `Field of Study`Social Sciences -8878 118464 -0.075 0.940403
## `Field of Study`STEM -68762 104150 -0.660 0.510528
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 389100 on 107 degrees of freedom
## Multiple R-squared: 0.4743, Adjusted R-squared: 0.4399
## F-statistic: 13.79 on 7 and 107 DF, p-value: 1.28e-12
Education Level is a significant predictor of income. The regression model highlights that higher education is associated with better income outcomes.
K-means clustering segments respondents based on Annual Income, Job Satisfaction, and ROI.
set.seed(123) # For reproducibility
df_cluster <- df[, c("Annual Income (INR)", "ROI (1-5)", "Job Satisfaction (1-5)")]
kmeans_result <- kmeans(df_cluster, centers = 3)
df$Cluster <- as.factor(kmeans_result$cluster)
# Visualize clusters
ggplot(df, aes(x = `Job Satisfaction (1-5)`, y = `Annual Income (INR)`, color = Cluster)) +
geom_point() +
labs(title = "Cluster Analysis: Income vs. Job Satisfaction", x = "Job Satisfaction (1-5)", y = "Annual Income (INR)") +
theme_minimal() +
scale_color_brewer(palette = "Pastel1", name = "Cluster")
Cluster analysis reveals three distinct groups:
These clusters provide a nuanced understanding of respondent types based on economic and satisfaction-related characteristics.
This analysis underscores the positive impact of higher education, especially in STEM fields, on income, job satisfaction, and ROI. Advanced degrees tend to improve economic outcomes, making them a valuable investment for students seeking career growth.