This report uses Stack Overflow 2025 developer survey data to study developers’ job satisfaction. We apply the multi-classification method to predict the job satisfaction of developers into three orderly categories based on demographic characteristics, occupational characteristics and technology-related characteristics. Understanding these drivers is of great value to developers, employers, and human resources researchers working in rapidly developing technological environments.
Can we accurately predict the developer’s job satisfaction (dissatisfied, neutral or satisfied) based on his professional background, working conditions and technical usage patterns?
The job satisfaction of software developers affects employee retention, mental health and productivity. We adopt the three-level classification method to achieve a balance between distinguishing different categories and ensuring sufficient sample size, thus achieving stable modelling. Employers can use these insights to develop interventions and identify key working conditions, while developers can understand the relationship between career choice and satisfaction. In view of the impact of artificial intelligence-assisted development, this analysis is particularly timely.
The data set used is the Stack Overflow Annual Developer
Survey 2025, which can be publicly accessed: https://survey.stackoverflow.co/. The survey collected
feedback from global developers, including demographic information,
educational background, employment, programming language, tools and
attitudes towards artificial intelligence. The target variable of this
project is JobSat (overall job satisfaction), which was
initially recorded as a value of 0 to 10, and we re-encoded it into a
three-level ordered factor: Dissatisfied (0–6), Neutral
(7–8) and Satisfied (9–10).
The dataset satisfies four of the five required criteria:
library(tidyverse)
df <- read_csv("survey_results_public.csv", show_col_types = FALSE)
cat(
"Number of rows:", nrow(df), "\n",
"Number of columns:", ncol(df), "\n"
)
## Number of rows: 49191
## Number of columns: 172
The data set contains 49,191 rows, far exceeding the threshold of 10,000 rows. In view of this scale, we will use hierarchical sampling to control the calculation cost during model training, and use cross-verification instead of a single training set-test set division to evaluate the model.
cols_of_interest <- c(
"JobSat", "Age", "EdLevel", "Employment", "RemoteWork",
"WorkExp", "OrgSize", "ICorPM", "Country", "CompTotal",
"AISelect", "AISent", "AIThreat", "DevType", "YearsCode"
)
miss_df <- df %>%
select(all_of(cols_of_interest)) %>%
summarise(across(everything(), ~mean(is.na(.)))) %>%
pivot_longer(everything(), names_to = "Variable", values_to = "Missing_Rate") %>%
arrange(desc(Missing_Rate))
ggplot(miss_df, aes(x = reorder(Variable, Missing_Rate), y = Missing_Rate)) +
geom_col(fill = "#e05c5c") +
coord_flip() +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Missing Value Rate for Key Variables",
x = NULL,
y = "Proportion Missing"
) +
theme_minimal(base_size = 11) +
theme(
axis.text.y = element_text(size = 6),
axis.text.x = element_text(size = 10),
plot.title = element_text(face = "bold", size = 12)
)
The target variable JobSat is missing in
45.78% of the respondents. Some predictors also have a
large number of missing values (for example, the missing value of
CompTotal is about 49%, and the missing value of
OrgSize is about 31%). Therefore, it is necessary to adopt
a cautious interpolation strategy and conduct sensitivity analysis.
num_features <- sum(sapply(df, is.numeric))
cat_features <- sum(sapply(df, is.character))
cat(
"Total features:", ncol(df), "\n",
"Numerical features:", num_features, "\n",
"Categorical features:", cat_features, "\n"
)
## Total features: 172
## Numerical features: 53
## Categorical features: 119
The data set contains 172 features - 53 numerical features and 119 subtype features - which brings significant dimensional challenges. Many classification columns are coded with multiple-choice responses (separated by semicolons), which need to be parsed and binaryed before modelling. Feature selection or dimension reduction (for example, principal component analysis (PCA) of encoded features, or using LASSO punishment regression) is crucial.
jobsat_dist <- df %>%
filter(!is.na(JobSat)) %>%
mutate(JobSat_Class = case_when(
JobSat <= 6 ~ "Dissatisfied (0-6)",
JobSat <= 8 ~ "Neutral (7-8)",
JobSat <= 10 ~ "Satisfied (9-10)"
)) %>%
mutate(JobSat_Class = factor(JobSat_Class,
levels = c("Dissatisfied (0-6)", "Neutral (7-8)", "Satisfied (9-10)"))) %>%
count(JobSat_Class) %>%
mutate(Proportion = n / sum(n))
ggplot(jobsat_dist, aes(x = JobSat_Class, y = Proportion, fill = JobSat_Class)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = paste0(n, "\n(", scales::percent(Proportion, accuracy = 0.1), ")")),
vjust = -0.3, size = 3.2) +
scale_y_continuous(labels = scales::percent_format(), limits = c(0, 0.55)) +
scale_fill_manual(values = c("#d73027", "#fee090", "#1a9850")) +
labs(
title = "Distribution of Job Satisfaction (3-Class Recoding)",
x = "Job Satisfaction Class",
y = "Proportion"
) +
theme_minimal(base_size = 11)
The original 0-10 points numerical JobSat score was
recoded into three ordered classes: Dissatisfied (0-6),
Neutral (7-8) and Satisfied (9-10). The number of samples in these three
categories is about 7,386, 12,562 and 6,522, respectively, showing a
moderate imbalance, with the neutral category dominating. We will solve
this problem by using hierarchical sampling and macro average F1 scores
as the main evaluation indicators.
[To be completed by the group — include visualisations and summaries of key predictors, target variable, and relationships of interest.]
[To be completed by the group — outline modelling strategy, train/test split plan, algorithms to be trialled, and evaluation framework.]
Stack Overflow. (2025). Stack Overflow Annual Developer Survey 2025. https://survey.stackoverflow.co/