๐ Problem Statement
In an increasingly competitive job market, understanding what drives
job offers for computer science students is more important than ever.
This project investigates the predictors of job offers based on student
performance, experience, and personal development, while comparing these
findings to real-world hiring trends using Stack Overflow industry
data.
The goal is to uncover meaningful patterns that
could inform educational planning, career advising, and student
behavior.
โ Research Questions
Primary Question:
- What factors best predict whether a student receives a job
offer?
Secondary Questions:
- How does the number of projects completed influence
job offer likelihood?
- What is the impact of specific technical skills
(e.g., Python, SQL) on offers?
- Does extracurricular involvement correlate with job
outcomes?
- How do industry-level attributes (like salary, education, and
gender) influence employment?
๐พ Data Sources
Student Dataset
- 20,000 students as job fair candidates found here
- Key features:
experience_years
course_grades
projects_completed
extracurriculars
skills
job_offer (binary target variable)
Industry Dataset
- Sourced from Kaggle,
over 70,000 of Job Applicants and there qualifications/characteristics
used for industry reporting.
- Key features:
Age, Gender, EdLevel,
PreviousSalary,
YearsCodePro``ComputerSkills
MentalHealth, Employed,
Employment
Both datasets were preprocessed in R using dplyr,
tidyr, and stringr. No major cleaning or
re-organization was needed.
๐ Techniques Used
- Data preprocessing: Standardized missing values,
converted categories, split skills into individual rows
- Exploratory visualizations: Bar plots, boxplots,
percent breakdowns using
ggplot2
- Feature binning: Course grades and projects
bucketed for interpretability and usability
- Modeling:
- Decision Trees (
rpart) with
information gain
- Random Forest attempted for industry comparison
(future expansion)
- Visuals built using
rpart.plot
- Interactive dashboard: Delivered using
shiny and shinydashboard
๐ Exploratory Findings
Student Dataset Highlights
- Projects: Students with 90-95 Course
Grade, 4 years exp., 3
projects and 1 extracurricular had higher job
offer rates.
- Extracurriculars: The sweet spot was 1โ2
activities. More or fewer reduced offer rates by a little.
- Grades: Students with <75 saw reduced offers,
but above that, grades did not matter as much.
- Skills: Python and SQL most common among students
with offers. C++ less predictive alone.
Industry Dataset Highlights
- Computer Skills: Strongly correlated with
employment (boxplots showed clear separability)
- Years of Professional Coding: Employment increased
with each year of experience
- Education: Undergraduate (had most) and Masterโs
degrees had the highest employment counts, PhD had higher unemployment
than employment.
- Gender: Male developers far outnumber females and
non-binary, reflecting a hiring bias or pipeline issue
- Country: U.S., Germany, and India topped employment
counts; salary distribution also higher in these regions
๐ฒ Modeling Results
Student Decision Tree
Model trained on: experience_years,
projects_completed, extracurriculars
Key Paths:
- Students with < 6 projects + = 2 extracurriculars + < 2 years
of exp: highest offer rates
- Students with >= 2 extracurriculars and >=2 years of exp:
low offer probability
- Grades were tested but not included due to over fitting issues.
The decision tree clearly emphasized practical engagement
(projects) and well-rounded involvement as
dominant predictors of success.
Industry Decision Tree
Model trained on: Age, Gender,
YearsCodePro, PreviousSalary,
MentalHealth, + EdLevel +
Employment + MainBranch +
YearsCode
Observations:
- Developers under 39 years but more than 5 years of coding, a higher
education than bachelors, and are a developer had higher chances of
being employed than others.
- If you were not a Developer (MainBranch = NotDev), then most likely
you were unemployed in the field.
- Gender had a noticeable split, with males more often retained
- Mental health did not significantly split branches, but could be
underreported
โ
Final Conclusions
Summary of Key Findings
- ๐งฉ Students with strong project portfolios and
extracurricular involvement received more job offers
- ๐ Grades alone were not as predictive โ experience
mattered more
- ๐ง In industry, technical depth (e.g., years
coding, skillset and education) outweighed formal education levels
- ๐ The U.S., Germany, and India dominate the
developer market, especially at higher salaries
Insights from the Models
- Both trees emphasized practical experience over academic
scores, BUT education LEVEL did matter in being employed for
the industry.
- Student predictors:
projects_completed,
extracurriculars, experience_years
- Industry predictors:
YearsCode, EdLevel,
MainBranch
Real-World Implications
- Universities should emphasize experiential learning
and project-based curricula
- Students should focus on depth in one or two key technical
areas
- Employers could benefit from holistic candidate evaluation, not just
GPA or degree
๐ฎ Future Work
Modeling Expansion
- Add Random Forest to improve prediction
accuracy
- Test logistic regression to interpret odds and
marginal effects
- Use k-means clustering to identify student
โarchetypesโ for advising
Data Expansion
- Add internship experience,
leadership, and club
participation
- Merge salary/job offer outcomes from platforms like LinkedIn or
Indeed
- Incorporate regional economic indicators and hiring
growth projections
New Questions to Explore
- How do soft skills (e.g., communication, teamwork)
impact hiring outcomes?
- Does participation in student-run startups or
competitions correlate with employment?
- What impact does networking, such as LinkedIn
presence, have on offers?
๐ Appendix
- App built using:
shiny, shinydashboard,
ggplot2, rpart, DT,
tidyverse
- Source code available upon request
- Dashboard designed to be fully interactive and
stakeholder-friendly