๐Ÿ“Œ Problem Statement

In an increasingly competitive job market, understanding what drives job offers for computer science students is more important than ever. This project investigates the predictors of job offers based on student performance, experience, and personal development, while comparing these findings to real-world hiring trends using Stack Overflow industry data.

The goal is to uncover meaningful patterns that could inform educational planning, career advising, and student behavior.


โ“ Research Questions

Primary Question:

  • What factors best predict whether a student receives a job offer?

Secondary Questions:

  • How does the number of projects completed influence job offer likelihood?
  • What is the impact of specific technical skills (e.g., Python, SQL) on offers?
  • Does extracurricular involvement correlate with job outcomes?
  • How do industry-level attributes (like salary, education, and gender) influence employment?

๐Ÿ’พ Data Sources

Student Dataset

  • 20,000 students as job fair candidates found here
  • Key features:
    • experience_years
    • course_grades
    • projects_completed
    • extracurriculars
    • skills
    • job_offer (binary target variable)

Industry Dataset

  • Sourced from Kaggle, over 70,000 of Job Applicants and there qualifications/characteristics used for industry reporting.
  • Key features:
    • Age, Gender, EdLevel,
    • PreviousSalary, YearsCodePro``ComputerSkills
    • MentalHealth, Employed, Employment

Both datasets were preprocessed in R using dplyr, tidyr, and stringr. No major cleaning or re-organization was needed.


๐Ÿ” Techniques Used


๐Ÿ“Š Exploratory Findings

Student Dataset Highlights

  • Projects: Students with 90-95 Course Grade, 4 years exp., 3 projects and 1 extracurricular had higher job offer rates.
  • Extracurriculars: The sweet spot was 1โ€“2 activities. More or fewer reduced offer rates by a little.
  • Grades: Students with <75 saw reduced offers, but above that, grades did not matter as much.
  • Skills: Python and SQL most common among students with offers. C++ less predictive alone.

Industry Dataset Highlights

  • Computer Skills: Strongly correlated with employment (boxplots showed clear separability)
  • Years of Professional Coding: Employment increased with each year of experience
  • Education: Undergraduate (had most) and Masterโ€™s degrees had the highest employment counts, PhD had higher unemployment than employment.
  • Gender: Male developers far outnumber females and non-binary, reflecting a hiring bias or pipeline issue
  • Country: U.S., Germany, and India topped employment counts; salary distribution also higher in these regions

๐ŸŒฒ Modeling Results

Student Decision Tree

Model trained on: experience_years, projects_completed, extracurriculars

Key Paths:

  • Students with < 6 projects + = 2 extracurriculars + < 2 years of exp: highest offer rates
  • Students with >= 2 extracurriculars and >=2 years of exp: low offer probability
  • Grades were tested but not included due to over fitting issues.

The decision tree clearly emphasized practical engagement (projects) and well-rounded involvement as dominant predictors of success.

Industry Decision Tree

Model trained on: Age, Gender, YearsCodePro, PreviousSalary, MentalHealth, + EdLevel + Employment + MainBranch + YearsCode

Observations:

  • Developers under 39 years but more than 5 years of coding, a higher education than bachelors, and are a developer had higher chances of being employed than others.
  • If you were not a Developer (MainBranch = NotDev), then most likely you were unemployed in the field.
  • Gender had a noticeable split, with males more often retained
  • Mental health did not significantly split branches, but could be underreported

โœ… Final Conclusions

Summary of Key Findings

  • ๐Ÿงฉ Students with strong project portfolios and extracurricular involvement received more job offers
  • ๐Ÿ” Grades alone were not as predictive โ€” experience mattered more
  • ๐Ÿง  In industry, technical depth (e.g., years coding, skillset and education) outweighed formal education levels
  • ๐ŸŒ The U.S., Germany, and India dominate the developer market, especially at higher salaries

Insights from the Models

  • Both trees emphasized practical experience over academic scores, BUT education LEVEL did matter in being employed for the industry.
  • Student predictors: projects_completed, extracurriculars, experience_years
  • Industry predictors: YearsCode, EdLevel, MainBranch

Real-World Implications

  • Universities should emphasize experiential learning and project-based curricula
  • Students should focus on depth in one or two key technical areas
  • Employers could benefit from holistic candidate evaluation, not just GPA or degree

๐Ÿ”ฎ Future Work

Modeling Expansion

  • Add Random Forest to improve prediction accuracy
  • Test logistic regression to interpret odds and marginal effects
  • Use k-means clustering to identify student โ€œarchetypesโ€ for advising

Data Expansion

  • Add internship experience, leadership, and club participation
  • Merge salary/job offer outcomes from platforms like LinkedIn or Indeed
  • Incorporate regional economic indicators and hiring growth projections

New Questions to Explore

  • How do soft skills (e.g., communication, teamwork) impact hiring outcomes?
  • Does participation in student-run startups or competitions correlate with employment?
  • What impact does networking, such as LinkedIn presence, have on offers?

๐Ÿ“Ž Appendix