In today’s data-driven world, data science has become one of the most in-demand and interdisciplinary career paths. Professionals in this field are expected to combine technical, analytical, and communication skills to extract insights and drive decisions using data.
This project explores the question:
“Which are the most valued data science skills?”
Using a dataset from Kaggle (Data Science Job Postings & Skills, 2024), we analyze more than 12,000 job listings to identify the most frequently requested skills among employers in the data science job market.
Dataset: Data
Science Job Postings & Skills (2024)
Author: asaniczka
Platform: Kaggle
Source: LinkedIn job postings
License: ODC Attribution License
This dataset provides a raw dump of data science related job postings collected from LinkedIn. It includes details such as job titles, companies, locations, and most importantly, a list of skills mentioned for each posting. The main objective of the dataset is to allow users to practice data cleaning and to explore which skills are most relevant in the current job market.
library(tidyverse)
job_skills <- read_csv("job_skills.csv")
head(job_skills)
## # A tibble: 6 × 2
## job_link job_skills
## <chr> <chr>
## 1 https://www.linkedin.com/jobs/view/senior-machine-learning-enginee… Machine L…
## 2 https://www.linkedin.com/jobs/view/principal-software-engineer-ml-… C++, Pyth…
## 3 https://www.linkedin.com/jobs/view/senior-etl-data-warehouse-speci… ETL, Data…
## 4 https://www.linkedin.com/jobs/view/senior-data-warehouse-developer… Data Lake…
## 5 https://www.linkedin.com/jobs/view/lead-data-engineer-at-dice-3805… Java, Sca…
## 6 https://www.linkedin.com/jobs/view/senior-data-engineer-at-univers… Data Ware…
glimpse(job_skills)
## Rows: 12,217
## Columns: 2
## $ job_link <chr> "https://www.linkedin.com/jobs/view/senior-machine-learning…
## $ job_skills <chr> "Machine Learning, Programming, Python, Scala, Java, Data E…
skills_clean <- job_skills %>%
separate_rows(job_skills, sep = ",") %>%
mutate(job_skills = str_trim(job_skills)) %>%
filter(job_skills != "")
skills_clean
## # A tibble: 314,950 × 2
## job_link job_skills
## <chr> <chr>
## 1 https://www.linkedin.com/jobs/view/senior-machine-learning-engine… Machine L…
## 2 https://www.linkedin.com/jobs/view/senior-machine-learning-engine… Programmi…
## 3 https://www.linkedin.com/jobs/view/senior-machine-learning-engine… Python
## 4 https://www.linkedin.com/jobs/view/senior-machine-learning-engine… Scala
## 5 https://www.linkedin.com/jobs/view/senior-machine-learning-engine… Java
## 6 https://www.linkedin.com/jobs/view/senior-machine-learning-engine… Data Engi…
## 7 https://www.linkedin.com/jobs/view/senior-machine-learning-engine… Distribut…
## 8 https://www.linkedin.com/jobs/view/senior-machine-learning-engine… Statistic…
## 9 https://www.linkedin.com/jobs/view/senior-machine-learning-engine… Optimizat…
## 10 https://www.linkedin.com/jobs/view/senior-machine-learning-engine… Data Pipe…
## # ℹ 314,940 more rows
skill_counts <- skills_clean %>%
count(job_skills, sort = TRUE)
head(skill_counts, 10)
## # A tibble: 10 × 2
## job_skills n
## <chr> <int>
## 1 Python 4801
## 2 SQL 4606
## 3 Communication 2498
## 4 Data Analysis 2181
## 5 Machine Learning 1966
## 6 AWS 1740
## 7 Tableau 1685
## 8 Data Visualization 1562
## 9 R 1542
## 10 Java 1414
skill_counts %>%
slice_max(n, n = 10) %>%
ggplot(aes(x = reorder(job_skills, n), y = n)) +
geom_col(fill = "orange") +
coord_flip() +
labs(
title = "Top 10 Most Valued Data Science Skills (Job Postings)",
x = "Skill",
y = "Frequency"
) +
theme_minimal()
The analysis shows that Python, SQL, and Machine Learning are among the most frequently mentioned skills in job postings. Communication, Data Analysis, and Visualization tools like Tableau and R also appear prominently, suggesting that data scientists must balance technical proficiency with analytical and storytelling skills.