Research Question: What specific skills are essential for various data science positions across industries?

By: Daniel Brusche, Tiffany Hugh, Luis Fernando Munoz Grass

#install.packages("tidyverse")
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Read the CSV file from the URL
url <- "https://raw.githubusercontent.com/tiffhugh/Data-Acquisition-Mangement-/refs/heads/main/articles.csv"
article_data <- read.csv(url)

R Markdown

For the analysis, I want to summarize the data on sources and skills to see the breakdown of technical skills, programming languages, and soft skills derived from certain articles. I then want to visualize this data in a bar chart.

#install.packages("dplyr")
#install.packages("ggplot2")
library(dplyr)
library(ggplot2)


source_skill_count <- article_data %>%
  group_by(Source, Type) %>%
  summarise(Count = n(), .groups = 'drop')

# Print the result
print(source_skill_count)
## # A tibble: 14 × 3
##    Source             Type                  Count
##    <chr>              <chr>                 <int>
##  1 Coursera           Soft                      2
##  2 Coursera           Technical                 4
##  3 DataCamp           Programming Languages     4
##  4 DataCamp           Soft                      5
##  5 DataCamp           Technical                 6
##  6 Geek4Geek          Programming Languages     2
##  7 Geek4Geek          Soft                      3
##  8 Geek4Geek          Technical                 5
##  9 Linkedin           Programming Languages     1
## 10 Linkedin           Soft                      2
## 11 Linkedin           Technical                 4
## 12 Tableau            Soft                      7
## 13 Tableau            Technical                 3
## 14 TowardsDataScience Soft                      5
ggplot(source_skill_count, aes(x = reorder(Source, -Count), y = Count, fill = Type)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Source Count by Skill Type", x = "Source", y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),  # Rotate x-axis labels for readability
        legend.title = element_blank()) +  # Remove legend title
  scale_fill_viridis_d(option = "C") 

Findings

Based on our analysis of data science articles, we identified key skills grouped into programming languages, technical skills, and soft skills. Programming languages—such as Python, SQL, and R—serve as essential tools for analyzing and manipulating data, making them crucial for data scientists across all industries. Technical skills encompass specialized tools and software like Tableau and TensorFlow, which are vital for executing tasks such as data visualization and machine learning. Meanwhile, soft skills—like leadership and communication—refer to interpersonal abilities that empower professionals to manage teams, collaborate effectively, and solve complex problems.

Throughout the analyzed sources, technical skills are frequently emphasized, highlighting their foundational role in data science. While technical competencies are vital for data manipulation and analysis, the articles also acknowledge the growing importance of soft skills, particularly in a collaborative field like data science. Many sources focused solely on soft skills, likely because these attributes are essential for effective teamwork and communication in diverse work environments. Notably, well-known coding resources like DataCamp and GeeksforGeeks prominently feature programming languages, underscoring the demand for skills like Python, SQL, and R. This focus not only reflects industry needs but also serves to promote their coding programs, positioning them as key players in the education and training of aspiring data scientists.

Building on these insights, which reveal the skills emphasized in the articles, we aim to determine how closely they align with industry demands, ultimately preparing data scientists for success in real-world roles.