Let’s load the dataset and take a quick look at the first few rows to understand its structure.
# Load data
data <- read_csv("https://raw.githubusercontent.com/simonchy/DATA607/refs/heads/main/week%208/Cleaned_Augmented_Data_Science_Skills.csv")
## Rows: 32 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): Timestamp, First Name or Nickname, Email Address, Any data science...
## dbl (1): Age
## lgl (1): List the 5 most valuable data science skills (separated by commas)
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Display the first few rows of the data
head(data)
## # A tibble: 6 × 16
## Timestamp First Name or Nickna…¹ List the 5 most valu…² `Email Address` Age
## <chr> <chr> <lgl> <chr> <dbl>
## 1 10/23/24 … Md Asaduzzaman NA m.zaman3201@gm… 42
## 2 10/23/24 … Alex NA alexander.ptac… 27
## 3 10/23/24 … Inna NA innayedzinovic… 29
## 4 10/23/24 … Cindy NA cindylin90@gma… 34
## 5 10/23/24 … Sarika NA ssgupta.phd@gm… 46
## 6 10/24/24 … Halyna NA shgalin11@gmai… 27
## # ℹ abbreviated names: ¹`First Name or Nickname`,
## # ²`List the 5 most valuable data science skills (separated by commas)`
## # ℹ 11 more variables: `Any data science/data analytics experience?` <chr>,
## # `Any software engineering experience?` <chr>,
## # `Which programming languages do you use most frequently?` <chr>,
## # `What resources do you use for learning new data science skills?` <chr>,
## # `What areas of data science are you most interested in learning more about?` <chr>, …
We’ll clean up the data by renaming columns and removing rows with missing values in skill columns.
# Standardize column names to avoid spaces and special characters
colnames(data) <- make.names(colnames(data))
# Select only the columns with the top skills
skills_data <- data %>%
select(Name..1.most.most.valuable.data.science.skill,
Name..2.most.most.valuable.data.science.skill,
Name..3.most.most.valuable.data.science.skill,
Name..4.most.most.valuable.data.science.skill,
Name..5.most.most.valuable.data.science.skill) %>%
pivot_longer(cols = everything(),
names_to = "skill_rank",
values_to = "skill") %>%
filter(!is.na(skill)) # Remove rows with NA skills
# Check cleaned and reshaped data
head(skills_data)
## # A tibble: 6 × 2
## skill_rank skill
## <chr> <chr>
## 1 Name..1.most.most.valuable.data.science.skill R language skill
## 2 Name..2.most.most.valuable.data.science.skill Python skill
## 3 Name..3.most.most.valuable.data.science.skill Statistics and math skill
## 4 Name..4.most.most.valuable.data.science.skill Data Visualization skill
## 5 Name..5.most.most.valuable.data.science.skill SQL skill
## 6 Name..1.most.most.valuable.data.science.skill data cleaning
Let’s calculate the frequency of each skill to determine which are the most valued.
# Count the occurrences of each skill
skill_counts <- skills_data %>%
group_by(skill) %>%
summarise(count = n()) %>%
arrange(desc(count))
# Display the top skills
head(skill_counts, 10)
## # A tibble: 10 × 2
## skill count
## <chr> <int>
## 1 Resourcefulness 10
## 2 Critical Thinking 9
## 3 Collaboration 8
## 4 Creativity 8
## 5 Data Visualization 8
## 6 Data Cleaning 7
## 7 Python 7
## 8 SQL 7
## 9 Statistical Analysis 7
## 10 Teamwork 7
Finally, we visualize the most valued data science skills using a bar plot.
# Plot the top 10 most valued skills
library(ggplot2)
ggplot(skill_counts[1:10, ], aes(x = reorder(skill, count), y = count)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Most Valued Data Science Skills",
x = "Skill",
y = "Count") +
theme_minimal()
Based on our analysis, the top skills identified in the dataset are:
# Display top skills as a list
skill_counts[1:10, ]
## # A tibble: 10 × 2
## skill count
## <chr> <int>
## 1 Resourcefulness 10
## 2 Critical Thinking 9
## 3 Collaboration 8
## 4 Creativity 8
## 5 Data Visualization 8
## 6 Data Cleaning 7
## 7 Python 7
## 8 SQL 7
## 9 Statistical Analysis 7
## 10 Teamwork 7
These skills represent the most valued abilities in the data science field according to the responses provided and Resourcefulness is in top as most valued ability in the data science field.