I have introduced the term “Data Practitioner” as a generic job descriptor because we have so many different job role titles for individuals whose work activities overlap including Data Scientist, Data Engineer, Data Analyst, Business Analyst, Data Architect, etc. For this story we will answer the question, “How much do we get paid?” Your analysis and data visualizations must address the variation in average salary based on role descriptor and state.
url <- "https://raw.githubusercontent.com/crystaliquezada/data608_story4/main/Data%20Practitioner%20Salaries%20-%20Sheet1.csv"
data <- read.csv(url)
head(data)
## Job State Salary
## 1 Data Scientist AL 99040
## 2 Data Scientist AK 91710
## 3 Data Scientist AZ 112470
## 4 Data Scientist AR 117250
## 5 Data Scientist CA 140490
## 6 Data Scientist CO 120320
First, let’s define the data practitioner roles:
Data Scientist: A professional who cleans, analyzes, and interprets complex data to build predictive models, with the goal of informing and guiding business strategy.
Data Engineer: A professional who designs, builds, and maintains data pipelines to ensure efficient data collection, storage, and retrieval.
Data Architect: A professional who oversees the structure of data systems. They design the frameworks that data engineers implement and manage an organization’s overall data infrastructure.
Data Analyst: Similar to data scientists, data analysts collect, clean, and interpret data. However, they typically focus on historical data to generate insights and support business decisions.
Business Analyst: A professional who bridges business stakeholders and technical teams by analyzing processes and using data driven insights to implement improvements.
So, what do we get paid?
role_avg <- data %>%
group_by(Job) %>%
summarise(avg_salary = mean(Salary)) %>%
arrange(avg_salary)
highlight_role <- "Data Architect"
ggplot(role_avg, aes(x = reorder(Job, avg_salary), y = avg_salary, fill = Job == highlight_role)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = dollar(avg_salary)),
hjust = -0.05, size = 3.5) +
coord_flip() +
scale_fill_manual(values = c("TRUE" = "#2C7BE5", "FALSE" = "gray80")) +
labs(
title = "What Data Practitioner Role Gets Paid the Most?",
subtitle = "Data Architects have the highest average salary",
x = "",
y = ""
) +
scale_y_continuous(expand = expansion(mult = c(0, 0.15))
) +
theme_minimal() +
theme(plot.title = element_text(face = "bold", size = 14),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.text.y = element_text(size = 11)
)
The average salary of data practitioners range from 65,000 to almost 140,000 dollars. Of the five roles, data architects have the highest average salary, reflecting the importance of data organization within businesses.
Next, we examine how salary varies across the states.
state_avg <- data %>%
group_by(State) %>%
summarise(avg_salary = mean(Salary))
state_lookup <- data.frame(
State = state.abb,
region = tolower(state.name)
)
state_avg <- state_avg %>%
left_join(state_lookup, by = "State")
us_map <- map_data("state")
map_data_final <- us_map %>%
left_join(state_avg, by = "region")
ggplot(map_data_final, aes(x = long, y = lat, group = group, fill = avg_salary)) +
geom_polygon(color = "white") +
scale_fill_gradient(
low = "gray",
high = "#2C7BE5",
labels = dollar
) +
labs(
title = "Average Data Practitioner Salary by State",
fill = "Salary"
) +
theme_void() +
theme(plot.title = element_text(face = "bold", size = 14)
)
Major technology hubs like New York, California, and Washington have a higher average salary across all data practitioner roles (over 110K). While central states still have an average salary of over 90,000, geography is a clear driver of pay.
Finally, we examine how average salary and geography work together.
ggplot(data, aes(x = Salary, y = reorder(Job, Salary, FUN = mean))) +
geom_jitter(alpha = 0.4, color = "gray60", height = 0.2) +
stat_summary(fun = mean, geom = "point", size = 4, color = "#2C7BE5") +
labs(
title = "Salary Variation Across States",
subtitle = "point = state, blue dot = average salary",
x = "",
y = ""
) +
scale_x_continuous(labels = scales::dollar) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.text.y = element_text(size = 11),
plot.title = element_text(face = "bold")
)
While data architects and data scientists have two of the highest average salaries among data practitioner roles, they both have greater salary variability across states. This suggests that higher paying data roles are driven by both geography and position.
Data practitioner salaries vary by both technical skill and geographic location, with more technical roles and coastal states commanding higher pay. Ultimately, data scientists hold the highest average salary of the five data practitioner roles mentioned here.
Sources
https://www.bls.gov/oes/2023/may/oes152051.htm#
https://www.zippia.com/salaries/data-engineer/
https://www.zippia.com/advice/data-analyst-salary-by-state/
https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Business-Analyst-Salary-by-State
https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Lead-DATA-Architect-Salary-by-State