Overall and nationally, Data Engineers and Data Scientists earn meaningfully higher and more variable salaries than Business Analysts and Data Analysts. The distribution shapes also show that technical depth and engineering complexity correlate with both higher medians and wider pay ranges.
BA roles tend to be more structured and less specialized, so compensation is predictable and capped.
DA roles can scale upward when analytics blends into engineering or modeling, but the core market rate remains modest.
DE is one of the most in‑demand roles; salary reflects the complexity of building scalable data systems.
DS compensation varies widely depending on whether the role is analytics‑focused or ML‑engineering‑focused, but the ceiling is high.
Code
Data_Scientist1 <-select(Data_Practitioner, c("Data_Scientist", "State"))Data_Engineer1 <-select(Data_Practitioner, c("Data_Engineer", "State"))Data_Analyst1 <-select(Data_Practitioner, c("Data_Analyst", "State"))Business_Analyst1 <-select(Data_Practitioner, c("Business_Analyst", "State"))# Bar chart Data Scientistbc_DS <-ggplot(Data_Scientist1, aes(x =fct_reorder(Data_Scientist1$State, Data_Scientist1$Data_Scientist), y = Data_Scientist1$Data_Scientist, fill = State)) +geom_col() +coord_flip() +scale_y_continuous(labels =label_comma()) +theme_minimal() +theme(legend.position ="none",text =element_text(size =8),axis.title =element_text(size =12),plot.title =element_text(size =10) ) +labs(title ="US Average Data Scientist Salaries by State / Territory",x ="State or Territory",y ="Annual Average Salary" )bc_DS
Code
# Bar Chart Data Engineerbc_DE <-ggplot(Data_Engineer1, aes(x =fct_reorder(Data_Engineer1$State, Data_Engineer1$Data_Engineer), y = Data_Engineer1$Data_Engineer, fill = State)) +geom_col() +coord_flip() +scale_y_continuous(labels =label_comma()) +theme_minimal() +theme(legend.position ="none",text =element_text(size =8),axis.title =element_text(size =12),plot.title =element_text(size =10) ) +labs(title ="US Average Data Engineer Salaries by State / Territory",x ="State or Territory",y ="Annual Average Salary" )bc_DE
Code
# Bar Chart Data analystbc_DA <-ggplot(Data_Analyst1, aes(x =fct_reorder(Data_Analyst1$State, Data_Analyst1$Data_Analyst), y = Data_Analyst, fill = State)) +geom_col() +coord_flip() +scale_y_continuous(labels =label_comma()) +theme_minimal() +theme(legend.position ="none",text =element_text(size =8),axis.title =element_text(size =12),plot.title =element_text(size =10) ) +labs(title ="US Average Data Analyst Salaries by State / Territory",x ="State or Territory",y ="Annual Average Salary" )bc_DA
Code
# Bar Chart Business Analystbc_BA <-ggplot(Business_Analyst1, aes(x =fct_reorder(Business_Analyst1$State, Business_Analyst1$Business_Analyst), y = Business_Analyst, fill = State)) +geom_col() +coord_flip() +scale_y_continuous(labels =label_comma()) +theme_minimal() +theme(legend.position ="none",text =element_text(size =8),axis.title =element_text(size =12),plot.title =element_text(size =10) ) +labs(title ="US Average Business Analyst Salaries by State / Territory",x ="State or Territory",y ="Annual Average Salary" )bc_BA
Code
### Bar Chart for the average of the overall Data Practitioner Salaries by States.Data_Practitioner$Avg_Salary <-rowMeans( Data_Practitioner[, c("Data_Scientist", "Data_Engineer", "Data_Analyst", "Business_Analyst")],na.rm =TRUE)bc_state <-ggplot(Data_Practitioner, aes(x =fct_reorder(State, Avg_Salary), y = Avg_Salary, fill = State)) +geom_col() +coord_flip() +scale_y_continuous(labels =label_comma()) +theme_minimal() +theme(legend.position ="none",text =element_text(size =8),axis.title =element_text(size =12),plot.title =element_text(size =10) ) +labs(title ="US Average Data Practitioner Salaries by State / Territory",x ="State or Territory",y ="Average Salary Across Roles" )bc_state
Code
# Faceted bar chart of Top 20 States of all four roles in one plot (ranked by Data Scientist Role)top20_states <- Data_Practitioner |>arrange(desc(Data_Scientist)) |>slice(1:20) |>pull(State)dp_top20 <- Data_Practitioner |>filter(State %in% top20_states) |>pivot_longer(cols =c(Data_Scientist, Data_Engineer, Data_Analyst, Business_Analyst),names_to ="Role",values_to ="Salary" )ggplot(dp_top20, aes(x =fct_reorder(State, Salary), y = Salary, fill = State)) +geom_col() +coord_flip() +facet_wrap(~ Role, scales ="free_y") +scale_y_continuous(labels = scales::label_comma()) +theme_minimal() +theme(legend.position ="none") +labs(title ="Top 20 Highest-Paying States for Data Practitioner Roles",x ="State",y ="Average Salary" )
Interpretation:
Geography amplifies salary differences but does not change the ranking of roles. High‑cost states lift all roles upward; low‑cost states compress salaries downward.
Code
dp_long <- Data_Practitioner |>pivot_longer(cols =c(Data_Scientist, Data_Engineer, Data_Analyst, Business_Analyst),names_to ="Role",values_to ="Salary" )# Faceted bar chartggplot(dp_long, aes(x = Role, y = State, fill = Salary)) +geom_tile(color ="white") +scale_fill_viridis_c(option ="plasma") +theme_minimal() +labs(title ="Heatmap of Data Practitioner Salaries by State and Role",x ="Role",y ="State",fill ="Salary" )
Interpretation:
The heatmap confirms a stable national hierarchy of compensation across data roles, suggesting that role complexity and technical depth drive salary more than geography. Some states show uniformly higher salaries across all roles, indicated by lighter colors:
Likely high‑cost, high‑demand states (e.g., CA, NY, MA, WA).
These states show bright yellow for DS/DE and lighter greens for BA/DA.
Other states show uniformly lower salaries, indicated by darker purples:
Typically lower‑cost regions (e.g., southern or midwestern states).
Even Data Scientist salaries in these states fall into mid‑range colors.
Comment: The heatmap shows consistent salary stratification by role across all states—Data Scientists and Data Engineers earn the most, while Business Analysts and Data Analysts earn less, with noticeable geographic clusters where all roles command higher pay.
The heatmap quickly communicates relative differences across both roles and states.
Some states show greater spread between roles (large color contrast), while others show compressed ranges (similar tones across roles).
High‑tech hubs show wider spreads; DS/DE salaries spike upward more dramatically.
Smaller or lower‑demand states show narrow spreads, meaning the market differentiates less between roles.
Across nearly every state:
Data Scientist salaries are the highest band (yellow/bright tones).
Data Engineer salaries closely follow, often slightly lower but still in the upper range.
Data Analyst salaries fall into mid‑range colors (greens/purples).
Business Analyst salaries are consistently the lowest (darker purples).
Code
top20_states <- Data_Scientist1 %>%arrange(desc(Data_Scientist)) %>%slice(1:20) %>%pull(State)Data_Scientist1 <- Data_Scientist1 %>%mutate(Group =ifelse(State %in% top20_states, "Top 20", "Other States"))bc_DS <-ggplot(Data_Scientist1, aes(x =fct_reorder(State, Data_Scientist), y = Data_Scientist, fill = Group)) +geom_col() +coord_flip() +scale_fill_manual(values =c("Top 20"="#1f78b4", # blue"Other States"="#b2df8a")) +# greenscale_y_continuous(labels =label_comma()) +theme_minimal() +theme(legend.position ="none",text =element_text(size =8),axis.title =element_text(size =12),plot.title =element_text(size =10) ) +labs(title ="US Average Data Scientist Salaries by State / Territory",x ="State or Territory",y ="Annual Average Salary" )bc_DS
Source Code
---title: "DATA_608_Story_04"author: "Henock Montcho"format: html: code-fold: true code-tools: truedate: 2026-05-22editor: visualwarning: false---```{r}# Load librarieslibrary(tidytext)library(readxl)library(tidyverse)library(sf)library(tigris)library(viridis)library(usmap) library(ggplot2)library(forcats)library(scales)library(dplyr)setwd("C:/Users/month/Downloads")Data_Scientist <-read_excel("C:/Users/month/Downloads/Data_Scientist.xlsx")Data_Scientist$Salary <-as.numeric(gsub("[$,\\s]", "", Data_Scientist$Salary))head(Data_Scientist)Data_Engineer <-read_excel("C:/Users/month/Downloads/Data_Engineer.xlsx")Data_Engineer$Salary <-as.numeric(gsub("[$,\\s]", "", Data_Engineer$Salary))head(Data_Engineer)Data_Analyst <-read_excel("C:/Users/month/Downloads/Data_Analyst.xlsx")Data_Analyst$Salary <-as.numeric(gsub("[$,\\s]", "", Data_Analyst$Salary))head(Data_Analyst)Business_Analyst <-read_excel("C:/Users/month/Downloads/Business_Analyst.xlsx")Business_Analyst$Salary <-as.numeric(gsub("[$,\\s]", "", Business_Analyst$Salary))head(Business_Analyst)Data_Practitioner <- Data_Scientist |>left_join(Data_Engineer, by ="State") |>left_join(Data_Analyst, by ="State") |>left_join(Business_Analyst, by ="State") |>rename(Data_Scientist = Salary.x,Data_Engineer = Salary.y,Data_Analyst = Salary.x.x,Business_Analyst = Salary.y.y)# Convert to longData_Practitioner_Salaries <- Data_Practitioner |>pivot_longer(cols =c(Data_Scientist, Data_Engineer, Data_Analyst, Business_Analyst),names_to ="role",values_to ="salary") |>mutate(state =str_trim(as.character(State)),salary =as.numeric(salary) )head(Data_Practitioner_Salaries)# Roles Boxplot palette <-c("#9BB8AD", "#A39FA1", "#DEB3A0", "#FEC6AF")bp_role <-ggplot(Data_Practitioner_Salaries, aes(x=" ", y = salary, group = role)) +geom_boxplot(aes(fill = role)) +theme_minimal() +scale_y_continuous(labels =label_comma()) +facet_grid(. ~ role) +scale_fill_manual(values=palette) +theme(legend.position ="none") +theme(text =element_text(size=12), axis.title=element_text(size=12))bp_role```Comment:- **Overall and nationally, Data Engineers and Data Scientists earn meaningfully higher and more variable salaries than Business Analysts and Data Analysts.** The distribution shapes also show that technical depth and engineering complexity correlate with both higher medians and wider pay ranges. - BA roles tend to be more structured and less specialized, so compensation is predictable and capped. - DA roles can scale upward when analytics blends into engineering or modeling, but the core market rate remains modest. - DE is one of the most in‑demand roles; salary reflects the complexity of building scalable data systems. - DS compensation varies widely depending on whether the role is analytics‑focused or ML‑engineering‑focused, but the ceiling is high.```{r}Data_Scientist1 <-select(Data_Practitioner, c("Data_Scientist", "State"))Data_Engineer1 <-select(Data_Practitioner, c("Data_Engineer", "State"))Data_Analyst1 <-select(Data_Practitioner, c("Data_Analyst", "State"))Business_Analyst1 <-select(Data_Practitioner, c("Business_Analyst", "State"))# Bar chart Data Scientistbc_DS <-ggplot(Data_Scientist1, aes(x =fct_reorder(Data_Scientist1$State, Data_Scientist1$Data_Scientist), y = Data_Scientist1$Data_Scientist, fill = State)) +geom_col() +coord_flip() +scale_y_continuous(labels =label_comma()) +theme_minimal() +theme(legend.position ="none",text =element_text(size =8),axis.title =element_text(size =12),plot.title =element_text(size =10) ) +labs(title ="US Average Data Scientist Salaries by State / Territory",x ="State or Territory",y ="Annual Average Salary" )bc_DS# Bar Chart Data Engineerbc_DE <-ggplot(Data_Engineer1, aes(x =fct_reorder(Data_Engineer1$State, Data_Engineer1$Data_Engineer), y = Data_Engineer1$Data_Engineer, fill = State)) +geom_col() +coord_flip() +scale_y_continuous(labels =label_comma()) +theme_minimal() +theme(legend.position ="none",text =element_text(size =8),axis.title =element_text(size =12),plot.title =element_text(size =10) ) +labs(title ="US Average Data Engineer Salaries by State / Territory",x ="State or Territory",y ="Annual Average Salary" )bc_DE# Bar Chart Data analystbc_DA <-ggplot(Data_Analyst1, aes(x =fct_reorder(Data_Analyst1$State, Data_Analyst1$Data_Analyst), y = Data_Analyst, fill = State)) +geom_col() +coord_flip() +scale_y_continuous(labels =label_comma()) +theme_minimal() +theme(legend.position ="none",text =element_text(size =8),axis.title =element_text(size =12),plot.title =element_text(size =10) ) +labs(title ="US Average Data Analyst Salaries by State / Territory",x ="State or Territory",y ="Annual Average Salary" )bc_DA# Bar Chart Business Analystbc_BA <-ggplot(Business_Analyst1, aes(x =fct_reorder(Business_Analyst1$State, Business_Analyst1$Business_Analyst), y = Business_Analyst, fill = State)) +geom_col() +coord_flip() +scale_y_continuous(labels =label_comma()) +theme_minimal() +theme(legend.position ="none",text =element_text(size =8),axis.title =element_text(size =12),plot.title =element_text(size =10) ) +labs(title ="US Average Business Analyst Salaries by State / Territory",x ="State or Territory",y ="Annual Average Salary" )bc_BA``````{r}### Bar Chart for the average of the overall Data Practitioner Salaries by States.Data_Practitioner$Avg_Salary <-rowMeans( Data_Practitioner[, c("Data_Scientist", "Data_Engineer", "Data_Analyst", "Business_Analyst")],na.rm =TRUE)bc_state <-ggplot(Data_Practitioner, aes(x =fct_reorder(State, Avg_Salary), y = Avg_Salary, fill = State)) +geom_col() +coord_flip() +scale_y_continuous(labels =label_comma()) +theme_minimal() +theme(legend.position ="none",text =element_text(size =8),axis.title =element_text(size =12),plot.title =element_text(size =10) ) +labs(title ="US Average Data Practitioner Salaries by State / Territory",x ="State or Territory",y ="Average Salary Across Roles" )bc_state``````{r}# Faceted bar chart of Top 20 States of all four roles in one plot (ranked by Data Scientist Role)top20_states <- Data_Practitioner |>arrange(desc(Data_Scientist)) |>slice(1:20) |>pull(State)dp_top20 <- Data_Practitioner |>filter(State %in% top20_states) |>pivot_longer(cols =c(Data_Scientist, Data_Engineer, Data_Analyst, Business_Analyst),names_to ="Role",values_to ="Salary" )ggplot(dp_top20, aes(x =fct_reorder(State, Salary), y = Salary, fill = State)) +geom_col() +coord_flip() +facet_wrap(~ Role, scales ="free_y") +scale_y_continuous(labels = scales::label_comma()) +theme_minimal() +theme(legend.position ="none") +labs(title ="Top 20 Highest-Paying States for Data Practitioner Roles",x ="State",y ="Average Salary" )```- **Interpretation:**\ Geography amplifies salary differences but does not change the ranking of roles. High‑cost states lift all roles upward; low‑cost states compress salaries downward.```{r}dp_long <- Data_Practitioner |>pivot_longer(cols =c(Data_Scientist, Data_Engineer, Data_Analyst, Business_Analyst),names_to ="Role",values_to ="Salary" )# Faceted bar chartggplot(dp_long, aes(x = Role, y = State, fill = Salary)) +geom_tile(color ="white") +scale_fill_viridis_c(option ="plasma") +theme_minimal() +labs(title ="Heatmap of Data Practitioner Salaries by State and Role",x ="Role",y ="State",fill ="Salary" )```**Interpretation:**\The heatmap confirms a stable national hierarchy of compensation across data roles, suggesting that role complexity and technical depth drive salary more than geography. Some states show **uniformly higher salaries across all roles**, indicated by lighter colors:- Likely high‑cost, high‑demand states (e.g., CA, NY, MA, WA).- These states show bright yellow for DS/DE and lighter greens for BA/DA.Other states show **uniformly lower salaries**, indicated by darker purples:- Typically lower‑cost regions (e.g., southern or midwestern states).- Even Data Scientist salaries in these states fall into mid‑range colors.Comment: The heatmap shows **consistent salary stratification by role** across all states—**Data Scientists and Data Engineers earn the most**, while **Business Analysts and Data Analysts earn less**, with noticeable geographic clusters where all roles command higher pay.The heatmap quickly communicates **relative differences** across both roles and states.- Some states show **greater spread** between roles (large color contrast), while others show **compressed ranges** (similar tones across roles). - High‑tech hubs show **wider spreads**; DS/DE salaries spike upward more dramatically. - Smaller or lower‑demand states show **narrow spreads**, meaning the market differentiates less between roles.Across nearly every state:- **Data Scientist** salaries are the highest band (yellow/bright tones).- **Data Engineer** salaries closely follow, often slightly lower but still in the upper range.- **Data Analyst** salaries fall into mid‑range colors (greens/purples).- **Business Analyst** salaries are consistently the lowest (darker purples).```{r}top20_states <- Data_Scientist1 %>%arrange(desc(Data_Scientist)) %>%slice(1:20) %>%pull(State)Data_Scientist1 <- Data_Scientist1 %>%mutate(Group =ifelse(State %in% top20_states, "Top 20", "Other States"))bc_DS <-ggplot(Data_Scientist1, aes(x =fct_reorder(State, Data_Scientist), y = Data_Scientist, fill = Group)) +geom_col() +coord_flip() +scale_fill_manual(values =c("Top 20"="#1f78b4", # blue"Other States"="#b2df8a")) +# greenscale_y_continuous(labels =label_comma()) +theme_minimal() +theme(legend.position ="none",text =element_text(size =8),axis.title =element_text(size =12),plot.title =element_text(size =10) ) +labs(title ="US Average Data Scientist Salaries by State / Territory",x ="State or Territory",y ="Annual Average Salary" )bc_DS```