I have introduced the term “Data Practitioner” as a generic job descriptor because we have so many different job role titles for individuals whose work activities overlap including Data Scientist, Data Engineer, Data Analyst, Business Analyst, Data Architect, etc.
For this story we will answer the question, “How much do we get paid?” Your analysis and data visualizations must address the variation in average salary based on role descriptor and state.
Notes:
1. You will need to identify reliable sources for
salary data and assemble the data sets that you will need.
2. Your
visualization(s) must show the most salient information (variation in
average salary by role and by state).
3. For this Story you must use
a code library and code that you have written in R, Python or Java
Script (additional coding in
other languages is allowed).
4.
Post generation enhancements to you generated visualization will be
allowed (e.g. Addition of kickers and labels).
The modern data profession has evolved into a diverse ecosystem of roles — including Data Scientists, Data Engineers, Data Analysts, Business Analysts, and Data Architects. While these titles share overlapping skill sets, they differ in focus, responsibility, and compensation. This project explores the question: “How much do we get paid?” — analyzing salary variation across job roles and states within the United States.
The data for this analysis was collected from ZipRecruiter.com, one of the most reliable public sources for job market compensation data. Salary information was compiled for all 50 states and five data-related job titles, creating a dataset suitable for comparing both role-based and geographical salary patterns.
This story is designed with clarity and purpose in mind. The visualizations were chosen to emphasize accuracy and interpretability:
Box plots reveal the distribution of salaries across job titles and states, providing insight into variability and outliers (Fidelity & Simplicity).
A choropleth map illustrates the average annual salary by state, highlighting regional differences at a glance (Utility & Saliency).
Overall, this dashboard presents a cohesive and truthful representation of salary patterns in the U.S. data profession. It demonstrates how both job specialization and location significantly influence pay, offering a valuable perspective for anyone interested in understanding compensation dynamics within the data workforce.
The central issue explored in this story is the uneven
distribution of pay within the U.S. data profession.
Despite similar skill overlaps, compensation often varies dramatically
by role and geography.
This raises important questions for data practitioners:
- Is pay determined more by what we do or where we work?
- How can we visualize these disparities clearly and
fairly?
This analysis investigates those questions using structured salary data
and visual encoding principles.
I utilized Ziprecruiter.com as my source of data for the salaries of Data Scientist, Data Engineer, Data Analyst, Business Analyst, and Data Architect. It should be noted that the table wasn’t present for Data Architect, So I had to look of each state individually. It was all compiled into a google sheet which was exported as .CSV file.
# URL for Job Salaries
jobs <- c("Data-Scientist", "Data-Engineer", "Data-Analyst", "Business-Analyst", "Data-Architect")
url_link <- 'https://www.ziprecruiter.com/Salaries/What-Is-the-Average-%s-Salary-by-State'
for (job in jobs) {
url <- sprintf(url_link, job)
print(url)
}[1] "https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Data-Scientist-Salary-by-State"
[1] "https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Data-Engineer-Salary-by-State"
[1] "https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Data-Analyst-Salary-by-State"
[1] "https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Business-Analyst-Salary-by-State"
[1] "https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Data-Architect-Salary-by-State"
# read and load Data from CSV
df <- read.csv("D:/Cuny_sps/DATA_608/Story-4/Job_State_Salary.csv")
str(df)'data.frame': 250 obs. of 6 variables:
$ State : chr "New York" "Vermont" "California" "Maine" ...
$ Annual.Salary: chr "136,172.00" "133,828.00" "131,441.00" "127,644.00" ...
$ Monthly.Pay : chr "11,347.00" "11,152.00" "10,953.00" "10,637.00" ...
$ Weekly.Pay : chr "2,618.00" "2,573.00" "2,527.00" "2,454.00" ...
$ Hourly.Wage : num 65.5 64.3 63.2 61.4 60.7 ...
$ Job : chr "Data Scientist" "Data Scientist" "Data Scientist" "Data Scientist" ...
# Converting String to Numeric
df$`Annual.Salary` <- as.numeric(gsub(",", "", df$`Annual.Salary`))
df$`Monthly.Pay` <- as.numeric(gsub(",", "", df$`Monthly.Pay`))
df$`Weekly.Pay` <- as.numeric(gsub(",", "", df$`Weekly.Pay`))
head(df) State Annual.Salary Monthly.Pay Weekly.Pay Hourly.Wage Job
1 New York 136172 11347 2618 65.47 Data Scientist
2 Vermont 133828 11152 2573 64.34 Data Scientist
3 California 131441 10953 2527 63.19 Data Scientist
4 Maine 127644 10637 2454 61.37 Data Scientist
5 Idaho 126275 10522 2428 60.71 Data Scientist
6 Washington 125289 10440 2409 60.24 Data Scientist
# Copy of df
df2 <- df
# Average Salary by Job
Avg_Job <- df2 %>%
group_by(Job) %>%
summarize(Avg_Annual_Salary = mean(`Annual.Salary`))
# Average Annual Salary By State
Avg_State <- aggregate(`Annual.Salary` ~ State, data = df2, FUN = mean)
colnames(Avg_State) <- c("State", "Avg_Annual_Salary")
# State Abbreviation
data("state")
Avg_State$Abbreviation <- state.abb[match(Avg_State$State, state.name)]
Avg_State <- Avg_State[order(-Avg_State$Avg_Annual_Salary), ]
print(Avg_Job)# A tibble: 5 × 2
Job Avg_Annual_Salary
<chr> <dbl>
1 Business Analyst 90439.
2 Data Analyst 77605.
3 Data Architect 138570.
4 Data Engineer 121282.
5 Data Scientist 112832.
State Avg_Annual_Salary Abbreviation
47 Washington 126682.0 WA
32 New York 126019.2 NY
21 Massachusetts 122342.2 MA
2 Alaska 121887.2 AK
37 Oregon 120781.2 OR
34 North Dakota 120589.2 ND
45 Vermont 119791.2 VT
11 Hawaii 118263.0 HI
6 Colorado 116922.2 CO
5 California 116585.0 CA
38 Pennsylvania 115432.8 PA
28 Nevada 115248.6 NV
30 New Jersey 114326.4 NJ
41 South Dakota 113969.8 SD
19 Maine 113532.2 ME
46 Virginia 113152.2 VA
49 Wisconsin 112992.4 WI
29 New Hampshire 112388.6 NH
8 Delaware 111996.4 DE
20 Maryland 111035.8 MD
50 Wyoming 110364.0 WY
39 Rhode Island 109948.0 RI
12 Idaho 109211.4 ID
23 Minnesota 108872.0 MN
27 Nebraska 108819.0 NE
14 Indiana 108727.2 IN
31 New Mexico 108540.8 NM
13 Illinois 107957.2 IL
3 Arizona 106479.2 AZ
36 Oklahoma 105382.6 OK
26 Montana 104911.8 MT
35 Ohio 104852.0 OH
15 Iowa 103908.2 IA
24 Mississippi 103650.6 MS
1 Alabama 103565.4 AL
40 South Carolina 102972.0 SC
7 Connecticut 102900.4 CT
25 Missouri 101600.4 MO
43 Texas 101191.4 TX
42 Tennessee 101107.8 TN
33 North Carolina 100846.4 NC
44 Utah 99910.8 UT
16 Kansas 97589.2 KS
22 Michigan 96869.0 MI
10 Georgia 96479.2 GA
18 Louisiana 95055.2 LA
17 Kentucky 94945.4 KY
4 Arkansas 92122.4 AR
48 West Virginia 89149.8 WV
9 Florida 85413.0 FL
Data Visualization:
This dashboard uses: - Position and range
(boxplots) to show salary variation with precision.
- Color hue and saturation (choropleth map) to depict
geographic differences intuitively.
These encoding choices balance interpretability with visual appeal,
ensuring that differences in pay are immediately visible and
comparable.
These visuals collectively highlight how both specialization and geography shape the earning potential of data professionals in the U.S.
To showcase salary, there are 3 different graphics. There is one for Annual Salary Distribution by Job Description as a box plot, Annual Salary Distribution by State as a box plot, and Average Salary by State as a heat map of the United States.
Box Plot: Average Salary
by Data Role and state
# Box plot by Job
job_box <- plot_ly(df, x = ~Job, y = ~`Annual.Salary`, type = 'box',
marker = list(color = 'rgb(110, 164, 214)')) %>%
layout(
title = 'Annual Salary Distribution by Data Role',
xaxis = list(title = 'Data Job Title'),
yaxis = list(title = 'Annual Salary (USD)'),
annotations = list(
list(x = 0.5, y = 0.95, xref = 'paper', yref = 'paper',
text = "Box = Interquartile Range (IQR), Line = Median Salary",
showarrow = FALSE, font = list(size = 10, color = 'gray')))
)
# Box plot by State
state_box <- plot_ly(df, x = ~State, y = ~`Annual.Salary`, type = 'box',
marker = list(color = 'rgb(110, 164, 214)')) %>%
layout(title = 'Annual Salary Distribution by State',
xaxis = list(tickfont = list(size = 12), tickangle = -45),
yaxis = list(title = 'Average Annual Salary($)'))
# Plots
job_boxThis boxplot compares the salary distribution across different
data-related job titles.
It reveals that Data Scientists and Machine Learning
Engineers tend to have higher median salaries with wider variation,
indicating opportunities for growth and specialization.
Meanwhile, roles such as Data Analyst or Business Intelligence
Analyst show lower medians and tighter spreads, reflecting more
standardized pay scales.
Overall, this chart highlights how technical depth and modeling
expertise translate into higher compensation within the data
profession.
This boxplot displays the distribution of average annual salaries
across all U.S. states for data professionals.
The wide spread in salary values highlights regional disparities
— states like California, Washington, and New York generally
offer higher median salaries, while states in the South and
Midwest show lower averages.
This visualization emphasizes how geographical location strongly
influences pay levels, reflecting differences in cost of living, local
demand, and the concentration of technology jobs.
From the visualization, we can observe that Data Architects and
Data Scientists tend to command the highest average salaries, with
median values exceeding $120,000 in many regions. In contrast,
Business Analysts and Data Analysts have lower median
earnings, reflecting their comparatively broader entry paths and varying
technical requirements.
This comparison reveals how role specialization and technical
depth drive earning potential within the data profession.
Choropleth Map of
Average Salary by State
Together, the job-based boxplot
and the state-based boxplot reveal two dimensions of salary variation —
one professional, one geographic.
To connect these insights, the next visualization translates the same
salary data onto a geospatial scale, revealing how
economic opportunity for data practitioners clusters regionally across
the U.S.
The following map provides a geographic perspective on data professional salaries, visually showing where pay levels are highest and lowest across the United States.
# Choropleth Map for Avg Salary by State
state_map <- plot_ly(
Avg_State,
z = ~Avg_Annual_Salary,
locations = ~Abbreviation,
locationmode = 'USA-states',
type = 'choropleth',
colorscale = 'Viridis',
zmin = min(Avg_State$Avg_Annual_Salary),
zmax = max(Avg_State$Avg_Annual_Salary),
text = ~paste('State:', State, '<br>Avg Annual Salary:', round(Avg_Annual_Salary, 2))) %>%
layout(
title = 'Average Annual Salary in the US by State',
geo = list(
scope = 'usa',
projection = list(type = 'albers usa'),
showlakes = TRUE,
lakecolor = 'rgb(255, 255, 255)'),
annotations = list(
list(
x = 0.00,
y = -0.05,
xref = "paper",
yref = "paper",
text = "Arkansas, West Virginia, and Florida have the Lowest Average Salary.",
showarrow = FALSE,
font = list(size = 12)),
list(
x = 0.00, # X-coordinate of the note
y = 0.00, # Y-coordinate of the note
xref = "paper",
yref = "paper",
text = "Washington, New York and Massachusetts have the Highest Average Salary.",
showarrow = FALSE))
)
# Plot
state_mapThis choropleth map displays how the average salary for data professionals varies by state across the United States. Each state is color-coded based on its average compensation — darker shades represent higher salaries.
The visualization highlights a clear regional disparity in
pay. States such as Washington, New York, California, and
Massachusetts show the highest average salaries, often above
$125,000, reflecting the concentration of tech hubs and large
data-driven enterprises.
Meanwhile, southern and midwestern states generally offer lower
salaries, correlating with smaller tech markets and differing
cost-of-living levels.
Overall, this map emphasizes the strong geographic influence on salary levels, showing that location is nearly as impactful as job title in determining a data practitioner’s earning potential.
When viewed alongside the boxplot of job roles, the state-level
salary map completes the overall picture of compensation trends in the
U.S. data industry.
The analysis reveals two dominant forces shaping data professionals’
earnings: job specialization and geographic location.
States with major technology hubs such as California, Washington, New
York, and Massachusetts consistently offer higher pay across
all roles, while salaries are more moderate in regions with smaller or
emerging data markets.
This demonstrates that the highest earning potential occurs when technical expertise (for example, Data Architect or Data Scientist roles) aligns with high-demand regions. The visualization makes it clear that both skill level and location play equally critical roles in determining how much a data practitioner gets paid in today’s job market.
Heatmap: Average Salary by
Role and State (Top 10 States)
# Compute average salary by State and Job
heat_df <- df2 %>%
group_by(State, Job) %>%
summarise(Avg_Salary = mean(`Annual.Salary`), .groups = "drop")
# Select Top 10 states by overall average salary
top_states <- heat_df %>%
group_by(State) %>%
summarise(Overall = mean(Avg_Salary)) %>%
top_n(10, Overall) %>%
pull(State)
heat_df_top <- heat_df %>% filter(State %in% top_states)
# Plot
heat_plot <- ggplot(heat_df_top, aes(x = Job, y = reorder(State, -Avg_Salary), fill = Avg_Salary)) +
geom_tile(color = "white", linewidth = 0.5) +
scale_fill_viridis_c(option = "plasma", direction = -1) +
labs(
title = "Average Salary by Role and State (Top 10 States)",
x = "Job Title",
y = "State",
fill = "Avg Salary ($)"
) +
theme_minimal(base_size = 14) + # increased font
theme(
axis.text.x = element_text(angle = 30, hjust = 1, size = 12, face = "bold"),
axis.text.y = element_text(size = 12, face = "bold"),
plot.title = element_text(face = "bold", size = 16),
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 11)
)
ggplotly(heat_plot, height = 600, width = 900)This heatmap adds a multidimensional view, connecting roles and
states simultaneously.
It clearly shows that technical roles (Data Scientist, Data
Architect) maintain higher salaries across most states,
while regional differences remain consistent with the
map and boxplots.
Including this heatmap demonstrates how tabular and spatial perspectives
align — reinforcing the data story.
How Much Do We Get Paid?
Depends on Title and Location
When comparing the different Job titles out of Data Scientist, Data Engineer, Data Analyst, Business Analyst, and Data Architect we can see a difference. As the title becomes more “specialized” in terminology it seems that the salaries do increase substantially. As job title does typically entail more responsibilities and specialization we can see that each jump in title is about a 10% increase in salary on average. With Data Analyst being $77,605 annually and Data Architect being $138,570 annually on the top end. Since recently there has been title inflation within the data world, it would seem to be that typically when more specialized and have more knowledge, there is a correlation of getting a higher annual income compared to title.
The other impact on salary is geographical location within the United States. As seen on the Map showcasing average salary by State. We can see that Washington, New York, and Massachusetts have the highest salaries on average while Arkansas, West Virginia, and Florida have the lowest salaries. When doing a comparison side by side, Washington has a staggering 45% higher average salary compared to Florida at $126,000 per a Year. However it should be noted that this is very generalized since the cost of living is very different even within a state level, let alone between state to state. There is also the idea of where the jobs are actually located since areas with lower supply of jobs and higher demand can have lower salaries than if the scenarios were reversed. But overall, if we have a more specialized title and work in a place such as Washington and New York, there is a high chance we will have a higher average salary.
In summary, data professionals earn significantly more when they hold specialized roles and work in high-demand regions such as Washington and New York. Overall, the analysis shows that both role specialization and geographic clustering drive pay inequality among U.S. data professionals. The combination of boxplots, heatmaps, and maps provides a layered understanding of this issue — moving from distribution to geography to intersection.From a visual analytics perspective, this dashboard demonstrates how effective encoding choices and connected visuals can transform raw salary data into actionable insight.
Resources
https://www.ziprecruiter.com/
https://www.businessinsider.com/how-title-inflation-hurt-employees-careers-companies-morale-2022-12