Story 4: How much do we get paid?

Instructions I have introduced the term “Data Practitioner” as a generic job descriptor because we have so many different job role titles for individuals whose work activities overlap including Data Scientist, Data Engineer, Data Analyst, Business Analyst, Data Architect, etc.

For this story we will answer the question, “How much do we get paid?” Your analysis and data visualizations must address the variation in average salary based on role descriptor and state.

Notes:

Introduction

Imagine you are finishing a data science degree and looking at job postings: Data Scientist, Data Engineer, BI Analyst, Operations Research Analyst, Statistician…
All of them sound “data-heavy,” but they do not all pay the same, and pay is very different in New York than in Arkansas.

In this story I treat all of these as data practitioner roles and ask a simple question:

How much can a data practitioner expect to be paid, and how does pay vary by occupation and by state?

To answer this, I use official salary data from the U.S. Bureau of Labor Statistics Occupational Employment and Wage Statistics (OEWS), May 2023 edition.
From the full OEWS file I build a focused dataset in R; the next section explains how I restrict the data to state-level records and data-related occupations.

Data Source and Preprocessing

The original BLS OEWS file (all_data_M_2023.xlsx) is large, with over 400,000 rows covering every occupation, industry, and geography in the United States. For this story, I narrow it down to the pieces that matter for data practitioners:

  1. State level only – I keep AREA_TYPE = 2, which corresponds to states.
  2. All industries combinedNAICS = "000000" and I_GROUP = "cross-industry" give a cross-industry view, so salaries are not tied to a single sector.
  3. Data-related occupations – I select 13 SOC codes that represent analytical and data-intensive roles.

The code below implements these filters and constructs a smaller working dataset with only the variables used in the analysis.

oews_raw <- read_excel("all_data_M_2023.xlsx", guess_max = 200000) %>% 
  clean_names()

# 2. Filter to state-level, all industries
oews_state_allind <- oews_raw %>%
  filter(
    area_type == 2,          # 2 = State
    naics == "000000",       # all industries
    i_group == "cross-industry"
  )

# 3. Pick the data occupations you care about
data_soc_codes <- c(
  "13-1111",  # Management Analysts
  "13-2031",  # Budget Analysts
  "13-2041",  # Credit Analysts
  "13-2051",  # Financial and Investment Analysts
  "13-2054",  # Financial Risk Specialists
  "15-1211",  # Computer Systems Analysts
  "15-1221",  # Computer and Information Research Scientists
  "15-1242",  # Database Administrators
  "15-1243",  # Database Architects
  "15-2031",  # Operations Research Analysts
  "15-2041",  # Statisticians
  "15-2099",  # Mathematical Science Occupations, All Other
  "15-2051"   # Data Scientists
)

oews_small <- oews_raw %>%
  filter(
    area_type == 2,               # states only
    naics == "000000",            # all industries
    i_group == "cross-industry",
    occ_code %in% data_soc_codes  # your roles
  ) %>%
  transmute(
    state              = area_title,
    state_abbr         = prim_state,
    soc_code           = occ_code,
    bls_title          = occ_title,
    total_emp          = tot_emp,
    annual_mean_wage   = a_mean,
    annual_median_wage = a_median
  ) %>%
  drop_na(annual_mean_wage)

write_csv(oews_small, "data_practitioner_salaries_2023.csv")

Load and clean the data

I start from the trimmed OEWS file data_practitioner_salaries_2023.csv on GitHub. Each row is a combination of state and occupation, with annual mean and median salary stored as text in the raw file.

In this step I:

  • load the trimmed dataset

  • standardize the state and role labels

  • convert employment and salary fields from character to numeric for analysis.

# load trimmed dataset
salary_raw <- read.csv("https://raw.githubusercontent.com/JaydeeJan/Data-608-Story-4/refs/heads/main/data_practitioner_salaries_2023.csv") %>%
  clean_names()

# clean and prepare
salary <- salary_raw %>%
  mutate(
    state = str_trim(state),
    state_abbr = str_trim(state_abbr),
    role = str_trim(bls_title),
    
    # convert character fields that represent numbers into numeric
    total_emp = parse_number(total_emp),
    mean_salary = parse_number(annual_mean_wage),
    median_salary = parse_number(annual_median_wage)
  ) %>%
  select(
    state, state_abbr, role, soc_code, total_emp, mean_salary, median_salary
  ) %>%
  drop_na(mean_salary)

n_states <- n_distinct(salary$state)
n_roles <- n_distinct(salary$role)

The cleaned dataset contains 51 states and 13 distinct data-practitioner occupations. Salaries are stored as annual amounts in U.S. dollars.

National Salary Differences By Occupation

Here I summarize the average salary across all states for each data-practitioner occupation. This shows how clearly some roles sit higher on the pay ladder than others.

Role Average salary (USD) Median of state means (USD)
Mathematical Science Occupations, All Other 75363 72175
Credit Analysts 84791 83790
Budget Analysts 85597 84380
Operations Research Analysts 91483 90540
Statisticians 97084 95510
Database Administrators 98563 99430
Financial and Investment Analysts 99780 98070
Financial Risk Specialists 101576 100250
Management Analysts 102659 98615
Computer Systems Analysts 104255 101740
Data Scientists 110043 107000
Database Architects 127109 125650
Computer and Information Research Scientists 133538 131240

This chart shows the national average salary for each of the 13 data-practitioner occupations.

At the upper end, roles such as Computer and Information Research Scientists and Database Architects earn around $130,000 per year on average, forming the top of the pay ladder. At the lower end, occupations such as Mathematical Science Occupations, All Other and Credit Analysts are closer to $75,000–$85,000.

The difference between the lowest- and highest-paid occupations in this set is about $60,000 per year. For a data-science student, this means that occupation choice alone can shift expected salary by more than $50,000 per year, even before considering which state you live in.

Which States Pay Data Practitioners The Most and The Least?

This chart ranks states by the average salary across all 13 data-practitioner occupations, highlighting the ten highest-paying (green) and ten lowest-paying (orange) states.

At the top of the ladder sit states like New York, California, and the District of Columbia, where the average data-practitioner salary is around $115,000–$125,000 per year. At the bottom are states such as Mississippi, Louisiana, Arkansas, and Kentucky, where average salaries are closer to $75,000–$85,000.

The gap between the typical top-10 state and the typical bottom-10 state is on the order of $35,000–$45,000 per year. For a data-science student, this means that simply starting your career in a high-paying state can translate into tens of thousands of extra dollars annually, even before you choose a specific job title.

How Much Does Location Matter Within Each Data Role?

ggplot(salary,
  aes(
    x = reorder(role, mean_salary, FUN = median),  # order by median salary
    y = mean_salary
  )
) +
  geom_boxplot(fill = "#88CCEE") +
  coord_flip() +
  scale_y_continuous(labels = dollar_format()) +
  labs(
    title    = "State-level pay spreads are substantial for most data occupations",
    subtitle = "Each box shows the distribution of state mean salaries for that occupation (2023)",
    x        = NULL,
    y        = "Annual mean salary by state (USD)",
    caption  = "Source: BLS OEWS 2023 State Occupational Employment and Wage Estimates"
  ) +
  theme_story4()

This figure shows how much salaries vary from state to state within the same occupation. Each box summarizes the distribution of state-level mean salaries for one role. For many jobs, the distance from the lower whisker to the upper whisker is on the order of $40,000–$60,000. For example, a Data Scientist or Database Architect position in a high-paying state can pay tens of thousands of dollars more per year than the same role in a low-paying state.

For a data-science student, the key takeaway is that location matters almost as much as job title: even after you choose a role, your choice of state can substantially raise or lower your expected pay.

Together with the boxplots, the table above shows that some occupations have very wide state-level pay ranges. Using the minimum and maximum state mean salary for each role, the estimated spreads range from roughly $57,000 for Management Analysts to more than $110,000 for Computer and Information Research Scientists. In other words, for some advanced research roles, the same job title can pay over $100,000 more per year depending on which state you work in.

Which States Pay Data Practitioners The Most?

state_summary <- salary %>%
  group_by(state) %>%
  summarise(
    avg_state_salary = mean(mean_salary, na.rm = TRUE),
    .groups = "drop"
    ) %>%
  arrange(desc(avg_state_salary))

top_states <- head(state_summary$state, 3)
bottom_states <- tail(state_summary$state, 3)

top_states
## [1] "New York"             "California"           "District of Columbia"
bottom_states
## [1] "Arkansas"    "Louisiana"   "Mississippi"

Across all 13 data occupations, the three highest-paying locations are New York, California, and the District of Columbia, while the three lowest-paying are Arkansas, Louisiana, and Mississippi. In the high-pay states, almost every data role shifts upward by tens of thousands of dollars compared with the low-pay states. For example, a typical data role in New York or California can easily sit $40,000–$60,000 above the same occupation in Arkansas or Mississippi.

For a data-science student, this means that occupation and location together determine your pay: moving from a lower-paying state to a higher-paying state can matter as much as moving from an entry-level analyst role into a more advanced research or architecture position.

Which States Sit At The Top and Bottom Of The Data-salary Ladder?

focus_states <- c(top_states, bottom_states)

salary_focus <- salary |>
  filter(state %in% focus_states) |>
  mutate(mean_salary_k = mean_salary / 1000)   # salary in thousands

ggplot(salary_focus,
       aes(x = mean_salary_k,
           y = role)) +
  geom_col(fill = "#4477AA") +
  facet_wrap(~ state, nrow = 2) +
  scale_x_continuous(
    breaks = c(60, 100, 140),                  # adjust if your range is different
    labels = function(x) paste0("$", x, "k"),  # 60 -> $60k
    expand = expansion(mult = c(0, 0.05))
  ) +
  labs(
    title = "High-pay states boost salaries for every data occupation",
    subtitle = "Top 3 and bottom 3 states by average data-practitioner salary, 2023",
    x = "Annual mean salary (thousand USD)",
    y = NULL,
    caption = "Source: BLS OEWS 2023 State Occupational Employment and Wage Estimates"
  ) +
  theme_story4() +
  theme(
    strip.text = element_text(face = "bold"),
    axis.text.y = element_text(size = 8),
    axis.text.x = element_text(size = 8)
  )

This panel chart compares the three highest-paying locations for data practitioners (New York, California, and the District of Columbia) with the three lowest-paying (Arkansas, Louisiana, and Mississippi). Each bar is the average 2023 salary for a given data occupation in that state.

Across all panels, the bars in New York, California, and D.C. are consistently shifted to the right: most data roles there fall roughly in the $100k–$140k range. In the lower-paying states, many of the same occupations cluster closer to $70k–$95k.

For a data-science student, the key message is that state choice systematically lifts or compresses salaries across almost every data job. Choosing a high-pay state such as New York or California can easily add $30,000–$40,000 per year compared with starting in Arkansas, Louisiana, or Mississippi, even when the job title is identical.

Conclusion: What This Means For Future Data Practitioners

Looking across the 2023 BLS OEWS data, two patterns stand out clearly. First, job title matters a lot: nationally, average salaries for data-practitioner roles range from roughly $75,000 for more general mathematical science roles up to around $130,000 for advanced research positions such as Computer and Information Research Scientists and Database Architects. Choosing a more technical, research-or architecture-oriented role can add on the order of $60,000 per year compared with the lowest-paid occupations in this set.

Second, where you work is almost as important as what you do. When we average across all 13 occupations, high-pay states such as New York, California, and the District of Columbia typically pay around $115,000–$125,000 per year, while lower-pay states such as Mississippi, Louisiana, and Arkansas are closer to $75,000–$85,000. For many occupations the gap between the lowest- and highest-paying states exceeds $80,000–$100,000 from one end of the distribution to the other. The facet charts show that in high-pay states nearly every role shifts upward together, and in low-pay states nearly every role shifts downward together.

For a data-science student planning a career, the message is straightforward:

  • Occupation choice can move your expected salary by tens of thousands of dollars per year. Roles that involve research, architecture, or systems-level responsibility tend to sit at the top of the pay ladder.

  • Location choice can add another large adjustment on top of that, especially when comparing the highest- and lowest-paying states

  • Together, role + state determine most of the variation we see in pay: a data scientist or database architect in a high-pay state can plausibly earn $40,000–$60,000 more than the same occupation in a low-pay state.

These results do not say where you should live—cost of living, personal preferences, and visa or family constraints all matter—but they provide a quantitative baseline. As you evaluate offers, it is worth asking not only “What is my title?” but also “How does this salary compare to other states for the same role?” The OEWS data show that informed choices about both role and location can significantly change your long-term earning potential as a future data practitioner.