This project explores the relationship between the growth of AI as an economic sector and shifts in the software development labor market. As AI technologies such as large language models, generative tools, and enterprise AI platforms have become mainstream, many have been asking (and debating): is AI creating new software jobs, displacing them, or simply reshaping what they require?

Focus Question 1: How has AI market revenue growth affected the volume of software development job postings between 2020 and 2025? Focus Question 2: Is the share of software job postings that mention AI skills growing proportionally with AI market expansion? Focus Question 3: Did the post-ChatGPT period (2022–2023) produce a measurable disruption in software development hiring patterns?

To answer these questions, we use two publicly available datasets: Statista’s AI and software market revenue/user data, and Indeed Hiring Lab’s job posting indices. Together, they allow us to compare macroeconomic AI growth against real-world employer behavior.

Understanding this relationship matters for students, policymakers, and educators who are trying to anticipate how AI integration into the workforce will affect career pathways.

Ethical Questions and Limitations

Several important limitations apply to this analysis. The Indeed data is filtered to the United States only, while Statista represents global markets — creating a geographic mismatch. The Indeed platform also skews toward certain industries and company sizes, underrepresenting companies that recruit through LinkedIn, internal portals, or specialized platforms. We narrowed our focus to “Software Development,” which excludes adjacent AI-heavy roles like ML engineering and AI research, potentially underestimating AI’s total labor market impact.

Additionally, the Statista data from 2025 onward represents projected figures rather than observed ones, and should be interpreted with appropriate caution. Finally, our dataset cannot speak to who is benefiting from AI economic growth, as we have no demographic data to assess whether this transition is equitable or concentrated among already-privileged groups.


Source: Statista — a leading global provider of market and consumer statistics, drawing from over 22,500 sources including Gartner, IDC, and industry financial reports.

What Was Measured: This dataset captures three annual metrics at the global level: total AI market revenue (in billions USD), the number of people and organizations actively using AI tools (in millions), and the total size of the broader software industry (in billions USD). Each of these was recorded annually from 2020 through 2030, with figures from 2020–2024 representing historical observations and 2025–2030 representing forward-looking projections from major market research firms.

Sample Description: The data is aggregated at the global level with no individual-level observations — each row represents a single year. Because Statista compiles its figures from published financial reports, industry surveys, and enterprise licensing data, the historical values are broadly verifiable. However, the projection years depend on growth models whose assumptions are not fully disclosed, which introduces meaningful uncertainty. For our purposes, this sample allows us to track AI’s financial footprint over time and situate it within the larger software economy, though it cannot tell us about regional variation or which types of AI products are driving growth.

Benefit / Shortcoming: The primary benefit is the breadth of sources Statista synthesizes, giving a more comprehensive picture than any single report. The main shortcoming is limited methodological transparency, especially for projections, which may reflect optimistic industry assumptions.

Source: Indeed Hiring Lab — the economic research arm of Indeed, one of the world’s largest job platforms, with over 250 million monthly unique visitors.

What Was Measured: Two files were used. The first tracks a daily index of software development job postings in the United States, baselined to February 1, 2020 = 100. The second tracks the daily percentage of all U.S. job postings that mention “AI” or related terms in the title or description. Both series span approximately 2019–2024 with daily granularity, yielding over 2,000 observations per file.

Sample Description: The data captures actual employer-posted job listings, meaning it reflects genuine hiring demand rather than survey-based estimates. Recordings were made daily by Indeed’s platform systems, which index postings automatically. Geographic scope is limited to the United States for this analysis. The software sector index is computed for the “Software Development” sector specifically, using Indeed’s internal sector classification system.

Benefit / Shortcoming: The primary benefit is the daily resolution and large sample size, which allows detection of short-term labor market shifts (e.g., post-ChatGPT disruptions). The key shortcoming is platform bias — Indeed does not capture all job postings, and skews toward companies that actively advertise publicly, potentially underrepresenting enterprise and internal hiring.

How the Datasets Are Combined

The primary join key is Year. Statista data is already annual; Indeed’s daily data is aggregated to annual averages using group_by(Year) and summarize(). A left join on Year preserves all Statista rows (including projected years 2025–2030), with NA values appearing for Indeed columns in future years where no data yet exists.

Before joining, the Statista datasets required a pivot_longer() transformation from wide format (years as columns) to long format (one row per year). Date strings in the Indeed data were converted to proper Date objects using as.Date(), and the year was extracted with lubridate::year().

Potential Challenges: The geographic mismatch (U.S. labor data vs. global market data) is the most significant limitation of the merged dataset. Additionally, the COVID-era baseline (Feb 2020) for the software index amplifies the appearance of recovery in 2021–2022.

Variable Description Unit Source
Year Calendar year Numeric (2020–2030) Both
AI_Market_Billions Total AI market revenue Billions USD Statista
AI_Users_Millions Number of AI tool users Millions Statista
Software_Total_Billions Total software industry revenue Billions USD Statista
avg_software_index Avg software job posting index Index (Feb 2020 = 100) Indeed
avg_ai_share Avg % of jobs mentioning AI Percentage (0–100) Indeed
avg_ai_index AI share normalized to base 100 Index (Jan 2019 = 100) Indeed (calc.)
observations Daily data points per year Count Indeed (calc.)

How has AI market revenue growth affected the volume of software development job postings between 2020 and 2025?

As the global AI market expanded from approximately $16.87 billion in 2020 toward a projected $46.99 billion in 2025, we examine whether this economic growth corresponds to increased hiring demand for software developers, or whether AI’s productivity gains are allowing companies to accomplish more with fewer hires. We compare AI_Market_Billions against avg_software_index year-over-year.

# Build indexed comparison dataset for Q1
viz_data <- final_combined %>%
  filter(!is.na(avg_software_index)) %>%
  select(Year, AI_Market_Billions, avg_software_index) %>%
  mutate(
    AI_Market_Index = (AI_Market_Billions / first(AI_Market_Billions)) * 100,
    Software_Jobs_Index = (avg_software_index / first(avg_software_index)) * 100
  ) %>%
  pivot_longer(
    cols = c(AI_Market_Index, Software_Jobs_Index),
    names_to = "Metric",
    values_to = "Index_Value"
  ) %>%
  mutate(
    Metric = recode(Metric,
      "AI_Market_Index"     = "AI Market Revenue",
      "Software_Jobs_Index" = "Software Dev Job Postings"
    )
  )

# We index both to 2020 = 100 so they can be compared on the same axis
# despite having very different raw scales (billions vs. a job-posting index)
ggplot(viz_data, aes(x = Year, y = Index_Value, color = Metric)) +
  geom_line(linewidth = 1.4) +
  geom_point(size = 3) +
  annotate("rect", xmin = 2022, xmax = 2023, ymin = -Inf, ymax = Inf,
           alpha = 0.08, fill = "gray30") +
  annotate("text", x = 2022.5, y = 260, label = "Post-ChatGPT\ndivergence",
           size = 3, color = "gray40", hjust = 0.5) +
  scale_color_manual(values = c(
    "AI Market Revenue"          = "hotpink",
    "Software Dev Job Postings"  = "hotpink4"
  )) +
  scale_y_continuous(labels = comma) +
  scale_x_continuous(breaks = 2020:2025) +
  labs(
    title = "AI Market Growth vs. Software Development Job Postings (2020–2025)",
    subtitle = "Both metrics indexed to 2020 = 100 | Shaded area = post-ChatGPT period",
    x = "Year",
    y = "Index (2020 = 100)",
    color = "",
    caption = "Sources: Statista (AI market revenue) | Indeed Hiring Lab (software dev job postings, U.S.)"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(color = "gray50", size = 11),
    legend.position = "bottom",
    panel.grid.minor = element_blank()
  )

While AI market revenue grew approximately 279% relative to its 2020 baseline by 2025, software development job postings fell below their 2020 baseline by 2024–2025. This is a striking divergence: as the AI economy expanded rapidly, employer demand for software developers did not rise in tandem and in fact declined. The shaded region marks the period immediately following ChatGPT’s launch in November 2022, where the two trends begin to move in opposite directions.

This visualization uses mutate() to create growth indices (dividing each year’s value by the 2020 baseline and multiplying by 100), and pivot_longer() to reshape both metrics into a single column for plotting with ggplot2. Indexing is necessary here because the raw values — billions of dollars vs. an arbitrary job-posting index — cannot be compared on the same axis without normalization.

# Summarize the key numbers from Q1
viz_data %>%
  pivot_wider(names_from = Metric, values_from = Index_Value) %>%
  rename(
    `AI Market Index` = `AI Market Revenue`,
    `Software Jobs Index` = `Software Dev Job Postings`
  ) %>%
  kable(digits = 1, caption = "Indexed Values by Year (2020 = 100)")
Indexed Values by Year (2020 = 100)
Year AI_Market_Billions avg_software_index AI Market Index Software Jobs Index
2020 16.9 78.9 100.0 100.0
2021 36.1 148.4 213.9 188.1
2022 23.6 195.7 140.0 248.0
2023 25.6 90.5 151.7 114.6
2024 34.9 69.6 206.9 88.2
2025 47.0 64.9 278.5 82.2

Is the share of software development job postings that mention AI skills growing proportionally with AI market expansion?

Using avg_ai_share from Indeed alongside AI_Market_Billions and AI_Users_Millions from Statista, we examine whether AI is gradually integrating into existing software developer roles or whether AI market growth is happening largely independent of changes in what those roles require. A proportional relationship would suggest AI is reshaping the job itself; a decoupled relationship would suggest economic growth is concentrated elsewhere (e.g., in infrastructure or enterprise licensing, not front-line developer roles).

# Prepare data: all three metrics indexed to their first observed value = 100
# This lets us compare growth rates regardless of original scale
growth_long <- growth_indexed %>%
  filter(!is.na(Software_Jobs_Index)) %>%
  select(Year, AI_Market_Index, AI_Users_Index, AI_JobShare_Index) %>%
  pivot_longer(
    cols = -Year,
    names_to = "Metric",
    values_to = "Index_Value"
  ) %>%
  mutate(
    Metric = case_when(
      Metric == "AI_Market_Index"    ~ "AI Market Revenue",
      Metric == "AI_Users_Index"     ~ "AI Tool Users",
      Metric == "AI_JobShare_Index"  ~ "AI Share of Job Postings",
      TRUE ~ Metric
    )
  )

ggplot(growth_long, aes(x = Year, y = Index_Value, color = Metric)) +
  geom_line(linewidth = 1.3) +
  geom_point(size = 2.8) +
  scale_color_manual(values = c(
    "AI Market Revenue"         = "hotpink",
    "AI Tool Users"             = "hotpink3",
    "AI Share of Job Postings"  = "deeppink4"
  )) +
  scale_y_continuous(labels = comma) +
  scale_x_continuous(breaks = 2020:2024) +
  labs(
    title = "AI Market Growth vs. AI Skill Demand in Job Postings (2020–2024)",
    subtitle = "All metrics indexed to 2020 = 100",
    x = "Year",
    y = "Growth Index (2020 = 100)",
    color = "",
    caption = "Sources: Statista | Indeed Hiring Lab (U.S.)"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "bottom",
    panel.grid.minor = element_blank()
  )

The visualization reveals that AI’s share of job postings is growing, but at a substantially slower rate than either AI market revenue or AI tool user adoption. By 2024, AI market revenue had grown over 250% relative to its 2020 baseline, while the AI share of job postings had grown by a much more modest amount. This decoupling suggests that AI’s economic expansion is not being fully reflected in formal skill requirements listed in job postings — at least not yet.

One interpretation is that companies are integrating AI tools into existing workflows without explicitly advertising for them as skills, meaning actual AI usage on the job may be outpacing what job descriptions signal. Another interpretation is that the economic growth is concentrated in AI infrastructure, cloud services, or enterprise licensing — sectors that don’t directly translate to “AI skills required” in a software developer job posting. Either way, the proportionality hypothesis does not hold: AI market growth and AI job-skill demand are not moving together.

# Show the raw AI share values and their YoY change to make the finding concrete
yoy_growth %>%
  filter(!is.na(avg_ai_share)) %>%
  select(Year, avg_ai_share, AI_Share_YoY, AI_Market_YoY) %>%
  rename(
    `AI Share of Jobs (%)` = avg_ai_share,
    `AI Share YoY Growth (%)` = AI_Share_YoY,
    `AI Market YoY Growth (%)` = AI_Market_YoY
  ) %>%
  kable(digits = 2, caption = "AI Share of Job Postings vs. AI Market Growth (YoY %)")
AI Share of Job Postings vs. AI Market Growth (YoY %)
Year AI Share of Jobs (%) AI Share YoY Growth (%) AI Market YoY Growth (%)
2020 1.72 NA NA
2021 2.36 37.21 113.93
2022 2.83 19.71 -34.58
2023 1.84 -35.01 8.43
2024 2.21 20.08 36.33
2025 3.09 40.00 34.64

Did the post-ChatGPT period (2022–2023) produce a measurable disruption in software development hiring patterns?

The software job posting index dropped sharply in 2023 (from approximately 195.7 to 90.5) — precisely when AI market interest was surging post-ChatGPT. We investigate whether this decline reflects AI tools reducing demand for software developers, or whether it is better explained by broader macroeconomic factors: the 2022–2023 tech layoffs, rising interest rates, and post-pandemic correction. The Growth_Phase categorical variable and year-over-year growth rates in our wrangled dataset are central to answering this question.

# Create phase-level summary for grouped bar chart
# filter() keeps only years where we have actual Indeed data (not projected)
# summarize() collapses to one row per phase, calculating phase averages
phase_summary <- yoy_growth %>%
  filter(!is.na(Software_Jobs_Index)) %>%
  group_by(Growth_Phase) %>%
  summarize(
    Avg_AI_Market_Index    = mean(AI_Market_Index, na.rm = TRUE),
    Avg_Software_Jobs_Index = mean(Software_Jobs_Index, na.rm = TRUE),
    Avg_AI_Share           = mean(avg_ai_share, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  pivot_longer(
    cols = -Growth_Phase,
    names_to = "Metric",
    values_to = "Value"
  ) %>%
  mutate(
    Metric = recode(Metric,
      "Avg_AI_Market_Index"     = "AI Market Index (2020=100)",
      "Avg_Software_Jobs_Index" = "Software Jobs Index (2020=100)",
      "Avg_AI_Share"            = "AI Share of Jobs (%)"
    )
  )

ggplot(phase_summary, aes(x = Growth_Phase, y = Value, fill = Metric)) +
  geom_col(position = "dodge", width = 0.65) +
  scale_fill_manual(values = c(
    "AI Market Index (2020=100)"     = "hotpink",
    "Software Jobs Index (2020=100)" = "hotpink3",
    "AI Share of Jobs (%)"           = "deeppink4"
  )) +
  labs(
    title = "AI Growth Phase Comparison: Market vs. Labor Market Response",
    subtitle = "Average values within each AI growth phase",
    x = "Growth Phase",
    y = "Average Value",
    fill = "",
    caption = "Sources: Statista | Indeed Hiring Lab (U.S.)"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", size = 14),
    legend.position = "bottom",
    axis.text.x = element_text(size = 10),
    panel.grid.minor = element_blank()
  )

The grouped bar chart reveals a striking pattern: during the Rapid Expansion phase (2022–2024), AI market growth accelerated sharply while software job postings fell back toward — and then below — the 2020 baseline. This is the clearest evidence of disruption in the data. However, attributing this entirely to AI would be an overreach. The 2022–2023 period also saw the Federal Reserve’s most aggressive interest rate hiking cycle in decades, widespread tech layoffs at major firms (Amazon, Google, Meta, Microsoft), and a broader post-pandemic correction in tech valuations.

The most defensible interpretation is that the job posting decline is multi-causal: macroeconomic headwinds were the primary driver of the 2023 collapse, but AI-driven productivity gains may have contributed by allowing companies to sustain output with fewer new hires during the recovery. The AI share of job postings did not spike dramatically in 2023, which argues against a simple “AI replaced developers” narrative. Rather, the data suggests a more complex story where both economic cycles and AI adoption are reshaping the labor market simultaneously.

This section uses case_when() to create the Growth_Phase categorical variable — this is our intermediate tool (Conditional Transformation). Without this, we could not group years into meaningful analytical phases, and the phase-level bar chart comparison would not be possible.

yoy_growth %>%
  filter(!is.na(avg_software_index)) %>%
  group_by(Growth_Phase) %>%
  summarize(
    `Years` = paste(Year, collapse = ", "),
    `Avg AI Market ($B)` = mean(AI_Market_Billions, na.rm = TRUE),
    `Avg Software Jobs Index` = mean(avg_software_index, na.rm = TRUE),
    `Avg AI Share (%)` = mean(avg_ai_share, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  kable(digits = 2, caption = "Summary Statistics by Growth Phase")
Summary Statistics by Growth Phase
Growth_Phase Years Avg AI Market ($B) Avg Software Jobs Index Avg AI Share (%)
Early Stage (2020-2021) 2020, 2021 26.48 113.67 2.04
Rapid Expansion (2022-2024) 2022, 2023, 2024 28.04 118.58 2.29
Projected Maturity (2025+) 2025 46.99 64.91 3.09

This section documents the key data transformations performed across the analysis. All tools used must either directly help answer our research questions, check assumptions, or produce outputs that improve our conclusions.

Step 1 — Pivot Wider to Longer (pivot_longer): The Statista datasets arrived in wide format, with years as column headers. We used pivot_longer() to convert each into a tidy long format with one row per year, which is necessary for joining on a Year key and for ggplot2 plotting.

Step 2 — Date Conversion (as.Date, lubridate::year): Indeed’s dateString column was a character field. We used as.Date() to convert it to a proper Date object and then lubridate::year() to extract the year for annual aggregation. This is our intermediate tool: Dates. Without converting to Date format, we could not correctly extract year, calculate date-based gaps, or align the daily series with Statista’s annual data.

Step 3 — Conditional Transformation (case_when): We used case_when() to classify each year into one of three growth phases (Early Stage, Rapid Expansion, Projected Maturity). This is our intermediate tool: Conditional Transformation. An alternative would be manually filtering to each year range, but case_when() is far more readable and maintainable, and allows us to treat the phase as a factor variable for grouped analysis.

Step 4 — summarize and mutate for YoY rates: We computed year-over-year percentage changes using mutate() with lag(), and aggregated daily data to annual averages using group_by() + summarize(). These are basic tools from the required list.

Step 5 — filter and arrange: Used throughout to restrict data to the United States and Software Development sector (Indeed), and to ensure chronological ordering before applying lag().

Step 6 — left_join and inner_join: Joining the Statista and Indeed datasets on Year — left join to preserve future projection years from Statista with NA for Indeed columns.

lubridate::year() This function extracts the four-digit year from a Date object as a numeric integer. It solves the problem of needing to group daily data by year before joining with Statista’s annual data. We could not have achieved this cleanly with base R’s format() alone, as that returns a string rather than a numeric type compatible with the Statista Year column. We learned to use it from the lubridate package documentation and the course’s date handling reference materials.

lag() inside mutate() The lag() function returns the previous row’s value within a vector, making it ideal for computing year-over-year differences. Without lag(), we would have had to join the dataset to a shifted version of itself — a far more complex operation. We learned this from the dplyr documentation and R for Data Science (Hadley Wickham, Chapter 5).

geom_smooth(method = "loess") Used in the appendix daily visualization, LOESS (locally estimated scatterplot smoothing) fits a flexible, non-parametric smooth curve to noisy time series data. It is more appropriate than a linear trend line here because the relationship between date and job postings is clearly non-linear. We could not produce this with tools covered in class and learned it from the ggplot2 documentation.

# Show the final combined dataset used for all analyses
kable(
  final_combined %>%
    filter(!is.na(avg_software_index)) %>%
    select(Year, AI_Market_Billions, AI_Users_Millions, Software_Total_Billions,
           avg_software_index, avg_ai_share, observations),
  digits = 2,
  caption = "Final Combined Dataset (Years with Both Statista + Indeed Data)",
  col.names = c("Year", "AI Market ($B)", "AI Users (M)", "Software Market ($B)",
                "Avg SW Jobs Index", "Avg AI Share (%)", "Daily Observations")
)
Final Combined Dataset (Years with Both Statista + Indeed Data)
Year AI Market (\(B)| AI Users (M)| Software Market (\)B) Avg SW Jobs Index Avg AI Share (%) Daily Observations
2020 16.87 48.13 270.86 78.92 1.72 335
2021 36.09 59.72 286.85 148.42 2.36 365
2022 23.61 75.07 313.56 195.70 2.83 365
2023 25.60 84.10 338.22 90.46 1.84 365
2024 34.90 104.84 363.39 69.59 2.21 366
2025 46.99 129.08 379.29 64.91 3.09 334
kable(
  yoy_growth %>%
    filter(!is.na(Software_Jobs_Index)) %>%
    select(Year, Growth_Phase, AI_Market_YoY, Software_Jobs_YoY, AI_Share_YoY),
  digits = 1,
  caption = "Year-over-Year Growth Rates (%) by Phase",
  col.names = c("Year", "Growth Phase", "AI Market YoY %", "SW Jobs YoY %", "AI Share YoY %")
)
Year-over-Year Growth Rates (%) by Phase
Year Growth Phase AI Market YoY % SW Jobs YoY % AI Share YoY %
2020 Early Stage (2020-2021) NA NA NA
2021 Early Stage (2020-2021) 113.9 88.1 37.2
2022 Rapid Expansion (2022-2024) -34.6 31.9 19.7
2023 Rapid Expansion (2022-2024) 8.4 -53.8 -35.0
2024 Rapid Expansion (2022-2024) 36.3 -23.1 20.1
2025 Projected Maturity (2025+) 34.6 -6.7 40.0

Takeaway 1: AI economic growth and software developer hiring have decoupled sharply since 2022.

Our most striking finding is the divergence between AI market revenue (up ~279% from 2020 to 2025) and software development job postings (which fell below their 2020 baseline by 2024). This directly answers Big Picture Question 1: AI market growth has not translated into proportional hiring demand for software developers. Rather than expanding the workforce, AI’s economic gains appear to be concentrated in ways that do not require proportional increases in developer headcount — likely through productivity gains, infrastructure spending, and enterprise licensing that doesn’t generate front-line dev jobs at scale.

Takeaway 2: AI skill requirements in job postings are growing, but far more slowly than the AI market itself.

AI’s share of software job postings did increase across our study period, but not in proportion to the dramatic revenue and user growth in the AI market. This answers Big Picture Question 2: AI market expansion and AI skill demand are not growing proportionally. One likely explanation is that many companies are adopting AI tools internally without explicitly listing them as required skills in job postings. Another is that much of the economic value creation is in AI infrastructure and enterprise software sales, neither of which necessarily requires new developer hires who list “AI skills” as their primary qualification.

Takeaway 3: The 2022–2023 software hiring downturn is multi-causal and cannot be attributed to AI alone.

The sharpest drop in software job postings occurred in 2023, coinciding with ChatGPT’s mainstream breakthrough, but also with one of the most severe tech-sector contractions in recent memory. This addresses Big Picture Question 3: the disruption is real, but disentangling AI’s contribution from macroeconomic headwinds (interest rates, post-ZIRP tech corrections, mass layoffs) is not possible with this dataset alone. Our data is consistent with both a “macro-driven contraction” narrative and an “AI-boosted productivity requiring fewer hires” narrative.

Is our analysis over- or underestimating AI’s labor market impact?

We believe our analysis likely underestimates AI’s impact on the job market. The Indeed data captures only jobs explicitly mentioning AI in titles or descriptions, missing the many roles where AI tools are used daily without being listed as a formal skill requirement. Additionally, filtering to the “Software Development” sector excludes adjacent roles (s.a. ML engineers, AI researchers, data scientists, prompt engineers) where AI’s impact on hiring is even more pronounced. The geographic restriction to the United States also misses global labor market dynamics.

Future Directions

The most valuable extension of this project would be incorporating salary data, which would allow us to test whether AI-adjacent skills command premium compensation even as total job volume declines. We would also benefit from demographic breakdowns to assess whether AI’s labor market transformation is equitably distributed. Finally, expanding the sector scope beyond “Software Development” to include data science, AI research, product management, and operations roles would give a fuller picture of how AI is reshaping the knowledge economy broadly. Longitudinal survey data on actual worker experiences (rather than job posting counts) would complement our posting-based analysis meaningfully.

# Reproduce the key visualization from Q1 as a summary
growth_long_all <- growth_indexed %>%
  filter(!is.na(Software_Jobs_Index)) %>%
  select(Year, AI_Market_Index, Software_Market_Index,
         Software_Jobs_Index, AI_JobShare_Index) %>%
  pivot_longer(cols = -Year, names_to = "Metric", values_to = "Index_Value") %>%
  mutate(
    Metric = case_when(
      Metric == "AI_Market_Index"      ~ "AI Market Revenue",
      Metric == "Software_Market_Index" ~ "Total Software Market",
      Metric == "Software_Jobs_Index"   ~ "Software Job Postings",
      Metric == "AI_JobShare_Index"     ~ "AI Share of Jobs",
      TRUE ~ Metric
    )
  )

ggplot(growth_long_all, aes(x = Year, y = Index_Value, color = Metric)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2.5) +
  scale_color_manual(values = c(
    "AI Market Revenue"     = "lightpink",
    "Total Software Market" = "hotpink",
    "Software Job Postings" = "hotpink3",
    "AI Share of Jobs"      = "deeppink4"
  )) +
  scale_y_continuous(labels = comma) +
  scale_x_continuous(breaks = 2020:2024) +
  labs(
    title = "All Metrics Compared: 2020–2024",
    subtitle = "Indexed to 2020 = 100",
    x = "Year", y = "Growth Index (2020 = 100)", color = "",
    caption = "Sources: Statista | Indeed Hiring Lab (U.S.)"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom", panel.grid.minor = element_blank())
# Snapshot of final year values for all key metrics
growth_indexed %>%
  filter(!is.na(avg_software_index)) %>%
  select(Year, AI_Market_Index, AI_Users_Index,
         Software_Market_Index, Software_Jobs_Index, AI_JobShare_Index) %>%
  rename(
    `AI Market` = AI_Market_Index,
    `AI Users` = AI_Users_Index,
    `SW Market` = Software_Market_Index,
    `SW Jobs` = Software_Jobs_Index,
    `AI Job Share` = AI_JobShare_Index
  ) %>%
  kable(digits = 1, caption = "All Growth Indices by Year (2020 = 100)")
All Growth Indices by Year (2020 = 100)
Year AI Market AI Users SW Market SW Jobs AI Job Share
2020 100.0 100.0 100.0 100.0 100.0
2021 213.9 124.1 105.9 188.1 137.2
2022 140.0 156.0 115.8 248.0 164.3
2023 151.7 174.7 124.9 114.6 106.7
2024 206.9 217.8 134.2 88.2 128.2
2025 278.5 268.2 140.0 82.2 179.5

Topic: Analyzing the relationship between AI market growth and software development job market trends.

This group is investigating how the growth of AI technologies is reshaping the software development job market. Specifically, we examine whether AI’s economic expansion, measured by market revenue and user adoption, corresponds to changes in hiring demand and skill requirements in the software industry. Our analysis draws on market data from Statista and labor market data from Indeed Hiring Lab to explore this question empirically.

Group Section: BG-5 | Teaching Assistant: Lexeigh Kolakowski

Lauren Hughes

Applied Math | University of Washington | lhughes@uw.edu

Lauren is an Applied Mathematics major. She chose this topic due to interest in how actual market trends compare with what is portrayed in the news, as well as the mathematical modeling behind labor market indices & growth curves. Lauren enjoys applying various quantitative methods to economic questions (+ seeing how others work with data) & is pursuing a career in either Applied Mathematics/tech or Medicine, both of which emphasize the importance of accurate, readily conveyable information.

Jonah Calague Informatics | University of Washington | jcalague@uw.edu

Jonah is interested in seeing how the growth of AI is transforming the software industry and influencing the job market. His informatics background gives him a strong foundation in understanding how technology, data, and human behavior relate.

Oliver Boctor University of Washington

Oliver is a member of the project team contributing to the analysis of AI market trends and their relationship to software development employment patterns.

Tools from Class Used in This Project
  • filter, select, mutate, group_by, summarize, arrange — core dplyr verbs for data wrangling
  • pivot_longer — reshaping Statista wide-format data to long format
  • inner_join, left_join — combining Statista and Indeed datasets
  • ggplot2 with geom_line, geom_point, geom_col, geom_area, geom_smooth — visualization
  • knitr::kable — producing formatted summary tables
  • scales::comma — human-readable number formatting on axis labels
Additional Resources Consulted
  • ggplot2 Book (https://ggplot2-book.org/themes.html) — for custom color palettes and theme adjustments
  • Indeed Hiring Lab methodology (https://www.hiringlab.org/about/) — for understanding index baseline conventions
  • lubridate documentation — for year() function usage
  • R for Data Science (Hadley Wickham) — for lag() within mutate() patterns
sessionInfo()
R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.7.3

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] knitr_1.51      scales_1.4.0    lubridate_1.9.5 forcats_1.0.1  
 [5] stringr_1.6.0   dplyr_1.1.4     purrr_1.2.1     readr_2.1.6    
 [9] tidyr_1.3.2     tibble_3.3.1    ggplot2_4.0.1   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] bit_4.6.0          gtable_0.3.6       jsonlite_2.0.0     crayon_1.5.3      
 [5] compiler_4.5.2     tidyselect_1.2.1   parallel_4.5.2     yaml_2.3.12       
 [9] fastmap_1.2.0      R6_2.6.1           labeling_0.4.3     generics_0.1.4    
[13] htmlwidgets_1.6.4  pillar_1.11.1      RColorBrewer_1.1-3 tzdb_0.5.0        
[17] rlang_1.1.7        stringi_1.8.7      xfun_0.56          S7_0.2.1          
[21] bit64_4.6.0-1      otel_0.2.0         timechange_0.4.0   cli_3.6.5         
[25] withr_3.0.2        magrittr_2.0.4     digest_0.6.39      grid_4.5.2        
[29] vroom_1.7.0        rstudioapi_0.18.0  hms_1.1.4          lifecycle_1.0.5   
[33] vctrs_0.7.1        evaluate_1.0.5     glue_1.8.0         farver_2.1.2      
[37] rmarkdown_2.30     tools_4.5.2        pkgconfig_2.0.3    htmltools_0.5.9