Background and Problem Definition

I have chose the AI Job Market dataset to exlore becuase AI is on the uprising but there are a lot of different ideas on what is going to happen to AI and if it is going to take over everyones jobs. With AI being on the rise it has allowed to a high demand in jobs that require programming skills, machine learning and data engineering. The data shows over 2,000 AI related job listings and details about each job containing the companies name, industry, company size, job information, experience level, employment type(Full/ Part Time), location, requirements, salary information and posting date.

Classic AI meme

Data Wranglng, Munging and Cleaning

Importing and cleaning up the AI Job Market Dataset

I want to show relationships between jobs that are here in arizona and the salaries so I have all of the data cleaning below. This is just a head of the AI Job Postings.

#  getting the min and max salary
df_num <- df %>% 
  separate(salary_range_usd, into = c("min_salary", "max_salary"),
           sep = "-", remove = FALSE) %>%
  mutate(
    min_salary = as.numeric(min_salary),
    max_salary = as.numeric(max_salary)
  )

#get the valid states 
valid_states <- state.abb

# extracting the states 
df_states <- df %>%
  mutate(state = str_extract(location, "(?<=, )[:upper:]{2}$")) %>%
  filter(!is.na(state), state %in% valid_states)

#Filter top states for bar plot
state_counts <- df_states %>%
  count(state, sort = TRUE) %>%
  slice_max(n, n = 10) %>%
  as.data.frame()

head(df)

##   job_id             company_name   industry                job_title
## 1      1          Foster and Sons Healthcare             Data Analyst
## 2      2  Boyd, Myers and Ramirez       Tech Computer Vision Engineer
## 3      3                 King Inc       Tech         Quant Researcher
## 4      4 Cooper, Archer and Lynch       Tech       AI Product Manager
## 5      5                 Hall LLC    Finance           Data Scientist
## 6      6                Ellis PLC E-commerce       AI Product Manager
##                                                      skills_required
## 1 NumPy, Reinforcement Learning, PyTorch, Scikit-learn, GCP, FastAPI
## 2                                    Scikit-learn, CUDA, SQL, Pandas
## 3                          MLflow, FastAPI, Azure, PyTorch, SQL, GCP
## 4                       Scikit-learn, C++, Pandas, LangChain, AWS, R
## 5                                    Excel, Keras, SQL, Hugging Face
## 6                                   GCP, Excel, Scikit-learn, MLflow
##   experience_level employment_type              location salary_range_usd
## 1              Mid       Full-time         Tracybury, AR     92860-109598
## 2           Senior       Full-time        Lake Scott, CU     78523-144875
## 3            Entry       Full-time        East Paige, CM    124496-217204
## 4              Mid       Full-time         Perezview, FI     50908-123743
## 5           Senior        Contract North Desireeland, NE     98694-135413
## 6           Senior          Remote       South Kevin, TZ     92632-180718
##   posted_date company_size                 tools_preferred
## 1  2025-08-20        Large                 KDB+, LangChain
## 2  2024-03-22        Large       FastAPI, KDB+, TensorFlow
## 3  2025-09-18        Large BigQuery, PyTorch, Scikit-learn
## 4  2024-05-08        Large    TensorFlow, BigQuery, MLflow
## 5  2025-02-24        Large              PyTorch, LangChain
## 6  2025-08-07        Large    PyTorch, TensorFlow, FastAPI

These are the jobs that are availble here in Arizona

az <- df[grepl(", AZ", df$location), ]
az

##      job_id                company_name   industry                job_title
## 81       81                 Johnson Inc  Education            AI Researcher
## 561     561               Dixon-Sanchez       Tech       AI Product Manager
## 597     597               Erickson-Hill    Finance              ML Engineer
## 1035   1035                Peterson Ltd E-commerce         Quant Researcher
## 1206   1206              Richards-Adams    Finance         Quant Researcher
## 1272   1272              Johnson-Peters       Tech             NLP Engineer
## 1401   1401 Grant, Rosario and Williams     Retail            AI Researcher
## 1420   1420                Cook-Francis Automotive         Quant Researcher
## 1569   1569       Walls, Young and Cook E-commerce           Data Scientist
## 1682   1682 Schmitt, James and Campbell E-commerce Computer Vision Engineer
##                                                      skills_required
## 81                Reinforcement Learning, Python, Pandas, TensorFlow
## 561                               Excel, Azure, Pandas, PyTorch, SQL
## 597           LangChain, Reinforcement Learning, CUDA, R, SQL, Excel
## 1035 Reinforcement Learning, Python, Pandas, Excel, LangChain, Azure
## 1206                                      FastAPI, C++, NumPy, Flask
## 1272                                LangChain, GCP, NumPy, Python, R
## 1401                              Keras, AWS, SQL, CUDA, Python, GCP
## 1420                                        SQL, GCP, Excel, CUDA, R
## 1569       Reinforcement Learning, Keras, MLflow, Flask, Pandas, AWS
## 1682                        FastAPI, SQL, Pandas, TensorFlow, MLflow
##      experience_level employment_type               location salary_range_usd
## 81                Mid      Internship       Lake Kristen, AZ    111184-166588
## 561               Mid          Remote         Adamsshire, AZ     95822-165764
## 597            Senior      Internship          Aaronview, AZ      48914-60214
## 1035              Mid        Contract       Staffordstad, AZ    147838-240312
## 1206            Entry        Contract Lake Kathleenville, AZ     67422-166675
## 1272            Entry        Contract       North Daniel, AZ     74842-147593
## 1401              Mid          Remote     Lake Ryanville, AZ    127351-220767
## 1420            Entry          Remote        Bridgesberg, AZ     94512-159116
## 1569           Senior        Contract         New Nicole, AZ     55477-100381
## 1682              Mid        Contract        Port Joanne, AZ     74357-107271
##      posted_date company_size                tools_preferred
## 81    2025-08-03        Large  Scikit-learn, MLflow, PyTorch
## 561   2025-05-21      Startup      PyTorch, TensorFlow, KDB+
## 597   2024-10-13          Mid                           KDB+
## 1035  2025-04-30        Large PyTorch, LangChain, TensorFlow
## 1206  2024-08-17        Large              MLflow, LangChain
## 1272  2025-04-18      Startup                      LangChain
## 1401  2024-12-28        Large    TensorFlow, MLflow, PyTorch
## 1420  2024-10-06      Startup                 BigQuery, KDB+
## 1569  2024-02-12      Startup                         MLflow
## 1682  2024-09-15        Large    KDB+, TensorFlow, LangChain

Data Visualization

Histogram of Maximum Salary

The histogram shows the distribution of maximum salaries from all the AI job postings. The blue density curve shows the shape of the distribution where we can see how the data is visualized.

## List of 8
##  $ x         : num [1:512] 29532 30007 30482 30957 31432 ...
##  $ y         : num [1:512] 8.40e-10 1.02e-09 1.23e-09 1.49e-09 1.79e-09 ...
##  $ bw        : num 8210
##  $ n         : int 2000
##  $ old.coords: logi FALSE
##  $ call      : language density.default(x = df_num$max_salary, kernel = "gaussian")
##  $ data.name : chr "df_num$max_salary"
##  $ has.na    : logi FALSE
##  - attr(*, "class")= chr "density"

## Top 10 Skills Required I wanted to go ahead and show what the top 10 skills are that are needed for these AI jobs. In order to get this I had to seperate the skills_required column becuause each job contains its own skills that are required and then I had to count the frequences for each skill.

skills_list <- df %>%
  separate_rows(skills_required, sep = ",\\s*")

top_skills <- skills_list %>% 
  count(skills_required, sort = TRUE) %>% 
  slice_head(n = 10)

top_skills

## # A tibble: 10 × 2
##    skills_required            n
##    <chr>                  <int>
##  1 TensorFlow               452
##  2 Excel                    432
##  3 Pandas                   427
##  4 FastAPI                  419
##  5 NumPy                    416
##  6 Reinforcement Learning   414
##  7 Azure                    413
##  8 Hugging Face             408
##  9 SQL                      408
## 10 Keras                    406

Scatter Plot : Relationship between Min and Max Salaries

The scatter plot shows a postiive relationship between min and max salaries from all the AI Job listings BASED on their industry which shows us what industires offer either higher or lower salaries.

df_num %>% 
  ggplot(aes(x = min_salary, y = max_salary, color = industry)) + 
  geom_point(alpha = 0.7, size = 2.5) + 
  geom_smooth(method = "lm", se = FALSE, color = "black", lwd = 1.2) + 
  labs(
    title = "Relationship Between Minimun and Maximum Salary (By Indistry)",
    x = "Minimum Salary (USD)",
    y = "Maximum Salary (USD)",
    color = "Industry"
  ) + 
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

What states have the most job postings?

I wanted to show the top US States with the highest number of AI related jobs. What I did was first extract all of the states by abbreviation from the location column and then filtered out states that were not abbreviations. After that I just counted all of the jobs for that specific state and then was able to use a bar plot. We can see that the top state with the most AI jobs is South Carolina and the state with the least amount of Job postings is South Dakota. We can also see AZ on the list with only about 9 postings.

state_counts <- df_states %>%
  count(state, sort = TRUE)


ggplot(state_counts, aes(x = reorder(state, n), y = n)) +
  geom_col(fill = "pink", color = "black", width= 0.7) +
  coord_flip() +
  labs(
    title = "Top U.S. States for AI Job Postings",
    x = "State",
    y = "Number of Job Postings"
  ) +
  theme_minimal()

Conclusion

Using our data manipulation we were able to see what states had the more AI jobs and which skills are the most required. We were also able to see the relationship between the salary of AI Jobs and what type of industry they were in.

Project 1

Heather Macias

2025-11-10