#1. PREPARE

The rapid adoption of artificial intelligence (AI) across industries is transforming the global job market, reshaping traditional roles, and driving demand for new skills. Using the “AI-Powered Job Market Insights” dataset, retrieved from Kaggle, I aim to explore how AI integration influences various aspects of employment, including job roles, salary trends, and workforce dynamics. This dataset offers a unique opportunity to analyze patterns in AI adoption across different industries and its implications for job growth, company size, and skill requirements.

Through a detailed exploration, I seek to answer critical questions:

How does AI adoption vary by industry? What is its impact on salary and job roles? How do company size and AI implementation intersect? What is the relationship between AI adoption levels and projected job growth across different industries? What skills are becoming essential in an AI-driven workforce? Which roles are most at risk of automation?

This research provides valuable insights into the evolving nature of work, helping businesses, policymakers, and individuals navigate the challenges and opportunities of an AI-powered economy.

Dataset Features: Job_Title: Description: The title of the job role. Type: Categorical Example Values: “Data Scientist”, “Software Engineer”, “HR Manager”

Industry: Description: The industry in which the job is located. Type: Categorical Example Values: “Technology”, “Healthcare”, “Finance”

Company_Size: Description: The size of the company offering the job. Type: Categorical Categories: “Small”, “Medium”, “Large”

Location: Description: The geographic location of the job. Type: Categorical Example Values: “New York”, “San Francisco”, “London”

AI_Adoption_Level: Description: The extent to which the company has adopted AI in its operations. Type: Categorical Categories: “Low”, “Medium”, “High”

Automation_Risk: Description: The estimated risk that the job could be automated within the next 10 years. Type: Categorical Categories: “Low”, “Medium”, “High”

Required_Skills: Description: The key skills required for the job role. Type: Categorical Example Values: “Python”, “Data Analysis”, “Project Management”

Salary_USD: Description: The annual salary offered for the job in USD. Type: Numerical Value Range: $30,000 - $200,000

Remote_Friendly: Description: Indicates whether the job can be performed remotely. Type: Categorical Categories: “Yes”, “No”

Job_Growth_Projection: Description: The projected growth or decline of the job role over the next five years. Type: Categorical Categories: “Decline”, “Stable”, “Growth”

Load Necessary Libraries

library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(scales)
## Warning: package 'scales' was built under R version 4.3.3
## 
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
## 
##     col_factor
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.3.3
## corrplot 0.95 loaded
library(RColorBrewer)

Load dataset

job_data <- read_csv("ai_job_market_insights.csv")
## Rows: 500 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (9): Job_Title, Industry, Company_Size, Location, AI_Adoption_Level, Aut...
## dbl (1): Salary_USD
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Preview the dataset

head(job_data)

Check for missing values

missing_values <- sapply(job_data, function(x) sum(is.na(x)))
print("Missing Values per Column:")
## [1] "Missing Values per Column:"
print(missing_values)
##             Job_Title              Industry          Company_Size 
##                     0                     0                     0 
##              Location     AI_Adoption_Level       Automation_Risk 
##                     0                     0                     0 
##       Required_Skills            Salary_USD       Remote_Friendly 
##                     0                     0                     0 
## Job_Growth_Projection 
##                     0

View dataset structure and summary

str(job_data)
## spc_tbl_ [500 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Job_Title            : chr [1:500] "Cybersecurity Analyst" "Marketing Specialist" "AI Researcher" "Sales Manager" ...
##  $ Industry             : chr [1:500] "Entertainment" "Technology" "Technology" "Retail" ...
##  $ Company_Size         : chr [1:500] "Small" "Large" "Large" "Small" ...
##  $ Location             : chr [1:500] "Dubai" "Singapore" "Singapore" "Berlin" ...
##  $ AI_Adoption_Level    : chr [1:500] "Medium" "Medium" "Medium" "Low" ...
##  $ Automation_Risk      : chr [1:500] "High" "High" "High" "High" ...
##  $ Required_Skills      : chr [1:500] "UX/UI Design" "Marketing" "UX/UI Design" "Project Management" ...
##  $ Salary_USD           : num [1:500] 111392 93793 107170 93028 87753 ...
##  $ Remote_Friendly      : chr [1:500] "Yes" "No" "Yes" "No" ...
##  $ Job_Growth_Projection: chr [1:500] "Growth" "Decline" "Growth" "Growth" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Job_Title = col_character(),
##   ..   Industry = col_character(),
##   ..   Company_Size = col_character(),
##   ..   Location = col_character(),
##   ..   AI_Adoption_Level = col_character(),
##   ..   Automation_Risk = col_character(),
##   ..   Required_Skills = col_character(),
##   ..   Salary_USD = col_double(),
##   ..   Remote_Friendly = col_character(),
##   ..   Job_Growth_Projection = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
summary(job_data)
##   Job_Title           Industry         Company_Size         Location        
##  Length:500         Length:500         Length:500         Length:500        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  AI_Adoption_Level  Automation_Risk    Required_Skills      Salary_USD    
##  Length:500         Length:500         Length:500         Min.   : 31970  
##  Class :character   Class :character   Class :character   1st Qu.: 78512  
##  Mode  :character   Mode  :character   Mode  :character   Median : 91998  
##                                                           Mean   : 91222  
##                                                           3rd Qu.:103971  
##                                                           Max.   :155210  
##  Remote_Friendly    Job_Growth_Projection
##  Length:500         Length:500           
##  Class :character   Class :character     
##  Mode  :character   Mode  :character     
##                                          
##                                          
## 

#2. WRANGLE: Data Cleaning After conducting a thorough review of the dataset, it was observed that all columns are correctly formatted and appropriately assigned to their respective data types. This ensures that the dataset is ready for analysis without requiring additional adjustments or transformations. The consistency in data types across the columns eliminates the need for preprocessing steps. As a result, the focus can shift directly to exploring, analyzing, and modeling the data with confidence in its structural integrity.

#3. EXPLORE & MODEL: Data Analysis

# Convert categorical variables to factors
job_data <- job_data %>%
  mutate_if(is.character, as.factor)

##3.1 Analyze the Impact of AI Adoption on Different Industries

Explore the relationship between AI adoption and industry by creating a stacked bar chart

ggplot(job_data, aes(x = Industry, fill = AI_Adoption_Level)) +
  geom_bar(position = "fill") +  # Stacked bar with proportional representation
  labs(
    title = "Proportional Relationship of AI Adoption Level by Industry",
    x = "Industry",
    y = "Proportion of Job Listings",
    fill = "AI Adoption Level"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_fill_manual(values = c("High" = "red", "Medium" = "blue", "Low" = "green"))

This stacked bar chart represents the proportion of job listings across different industries, categorized by AI Adoption Level. The levels are labeled as High, Medium, and Low AI adoption. Each bar represents an industry, with different colored segments indicating the proportion of job listings in that industry that are categorized under High (red), Medium (blue), and Low (green) AI adoption. Healthcare, Technology, and Retail stand out with a higher proportion of job listings in the High AI Adoption category (red). These industries are leading the way in AI integration, indicating a greater demand for AI-related skills and roles.Energy, Finance, and Manufacturing show a mix of Medium (blue) and Low (green) AI Adoption levels. Transportation and Telecommunications exhibit a higher proportion of job listings in the Low AI Adoption category (green), indicating that AI adoption is less prevalent in these industries, or these sectors are at an earlier stage of AI integration.

##3.2 Explore the impact of AI integration on job roles across different industries

# Analyze high vs low demand roles
high_low_demand <- job_data %>%
  group_by(Industry, Job_Title) %>%
  summarise(
    count = n(),
    top_skills = list(Required_Skills),
    .groups = 'drop'
  ) %>%
  arrange(Industry, desc(count))

# Print top skills for each industry
industry_top_skills <- job_data %>%
  group_by(Industry) %>%
  count(Required_Skills) %>%
  arrange(Industry, desc(n)) %>%
  slice_head(n = 3)

print("Top 3 Skills by Industry:")
## [1] "Top 3 Skills by Industry:"
print(industry_top_skills)
## # A tibble: 30 × 3
## # Groups:   Industry [10]
##    Industry      Required_Skills        n
##    <fct>         <fct>              <int>
##  1 Education     Project Management    10
##  2 Education     Cybersecurity          9
##  3 Education     Data Analysis          9
##  4 Energy        UX/UI Design           9
##  5 Energy        Data Analysis          6
##  6 Energy        Machine Learning       6
##  7 Entertainment Cybersecurity          8
##  8 Entertainment Marketing              7
##  9 Entertainment JavaScript             6
## 10 Finance       Python                 9
## # ℹ 20 more rows

This table shows a summary of skills demanded by different industries, along with their respective counts (n), representing how frequently these skills are required. For example, for Education sector, Project Management is the most sought-after skill, with 10 instances. Cybersecurity and Data Analysis are also highly valued, both with a count of 9. For Energy sector, UX/UI Design and Data Analysis are the top skills, both appearing 9 times. Machine Learning has a significant presence with a count of 6. This table highlights industry-specific skill priorities, emphasizing the need for technical and analytical skills like Data Analysis and Cybersecurity in Education and Entertainment, while UX/UI Design and Python are critical in Energy and Finance, respectively.

 # Visualize role demand
role_demand <- job_data %>%
 group_by(Job_Title) %>%
 summarize(
   count = n(),
   avg_salary = mean(Salary_USD),
   common_skills = list(names(sort(table(Required_Skills), decreasing = TRUE)[1:3]))
 ) %>%
 arrange(desc(count)) %>%
 # Categorize roles as high/low demand based on median split
 mutate(demand_level = ifelse(count > median(count), "High Demand", "Low Demand"))

# Create visualization comparing high vs low demand roles
ggplot(role_demand, aes(x = reorder(Job_Title, count), y = count, fill = demand_level)) +
 geom_bar(stat = "identity") +
 coord_flip() +
 scale_fill_manual(values = c("High Demand" = "darkblue", "Low Demand" = "lightblue")) +
 labs(title = "Job Roles by Demand Level",
      subtitle = "With count of positions available",
      x = "Job Title",
      y = "Number of Positions",
      fill = "Demand Level") +
 theme_minimal() +
 theme(
   plot.title = element_text(size = 14, face = "bold"),
   axis.text.y = element_text(size = 10)
 )

This horizontal bar chart illustrates job roles categorized by their demand levels, based on the number of available positions: High Demand Roles (represented by dark blue): Includes positions such as Data Scientist, HR Manager, Cybersecurity Analyst, UX Designer, AI Researcher, and Sales Manager. Low Demand Roles (represented by light blue): Includes positions like Marketing Specialist, Operations Manager, Software Engineer, and Product Manager.

##3.3 Analyze AI Adoption and Salary Across Industries

# Summarize average salary by Industry and AI Adoption Level
salary_analysis <- job_data %>%
  group_by(Industry, AI_Adoption_Level) %>%
  summarise(Average_Salary = mean(Salary_USD, na.rm = TRUE), 
            Median_Salary = median(Salary_USD, na.rm = TRUE), 
            .groups = 'drop') %>%
  arrange(Industry, desc(Average_Salary))

# View summary table
print(salary_analysis)
## # A tibble: 30 × 4
##    Industry      AI_Adoption_Level Average_Salary Median_Salary
##    <fct>         <fct>                      <dbl>         <dbl>
##  1 Education     Low                       98748.        94787.
##  2 Education     High                      93822.        91830.
##  3 Education     Medium                    87453.        91464.
##  4 Energy        Medium                   102880.       102429.
##  5 Energy        Low                       92919.        89202.
##  6 Energy        High                      83115.        82017.
##  7 Entertainment High                      96553.       100703.
##  8 Entertainment Low                       95338.        86385.
##  9 Entertainment Medium                    91515.        93970.
## 10 Finance       Medium                   100481.       102594.
## # ℹ 20 more rows
# Visualize with bar chart
ggplot(salary_analysis, aes(x = AI_Adoption_Level, y = Average_Salary, fill = AI_Adoption_Level)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~ Industry) +
  labs(title = "Average Salary by AI Adoption Level Across Industries",
       x = "AI Adoption Level",
       y = "Average Salary (USD)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

The above figure illustrates the average salary for job listings across various industries, categorized by AI Adoption Level (High, Medium, Low). Each bar represents the average salary (in USD) for job listings in that specific industry, broken down by the level of AI adoption: Red: High AI Adoption, Blue: Medium AI Adoption, and Green: Low AI Adoption. Across most industries, the average salary is fairly consistent regardless of AI adoption level. This suggests that while AI skills are crucial, they may not yet be the sole determining factor for salary differences across industries. The figure indicates that AI adoption level alone might not drastically influence average salaries in certain industries. Factors like company size could also play critical roles in determining salary levels.

Explore Company size and Salary Across Industries

# Create the 'salary_summary' dataset by calculating average salary by company size and industry
salary_summary <- job_data %>%
  group_by(Industry, Company_Size) %>%
  summarise(Average_Salary = mean(Salary_USD, na.rm = TRUE))
## `summarise()` has grouped output by 'Industry'. You can override using the
## `.groups` argument.
# Create the bar plot with facet grid for different industries
ggplot(salary_summary, aes(x = Company_Size, y = Average_Salary, fill = Company_Size)) +
  geom_col() +  # Create a bar plot
  facet_grid(~Industry, scales = "free_x") +  # Facet by industry with independent x scales
  labs(title = "Average Salary by Company Size Across Industries",
       x = "Company Size",
       y = "Average Salary (USD)") +
  theme_minimal() +  # Apply a minimal theme
  scale_fill_manual(values = c("lightblue", "lightgreen", "lightcoral")) +  # Custom fill colors
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for readability

This figure illustrates the distribution of AI Adoption Levels (High, Low, Medium) across various industries, segmented by Company Size (Large, Medium, Small). The Entertainment, Energy, Education, and Telecommunication sectors show a higher concentration of small companies offering higher salaries. In contrast, the Finance, Manufacturing, and Retail sectors are characterized by a greater proportion of large companies, also associated with higher salary levels. The Transportation sector, however, displays more large and medium-sized companies with higher salaries compared to small-sized companies, which offer lower salaries. Meanwhile, the Technology sector shows a more balanced distribution across company sizes, with small companies offering relatively higher salaries.

Correlation Between AI Adoption and Salary by Industry

# Assign numeric values to AI Adoption Levels
job_data$AI_Adoption_Numeric <- as.numeric(factor(job_data$AI_Adoption_Level, levels = c("Low", "Medium", "High")))

# Correlation between AI Adoption and Salary within each industry
correlation_analysis <- job_data %>%
  group_by(Industry) %>%
  summarise(Correlation = cor(AI_Adoption_Numeric, Salary_USD, use = "complete.obs"), 
            .groups = 'drop') %>%
  arrange(desc(Correlation))

# View correlation results
print(correlation_analysis)
## # A tibble: 10 × 2
##    Industry           Correlation
##    <fct>                    <dbl>
##  1 Entertainment           0.0153
##  2 Healthcare             -0.0356
##  3 Retail                 -0.0397
##  4 Finance                -0.0467
##  5 Telecommunications     -0.0673
##  6 Education              -0.136 
##  7 Manufacturing          -0.172 
##  8 Energy                 -0.178 
##  9 Transportation         -0.224 
## 10 Technology             -0.268
# Visualize correlations
ggplot(correlation_analysis, aes(x = reorder(Industry, Correlation), y = Correlation, fill = Correlation)) +
  geom_bar(stat = "identity") +
  labs(title = "Correlation Between AI Adoption and Salary by Industry",
       x = "Industry",
       y = "Correlation Coefficient") +
  coord_flip() +
  theme_minimal()

The visualization reveals consistently negative correlations between AI adoption and salary across all industries, ranging from near-zero to -0.2. Technology sector demonstrates the strongest negative correlation (-0.2), followed by Transportation and Energy, while Entertainment shows the weakest relationship (near zero). This unexpected pattern suggests that higher AI adoption levels are generally associated with slightly lower salaries, with the relationship varying in strength across different industries.

##3.4 Analyze AI Adoption and Company Size Across Industries

ggplot(job_data, aes(x = Industry, fill = Company_Size)) + 
  geom_bar(position = "dodge") +  # Position 'dodge' places bars next to each other
  facet_wrap(~ AI_Adoption_Level, scales = "free_x") +  # Facet by AI Adoption Level
  labs(title = "AI Adoption and Company Size Across Industries",
       x = "Industry",
       y = "Count of Job Listings") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

This figure displays the patterns of AI Adoption (High, Low, Medium) across various industries, broken down by company size. In the High AI Adoption panel, the distribution varies across industries. The Technology sector is notably represented across all company sizes, while the Finance sector has a strong concentration of large companies. Manufacturing shows a consistent presence across company sizes. Education and Healthcare exhibit a mixed distribution across different company sizes. The Transportation sector reveals distinct distribution patterns across the different AI adoption levels.

##3.5 Summarize Job Growth Projection by AI Adoption Level and Industry

# Count the distribution of Job Growth Projection by Industry and AI Adoption Level
job_growth_summary <- job_data %>%
  group_by(Industry, AI_Adoption_Level, Job_Growth_Projection) %>%
  summarise(Count = n(), .groups = 'drop')

# View the summary
print(job_growth_summary)
## # A tibble: 89 × 4
##    Industry  AI_Adoption_Level Job_Growth_Projection Count
##    <fct>     <fct>             <fct>                 <int>
##  1 Education High              Decline                   5
##  2 Education High              Growth                    8
##  3 Education High              Stable                    3
##  4 Education Low               Decline                   4
##  5 Education Low               Growth                   10
##  6 Education Low               Stable                    9
##  7 Education Medium            Decline                   7
##  8 Education Medium            Growth                    6
##  9 Education Medium            Stable                    5
## 10 Energy    High              Decline                   4
## # ℹ 79 more rows

Visualize Job Growth Projection by AI Adoption Level Using Bar Charts

# Plot Job Growth Projection Distribution
ggplot(job_growth_summary, aes(x = AI_Adoption_Level, y = Count, fill = Job_Growth_Projection)) +
  geom_bar(stat = "identity", position = "fill") +  # Position "fill" for proportional comparison
  facet_wrap(~ Industry, scales = "free_y") +
  labs(title = "Distribution of Job Growth Projection by AI Adoption Level Across Industries",
       x = "AI Adoption Level",
       y = "Proportion of Job Growth Projection",
       fill = "Job Growth Projection") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

This figure highlights the relationship between AI Adoption Levels (High, Low, Medium) and Job Growth Projections (Growth, Stable, Decline) across industries. High AI Adoption industries, such as Finance, Technology, and Healthcare, show a higher proportion of jobs with growth or stable projections. Low AI Adoption industries, including Education and Manufacturing, have a greater share of jobs at risk of decline. Medium AI Adoption industries, like Retail and Telecommunications, present a balanced distribution across all job growth categories. This suggests that higher AI adoption is generally associated with better job growth prospects.