Question 1a: Number of Cases (Rows)

case_count <- nrow(data)
case_count
## [1] 500

Question 1b: Number of Variables (Columns)

variable_count <- ncol(data)
variable_count
## [1] 10

Question 1c: First 10 Instances

head(data, 10) %>% kable() %>% kable_styling()
Job_Title Industry Company_Size Location AI_Adoption_Level Automation_Risk Required_Skills Salary_USD Remote_Friendly Job_Growth_Projection
Cybersecurity Analyst Entertainment Small Dubai Medium High UX/UI Design 111392.17 Yes Growth
Marketing Specialist Technology Large Singapore Medium High Marketing 93792.56 No Decline
AI Researcher Technology Large Singapore Medium High UX/UI Design 107170.26 Yes Growth
Sales Manager Retail Small Berlin Low High Project Management 93027.95 No Growth
Cybersecurity Analyst Entertainment Small Tokyo Low Low JavaScript 87752.92 Yes Decline
UX Designer Education Large San Francisco Medium Medium Cybersecurity 102825.01 No Growth
HR Manager Finance Medium Singapore Low High Sales 102065.72 Yes Growth
Cybersecurity Analyst Technology Small Dubai Medium Low Machine Learning 86607.32 Yes Decline
AI Researcher Retail Large London High Low JavaScript 75015.86 No Stable
Sales Manager Entertainment Medium Singapore High Low Cybersecurity 96834.58 Yes Decline

Question 1d: Missing Values Analysis

missing_summary <- sapply(data, function(x) sum(is.na(x)))
missing_summary <- missing_summary[missing_summary > 0]
missing_summary
## named integer(0)

Part 2: Research Questions

Part 2a: Research Questions and Hypotheses

Question 1: Does the AI_Adoption_Level of a company impact the Salary_USD for jobs within that company?

Hypothesis 1: Higher AI_Adoption_Level is associated with higher Salary_USD.

Question 2: Can the Automation_Risk level of a job predict its Job_Growth_Projection?

Hypothesis 2: Jobs with higher Automation_Risk are more likely to have a “Decline” in Job_Growth_Projection.

Part 2b: Relevant Variables for Each Research Question

Question 1: Relevant Variables - AI_Adoption_Level, Salary_USD

Question 2: Relevant Variables - Automation_Risk, Job_Growth_Projection

Part 2c: Identify Response Variables

Question 1: Response Variable: Salary_USD

Question 2: Response Variable: Job_Growth_Projection

Part 2d: Missing Values for Response Variables

missing_salary <- sum(is.na(data$Salary_USD))
missing_growth <- sum(is.na(data$Job_Growth_Projection))

missing_values_responses <- list(Salary_USD = missing_salary, Job_Growth_Projection = missing_growth)
missing_values_responses
## $Salary_USD
## [1] 0
## 
## $Job_Growth_Projection
## [1] 0

Part 2e: Distribution of Response Variables Salary Distribution

ggplot(data, aes(x = Salary_USD)) +
  geom_histogram(binwidth = 5000) +
  labs(title = "Salary Distribution", x = "Salary (USD)", y = "Count")

Job Growth Projection Distribution

ggplot(data, aes(x = Job_Growth_Projection)) +
  geom_bar() +
  labs(title = "Job Growth Projection Distribution", x = "Job Growth Projection", y = "Count")

Part 2f: Relationship Between Response and Explanatory Variables Relationship between Salary and AI Adoption Level

ggplot(data, aes(x = AI_Adoption_Level, y = Salary_USD, fill = AI_Adoption_Level)) +
  geom_boxplot() +
  labs(title = "Salary by AI Adoption Level", x = "AI Adoption Level", y = "Salary (USD)") +
  theme_minimal()

Relationship between Salary and Company Size

ggplot(data, aes(x = Company_Size, y = Salary_USD, fill = Company_Size)) +
  geom_boxplot() +
  labs(title = "Salary by Company Size", x = "Company Size", y = "Salary (USD)") +
  theme_minimal() 

Relationship between Job Growth Projection and Automation Risk

ggplot(data, aes(x = Automation_Risk, fill = Job_Growth_Projection)) +
  geom_bar(position = "dodge") +
  labs(title = "Job Growth Projection by Automation Risk", x = "Automation Risk", y = "Count")

Relationship between Job Growth Projection and Industry

ggplot(data, aes(x = Industry, fill = Job_Growth_Projection)) +
  geom_bar(position = "dodge") +
  labs(title = "Job Growth Projection by Industry", x = "Industry", y = "Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))