Project 2

Author

Ashley Ramirez

Analyst positions employment predictions 2023-2033

My data set comes from the handbook of occupations of the U.S Bureau of Statistics and it shows different factors between the years 2023 and 2033 in relation to jobs including the employment change, percent that they occupy in the industry, and the total employment in each position. I have also used some information from the Occupational Employment and Wage Statistics, from the same source, in order to see how the change in the 2023-2033 predictions could affect the state of Maryland. Both data sets are going to be filtered by the category “Analyst”, which will provide the details for each position and later on be merged to use for analysis purposes. The reason as to why i choose this topic was that i though it would be interesting to see the changes in the different types of jobs, analysts in this case, in the next 10 years. What i would gain from this is meaningful as i wish to someday include my analysts skills in my future or or work as an analysts and this shows me if it is a good idea or bad and also how competitive it may become.

Load libraries

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

setwd("/Users/ashleyramirez/Desktop/data110")

Load Dataset

national_employement <- read_csv("National Employment Matrix_IND_TE1100.csv")

Rows: 882 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Occupation Title, Occupation Code, Occupation Type
dbl (8): 2023 Percent of Industry, 2023 Percent of Occupation, Projected 203...
num (2): 2023 Employment, Projected 2033 Employment

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Cleaning and filtering dataset

cleaned_data_employement <- na.omit(national_employement)

Final_employement_data <- cleaned_data_employement %>%
  select(-`Display Level` , -`Occupation Code`, -`Occupation Sort`)

Analyst_Data_National <- Final_employement_data %>%
  filter(str_detect(`Occupation Title`, regex("analyst", ignore_case = TRUE)))

Adding another data set with more details

Full_employement_details <- read_csv("state_M2023_dl.csv")

Rows: 37676 Columns: 32
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (26): AREA, AREA_TITLE, PRIM_STATE, NAICS, NAICS_TITLE, I_GROUP, OCC_COD...
dbl  (2): AREA_TYPE, OWN_CODE
lgl  (4): PCT_TOTAL, PCT_RPT, ANNUAL, HOURLY

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Clean and filter data set

Analyst_Jobs<- Full_employement_details %>%
  filter(str_detect(`OCC_TITLE`, regex("analyst", ignore_case = TRUE)))

MD_Analyst <- Analyst_Jobs %>%
  filter(AREA_TITLE == "Maryland")

Merge Both data sets with the information wanted

# Clean up the 'OCC_TITLE' and 'Occupation Title' columns
MD_Analyst <- MD_Analyst %>%
  mutate(OCC_TITLE = str_trim(tolower(OCC_TITLE)),
         OCC_TITLE = str_replace_all(OCC_TITLE, "[^[:alnum:] ]", ""),
         OCC_TITLE = case_when(
           OCC_TITLE == "software quality assurance analysts and testers" ~ "software quality assurance analyst",
           TRUE ~ OCC_TITLE
         ))

Analyst_Data_National_1 <- Analyst_Data_National %>%
  mutate(`Occupation Title` = str_trim(tolower(`Occupation Title`)),
         `Occupation Title` = str_replace_all(`Occupation Title`, "[^[:alnum:] ]", ""),
         `Occupation Title` = case_when(
           `Occupation Title` == "software quality assurance analysts and testers" ~ "software quality assurance analyst",
           TRUE ~ `Occupation Title`
         ))

# The context for this coding is that the occupation ""software quality assurance analysts and testers" was not in one of the data sets so I had to eliminate or make the code bypass it i order for them to sucessfully merge

# Merge the two datasets
merged_data <- MD_Analyst %>%
  inner_join(Analyst_Data_National_1, by = c("OCC_TITLE" = "Occupation Title"))

head(merged_data)

# A tibble: 6 × 41
  AREA  AREA_TITLE AREA_TYPE PRIM_STATE NAICS  NAICS_TITLE    I_GROUP   OWN_CODE
  <chr> <chr>          <dbl> <chr>      <chr>  <chr>          <chr>        <dbl>
1 24    Maryland           2 MD         000000 Cross-industry cross-in…     1235
2 24    Maryland           2 MD         000000 Cross-industry cross-in…     1235
3 24    Maryland           2 MD         000000 Cross-industry cross-in…     1235
4 24    Maryland           2 MD         000000 Cross-industry cross-in…     1235
5 24    Maryland           2 MD         000000 Cross-industry cross-in…     1235
6 24    Maryland           2 MD         000000 Cross-industry cross-in…     1235
# ℹ 33 more variables: OCC_CODE <chr>, OCC_TITLE <chr>, O_GROUP <chr>,
#   TOT_EMP <chr>, EMP_PRSE <chr>, JOBS_1000 <chr>, LOC_QUOTIENT <chr>,
#   PCT_TOTAL <lgl>, PCT_RPT <lgl>, H_MEAN <chr>, A_MEAN <chr>,
#   MEAN_PRSE <chr>, H_PCT10 <chr>, H_PCT25 <chr>, H_MEDIAN <chr>,
#   H_PCT75 <chr>, H_PCT90 <chr>, A_PCT10 <chr>, A_PCT25 <chr>, A_MEDIAN <chr>,
#   A_PCT75 <chr>, A_PCT90 <chr>, ANNUAL <lgl>, HOURLY <lgl>,
#   `Occupation Type` <chr>, `2023 Employment` <dbl>, …

Background Research

The U.S. economy is projected to add 6.7 million jobs from 2023 to 2033, the U.S. Bureau of Labor Statistics (BLS) reported today. Total employment is projected to increase to 174.6 million and grow 0.4 percent annually, which is slower than the 1.3 percent annual growth recorded over the 2013−23 decade.Technological advancements may also lead to increased productivity for some occupations. The growth of e-commerce as well as advances in technology are expected to limit demand for sales workers leading to employment declines. Similarly, automated systems and related technology, including AI, are expected to contribute to declines in employment of office and administrative support workers.Computer and mathematical occupations are projected to grow the second fastest of any occupational group, at 12.9 percent. The growth of computer and mathematical occupations is expected to stem from demand for upgraded computer services, continued development of artificial intelligence (AI) solutions,and an increasing amount of data available for analysis. In addition, the number and severity of cyberattacks and data breaches on U.S. businesses is expected to lead to greater demand for informationsecurity analysts.

Source link : https://www.bls.gov/news.release/pdf/ecopro.pdf

Linear regression analysis for the state of Maryland

#linear regression model
linear_reg <- lm(JOBS_1000 ~ `Projected 2033 Employment` + `2023 Percent of Industry`, data = merged_data)

#Extracting any coefficients 
coefficients <- summary(linear_reg)$coefficients
intercept <- coefficients[1, 1]
slope1 <- coefficients[2, 1]
slope2 <- coefficients[3, 1]

# Building the equation 
equation <- paste("JOBS_1000 =", round(intercept, 2), "+", round(slope1, 2), "* Projected 2033 Employement +", round(slope2, 2), "* 2023 Percent of Industry")
print(equation)

[1] "JOBS_1000 = 1.53 + 0.55 * Projected 2033 Employement + -51.54 * 2023 Percent of Industry"

# Show a summary of what was found
summary(linear_reg)


Call:
lm(formula = JOBS_1000 ~ `Projected 2033 Employment` + `2023 Percent of Industry`, 
    data = merged_data)

Residuals:
        1         2         3         4         5         6         7 
-0.003163 -0.159818 -0.731715 -1.503464  1.564193  0.367799  0.466168 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)  
(Intercept)                   1.5337     0.5652   2.713   0.0533 .
`Projected 2033 Employment`   0.5503     0.1820   3.024   0.0390 *
`2023 Percent of Industry`  -51.5360    18.3797  -2.804   0.0486 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.185 on 4 degrees of freedom
Multiple R-squared:  0.8778,    Adjusted R-squared:  0.8167 
F-statistic: 14.36 on 2 and 4 DF,  p-value: 0.01494

What this means is that the Projected 2033 employment and the 2023 percent of industry immensely influence employment per 1000 jobs, in the analysts section. The coefficient for projected 2023 employment says that for each unit of projected employment, employment per 1000 jobs increases by 0.55, while the coefficient for 2023 percent in industry indicates that employment per 1000 jobs decreases by 51.54. Overall, As the percentage of the industry grows employment per 1000 will decrease, but if the prediction is correct and increases employment, jobs per 1000 will increase with it.

Plot 1

# Merging data in order to create new column 
merged_data$Employment_Change <- ifelse(merged_data$`Projected 2033 Employment` > merged_data$`2023 Employment`, "Increase",
                                        ifelse(merged_data$`Projected 2033 Employment` < merged_data$`2023 Employment`, "Decrease", "Same"))

# Dot Plot 
ggplot(merged_data, aes(x = OCC_TITLE)) +

  geom_point(aes(y = `2023 Employment`, color = "2023 Employment"), size = 3, alpha = 0.5) +
  
  geom_point(aes(y = `Projected 2033 Employment`, color = Employment_Change), size = 3, position = position_jitter(width = 0.2, height = 0)) +
  
  scale_color_manual(values = c("Increase" = "green", "Decrease" = "red", "Same" = "orange", "2023 Employment" = "grey")) +
  labs(title = "2023 and Projected 2033 Employment by Job Title",
       x = "Occupation Title",
       y = "Employment",
       color = "Legend") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Plot 2

library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

Create the data that I want to show in the plot

merged_data <- merged_data %>%
  mutate(
    hover_text_2023 = paste(
      "2023 Employment: ", format(`2023 Employment`, big.mark = ","),
      "<br>Annual Mean Wage MD : $", format(`A_MEAN`, big.mark = ",", nsmall = 2),
      "<br>Industry Percentage: ", round(`2023 Percent of Occupation`, 2), "%",
      "<br> Total Employement MD : ", format(`TOT_EMP`, big.mark = ",", nsmall = 2),
     "Employement per 1000 jobs : ", format(`JOBS_1000`, big.mark = ",") 
    ),
    
    hover_text_2033 = paste(
      "<br>Projected 2033 Employment: ", format(`Projected 2033 Employment`, big.mark = ","),
      "<br>Percent Change: ", format(`Employment Change, 2023-2033`, 2), "%",
      "<br>Industry Percentage: ", round(`Projected 2033 Percent of Occupation`, 2), "%",
      "<br>Employment Change: ", format(`Employment_Change`, big.mark = ",")
      
    )
  )

# Create the bar plot
plot_ly(data = merged_data) %>%
  add_trace(
    x = ~OCC_TITLE,
    y = ~`2023 Employment`,
    type = 'bar',
    name = '2023 Employment',
    text = ~hover_text_2023,
    hoverinfo = 'text'
  ) %>%
  add_trace(
    x = ~OCC_TITLE,
    y = ~`Projected 2033 Employment`,
    type = 'bar',
    name = 'Projected 2033 Employment',
    marker = list(color = '#FF7F50'),
    text = ~hover_text_2033,
    hoverinfo = 'text'
  ) %>%
  layout(
    title = "2023 vs Projected 2033 Employment by Occupation Title",
    barmode = 'group',
    xaxis = list(title = "Occupation Title", tickangle = 45),
    yaxis = list(title = "Employment"),
    showlegend = TRUE,
    legend = list(x = 1, y = 1),
    annotations = list(
      list(
        text = "Data Source: U.S. Bureau of Labor Statistics",
        x = 0.5,
        xanchor = "center",
        y = -0.1,
        showarrow = FALSE,
        font = list(size = 10)
      )
    )
  )

Vizualization Fidings

What i could see from this is that budget analysts employment level will stay the same, which could either be because employment is competitive or technology will continue to improve and they wont be as needed. The positions that will keep increasing are related with management and marketing which will probably mean that the individual also needs to be more educated however I am curious as to the change in wages and as the employment increases, and how will the decrease in the total employment differ in Maryland.