My data set comes from the handbook of occupations of the U.S Bureau of Statistics and it shows different factors between the years 2023 and 2033 in relation to jobs including the employment change, percent that they occupy in the industry, and the total employment in each position. I have also used some information from the Occupational Employment and Wage Statistics, from the same source, in order to see how the change in the 2023-2033 predictions could affect the state of Maryland. Both data sets are going to be filtered by the category “Analyst”, which will provide the details for each position and later on be merged to use for analysis purposes. The reason as to why i choose this topic was that i though it would be interesting to see the changes in the different types of jobs, analysts in this case, in the next 10 years. What i would gain from this is meaningful as i wish to someday include my analysts skills in my future or or work as an analysts and this shows me if it is a good idea or bad and also how competitive it may become.
Load libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 882 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Occupation Title, Occupation Code, Occupation Type
dbl (8): 2023 Percent of Industry, 2023 Percent of Occupation, Projected 203...
num (2): 2023 Employment, Projected 2033 Employment
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 37676 Columns: 32
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (26): AREA, AREA_TITLE, PRIM_STATE, NAICS, NAICS_TITLE, I_GROUP, OCC_COD...
dbl (2): AREA_TYPE, OWN_CODE
lgl (4): PCT_TOTAL, PCT_RPT, ANNUAL, HOURLY
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Clean up the 'OCC_TITLE' and 'Occupation Title' columnsMD_Analyst <- MD_Analyst %>%mutate(OCC_TITLE =str_trim(tolower(OCC_TITLE)),OCC_TITLE =str_replace_all(OCC_TITLE, "[^[:alnum:] ]", ""),OCC_TITLE =case_when( OCC_TITLE =="software quality assurance analysts and testers"~"software quality assurance analyst",TRUE~ OCC_TITLE ))Analyst_Data_National_1 <- Analyst_Data_National %>%mutate(`Occupation Title`=str_trim(tolower(`Occupation Title`)),`Occupation Title`=str_replace_all(`Occupation Title`, "[^[:alnum:] ]", ""),`Occupation Title`=case_when(`Occupation Title`=="software quality assurance analysts and testers"~"software quality assurance analyst",TRUE~`Occupation Title` ))# The context for this coding is that the occupation ""software quality assurance analysts and testers" was not in one of the data sets so I had to eliminate or make the code bypass it i order for them to sucessfully merge
# Merge the two datasetsmerged_data <- MD_Analyst %>%inner_join(Analyst_Data_National_1, by =c("OCC_TITLE"="Occupation Title"))head(merged_data)
The U.S. economy is projected to add 6.7 million jobs from 2023 to 2033, the U.S. Bureau of Labor Statistics (BLS) reported today. Total employment is projected to increase to 174.6 million and grow 0.4 percent annually, which is slower than the 1.3 percent annual growth recorded over the 2013−23 decade.Technological advancements may also lead to increased productivity for some occupations. The growth of e-commerce as well as advances in technology are expected to limit demand for sales workers leading to employment declines. Similarly, automated systems and related technology, including AI, are expected to contribute to declines in employment of office and administrative support workers.Computer and mathematical occupations are projected to grow the second fastest of any occupational group, at 12.9 percent. The growth of computer and mathematical occupations is expected to stem from demand for upgraded computer services, continued development of artificial intelligence (AI) solutions,and an increasing amount of data available for analysis. In addition, the number and severity of cyberattacks and data breaches on U.S. businesses is expected to lead to greater demand for informationsecurity analysts.
Source link : https://www.bls.gov/news.release/pdf/ecopro.pdf
Linear regression analysis for the state of Maryland
#linear regression modellinear_reg <-lm(JOBS_1000 ~`Projected 2033 Employment`+`2023 Percent of Industry`, data = merged_data)
# Show a summary of what was foundsummary(linear_reg)
Call:
lm(formula = JOBS_1000 ~ `Projected 2033 Employment` + `2023 Percent of Industry`,
data = merged_data)
Residuals:
1 2 3 4 5 6 7
-0.003163 -0.159818 -0.731715 -1.503464 1.564193 0.367799 0.466168
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.5337 0.5652 2.713 0.0533 .
`Projected 2033 Employment` 0.5503 0.1820 3.024 0.0390 *
`2023 Percent of Industry` -51.5360 18.3797 -2.804 0.0486 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.185 on 4 degrees of freedom
Multiple R-squared: 0.8778, Adjusted R-squared: 0.8167
F-statistic: 14.36 on 2 and 4 DF, p-value: 0.01494
What this means is that the Projected 2033 employment and the 2023 percent of industry immensely influence employment per 1000 jobs, in the analysts section. The coefficient for projected 2023 employment says that for each unit of projected employment, employment per 1000 jobs increases by 0.55, while the coefficient for 2023 percent in industry indicates that employment per 1000 jobs decreases by 51.54. Overall, As the percentage of the industry grows employment per 1000 will decrease, but if the prediction is correct and increases employment, jobs per 1000 will increase with it.
Plot 1
# Merging data in order to create new column merged_data$Employment_Change <-ifelse(merged_data$`Projected 2033 Employment`> merged_data$`2023 Employment`, "Increase",ifelse(merged_data$`Projected 2033 Employment`< merged_data$`2023 Employment`, "Decrease", "Same"))
# Create the bar plotplot_ly(data = merged_data) %>%add_trace(x =~OCC_TITLE,y =~`2023 Employment`,type ='bar',name ='2023 Employment',text =~hover_text_2023,hoverinfo ='text' ) %>%add_trace(x =~OCC_TITLE,y =~`Projected 2033 Employment`,type ='bar',name ='Projected 2033 Employment',marker =list(color ='#FF7F50'),text =~hover_text_2033,hoverinfo ='text' ) %>%layout(title ="2023 vs Projected 2033 Employment by Occupation Title",barmode ='group',xaxis =list(title ="Occupation Title", tickangle =45),yaxis =list(title ="Employment"),showlegend =TRUE,legend =list(x =1, y =1),annotations =list(list(text ="Data Source: U.S. Bureau of Labor Statistics",x =0.5,xanchor ="center",y =-0.1,showarrow =FALSE,font =list(size =10) ) ) )
Vizualization Fidings
What i could see from this is that budget analysts employment level will stay the same, which could either be because employment is competitive or technology will continue to improve and they wont be as needed. The positions that will keep increasing are related with management and marketing which will probably mean that the individual also needs to be more educated however I am curious as to the change in wages and as the employment increases, and how will the decrease in the total employment differ in Maryland.