My data set references employment in Maryland. It has yearly employment numbers, being the total number of jobs in Maryland for every year from 2010 to 2020, and then 5 year projections until 2050. The data set also splits these totals by county for all of the aforementioned years, showing the spread between the 24 counties in Maryland. The data set then goes further and splits the split of employment by categories into which each job falls into, examples being State Government, Construction, Retail etc. My cleaning consisted of the general Saidi cleaning, and then filtering out Maryland, and then the counties.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(highcharter)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use
Rows: 425 Columns: 31
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (23): Data Created, Jurisdiction, Forestry, fishing, and related activit...
dbl (8): FIBS, Year, Total employment (number of jobs), Farm employment, Go...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Saidi Cleaning with an extra stepdata1 <- data1 %>%rename(Total_Employment =`Total employment (number of jobs)`)names(data1) <-tolower(names(data1))names(data1) <-gsub(" ","_",names(data1))
#Filter out Data into Maryland and CountiesdataMD <- data1 |>filter (jurisdiction =="Maryland")dataCounty<- data1 |>filter (jurisdiction !="Maryland")
p1 <- dataMD |>ggplot(aes(x = total_employment, y = military, color = year)) +geom_point()+labs(title ="Military jobs vs total employment in Maryland",caption ="Source: NAICS (North American Industry Classification System)",x ="Total Employment",y ="Military Employment") +geom_smooth(method ="lm", se =TRUE, color ="purple") +theme_classic()p1
`geom_smooth()` using formula = 'y ~ x'
#Playing around with plotsp2 <- dataCounty |>ggplot(aes(x = total_employment, y = military, color = jurisdiction)) +geom_point()+geom_smooth(method ="lm", se =TRUE) +labs(title ="Military jobs vs total employment in Maryland",caption ="Source: NAICS (North American Industry Classification System)",x ="Total Employment",y ="Military Employment") +theme_classic() p2
`geom_smooth()` using formula = 'y ~ x'
#Just playing around with plotsp8 <- dataMD |>ggplot(aes(x = total_employment, y = year, color = jurisdiction)) +geom_point()p8
#Just playing around with plotsp9 <- dataCounty |>ggplot(aes(x = total_employment, y = year, color = jurisdiction)) +geom_point()p9
options(scipen =999)p3 <- data1 |>ggplot(aes(x = year, y = total_employment, alluvium = jurisdiction)) +theme_bw() +geom_alluvium(aes(fill = jurisdiction),color ="white", width = .2, alpha = .9,decreasing =FALSE) +scale_fill_viridis(discrete =TRUE, option ="D") +#USED CHATGPT FOR ASSISTANCElabs(title ="Projected Employment Growth Maryland",y =" Total Employment",x ="Year",fill ="Jurisdiction",caption ="Source: NAICS (North American Industry Classification System)")p3
ggplotly(p3)
My plot shows the total number of employment in Maryland and its projected growth for the next 25 or so years. The total Maryland value is obviously the largest, and then below is the split by counties for the number of employment. The plot is interactive using Plotly, made separate in case of any possible issues with plotly so the non-interactive graph is still easily viewable. The graph shows a pretty significant growth at the beginning before a rapid drop in 2020, likely cause by the Covid Pandemic. The graph predicts fairly consistent growth until 2030, and then a slight drop off in the rate of growth between 2030 and 2050, though the rate of growth stays consistent between those years.