Project 2

Author

M Loukinov

My data set references employment in Maryland. It has yearly employment numbers, being the total number of jobs in Maryland for every year from 2010 to 2020, and then 5 year projections until 2050. The data set also splits these totals by county for all of the aforementioned years, showing the spread between the 24 counties in Maryland. The data set then goes further and splits the split of employment by categories into which each job falls into, examples being State Government, Construction, Retail etc. My cleaning consisted of the general Saidi cleaning, and then filtering out Maryland, and then the counties.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

library(highcharter)

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use

library(dplyr)
library(alluvial)
library(ggalluvial)
library(viridis)

Loading required package: viridisLite

#Import Data
setwd("C:/Users/Thecr/OneDrive/Desktop/Data 110 Notes")
data1 <- read_csv("MDjobs.csv")

Rows: 425 Columns: 31
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (23): Data Created, Jurisdiction, Forestry, fishing, and related activit...
dbl  (8): FIBS, Year, Total employment (number of jobs), Farm employment, Go...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Saidi Cleaning with an extra step
data1 <- data1 %>%
  rename(Total_Employment = `Total employment (number of jobs)`)

names(data1) <- tolower(names(data1))
names(data1) <- gsub(" ","_",names(data1))

#Filter out Data into Maryland and Counties
dataMD <- data1 |>
  filter (jurisdiction == "Maryland")
dataCounty<- data1 |>
  filter (jurisdiction != "Maryland")

p1 <- dataMD |>
  ggplot(aes(x = total_employment, y = military, color = year)) +
  geom_point()+
  labs(title = "Military jobs vs total employment in Maryland",
  caption = "Source: NAICS (North American Industry Classification System)",
  x = "Total Employment",
  y = "Military Employment") +
  geom_smooth(method = "lm", se = TRUE, color = "purple") +
  theme_classic()
p1

`geom_smooth()` using formula = 'y ~ x'

#Playing around with plots
p2 <- dataCounty |>
  ggplot(aes(x = total_employment, y = military, color = jurisdiction)) +
  geom_point()+
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Military jobs vs total employment in Maryland",
  caption = "Source: NAICS (North American Industry Classification System)",
  x = "Total Employment",
  y = "Military Employment") +
  theme_classic() 
p2

`geom_smooth()` using formula = 'y ~ x'

#Just playing around with plots
p8 <- dataMD |>
  ggplot(aes(x = total_employment, y = year, color = jurisdiction)) +
  geom_point()
p8

#Just playing around with plots
p9 <- dataCounty |>
  ggplot(aes(x = total_employment, y = year, color = jurisdiction)) +
  geom_point()
p9

options(scipen = 999)

p3 <- data1 |>
  ggplot(aes(x = year, y = total_employment, alluvium = jurisdiction)) +
  theme_bw() +
  geom_alluvium(aes(fill = jurisdiction),
  color ="white", width = .2, alpha = .9,
  decreasing = FALSE) +
  scale_fill_viridis(discrete = TRUE, option = "D") +      #USED CHATGPT FOR ASSISTANCE
  labs(title = "Projected Employment Growth Maryland",
  y = " Total Employment",
  x = "Year",
  fill = "Jurisdiction",
  caption = "Source: NAICS (North American Industry Classification System)")
  
p3

ggplotly(p3)

My plot shows the total number of employment in Maryland and its projected growth for the next 25 or so years. The total Maryland value is obviously the largest, and then below is the split by counties for the number of employment. The plot is interactive using Plotly, made separate in case of any possible issues with plotly so the non-interactive graph is still easily viewable. The graph shows a pretty significant growth at the beginning before a rapid drop in 2020, likely cause by the Covid Pandemic. The graph predicts fairly consistent growth until 2030, and then a slight drop off in the rate of growth between 2030 and 2050, though the rate of growth stays consistent between those years.