DATA110_Project1

DATA110 Project 1: Employment and Income in Maryland

By: Nura

Introduction:

The file that I will be using is from the USDA. It contains data from 2000-2023 of each county in each state of the US (categorical variable). For each county in each state (them being categorical variables), there is information of median incomes, labor force numbers, unemployment rates, and number of employees and those unemployed (all being quantitative variables). I will be exploring the labor force numbers, number of employers, and median incomes (quantitative variables) from Maryland’s counties: Montgomery County, Prince George’s County, and Howard County (categorical variables) in 2022.

#Loading dataset
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.3     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(dslabs)
setwd("C:/Users/24680/Downloads")
filename<-read_csv("C:/Users/24680/Downloads/Unemployment2023.csv")

Rows: 329726 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): State, Area_Name, Attribute
dbl (2): FIPS_Code, Value

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#filtering or cleaning the data
#Filtering data to show data in MD and specific counties
filtered_MD<-filename%>%filter(State=="MD" & Area_Name %in% c("Montgomery County, MD", "Prince George's County, MD", "Howard County, MD"))
#Filtering data to show data of specific categories in 2022
Attributes<-filter(filtered_MD, Attribute %in% c("Civilian_labor_force_2022", "Employed_2022", "Unemployed_2022", "Median_Household_Income_2022"))

#Grouping Attributes
filtered<-Attributes%>%group_by(Attribute)
filtered

# A tibble: 12 × 5
# Groups:   Attribute [4]
   FIPS_Code State Area_Name                  Attribute                    Value
       <dbl> <chr> <chr>                      <chr>                        <dbl>
 1     24027 MD    Howard County, MD          Civilian_labor_force_2022   187053
 2     24027 MD    Howard County, MD          Employed_2022               182520
 3     24027 MD    Howard County, MD          Unemployed_2022               4533
 4     24027 MD    Howard County, MD          Median_Household_Income_20… 133068
 5     24031 MD    Montgomery County, MD      Civilian_labor_force_2022   542899
 6     24031 MD    Montgomery County, MD      Employed_2022               528308
 7     24031 MD    Montgomery County, MD      Unemployed_2022              14591
 8     24031 MD    Montgomery County, MD      Median_Household_Income_20… 118020
 9     24033 MD    Prince George's County, MD Civilian_labor_force_2022   491617
10     24033 MD    Prince George's County, MD Employed_2022               475409
11     24033 MD    Prince George's County, MD Unemployed_2022              16208
12     24033 MD    Prince George's County, MD Median_Household_Income_20…  93833

#First visualization
area<-filtered%>%ggplot(aes(x=Area_Name, y=Value, fill = Attribute)) + geom_col(position="dodge") 
visual1<-area + scale_fill_manual(values = c("lightgreen", "lightblue", "purple", "pink")) + labs(x="Counties", y="Value", title = "Marylands Employment and Income", subtitle = "In Montgomery, PG, and Howard County", caption = "Source: USDA")
visual1 + theme_grey()

#Second Visualization
visual2<-filtered%>%ggplot(aes(x=Area_Name, y=Value)) +geom_col()+facet_grid(Area_Name~Attribute)+ labs(x="Counties", y="Value", title = "Marylands Employment and Income", subtitle = "In Montgomery, PG, and Howard County", caption = "Source: USDA")
visual2

Essay (Conclusion):

With the data by the USDA, I planned to look at the employment and income in Maryladn, specifically in Montgomery, Prince George’s, and Howard County. To do this, I used the filter() function and the %in% function in order to sift through the data. This resulted in me having data in those three counties in Maryland in the year 2022. I also grouped the data in a way where the counties would be grouped together by their attributes (meaning by the category of whether the measurement was for the median income, labor force, unemployed, or employed) using the filter() function again and the group_by() function.

The first visualization is a side by side bar graph in which it shows the number of civilians in the labor force, those unemployed, employed, and the median income in those counties. What I noticed in this first visualization is that Montgomery County and Prince George’s County have a significantly greater labor force and amount of people employed compared to the numbers in Howard County. My assumption from that was that there may be a greater population or greater work opportunities in those two counties than in Howard County. In the second visualization, there is a facet graph which has a graphs that each represent one of the four attributes I chose for each county. Just like the first visual, you can also see that Prince George’s County and Montgomery County have a significantly higher labor force and number of employed people compared to Howard County. I also noticed that Prince George’s County has the smallest median household income compared to Montgomery and Howard County (but the difference isn’t that significant). Both graphs both represent the number of civilians in the labor force, unemployed and employed people, and median household income but in different ways: one by a side by side bar graph and one by a facet graph.

What I could have shown that I couldn’t get to work was the proportion of people employed and unemployed and having it be presented in a bidirectional horizontal bar chart or a stacked bar graph. In the future, I could try to measure not just the selected counties in Maryland but all the counties in Maryland using a tree map.