Question: How did population change differ across all counties from Maryland in 2010-2017?
The dataset chosen to explore the question above contains data for 3,142 counties in the United States. It is from Openintro.org. The dataset informs the user on different variables of each county which are the following: count of population in 2000 and 2010 and 2017, population change from 2010-2017, percentage of population in poverty in 2017, home ownership rate from 2006-2010, percentage of multi-unit housing from 2006-2010, unemployment rate in 2017, whether or not a county contains a metropolitian area, median education level from 2013-2017, per capita/per person income from 2013-2017, median housing income, and type of county-level smoking ban in 2010.
To answer the question above, only Maryland counties will be needed and the data on population change from 2010-2017.
#Data Analysis The type of analysis that will be performed is a visualization. A bar graph will be created to compare the data on population change from MD Counties in 2010-2017. First, I must create a new dateset called MD_counties with just MD counties by filtering out all non-MD counties from the County datset. Second, I must select only the name of counties and population change in 2010-2017 for the MD_counties dataset. I will use this for the visualization. To make the graph more readable, I will abbreviate the names of the counties in the MD_counties dataset using the Archives of Maryland Historical List Abbreviations by Papenfuse et al. Once that is done, I will finally plot a bar graph to visualize and help address my question.
#Data Wrangling
Md_counties.df <- County.df |>
filter(state == "Maryland")
Md_counties.df <- Md_counties.df |>
select(c(name, pop_change))
#Data Cleaning & EDA
is.na(Md_counties.df)
## name pop_change
## [1,] FALSE FALSE
## [2,] FALSE FALSE
## [3,] FALSE FALSE
## [4,] FALSE FALSE
## [5,] FALSE FALSE
## [6,] FALSE FALSE
## [7,] FALSE FALSE
## [8,] FALSE FALSE
## [9,] FALSE FALSE
## [10,] FALSE FALSE
## [11,] FALSE FALSE
## [12,] FALSE FALSE
## [13,] FALSE FALSE
## [14,] FALSE FALSE
## [15,] FALSE FALSE
## [16,] FALSE FALSE
## [17,] FALSE FALSE
## [18,] FALSE FALSE
## [19,] FALSE FALSE
## [20,] FALSE FALSE
## [21,] FALSE FALSE
## [22,] FALSE FALSE
## [23,] FALSE FALSE
## [24,] FALSE FALSE
Md_counties.df$name[Md_counties.df$name == "Allegany County"] <- "AL"
Md_counties.df$name[Md_counties.df$name == "Anne Arundel County"] <- "AA"
Md_counties.df$name[Md_counties.df$name == "Baltimore County"] <- "BA"
Md_counties.df$name[Md_counties.df$name == "Calvert County"] <- "CV"
Md_counties.df$name[Md_counties.df$name == "Caroline County"] <- "CA"
Md_counties.df$name[Md_counties.df$name == "Carroll County"] <- "CR"
Md_counties.df$name[Md_counties.df$name == "Cecil County"] <- "CE"
Md_counties.df$name[Md_counties.df$name == "Charles County"] <- "CH"
Md_counties.df$name[Md_counties.df$name == "Dorchester County"] <- "DO"
Md_counties.df$name[Md_counties.df$name == "Frederick County"] <- "FR"
Md_counties.df$name[Md_counties.df$name == "Garrett County"] <- "GA"
Md_counties.df$name[Md_counties.df$name == "Harford County"] <- "HA"
Md_counties.df$name[Md_counties.df$name == "Howard County"] <- "HO"
Md_counties.df$name[Md_counties.df$name == "Kent County"] <- "KE"
Md_counties.df$name[Md_counties.df$name == "Montgomery County"] <- "MO"
Md_counties.df$name[Md_counties.df$name == "Prince George's County"] <- "PG"
Md_counties.df$name[Md_counties.df$name == "Queen Anne's County"] <- "QA"
Md_counties.df$name[Md_counties.df$name == "St. Mary's County"] <- "SM"
Md_counties.df$name[Md_counties.df$name == "Somerset County"] <- "SO"
Md_counties.df$name[Md_counties.df$name == "Talbot County"] <- "TA"
Md_counties.df$name[Md_counties.df$name == "Washington County"] <- "WA"
Md_counties.df$name[Md_counties.df$name == "Wicomico County"] <- "WI"
Md_counties.df$name[Md_counties.df$name == "Worcester County"] <- "WO"
Md_counties.df$name[Md_counties.df$name == "Baltimore city"] <- "BC"
#Visualization
library(ggplot2)
ggplot(Md_counties.df, aes(x = name, y = pop_change)) +
geom_bar(stat = "identity", fill = "#1f77b4", color = "black") +
labs(title = "Bar Graph of Pop. Change of MD Counties in 2010-2017", x = "MD Counties Abbreviated", y = "Pop. Change") +
ylim(-3, 6) +
theme_minimal()
#Conclusion Howard, Frederick, Charles, and Montgomery Counties have had the highest increase in population change from 2010-2017 in comparison to all other MD counties. Allegany, Garett, Kent, and Talbot Counties have had the highest decrease in population change from 2010-2017 in comparison to all other MD counties. Worchester, Carroll, and Somserset Counties have the lowest population change of any kind compared to all other MD counties.
The implications of these results can inform urban planning projects such as building more housing, expanding transportation services, and investing in more educational programs to accommodate for those counties with a high increase in population change. Potential avenues for future research would be to have a follow up and look at population change in 2026, and see how things need to be adjusted.
#References Data Sets. (2017). Openintro.org. https://www.openintro.org/data/index.php?data=county Edward C. Papenfuse, et al., Archives of Maryland, An Historical List of Public Officials of Maryland, new series, Vol. 1. Annapolis, MD: Maryland State Archives, 1990. https://msa.maryland.gov/msa/speccol/sc2600/sc2685/html/abbrev.html#:~:text=Abbreviations%20List,in%201906%20and%201908%20only)