Exploring the Atlas of Rural and Small Town America Dataset
The data used for this project is taken from the U.S. Department of Agriculture, Economic Research Service. This is the Atlas of Rural and Small Town America. This dataset provides statistics by broad categories for various socioeconomic factors, including demographic data from the American Community Survey (ACS), economic data from the bureau of Labor Statistics, categorical variables (codes) for various county classifications, data on income, and data on veterans.
For this project, we are only going to look at County Classifications. Let’s first import the excel workbook and specified sheet and convert into a tibble.
I then convert it to a data frame for better data wrangling ability. This code has been hidden, so as to not pull the entire data frame into view. Instead, we’ll look at the head.
head(RuralAtlasData23)
# A tibble: 6 x 45
FIPStxt State County RuralUrbanContinuumCode2~ UrbanInfluenceCode2~
<chr> <chr> <chr> <dbl> <dbl>
1 01001 AL Autauga 2 2
2 01003 AL Baldwin 3 2
3 01005 AL Barbour 6 6
4 01007 AL Bibb 1 1
5 01009 AL Blount 1 1
6 01011 AL Bullock 6 6
# ... with 40 more variables: RuralUrbanContinuumCode2003 <dbl>,
# UrbanInfluenceCode2003 <dbl>, Metro2013 <dbl>,
# Nonmetro2013 <dbl>, Micropolitan2013 <dbl>,
# Type_2015_Update <dbl>, Type_2015_Farming_NO <dbl>,
# Type_2015_Manufacturing_NO <dbl>, Type_2015_Mining_NO <dbl>,
# Type_2015_Government_NO <dbl>, Type_2015_Recreation_NO <dbl>,
# Low_Education_2015_update <dbl>,
# Low_Employment_2015_update <dbl>,
# Population_loss_2015_update <dbl>,
# Retirement_Destination_2015_Update <dbl>, Perpov_1980_0711 <dbl>,
# PersistentChildPoverty_1980_2011 <dbl>, Hipov <dbl>,
# HiAmenity <dbl>, HiCreativeClass2000 <dbl>, Gas_Change <dbl>,
# Oil_Change <dbl>, Oil_Gas_Change <dbl>, Metro2003 <dbl>,
# NonmetroNotAdj2003 <dbl>, NonmetroAdj2003 <dbl>,
# Noncore2003 <dbl>, EconomicDependence2000 <dbl>,
# Nonmetro2003 <dbl>, Micropolitan2003 <dbl>,
# FarmDependent2003 <dbl>, ManufacturingDependent2000 <dbl>,
# LowEducation2000 <dbl>, RetirementDestination2000 <dbl>,
# PersistentPoverty2000 <dbl>, Noncore2013 <dbl>,
# Type_2015_Nonspecialized_NO <dbl>, Metro_Adjacent2013 <dbl>,
# PersistentChildPoverty2004 <dbl>, RecreationDependent2000 <dbl>
With just a quick glance, we can see a few interesting tidbits.
45 columns is of course quite a bit to work with. We are going to select only five (5) relevant variables for this project. These shall include the Unique County ID, the State, the County, if the county is classified as Nonmetro (the county does not have an Urbanized Area or Urbanized Cluster in its jurisdiction), if the county is classified as a Micropolitan (population of at least 10,000 but less than 50,000), if the county experienced population loss in the past decade (2005 - 2015), counties in persistent poverty in the past three decades (1970 - 2000), and if the county had high natural amenities.
RuralAtlasData23 <- select(RuralAtlasData23, "FIPStxt",
"State",
"County",
"Nonmetro2013",
"Micropolitan2013",
"Population_loss_2015_update",
"PersistentPoverty2000",
"HiAmenity")
Next, let’s rename those long columns into something more digestible.
RuralAtlasData23 <- rename(RuralAtlasData23,
UniqueID = "FIPStxt",
Nonmetro = "Nonmetro2013",
Micropolitan = "Micropolitan2013",
Population_Loss = "Population_loss_2015_update",
Persistent_Poverty = "PersistentPoverty2000")
While the columns / variables are now easier to understand, the coded responses are not. We’ll need to recode those 0s and 1s to better reflect what they are identifying.
RuralAtlasData23 <- RuralAtlasData23 %>%
mutate(Nonmetro = recode(Nonmetro, '0' = "Urban", '1' = "Rural"),
Micropolitan = recode(Micropolitan, '0' = "No", '1' = "Yes"),
Population_Loss = recode(Population_Loss, '0' = "No", '1' = "Yes"),
Persistent_Poverty = recode(Persistent_Poverty, '0' = "No", '1' = "Yes"),
HiAmenity = recode(HiAmenity, '0' = "No", '1' = "Yes")
)
head(RuralAtlasData23)
# A tibble: 6 x 8
UniqueID State County Nonmetro Micropolitan Population_Loss
<chr> <chr> <chr> <chr> <chr> <chr>
1 01001 AL Autauga Urban No No
2 01003 AL Baldwin Urban No No
3 01005 AL Barbour Rural No No
4 01007 AL Bibb Urban No No
5 01009 AL Blount Urban No No
6 01011 AL Bullock Rural No No
# ... with 2 more variables: Persistent_Poverty <chr>,
# HiAmenity <chr>
As the last step in this data wrangling process, let’s filter out all the states save Texas (my home state!). When exploring geographic units of analysis, it’s often better to hone in on a smaller frame to find potentially richer information. While information is limited by the dataset, I think localizing this data moving forward will help us better answer some research questions.
# A tibble: 254 x 8
UniqueID State County Nonmetro Micropolitan Population_Loss
<chr> <chr> <chr> <chr> <chr> <chr>
1 48001 TX Anderson Rural Yes No
2 48003 TX Andrews Rural Yes No
3 48005 TX Angelina Rural Yes No
4 48007 TX Aransas Urban No No
5 48009 TX Archer Urban No No
6 48011 TX Armstrong Urban No No
7 48013 TX Atascosa Urban No No
8 48015 TX Austin Urban No No
9 48017 TX Bailey Rural No No
10 48019 TX Bandera Urban No No
# ... with 244 more rows, and 2 more variables:
# Persistent_Poverty <chr>, HiAmenity <chr>
Now that the data is cleaned and filtered, let’s consider some exploratory research questions.
For this project, we shall only explore the first question.
Let’s select the relevant columns for this question, filter on only Rural counties that experienced population loss, and provide a count.
Question1 <- RuralAtlasData23 %>%
select("UniqueID",
"County",
"Nonmetro",
"Population_Loss"
) %>%
filter(Nonmetro == "Rural",
Population_Loss == "Yes")
print(Question1)
# A tibble: 38 x 4
UniqueID County Nonmetro Population_Loss
<chr> <chr> <chr> <chr>
1 48023 Baylor Rural Yes
2 48033 Borden Rural Yes
3 48045 Briscoe Rural Yes
4 48047 Brooks Rural Yes
5 48069 Castro Rural Yes
6 48079 Cochran Rural Yes
7 48083 Coleman Rural Yes
8 48087 Collingsworth Rural Yes
9 48101 Cottle Rural Yes
10 48109 Culberson Rural Yes
# ... with 28 more rows
38 Rural Texas counties experienced population loss. There are a total of 172 Rural Texas counties (of 254 total). Doing some quick math will pull a percentage of those that experienced population loss.
(38 / 172) * 100
[1] 22.09302
22% of all Texas Rural counties experienced population loss. This did not meet the threshold set by the research question (25%), and therefore we can conclude that the majority of Texas Rural counties are growing.
We could further explore a question of similar concern by comparing population loss across the Rural / Urban Continuum, and see what percentage of Texas Urban counties experienced population loss. Let’s examine that real quick.
Question1vU <- RuralAtlasData23 %>%
select("UniqueID",
"County",
"Nonmetro",
"Population_Loss"
) %>%
filter(Nonmetro == "Urban",
Population_Loss == "Yes")
print(Question1vU)
# A tibble: 4 x 4
UniqueID County Nonmetro Population_Loss
<chr> <chr> <chr> <chr>
1 48065 Carson Urban Yes
2 48107 Crosby Urban Yes
3 48305 Lynn Urban Yes
4 48359 Oldham Urban Yes
(4 / 82) * 100
[1] 4.878049
There’s much less population loss for Texas Urban counties. Only 4.8% have experienced some form of population loss in the past decade (2005 - 2015). From this we can conclude that, while rural counties have not met the threshold of substantial population loss, they are 4x more likely to experience population loss than their urban counterparts.
There is more we can look at, and for the Final Project, we shall clean up some of this code to iterate smoother through the data and provide some visualizations / tables. Until then, this should suffice for Homework 3.