Visualizing the Atlas of Rural and Small Town America Dataset
For Homework 4, we will continue using the Atlas for Rural and Small Town America. A lot of the sections from the previous Homework 3 module – importing, writing to a data frame, cleaning, and wrangling, will be hidden from this report. However, in places where I have selected, renamed, and recoded new variables, I will include that code chunk.
RuralAtlasData23 <- select(RuralAtlasData23, "FIPStxt",
"State",
"County",
"Nonmetro2013",
"Micropolitan2013",
"Low_Education_2015_update",
"Low_Employment_2015_update",
"Population_loss_2015_update",
"Retirement_Destination_2015_Update",
"PersistentChildPoverty2004",
"PersistentPoverty2000",
"HiAmenity")
We’ve added Low Education, Low Employment, Retirement Destination, and Persistent Child Poverty. Let’s rename and recode these columns.
RuralAtlasData23 <- rename(RuralAtlasData23,
UniqueID = "FIPStxt",
Nonmetro = "Nonmetro2013",
Micropolitan = "Micropolitan2013",
Low_Education = "Low_Education_2015_update",
Low_Employment = "Low_Employment_2015_update",
Population_Loss = "Population_loss_2015_update",
Retirement_Destination = "Retirement_Destination_2015_Update",
Persistent_Child_Poverty = "PersistentChildPoverty2004",
Persistent_Poverty = "PersistentPoverty2000")
RuralAtlasData23 <- RuralAtlasData23 %>%
mutate(Nonmetro = recode(Nonmetro, '0' = "Urban", '1' = "Rural"),
Micropolitan = recode(Micropolitan, '0' = "No", '1' = "Yes"),
Low_Education = recode(Low_Education,'0' = "No", '1' = "Yes"),
Low_Employment = recode(Low_Employment,'0' = "No", '1' = "Yes"),
Population_Loss = recode(Population_Loss, '0' = "No", '1' = "Yes"),
Retirement_Destination = recode(Retirement_Destination,'0' = "No", '1' = "Yes"),
Persistent_Child_Poverty = recode(Persistent_Child_Poverty,'0' = "No", '1' = "Yes"),
Persistent_Poverty = recode(Persistent_Poverty, '0' = "No", '1' = "Yes"),
HiAmenity = recode(HiAmenity, '0' = "No", '1' = "Yes")
)
head(RuralAtlasData23)
# A tibble: 6 x 12
UniqueID State County Nonmetro Micropolitan Low_Education
<chr> <chr> <chr> <chr> <chr> <chr>
1 01001 AL Autauga Urban No No
2 01003 AL Baldwin Urban No No
3 01005 AL Barbour Rural No Yes
4 01007 AL Bibb Urban No Yes
5 01009 AL Blount Urban No Yes
6 01011 AL Bullock Rural No Yes
# ... with 6 more variables: Low_Employment <chr>,
# Population_Loss <chr>, Retirement_Destination <chr>,
# Persistent_Child_Poverty <chr>, Persistent_Poverty <chr>,
# HiAmenity <chr>
The last step is selecting only those rows/counties that are in Texas.
# A tibble: 254 x 12
UniqueID State County Nonmetro Micropolitan Low_Education
<chr> <chr> <chr> <chr> <chr> <chr>
1 48001 TX Anderson Rural Yes Yes
2 48003 TX Andrews Rural Yes Yes
3 48005 TX Angelina Rural Yes No
4 48007 TX Aransas Urban No No
5 48009 TX Archer Urban No No
6 48011 TX Armstrong Urban No No
7 48013 TX Atascosa Urban No Yes
8 48015 TX Austin Urban No No
9 48017 TX Bailey Rural No Yes
10 48019 TX Bandera Urban No No
# ... with 244 more rows, and 6 more variables: Low_Employment <chr>,
# Population_Loss <chr>, Retirement_Destination <chr>,
# Persistent_Child_Poverty <chr>, Persistent_Poverty <chr>,
# HiAmenity <chr>
Now’s the time to explore some frequency tables. We don’t have any numeric variables, so we will solely be using frequency tables to determine the percentage of counties in Texas that are coded as X variable. For the Final Project, I am looking at joining this dataframe on another worksheet with the dataset to include numeric values, along with creating an iterative for loop to pull these proportions into one dataframe/output. We shall see if there is enough time to do so.
.
Rural Urban
67.71654 32.28346
68% of counties in Texas are classified as Rural, while 32% are Urban. We can repeat this process for the remaining eight (8) columns.
.
No Yes
81.88976 18.11024
Nothing tremendous here. Would need to dig into Rural/Urban for Micropolitan to see the percentage of those Rural counties that have more than 10K population but less than 50K.
.
No Yes
62.99213 37.00787
But this is worrisome. Almost 40% of all Texas counties are classified as having Low Education. While the Variable Classification does not define how the Atlas codes a county as Low Education, I can make an educated guess that there’s a threshold of those that do not have X% of a degree, whether as worrisome as not having a percentage of High School Diplomas or not having a Bachelor’s degree.
.
No Yes
72.04724 27.95276
Another worrisome statistic. Almost 30% of all Texas counties are classified as having Low Employment. Similar to the challenges noted in defining Low Education, this is most certainly meeting a specific threshold to determine eligibility.
.
No Yes
83.46457 16.53543
Population loss – particularly rural vs urban – is a research question we reviewed in Homework #3. We shall not delve deep into that question here.
.
No Yes
81.49606 18.50394
A variable we can examine in the Data Visualization section. Nothing much to say about it here.
.
No Yes
60.62992 39.37008
Compared to Persistent Poverty (18% of counties), Persistent Child Poverty is double that. PP and PCP are defined as a county experiencing 20% of a population under the poverty rate for 20+ years. This shows that almost half of all counties in Texas are classified as having deeply entrenched child poverty.
.
No Yes
81.88976 18.11024
See discussion above. Not much more to say.
What’s interesting here is that 47% of all Texas counties are classified as having high amenities. I would definitely want to know how this is classified and would love to dig deeper, using some visualizations against other variables to see if there’s a pattern anywhere.
We’ll end this section with a crosstabs and proportional crosstabs to help answer Research Question #2 from Homework #3: Are Rural Texas counties more likely to experience persistent poverty compared with their Urban counterparts?
xtabs(~ Nonmetro + Persistent_Poverty, RuralAtlasData23)
Persistent_Poverty
Nonmetro No Yes
Rural 137 35
Urban 71 11
And then the proportional crosstabs.
prop.table(xtabs(~ Nonmetro + Persistent_Poverty, RuralAtlasData23))*100
Persistent_Poverty
Nonmetro No Yes
Rural 53.937008 13.779528
Urban 27.952756 4.330709
Looks like Rural counties are 3x more likely to be classified and experience Persistent Poverty as compared to their Urban counterpart. When taking the data from Research Question #1, we can see that a Texas Rural county is much more likely to experience population loss and persistent poverty compared to Urban counties, often at rates of three to four times.
While there are plenty of other proportional crosstabs that would make for interesting research questions, we’re going to stop here for now and explore two data visualizations.
Due to the categorical nature of this current dataset, we are going to use bar charts for our univariate and bivariate graphs. We’ll focus on Retirement Destinations for both plots.
Nothing really amazing here. Most counties are not retirement destinations, almost 4:1. Let’s add some color to this graph.
ggplot(RuralAtlasData23,
aes(Retirement_Destination,
fill = Nonmetro)) +
geom_bar(position = "stack")
Now we’re getting somewhere! It looks like there are more Urban Retirement Destinations, both in count and in frequency. So older individuals are moving not to the countryside but to the city.
But this simple bar chart is pretty boring, still. We can look at proportions by editing the position from stack to fill and updating the colors / labels.
ggplot(RuralAtlasData23,
aes(Retirement_Destination,
fill = Nonmetro)) +
geom_bar(position = "fill") +
scale_fill_brewer(palette = "Paired") +
labs(y = "Percent",
x = "Retirement Destination",
title = "More Than Half of All Texas Retirement Destinations are in Urban Counties") +
theme_minimal()
That’s much better. And while a bar chart is still not that exciting of a data visualization, it tells us a little bit about the Retirement Destination column. For the Final Project, I hope to add more categorical data visualization, such as geom_tile and geom_count.
There are of course limitations to this bar chart. We can add some improvements by showing percentage labels within the bars – detailing how many cities are Retirement Destinations by their Nonmetro classification. The Y axis could also be edited from a decimal to percentage.
For readability’s sake, we should make the color / palette for colorblind people. Different hues from the same color make it easier on the eyes, but that doesn’t help if a colorblind person can’t easily make a distinction with blues! That’s not difficult, but we are aiming for a simple visualization here. Other readability changes could include parsing down the chart lines, changing the alpha score, and amending the legend.
I think this bivariate plot opens the door for more unanswered questions. How do other variables impact retirement destinations, such as persistent poverty and high amenities? I will need to explore future categorical data visualizations, incorporating multiple variables, to see what we can produce. There is a limitation here, so perhaps it will be fruitful to add numeric variables into the analysis.