Homework 3

Importing the Data

The data used for this project is taken from the U.S. Department of Agriculture, Economic Research Service. This is the Atlas of Rural and Small Town America. This dataset provides statistics by broad categories for various socioeconomic factors, including demographic data from the American Community Survey (ACS), economic data from the bureau of Labor Statistics, categorical variables (codes) for various county classifications, data on income, and data on veterans.

For this project, we are only going to look at County Classifications. Let’s first import the excel workbook and specified sheet and convert into a tibble.

library(readxl)
library(tidyverse)
library(dplyr)
RuralAtlasData23 <- read_excel("RuralAtlasData23.xlsx", 
    sheet = "County Classifications")

I then convert it to a data frame for better data wrangling ability. This code has been hidden, so as to not pull the entire data frame into view. Instead, we’ll look at the head.

head(RuralAtlasData23)

# A tibble: 6 x 45
  FIPStxt State County  RuralUrbanContinuumCode2~ UrbanInfluenceCode2~
  <chr>   <chr> <chr>                       <dbl>                <dbl>
1 01001   AL    Autauga                         2                    2
2 01003   AL    Baldwin                         3                    2
3 01005   AL    Barbour                         6                    6
4 01007   AL    Bibb                            1                    1
5 01009   AL    Blount                          1                    1
6 01011   AL    Bullock                         6                    6
# ... with 40 more variables: RuralUrbanContinuumCode2003 <dbl>,
#   UrbanInfluenceCode2003 <dbl>, Metro2013 <dbl>,
#   Nonmetro2013 <dbl>, Micropolitan2013 <dbl>,
#   Type_2015_Update <dbl>, Type_2015_Farming_NO <dbl>,
#   Type_2015_Manufacturing_NO <dbl>, Type_2015_Mining_NO <dbl>,
#   Type_2015_Government_NO <dbl>, Type_2015_Recreation_NO <dbl>,
#   Low_Education_2015_update <dbl>,
#   Low_Employment_2015_update <dbl>,
#   Population_loss_2015_update <dbl>,
#   Retirement_Destination_2015_Update <dbl>, Perpov_1980_0711 <dbl>,
#   PersistentChildPoverty_1980_2011 <dbl>, Hipov <dbl>,
#   HiAmenity <dbl>, HiCreativeClass2000 <dbl>, Gas_Change <dbl>,
#   Oil_Change <dbl>, Oil_Gas_Change <dbl>, Metro2003 <dbl>,
#   NonmetroNotAdj2003 <dbl>, NonmetroAdj2003 <dbl>,
#   Noncore2003 <dbl>, EconomicDependence2000 <dbl>,
#   Nonmetro2003 <dbl>, Micropolitan2003 <dbl>,
#   FarmDependent2003 <dbl>, ManufacturingDependent2000 <dbl>,
#   LowEducation2000 <dbl>, RetirementDestination2000 <dbl>,
#   PersistentPoverty2000 <dbl>, Noncore2013 <dbl>,
#   Type_2015_Nonspecialized_NO <dbl>, Metro_Adjacent2013 <dbl>,
#   PersistentChildPoverty2004 <dbl>, RecreationDependent2000 <dbl>

With just a quick glance, we can see a few interesting tidbits.

There are 3,215 rows.
There are 45 columns, or variables.
The first three columns are characters. They are broken down by:

FIPStxt, or the County’s Unique ID;
State, and;
County.

The remaining 42 are all double, or float.

We will recode some of these variables to characters in a later step.

Wrangling the Data

45 columns is of course quite a bit to work with. We are going to select only five (5) relevant variables for this project. These shall include the Unique County ID, the State, the County, if the county is classified as Nonmetro (the county does not have an Urbanized Area or Urbanized Cluster in its jurisdiction), if the county is classified as a Micropolitan (population of at least 10,000 but less than 50,000), if the county experienced population loss in the past decade (2005 - 2015), counties in persistent poverty in the past three decades (1970 - 2000), and if the county had high natural amenities.

RuralAtlasData23 <- select(RuralAtlasData23, "FIPStxt", 
                           "State", 
                           "County", 
                           "Nonmetro2013", 
                           "Micropolitan2013", 
                           "Population_loss_2015_update", 
                           "PersistentPoverty2000", 
                           "HiAmenity")

Next, let’s rename those long columns into something more digestible.

RuralAtlasData23 <- rename(RuralAtlasData23, 
                           UniqueID = "FIPStxt", 
                           Nonmetro = "Nonmetro2013", 
                           Micropolitan = "Micropolitan2013", 
                           Population_Loss = "Population_loss_2015_update", 
                           Persistent_Poverty = "PersistentPoverty2000")

While the columns / variables are now easier to understand, the coded responses are not. We’ll need to recode those 0s and 1s to better reflect what they are identifying.

RuralAtlasData23 <- RuralAtlasData23 %>%
  mutate(Nonmetro = recode(Nonmetro, '0' = "Urban", '1' = "Rural"),
         Micropolitan = recode(Micropolitan, '0' = "No", '1' = "Yes"),
         Population_Loss = recode(Population_Loss, '0' = "No", '1' = "Yes"),
         Persistent_Poverty = recode(Persistent_Poverty, '0' = "No", '1' = "Yes"),
         HiAmenity = recode(HiAmenity, '0' = "No", '1' = "Yes")
         )
head(RuralAtlasData23)

# A tibble: 6 x 8
  UniqueID State County  Nonmetro Micropolitan Population_Loss
  <chr>    <chr> <chr>   <chr>    <chr>        <chr>          
1 01001    AL    Autauga Urban    No           No             
2 01003    AL    Baldwin Urban    No           No             
3 01005    AL    Barbour Rural    No           No             
4 01007    AL    Bibb    Urban    No           No             
5 01009    AL    Blount  Urban    No           No             
6 01011    AL    Bullock Rural    No           No             
# ... with 2 more variables: Persistent_Poverty <chr>,
#   HiAmenity <chr>

As the last step in this data wrangling process, let’s filter out all the states save Texas (my home state!). When exploring geographic units of analysis, it’s often better to hone in on a smaller frame to find potentially richer information. While information is limited by the dataset, I think localizing this data moving forward will help us better answer some research questions.

RuralAtlasData23 <- RuralAtlasData23 %>%
  filter(State == "TX")
print(RuralAtlasData23)

# A tibble: 254 x 8
   UniqueID State County    Nonmetro Micropolitan Population_Loss
   <chr>    <chr> <chr>     <chr>    <chr>        <chr>          
 1 48001    TX    Anderson  Rural    Yes          No             
 2 48003    TX    Andrews   Rural    Yes          No             
 3 48005    TX    Angelina  Rural    Yes          No             
 4 48007    TX    Aransas   Urban    No           No             
 5 48009    TX    Archer    Urban    No           No             
 6 48011    TX    Armstrong Urban    No           No             
 7 48013    TX    Atascosa  Urban    No           No             
 8 48015    TX    Austin    Urban    No           No             
 9 48017    TX    Bailey    Rural    No           No             
10 48019    TX    Bandera   Urban    No           No             
# ... with 244 more rows, and 2 more variables:
#   Persistent_Poverty <chr>, HiAmenity <chr>

Research Questions

Now that the data is cleaned and filtered, let’s consider some exploratory research questions.

Did a substantial amount of Rural Texas counties (25% or more) experience population loss?
Are Rural Texas counties more likely to experience persistent poverty compared with their Urban counterparts?
Do Rural Texas counties with high amounts of natural amenities consistently experience both (a) population loss and (b) persistent poverty?

For this project, we shall only explore the first question.

Question 1: Rural Population Loss

Let’s select the relevant columns for this question, filter on only Rural counties that experienced population loss, and provide a count.

Question1 <- RuralAtlasData23 %>%
  select("UniqueID", 
         "County",
         "Nonmetro",
         "Population_Loss"
         ) %>%
  filter(Nonmetro == "Rural", 
         Population_Loss == "Yes")
print(Question1)

# A tibble: 38 x 4
   UniqueID County        Nonmetro Population_Loss
   <chr>    <chr>         <chr>    <chr>          
 1 48023    Baylor        Rural    Yes            
 2 48033    Borden        Rural    Yes            
 3 48045    Briscoe       Rural    Yes            
 4 48047    Brooks        Rural    Yes            
 5 48069    Castro        Rural    Yes            
 6 48079    Cochran       Rural    Yes            
 7 48083    Coleman       Rural    Yes            
 8 48087    Collingsworth Rural    Yes            
 9 48101    Cottle        Rural    Yes            
10 48109    Culberson     Rural    Yes            
# ... with 28 more rows

38 Rural Texas counties experienced population loss. There are a total of 172 Rural Texas counties (of 254 total). Doing some quick math will pull a percentage of those that experienced population loss.

(38 / 172) * 100

[1] 22.09302

22% of all Texas Rural counties experienced population loss. This did not meet the threshold set by the research question (25%), and therefore we can conclude that the majority of Texas Rural counties are growing.

We could further explore a question of similar concern by comparing population loss across the Rural / Urban Continuum, and see what percentage of Texas Urban counties experienced population loss. Let’s examine that real quick.

Question1vU <- RuralAtlasData23 %>%
  select("UniqueID", 
         "County",
         "Nonmetro",
         "Population_Loss"
         ) %>%
  filter(Nonmetro == "Urban", 
         Population_Loss == "Yes")
print(Question1vU)

# A tibble: 4 x 4
  UniqueID County Nonmetro Population_Loss
  <chr>    <chr>  <chr>    <chr>          
1 48065    Carson Urban    Yes            
2 48107    Crosby Urban    Yes            
3 48305    Lynn   Urban    Yes            
4 48359    Oldham Urban    Yes

(4 / 82) * 100

[1] 4.878049

There’s much less population loss for Texas Urban counties. Only 4.8% have experienced some form of population loss in the past decade (2005 - 2015). From this we can conclude that, while rural counties have not met the threshold of substantial population loss, they are 4x more likely to experience population loss than their urban counterparts.

There is more we can look at, and for the Final Project, we shall clean up some of this code to iterate smoother through the data and provide some visualizations / tables. Until then, this should suffice for Homework 3.