Ranking of Nashville-Adjacent County Districts, by Income

Task 1: This code loads the tidyverse library after determining whether or not the library is installed. It then uses the read.csv() function to load the Inc2022.csv data into its corresponding data frame: Inc2022. Lastly, it repeats the read process for the Inc2017.csv data and, naturally, the Inc2017 data frame.

#Task 1
if(!require(tidyverse))install.packages("tidyverse")
library(tidyverse)

Inc2022 <- read.csv("https://drkblake.com/wp-content/uploads/2024/02/Inc2022.csv")
Inc2017 <- read.csv("https://drkblake.com/wp-content/uploads/2024/02/Inc2017.csv")

Task 2: This code uses the left_join() function to add the data from the Inc2022 data frame to that of Inc2017 by their GEOID variables, which served as the “key column” variables. This new hybrid data frame is then stored with the name Income. Finally, the head() function is used to display the first 10 values of Income.

#Task 2
Income <- left_join(Inc2017,
                    Inc2022,
                    by = join_by(GEOID == GEOID))
head(Income, 10)
##         GEOID HHInc2017   District          County HHInc2022   Significance
## 1  4702190022     44091 District 1 Cheatham County     59741    Significant
## 2  4702190212     55247 District 2 Cheatham County     73634    Significant
## 3  4702190402     73802 District 3 Cheatham County     89861 Nonsignificant
## 4  4702190592     49731 District 4 Cheatham County     73293    Significant
## 5  4702190782     60793 District 5 Cheatham County     78380    Significant
## 6  4702190972     66750 District 6 Cheatham County     92305    Significant
## 7  4703790038     56258 District 1 Davidson County     75038    Significant
## 8  4703790228     32487 District 2 Davidson County     54346    Significant
## 9  4703790418     49402 District 3 Davidson County     67953    Significant
## 10 4703790608     79381 District 4 Davidson County    106260    Significant

Task 3: This code adds a new variable, Change, to the Income data frame via the mutate() function. The Change variable is written to equal the values stored in the HH2017 column subtracted from the values in the HH2022 column. Lastly, the head() function mentioned above is reused to display the first 10 values of this updated iteration of Income.

#Task 3
Income <- Income %>%
  mutate(Change = HHInc2022 - HHInc2017)
head(Income, 10)
##         GEOID HHInc2017   District          County HHInc2022   Significance
## 1  4702190022     44091 District 1 Cheatham County     59741    Significant
## 2  4702190212     55247 District 2 Cheatham County     73634    Significant
## 3  4702190402     73802 District 3 Cheatham County     89861 Nonsignificant
## 4  4702190592     49731 District 4 Cheatham County     73293    Significant
## 5  4702190782     60793 District 5 Cheatham County     78380    Significant
## 6  4702190972     66750 District 6 Cheatham County     92305    Significant
## 7  4703790038     56258 District 1 Davidson County     75038    Significant
## 8  4703790228     32487 District 2 Davidson County     54346    Significant
## 9  4703790418     49402 District 3 Davidson County     67953    Significant
## 10 4703790608     79381 District 4 Davidson County    106260    Significant
##    Change
## 1   15650
## 2   18387
## 3   16059
## 4   23562
## 5   17587
## 6   25555
## 7   18780
## 8   21859
## 9   18551
## 10  26879

Task 4: This code reuses the mutate() function to add a new variable called Level. Level is determined by using the case_when() function to give one of two possible values. If the HHInc2022 is greater than or equal to 100,000, this labels the district as “$100k+”; if not, the district is labeled with the value, “<$100k”.

#Task 4
Income <- Income %>%
  mutate(Level = case_when(HHInc2022 >= 100000 ~ "$100k+",
                           HHInc2022 < 100000 ~ "<$100k"))

Task 5: This code uses four functions; three of which are used to alter the Income data frame, which is stored as a new data frame named LevelByCounty. First, the group_by() function groups the Income data frame first by the County variable and then, within each County, by the Level variable. Next, the summarize() function counts how many rows are in Income. After that comes the pivot_wider() function, which reorganizes this data into a table that tracks how many districts are labeled with either Label value. Finally, the fourth function is the aforementioned head() function, now being used to display the first 10 values of LevelByCounty.

#Task 5
LevelByCounty <- Income %>%
  group_by(County, Level) %>%
  summarize(Count = n()) %>%
  pivot_wider(names_from = Level,
              values_from = Count)
head(LevelByCounty, 10)
## # A tibble: 7 × 3
## # Groups:   County [7]
##   County            `<$100k` `$100k+`
##   <chr>                <int>    <int>
## 1 Cheatham County          6       NA
## 2 Davidson County         29        6
## 3 Robertson County        11        1
## 4 Rutherford County       18        3
## 5 Sumner County           10        2
## 6 Williamson County        2       10
## 7 Wilson County           15       10

Task 6: The code for Task 6 is a “return to form” of sorts, going back making new data frames out of the Income data frame. Income is filtered via the filter() function to only list districts that have the “$100k+” Label value before being stored in a new data frame: RichDistricts. The code ends with a head() function display of the first 10 values of this RichDistricts data frame.

#Task 6
RichDistricts <- filter(Income, Level == "$100k+")
head(RichDistricts, 10)
##         GEOID HHInc2017    District            County HHInc2022   Significance
## 1  4703790608     79381  District 4   Davidson County    106260    Significant
## 2  4703794218     86768 District 23   Davidson County    117474    Significant
## 3  4703794408     79959 District 24   Davidson County    110739    Significant
## 4  4703794598     90998 District 25   Davidson County    120206    Significant
## 5  4703796460    128421 District 34   Davidson County    161370    Significant
## 6  4703796650     99666 District 35   Davidson County    117294    Significant
## 7  4714791098     65179  District 6  Robertson County    106058    Significant
## 8  4714991480     75875  District 8 Rutherford County    102862    Significant
## 9  4714991670     77481  District 9 Rutherford County    103006    Significant
## 10 4714992620     95708 District 14 Rutherford County    105750 Nonsignificant
##    Change  Level
## 1   26879 $100k+
## 2   30706 $100k+
## 3   30780 $100k+
## 4   29208 $100k+
## 5   32949 $100k+
## 6   17628 $100k+
## 7   40879 $100k+
## 8   26987 $100k+
## 9   25525 $100k+
## 10  10042 $100k+

Task 7: This code takes the previously mentioned RichDistricts data frame and uses the arrange() function to list the data in order of one of its variables. In this case, the desc() function is used to indicate that the data frame is meant to be arranged in descending order according to the HHInc2022 variable. Finally, the head() function is reused to display the first 10 values of the newly arranged RichDistricts.

#Task 7
RichDistricts <- arrange(RichDistricts, desc(HHInc2022))
head(RichDistricts, 10)
##         GEOID HHInc2017    District            County HHInc2022   Significance
## 1  4718791328    150106  District 7 Williamson County    181709    Significant
## 2  4718791138    154149  District 6 Williamson County    178665    Significant
## 3  4703796460    128421 District 34   Davidson County    161370    Significant
## 4  4718790948    121622  District 5 Williamson County    159737    Significant
## 5  4718791708    117526  District 9 Williamson County    144924    Significant
## 6  4718993040     74528 District 16     Wilson County    125785    Significant
## 7  4718990570     80577  District 3     Wilson County    125324    Significant
## 8  4718790758    127552  District 4 Williamson County    124237 Nonsignificant
## 9  4718790378    101687  District 2 Williamson County    123609    Significant
## 10 4718990380     98846  District 2     Wilson County    120302    Significant
##    Change  Level
## 1   31603 $100k+
## 2   24516 $100k+
## 3   32949 $100k+
## 4   38115 $100k+
## 5   27398 $100k+
## 6   51257 $100k+
## 7   44747 $100k+
## 8   -3315 $100k+
## 9   21922 $100k+
## 10  21456 $100k+

Task 8: I see two notable patterns with this data. First, the only county to not have at least one district in the RichDistricts data frame is Cheatam County, which could imply that Cheatam is the poorest county in the area. Secondly, and more interestingly, there is only one district in the entirety of RichDistricts that had a negative Change value. District 4 in Williamson County had a Change of -3315, which makes it one of four districts that had a “Nonsignificant” Change. Despite this, it still had the eighth highest HHInc2022 value.