Background

Apart of the 2024 STAR Scholars Research Program at Drexel University, this markdown showcases the step-by-step process as to how the final analytic data set was generated, using the data provided from Urban Institute’s Upward Mobility Framework (https://upward-mobility.urban.org/mobility-metrics-framework). Ultimately, this data set will contain the most recent data measured for each metric.

This is a key component in the research objective of finding the correlation between the framework’s metrics within each pillar across all the US counties.

Examining the Raw Data

First thing to do is to load the data into R as shown below. Ultimately, the raw data set should have 28,281 observations and 91 variables.

my_data <- read.csv("https://urban-data-catalog.s3.amazonaws.com/drupal-root-live/2023/04/28/00_mobility-metrics_longitudinal.csv")

The next thing to do is to examine the data. That means either getting the first few rows of the data, getting the structure, or getting the summary statistics of the data.

# getting the first few rows
head(my_data)
##   year state county state_name    county_name share_in_preschool
## 1 2014     1      1    Alabama Autauga County                 NA
## 2 2014     1      3    Alabama Baldwin County                 NA
## 3 2014     1      5    Alabama Barbour County                 NA
## 4 2014     1      7    Alabama    Bibb County                 NA
## 5 2014     1      9    Alabama  Blount County                 NA
## 6 2014     1     11    Alabama Bullock County                 NA
##   share_in_preschool_quality rate_learning rate_learning_lb rate_learning_ub
## 1                         NA     0.9788868        0.8690220        1.0887516
## 2                         NA     0.7482612        0.6836774        0.8128450
## 3                         NA     1.0440179        0.8648752        1.2231606
## 4                         NA     0.9113139        0.7220252        1.1006026
## 5                         NA     0.8065096        0.6962805        0.9167386
## 6                         NA     0.5768345        0.2778593        0.8758097
##   rate_learning_quality meps20_black meps20_black_quality meps20_hispanic
## 1                     1    0.3640572                    1       0.2390438
## 2                     1    0.4734082                    1       0.6186122
## 3                     1    0.9208135                    1       0.9242424
## 4                     1    1.0000000                    1       1.0000000
## 5                     1    0.5031447                    1       0.8573508
## 6                     1    1.0000000                    1       1.0000000
##   meps20_hispanic_quality meps20_white meps20_white_quality share_hs_degree
## 1                       1    0.3246145                    1              NA
## 2                       1    0.3776975                    1              NA
## 3                       1    0.9927471                    1              NA
## 4                       1    1.0000000                    1              NA
## 5                       1    0.5788806                    1              NA
## 6                       1    1.0000000                    2              NA
##   share_hs_degree_quality share_digital_access share_digital_access_quality
## 1                      NA                   NA                           NA
## 2                      NA                   NA                           NA
## 3                      NA                   NA                           NA
## 4                      NA                   NA                           NA
## 5                      NA                   NA                           NA
## 6                      NA                   NA                           NA
##   pctl_income_20 pctl_income_50 pctl_income_80 pctl_income_quality
## 1             NA             NA             NA                  NA
## 2             NA             NA             NA                  NA
## 3             NA             NA             NA                  NA
## 4             NA             NA             NA                  NA
## 5             NA             NA             NA                  NA
## 6             NA             NA             NA                  NA
##   share_debt_col share_debt_col_quality share_debt_col_lb share_debt_col_ub
## 1             NA                     NA                NA                NA
## 2             NA                     NA                NA                NA
## 3             NA                     NA                NA                NA
## 4             NA                     NA                NA                NA
## 5             NA                     NA                NA                NA
## 6             NA                     NA                NA                NA
##   ratio_black_nh_house_value_households
## 1                           19.0%:27.2%
## 2                             3.9%:8.2%
## 3                           31.7%:50.8%
## 4                           39.7%:58.2%
## 5                             2.0%:7.3%
## 6                           31.7%:50.8%
##   ratio_black_nh_house_value_households_quality
## 1                                             3
## 2                                             3
## 3                                             3
## 4                                             3
## 5                                             3
## 6                                             3
##   ratio_hispanic_house_value_households
## 1                             1.4%:2.3%
## 2                             1.3%:3.3%
## 3                             4.3%:1.5%
## 4                             0.3%:0.5%
## 5                             0.8%:3.4%
## 6                             4.3%:1.5%
##   ratio_hispanic_house_value_households_quality
## 1                                             3
## 2                                             3
## 3                                             3
## 4                                             3
## 5                                             3
## 6                                             3
##   ratio_other_nh_house_value_households
## 1                             1.5%:1.9%
## 2                             1.1%:1.7%
## 3                             0.8%:0.9%
## 4                             0.3%:0.6%
## 5                             0.6%:0.9%
## 6                             0.8%:0.9%
##   ratio_other_nh_house_value_households_quality
## 1                                             3
## 2                                             3
## 3                                             3
## 4                                             3
## 5                                             3
## 6                                             3
##   ratio_white_nh_house_value_households
## 1                           78.1%:68.7%
## 2                           93.7%:86.8%
## 3                           63.1%:46.7%
## 4                           59.7%:40.8%
## 5                           96.5%:88.5%
## 6                           63.1%:46.7%
##   ratio_white_nh_house_value_households_quality ratio_population_pc_physician
## 1                                             3                          <NA>
## 2                                             1                          <NA>
## 3                                             3                          <NA>
## 4                                             3                          <NA>
## 5                                             2                          <NA>
## 6                                             3                          <NA>
##   ratio_population_pc_physician_quality rate_low_birth_weight
## 1                                    NA                    NA
## 2                                    NA                    NA
## 3                                    NA                    NA
## 4                                    NA                    NA
## 5                                    NA                    NA
## 6                                    NA                    NA
##   rate_low_birth_weight_lb rate_low_birth_weight_ub
## 1                       NA                       NA
## 2                       NA                       NA
## 3                       NA                       NA
## 4                       NA                       NA
## 5                       NA                       NA
## 6                       NA                       NA
##   rate_low_birth_weight_quality rate_injury_death rate_injury_death_lb
## 1                            NA                NA                   NA
## 2                            NA                NA                   NA
## 3                            NA                NA                   NA
## 4                            NA                NA                   NA
## 5                            NA                NA                   NA
## 6                            NA                NA                   NA
##   rate_injury_death_ub rate_injury_death_quality share_affordable_80_ami
## 1                   NA                        NA                1.673513
## 2                   NA                        NA                1.928290
## 3                   NA                        NA                1.565417
## 4                   NA                        NA                1.316445
## 5                   NA                        NA                1.643577
## 6                   NA                        NA                1.565417
##   share_affordable_50_ami share_affordable_30_ami share_affordable_quality
## 1                1.651920                1.674228                        2
## 2                2.311711                2.630169                        1
## 3                1.647200                1.647200                        2
## 4                1.381865                1.469404                        2
## 5                1.749218                1.542356                        2
## 6                1.647200                1.647200                        2
##   count_homeless count_homeless_lb count_homeless_ub share_homeless
## 1             73                73                73    0.007553808
## 2            356               356               356    0.011635507
## 3              1                 0                 2    0.001081081
## 4             31                31                31    0.009234436
## 5            179               179               179    0.018710149
## 6              7                 7                 7    0.004608295
##   homeless_quality share_election_turnout share_election_turnout_quality
## 1                1                     NA                             NA
## 2                1                     NA                             NA
## 3                3                     NA                             NA
## 4                1                     NA                             NA
## 5                1                     NA                             NA
## 6                1                     NA                             NA
##   share_desc_rep_asian_other share_desc_rep_black_nonhispanic
## 1                         NA                               NA
## 2                         NA                               NA
## 3                         NA                               NA
## 4                         NA                               NA
## 5                         NA                               NA
## 6                         NA                               NA
##   share_desc_rep_hispanic share_desc_rep_white_nonhispanic
## 1                      NA                               NA
## 2                      NA                               NA
## 3                      NA                               NA
## 4                      NA                               NA
## 5                      NA                               NA
## 6                      NA                               NA
##   share_poverty_exposure share_poverty_exposure_quality share_black_nh_exposure
## 1                     NA                             NA                      NA
## 2                     NA                             NA                      NA
## 3                     NA                             NA                      NA
## 4                     NA                             NA                      NA
## 5                     NA                             NA                      NA
## 6                     NA                             NA                      NA
##   share_black_nh_exposure_quality share_hispanic_exposure
## 1                              NA                      NA
## 2                              NA                      NA
## 3                              NA                      NA
## 4                              NA                      NA
## 5                              NA                      NA
## 6                              NA                      NA
##   share_hispanic_exposure_quality share_other_nh_exposure
## 1                              NA                      NA
## 2                              NA                      NA
## 3                              NA                      NA
## 4                              NA                      NA
## 5                              NA                      NA
## 6                              NA                      NA
##   share_other_nh_exposure_quality share_white_nh_exposure
## 1                              NA                      NA
## 2                              NA                      NA
## 3                              NA                      NA
## 4                              NA                      NA
## 5                              NA                      NA
## 6                              NA                      NA
##   share_white_nh_exposure_quality transportation_cost index_transit_trips
## 1                              NA                  NA                  NA
## 2                              NA                  NA                  NA
## 3                              NA                  NA                  NA
## 4                              NA                  NA                  NA
## 5                              NA                  NA                  NA
## 6                              NA                  NA                  NA
##   transportation_cost_quality index_transit_trips_quality index_air_quality
## 1                          NA                          NA                 6
## 2                          NA                          NA                38
## 3                          NA                          NA                 9
## 4                          NA                          NA                12
## 5                          NA                          NA                26
## 6                          NA                          NA                 6
##   index_air_quality_quality ratio_high_low_ses_fb_friends
## 1                         1                            NA
## 2                         1                            NA
## 3                         1                            NA
## 4                         1                            NA
## 5                         1                            NA
## 6                         1                            NA
##   ratio_high_low_ses_fb_friends_quality count_membership_associations_per_10k
## 1                                    NA                                    NA
## 2                                    NA                                    NA
## 3                                    NA                                    NA
## 4                                    NA                                    NA
## 5                                    NA                                    NA
## 6                                    NA                                    NA
##   count_membership_associations_per_10k_quality rate_violent_crime
## 1                                            NA                 NA
## 2                                            NA                 NA
## 3                                            NA                 NA
## 4                                            NA                 NA
## 5                                            NA                 NA
## 6                                            NA                 NA
##   rate_property_crime rate_crime_quality rate_juv_arrest
## 1                  NA                 NA              NA
## 2                  NA                 NA              NA
## 3                  NA                 NA              NA
## 4                  NA                 NA              NA
## 5                  NA                 NA              NA
## 6                  NA                 NA              NA
##   rate_juv_arrest_quality share_employed share_employed_quality
## 1                      NA             NA                     NA
## 2                      NA             NA                     NA
## 3                      NA             NA                     NA
## 4                      NA             NA                     NA
## 5                      NA             NA                     NA
## 6                      NA             NA                     NA
##   ratio_average_to_living_wage ratio_average_to_living_wage_quality
## 1                    0.6283695                                    1
## 2                    0.5996189                                    1
## 3                    0.6486415                                    1
## 4                    0.6772858                                    1
## 5                    0.5960493                                    1
## 6                    0.6246241                                    1
# getting the structure of data
str(my_data)
## 'data.frame':    28281 obs. of  91 variables:
##  $ year                                         : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
##  $ state                                        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ county                                       : int  1 3 5 7 9 11 13 15 17 19 ...
##  $ state_name                                   : chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ county_name                                  : chr  "Autauga County" "Baldwin County" "Barbour County" "Bibb County" ...
##  $ share_in_preschool                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_in_preschool_quality                   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_learning                                : num  0.979 0.748 1.044 0.911 0.807 ...
##  $ rate_learning_lb                             : num  0.869 0.684 0.865 0.722 0.696 ...
##  $ rate_learning_ub                             : num  1.089 0.813 1.223 1.101 0.917 ...
##  $ rate_learning_quality                        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ meps20_black                                 : num  0.364 0.473 0.921 1 0.503 ...
##  $ meps20_black_quality                         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ meps20_hispanic                              : num  0.239 0.619 0.924 1 0.857 ...
##  $ meps20_hispanic_quality                      : int  1 1 1 1 1 1 2 1 1 1 ...
##  $ meps20_white                                 : num  0.325 0.378 0.993 1 0.579 ...
##  $ meps20_white_quality                         : int  1 1 1 1 1 2 1 1 1 1 ...
##  $ share_hs_degree                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_hs_degree_quality                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_digital_access                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_digital_access_quality                 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ pctl_income_20                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ pctl_income_50                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ pctl_income_80                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ pctl_income_quality                          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_debt_col                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_debt_col_quality                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_debt_col_lb                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_debt_col_ub                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_black_nh_house_value_households        : chr  "19.0%:27.2%" "3.9%:8.2%" "31.7%:50.8%" "39.7%:58.2%" ...
##  $ ratio_black_nh_house_value_households_quality: int  3 3 3 3 3 3 3 1 3 3 ...
##  $ ratio_hispanic_house_value_households        : chr  "1.4%:2.3%" "1.3%:3.3%" "4.3%:1.5%" "0.3%:0.5%" ...
##  $ ratio_hispanic_house_value_households_quality: int  3 3 3 3 3 3 3 3 3 3 ...
##  $ ratio_other_nh_house_value_households        : chr  "1.5%:1.9%" "1.1%:1.7%" "0.8%:0.9%" "0.3%:0.6%" ...
##  $ ratio_other_nh_house_value_households_quality: int  3 3 3 3 3 3 3 3 3 3 ...
##  $ ratio_white_nh_house_value_households        : chr  "78.1%:68.7%" "93.7%:86.8%" "63.1%:46.7%" "59.7%:40.8%" ...
##  $ ratio_white_nh_house_value_households_quality: int  3 1 3 3 2 3 3 1 3 3 ...
##  $ ratio_population_pc_physician                : chr  NA NA NA NA ...
##  $ ratio_population_pc_physician_quality        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_low_birth_weight                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_low_birth_weight_lb                     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_low_birth_weight_ub                     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_low_birth_weight_quality                : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_injury_death                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_injury_death_lb                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_injury_death_ub                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_injury_death_quality                    : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_affordable_80_ami                      : num  1.67 1.93 1.57 1.32 1.64 ...
##  $ share_affordable_50_ami                      : num  1.65 2.31 1.65 1.38 1.75 ...
##  $ share_affordable_30_ami                      : num  1.67 2.63 1.65 1.47 1.54 ...
##  $ share_affordable_quality                     : int  2 1 2 2 2 2 2 1 2 2 ...
##  $ count_homeless                               : int  73 356 1 31 179 7 189 828 33 456 ...
##  $ count_homeless_lb                            : int  73 356 0 31 179 7 189 827 33 456 ...
##  $ count_homeless_ub                            : int  73 356 2 31 179 7 189 829 33 456 ...
##  $ share_homeless                               : num  0.00755 0.01164 0.00108 0.00923 0.01871 ...
##  $ homeless_quality                             : int  1 1 3 1 1 1 1 1 1 1 ...
##  $ share_election_turnout                       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_election_turnout_quality               : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_asian_other                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_black_nonhispanic             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_hispanic                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_white_nonhispanic             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_poverty_exposure                       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_poverty_exposure_quality               : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_black_nh_exposure                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_black_nh_exposure_quality              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_hispanic_exposure                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_hispanic_exposure_quality              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_other_nh_exposure                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_other_nh_exposure_quality              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_white_nh_exposure                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_white_nh_exposure_quality              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ transportation_cost                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ index_transit_trips                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ transportation_cost_quality                  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ index_transit_trips_quality                  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ index_air_quality                            : int  6 38 9 12 26 6 14 16 12 24 ...
##  $ index_air_quality_quality                    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ ratio_high_low_ses_fb_friends                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_high_low_ses_fb_friends_quality        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ count_membership_associations_per_10k        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ count_membership_associations_per_10k_quality: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_violent_crime                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_property_crime                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_crime_quality                           : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_juv_arrest                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_juv_arrest_quality                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_employed                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_employed_quality                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_average_to_living_wage                 : num  0.628 0.6 0.649 0.677 0.596 ...
##  $ ratio_average_to_living_wage_quality         : int  1 1 1 1 1 1 1 1 1 1 ...
# summary statistics with every year in the data frame 
summary(my_data$year)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2014    2016    2018    2018    2020    2022

Then, you would also look for the most recent year within this data set which should be 2022. This would be the most recent data that you are working with.

# looking at the most recent year, which is 2022
max(my_data$year)
## [1] 2022

However, not every metric was measured in 2022, so instead you will collect data from the most recent year that the metric was measured in. To start, the data dictionary provided by the Urban Institute (https://ui-research.github.io/mobility-from-poverty/00_mobility-metrics_longitudinal.html) should be utilized to determine the most recent year that each metric was measured in.

Extracting Specific Data and Deleting Unneccesary Data

Though every aspect of this data set is vital, we are only working with the most recent data. So the goal is to filter out the data for each metric that was not recently measured within a given year.

Here, we are extracting the data for 2018, creating a separate data frame for the 2018 data, and getting its structure. (Note: the order of which years you are extracting data from does not matter. Also, the “observations” can be interpreted as counties, while the “variables” can be interpreted as the metrics within each pillar of the framework)

# extracting data for year 2018
my_data.2018 <- my_data[my_data$year == 2018, ]

# looking at the structure for the 2018 data
str(my_data.2018)
## 'data.frame':    3142 obs. of  91 variables:
##  $ year                                         : int  2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 ...
##  $ state                                        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ county                                       : int  1 3 5 7 9 11 13 15 17 19 ...
##  $ state_name                                   : chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ county_name                                  : chr  "Autauga County" "Baldwin County" "Barbour County" "Bibb County" ...
##  $ share_in_preschool                           : num  0.194 0.514 0.349 0.456 0.231 ...
##  $ share_in_preschool_quality                   : int  2 3 3 3 3 3 2 3 3 3 ...
##  $ rate_learning                                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_learning_lb                             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_learning_ub                             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_learning_quality                        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ meps20_black                                 : num  0.4 0.39 0.717 0.881 0.196 ...
##  $ meps20_black_quality                         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ meps20_hispanic                              : num  0.333 0.471 0.744 0.636 0.566 ...
##  $ meps20_hispanic_quality                      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ meps20_white                                 : num  0.402 0.233 0.33 0.554 0.398 ...
##  $ meps20_white_quality                         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ share_hs_degree                              : num  0.853 0.812 0.843 0.889 0.904 ...
##  $ share_hs_degree_quality                      : int  3 3 3 3 3 3 2 3 3 3 ...
##  $ share_digital_access                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_digital_access_quality                 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ pctl_income_20                               : num  20161 25327 12157 10131 22592 ...
##  $ pctl_income_50                               : num  53694 55315 34040 28164 52468 ...
##  $ pctl_income_80                               : num  102323 112960 75982 73247 95940 ...
##  $ pctl_income_quality                          : int  2 1 2 2 2 2 2 1 2 2 ...
##  $ share_debt_col                               : num  0.36 0.283 0.421 0.434 0.342 ...
##  $ share_debt_col_quality                       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ share_debt_col_lb                            : num  0.329 0.268 0.371 0.378 0.311 ...
##  $ share_debt_col_ub                            : num  0.391 0.298 0.471 0.49 0.374 ...
##  $ ratio_black_nh_house_value_households        : chr  "17.4%:24.6%" "3.7%:7.9%" "37.4%:51.7%" "39.5%:56.4%" ...
##  $ ratio_black_nh_house_value_households_quality: int  3 3 3 3 3 3 3 1 3 3 ...
##  $ ratio_hispanic_house_value_households        : chr  "0.1%:1.5%" "1.3%:3.4%" "1.5%:2.6%" "0.1%:0.0%" ...
##  $ ratio_hispanic_house_value_households_quality: int  3 3 3 3 3 3 3 3 3 3 ...
##  $ ratio_other_nh_house_value_households        : chr  "0.2%:1.5%" "2.7%:2.6%" "3.0%:3.4%" "2.0%:1.1%" ...
##  $ ratio_other_nh_house_value_households_quality: int  3 3 3 3 3 3 3 3 3 3 ...
##  $ ratio_white_nh_house_value_households        : chr  "82.2%:72.4%" "92.2%:86.0%" "58.0%:42.2%" "58.4%:42.5%" ...
##  $ ratio_white_nh_house_value_households_quality: int  3 1 3 3 2 3 3 1 3 3 ...
##  $ ratio_population_pc_physician                : chr  NA NA NA NA ...
##  $ ratio_population_pc_physician_quality        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_low_birth_weight                        : num  0.1007 0.0857 0.1007 0.1007 0.1007 ...
##  $ rate_low_birth_weight_lb                     : num  0.0968 0.0742 0.0968 0.0968 0.0968 ...
##  $ rate_low_birth_weight_ub                     : num  0.1047 0.0971 0.1047 0.1047 0.1047 ...
##  $ rate_low_birth_weight_quality                : int  3 1 3 3 3 3 3 1 3 3 ...
##  $ rate_injury_death                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_injury_death_lb                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_injury_death_ub                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_injury_death_quality                    : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_affordable_80_ami                      : num  1.68 1.77 1.52 1.37 1.62 ...
##  $ share_affordable_50_ami                      : num  1.58 1.49 1.36 1.51 1.56 ...
##  $ share_affordable_30_ami                      : num  1.53 1.92 1.36 1.39 1.76 ...
##  $ share_affordable_quality                     : int  2 1 2 2 2 2 2 1 2 2 ...
##  $ count_homeless                               : int  58 229 32 29 127 7 86 281 9 262 ...
##  $ count_homeless_lb                            : int  58 229 32 29 127 7 86 281 9 262 ...
##  $ count_homeless_ub                            : int  58 229 32 29 127 7 86 281 9 262 ...
##  $ share_homeless                               : num  0.00638 0.0071 0.0059 0.00896 0.0166 ...
##  $ homeless_quality                             : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ share_election_turnout                       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_election_turnout_quality               : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_asian_other                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_black_nonhispanic             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_hispanic                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_white_nonhispanic             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_poverty_exposure                       : num  0 0 0 0 0 ...
##  $ share_poverty_exposure_quality               : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ share_black_nh_exposure                      : num  0.74 0.8 0.49 0.615 0.97 ...
##  $ share_black_nh_exposure_quality              : int  1 1 1 1 2 1 1 1 1 1 ...
##  $ share_hispanic_exposure                      : num  0.929 0.924 0.935 0.964 0.852 ...
##  $ share_hispanic_exposure_quality              : int  2 1 1 2 1 3 3 1 2 2 ...
##  $ share_other_nh_exposure                      : num  0.958 0.958 0.97 0.987 0.972 ...
##  $ share_other_nh_exposure_quality              : int  1 1 2 2 1 3 2 1 1 2 ...
##  $ share_white_nh_exposure                      : num  0.233 0.158 0.51 0.198 0.121 ...
##  $ share_white_nh_exposure_quality              : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ transportation_cost                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ index_transit_trips                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ transportation_cost_quality                  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ index_transit_trips_quality                  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ index_air_quality                            : int  18 46 18 18 26 15 26 24 15 24 ...
##  $ index_air_quality_quality                    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ ratio_high_low_ses_fb_friends                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_high_low_ses_fb_friends_quality        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ count_membership_associations_per_10k        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ count_membership_associations_per_10k_quality: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_violent_crime                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_property_crime                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_crime_quality                           : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_juv_arrest                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_juv_arrest_quality                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_employed                               : num  0.763 0.816 0.641 0.603 0.721 ...
##  $ share_employed_quality                       : int  2 1 2 2 2 2 2 1 2 2 ...
##  $ ratio_average_to_living_wage                 : num  0.665 0.628 0.687 0.687 0.61 ...
##  $ ratio_average_to_living_wage_quality         : int  1 1 1 1 1 1 1 1 1 1 ...

Then, we delete the metrics with data that is NOT recently measured in 2018. So all the metrics listed below will have data that is not recently measured in 2018. This data frame should have 3,142 observations (or counties) with 19 variables after cleaning the data.

# deleting data from variables that were NOT most recently measured in 2018
my_data.2018$rate_learning <- NULL
my_data.2018$rate_learning_lb <- NULL
my_data.2018$rate_learning_quality <- NULL
my_data.2018$rate_learning_ub <- NULL
my_data.2018$share_affordable_30_ami <- NULL
my_data.2018$share_affordable_50_ami<- NULL
my_data.2018$share_affordable_80_ami<- NULL
my_data.2018$count_homeless <- NULL
my_data.2018$share_homeless <- NULL
my_data.2018$count_homeless_lb <- NULL
my_data.2018$count_homeless_ub <- NULL
my_data.2018$homeless_quality <- NULL
my_data.2018$share_poverty_exposure <- NULL
my_data.2018$share_poverty_exposure_quality <- NULL
my_data.2018$share_black_nh_exposure <- NULL
my_data.2018$share_black_nh_exposure_quality <- NULL
my_data.2018$share_hispanic_exposure <- NULL
my_data.2018$share_hispanic_exposure_quality <- NULL
my_data.2018$share_other_nh_exposure <- NULL
my_data.2018$share_other_nh_exposure_quality <- NULL
my_data.2018$share_white_nh_exposure <- NULL
my_data.2018$share_white_nh_exposure_quality <- NULL
my_data.2018$share_in_preschool <- NULL
my_data.2018$share_in_preschool_quality <- NULL
my_data.2018$share_hs_degree <- NULL
my_data.2018$share_hs_degree_quality <- NULL
my_data.2018$ratio_average_to_living_wage <- NULL
my_data.2018$ratio_average_to_living_wage_quality <-NULL
my_data.2018$share_debt_col <- NULL
my_data.2018$share_debt_col_quality <- NULL
my_data.2018$share_debt_col_lb <- NULL
my_data.2018$share_debt_col_ub <- NULL
my_data.2018$ratio_black_nh_house_value_households <- NULL
my_data.2018$ratio_black_nh_house_value_households_quality <- NULL
my_data.2018$ratio_hispanic_house_value_households <- NULL
my_data.2018$ratio_hispanic_house_value_households_quality <- NULL
my_data.2018$ratio_other_nh_house_value_households <- NULL
my_data.2018$ratio_other_nh_house_value_households_quality <- NULL
my_data.2018$ratio_white_nh_house_value_households <- NULL
my_data.2018$ratio_white_nh_house_value_households_quality <- NULL
my_data.2018$rate_low_birth_weight <- NULL
my_data.2018$rate_low_birth_weight_lb <- NULL
my_data.2018$rate_low_birth_weight_ub <- NULL
my_data.2018$rate_low_birth_weight_quality <- NULL
my_data.2018$count_membership_associations_per_10k <- NULL
my_data.2018$count_membership_associations_per_10k_quality <- NULL
my_data.2018$ratio_high_low_ses_fb_friends <- NULL
my_data.2018$ratio_high_low_ses_fb_friends_quality <- NULL
my_data.2018$transportation_cost <- NULL
my_data.2018$transportation_cost_quality <- NULL  
my_data.2018$index_transit_trips <- NULL 
my_data.2018$index_transit_trips_quality <- NULL
my_data.2018$share_digital_access <- NULL
my_data.2018$share_digital_access_quality <- NULL
my_data.2018$ratio_population_pc_physician <- NULL 
my_data.2018$ratio_population_pc_physician_quality <- NULL  
my_data.2018$rate_injury_death <- NULL  
my_data.2018$rate_injury_death_lb <- NULL
my_data.2018$rate_injury_death_ub <- NULL
my_data.2018$rate_injury_death_quality <- NULL  
my_data.2018$share_election_turnout <- NULL
my_data.2018$share_election_turnout_quality <- NULL
my_data.2018$share_desc_rep_asian_other <- NULL
my_data.2018$share_desc_rep_black_nonhispanic <- NULL
my_data.2018$share_desc_rep_hispanic <- NULL
my_data.2018$share_desc_rep_white_nonhispanic <- NULL
my_data.2018$share_desc_rep_asian_other_quality <- NULL
my_data.2018$rate_violent_crime <- NULL
my_data.2018$rate_property_crime <- NULL
my_data.2018$rate_crime_quality <- NULL
my_data.2018$rate_juv_arrest <- NULL
my_data.2018$rate_juv_arrest_quality <- NULL
my_data.2018$share_affordable_quality <- NULL

That way, the data that was most recently measured in 2018, remains.

For the following years covered, the process is overall the same. Here, we are now extracting data from 2022.

# extracting data for 2022 from the main data frame
my_data.2022 <- my_data[my_data$year == 2022, ]

Then, we are deleting data that was not recently measured in 2022, so the data recently measured in 2020 for each metric remains. The data frame for the 2022 data should have 3,143 observations and 11 variables after cleaning the data.

# deleting data from columns that were not most recently measured in 2022
my_data.2022$rate_violent_crime <- NULL
my_data.2022$rate_property_crime <- NULL
my_data.2022$rate_crime_quality <- NULL
my_data.2022$rate_juv_arrest <- NULL
my_data.2022$rate_juv_arrest_quality <- NULL
my_data.2022$share_affordable_quality <- NULL
my_data.2022$ratio_black_nh_house_value_households <- NULL
my_data.2022$ratio_black_nh_house_value_households_quality <- NULL
my_data.2022$ratio_hispanic_house_value_households <- NULL
my_data.2022$ratio_hispanic_house_value_households_quality <- NULL
my_data.2022$ratio_other_nh_house_value_households <- NULL
my_data.2022$ratio_other_nh_house_value_households_quality <- NULL
my_data.2022$ratio_white_nh_house_value_households <- NULL
my_data.2022$ratio_white_nh_house_value_households_quality <- NULL
my_data.2022$rate_low_birth_weight <- NULL
my_data.2022$rate_low_birth_weight_lb <- NULL
my_data.2022$rate_low_birth_weight_ub <- NULL
my_data.2022$rate_low_birth_weight_quality <- NULL
my_data.2022$count_membership_associations_per_10k <- NULL
my_data.2022$count_membership_associations_per_10k_quality <- NULL

my_data.2022$transportation_cost <- NULL
my_data.2022$transportation_cost_quality <- NULL  
my_data.2022$index_transit_trips <- NULL 
my_data.2022$index_transit_trips_quality <- NULL
my_data.2022$share_digital_access <- NULL
my_data.2022$share_digital_access_quality <- NULL
my_data.2022$ratio_population_pc_physician <- NULL 
my_data.2022$ratio_population_pc_physician_quality <- NULL  
my_data.2022$rate_injury_death <- NULL  
my_data.2022$rate_injury_death_lb <- NULL
my_data.2022$rate_injury_death_ub <- NULL
my_data.2022$rate_injury_death_quality <- NULL  
my_data.2022$share_election_turnout <- NULL
my_data.2022$share_election_turnout_quality <- NULL
my_data.2022$share_desc_rep_asian_other <- NULL
my_data.2022$share_desc_rep_black_nonhispanic <- NULL
my_data.2022$share_desc_rep_hispanic <- NULL
my_data.2022$share_desc_rep_white_nonhispanic <- NULL
my_data.2022$share_desc_rep_asian_other_quality <- NULL
my_data.2022$rate_violent_crime <- NULL
my_data.2022$rate_learning <- NULL
my_data.2022$rate_learning_lb <- NULL
my_data.2022$rate_learning_quality <- NULL
my_data.2022$rate_learning_ub <- NULL
my_data.2022$share_affordable_30_ami <- NULL
my_data.2022$share_affordable_50_ami<- NULL
my_data.2022$share_affordable_80_ami<- NULL
my_data.2022$count_homeless <- NULL
my_data.2022$share_homeless <- NULL
my_data.2022$count_homeless_lb <- NULL
my_data.2022$count_homeless_ub <- NULL
my_data.2022$homeless_quality <- NULL
my_data.2022$share_poverty_exposure <- NULL
my_data.2022$share_poverty_exposure_quality <- NULL
my_data.2022$share_black_nh_exposure <- NULL
my_data.2022$share_black_nh_exposure_quality <- NULL
my_data.2022$share_hispanic_exposure <- NULL
my_data.2022$share_hispanic_exposure_quality <- NULL
my_data.2022$share_other_nh_exposure <- NULL
my_data.2022$share_other_nh_exposure_quality <- NULL
my_data.2022$share_white_nh_exposure <- NULL
my_data.2022$share_white_nh_exposure_quality <- NULL
my_data.2022$share_in_preschool <- NULL
my_data.2022$share_in_preschool_quality <- NULL
my_data.2022$share_hs_degree <- NULL
my_data.2022$share_hs_degree_quality <- NULL
my_data.2022$ratio_average_to_living_wage <- NULL
my_data.2022$ratio_average_to_living_wage_quality <-NULL
my_data.2022$meps20_black <- NULL
my_data.2022$meps20_black_quality <- NULL
my_data.2022$meps20_hispanic <- NULL
my_data.2022$meps20_hispanic_quality <- NULL
my_data.2022$meps20_white <- NULL
my_data.2022$meps20_white_quality <- NULL
my_data.2022$share_employed <- NULL
my_data.2022$share_employed_quality <- NULL
my_data.2022$pctl_income_20 <- NULL
my_data.2022$pctl_income_50 <- NULL
my_data.2022$pctl_income_80 <- NULL
my_data.2022$pctl_income_quality <- NULL
my_data.2022$index_air_quality <- NULL
my_data.2022$index_air_quality_quality <- NULL

Now, we are focusing on 2016: The data frame for the 2016 data should have 3,142 observations and 9 variables after cleaning.

# extracting data from 2016
my_data.2016 <- my_data[my_data$year == 2016, ]

#getting the structure of the data frame for 2016
str(my_data.2016)
## 'data.frame':    3142 obs. of  91 variables:
##  $ year                                         : int  2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
##  $ state                                        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ county                                       : int  1 3 5 7 9 11 13 15 17 19 ...
##  $ state_name                                   : chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ county_name                                  : chr  "Autauga County" "Baldwin County" "Barbour County" "Bibb County" ...
##  $ share_in_preschool                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_in_preschool_quality                   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_learning                                : num  0.949 0.871 0.861 0.916 0.794 ...
##  $ rate_learning_lb                             : num  0.837 0.809 0.699 0.729 0.681 ...
##  $ rate_learning_ub                             : num  1.06 0.934 1.022 1.102 0.906 ...
##  $ rate_learning_quality                        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ meps20_black                                 : num  0.364 0.403 1 0.886 0.38 ...
##  $ meps20_black_quality                         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ meps20_hispanic                              : num  0.272 0.506 1 0.793 0.676 ...
##  $ meps20_hispanic_quality                      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ meps20_white                                 : num  0.333 0.302 1 0.67 0.472 ...
##  $ meps20_white_quality                         : int  1 1 1 1 1 2 1 1 1 1 ...
##  $ share_hs_degree                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_hs_degree_quality                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_digital_access                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_digital_access_quality                 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ pctl_income_20                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ pctl_income_50                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ pctl_income_80                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ pctl_income_quality                          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_debt_col                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_debt_col_quality                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_debt_col_lb                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_debt_col_ub                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_black_nh_house_value_households        : chr  "19.2%:28.5%" "3.3%:6.1%" "36.7%:49.0%" "29.9%:57.6%" ...
##  $ ratio_black_nh_house_value_households_quality: int  3 3 3 3 3 3 3 1 3 3 ...
##  $ ratio_hispanic_house_value_households        : chr  "0.6%:1.5%" "1.4%:4.2%" "0.3%:1.8%" "0.3%:0.4%" ...
##  $ ratio_hispanic_house_value_households_quality: int  3 3 3 3 3 3 3 3 3 3 ...
##  $ ratio_other_nh_house_value_households        : chr  "0.6%:0.9%" "3.2%:2.0%" "1.1%:1.0%" "0.8%:0.6%" ...
##  $ ratio_other_nh_house_value_households_quality: int  3 3 3 3 3 3 3 3 3 3 ...
##  $ ratio_white_nh_house_value_households        : chr  "79.6%:69.0%" "92.1%:87.7%" "61.9%:48.1%" "69.0%:41.4%" ...
##  $ ratio_white_nh_house_value_households_quality: int  3 1 3 3 2 3 3 1 3 3 ...
##  $ ratio_population_pc_physician                : chr  NA NA NA NA ...
##  $ ratio_population_pc_physician_quality        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_low_birth_weight                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_low_birth_weight_lb                     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_low_birth_weight_ub                     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_low_birth_weight_quality                : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_injury_death                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_injury_death_lb                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_injury_death_ub                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_injury_death_quality                    : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_affordable_80_ami                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_affordable_50_ami                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_affordable_30_ami                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_affordable_quality                     : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ count_homeless                               : int  49 383 36 40 146 9 168 394 20 444 ...
##  $ count_homeless_lb                            : int  49 383 36 40 146 9 168 394 20 444 ...
##  $ count_homeless_ub                            : int  49 383 36 40 146 9 168 394 20 444 ...
##  $ share_homeless                               : num  0.00526 0.01225 0.00751 0.01215 0.01572 ...
##  $ homeless_quality                             : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ share_election_turnout                       : num  0.614 0.627 0.514 0.501 0.603 ...
##  $ share_election_turnout_quality               : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ share_desc_rep_asian_other                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_black_nonhispanic             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_hispanic                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_white_nonhispanic             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_poverty_exposure                       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_poverty_exposure_quality               : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_black_nh_exposure                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_black_nh_exposure_quality              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_hispanic_exposure                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_hispanic_exposure_quality              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_other_nh_exposure                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_other_nh_exposure_quality              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_white_nh_exposure                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_white_nh_exposure_quality              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ transportation_cost                          : num  16.43 27.47 1.29 21.53 20.31 ...
##  $ index_transit_trips                          : num  18.5 22.51 7.16 26.37 20.54 ...
##  $ transportation_cost_quality                  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ index_transit_trips_quality                  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ index_air_quality                            : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ index_air_quality_quality                    : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_high_low_ses_fb_friends                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_high_low_ses_fb_friends_quality        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ count_membership_associations_per_10k        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ count_membership_associations_per_10k_quality: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_violent_crime                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_property_crime                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_crime_quality                           : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_juv_arrest                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_juv_arrest_quality                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_employed                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_employed_quality                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_average_to_living_wage                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_average_to_living_wage_quality         : int  NA NA NA NA NA NA NA NA NA NA ...
# deleting data from columns that were NOT most recently measured in 2016
my_data.2016$rate_violent_crime <- NULL
my_data.2016$rate_property_crime <- NULL
my_data.2016$rate_crime_quality <- NULL
my_data.2016$rate_juv_arrest <- NULL
my_data.2016$rate_juv_arrest_quality <- NULL
my_data.2016$share_affordable_quality <- NULL
my_data.2016$ratio_black_nh_house_value_households <- NULL
my_data.2016$ratio_black_nh_house_value_households_quality <- NULL
my_data.2016$ratio_hispanic_house_value_households <- NULL
my_data.2016$ratio_hispanic_house_value_households_quality <- NULL
my_data.2016$ratio_other_nh_house_value_households <- NULL
my_data.2016$ratio_other_nh_house_value_households_quality <- NULL
my_data.2016$ratio_white_nh_house_value_households <- NULL
my_data.2016$ratio_white_nh_house_value_households_quality <- NULL
my_data.2016$rate_low_birth_weight <- NULL
my_data.2016$rate_low_birth_weight_lb <- NULL
my_data.2016$rate_low_birth_weight_ub <- NULL
my_data.2016$rate_low_birth_weight_quality <- NULL
my_data.2016$count_membership_associations_per_10k <- NULL
my_data.2016$count_membership_associations_per_10k_quality <- NULL
my_data.2016$ratio_high_low_ses_fb_friends <- NULL
my_data.2016$ratio_high_low_ses_fb_friends_quality <- NULL

my_data.2016$share_digital_access <- NULL
my_data.2016$share_digital_access_quality <- NULL
my_data.2016$ratio_population_pc_physician <- NULL 
my_data.2016$ratio_population_pc_physician_quality <- NULL  
my_data.2016$rate_injury_death <- NULL  
my_data.2016$rate_injury_death_lb <- NULL
my_data.2016$rate_injury_death_ub <- NULL
my_data.2016$rate_injury_death_quality <- NULL  
my_data.2016$share_election_turnout <- NULL
my_data.2016$share_election_turnout_quality <- NULL
my_data.2016$share_desc_rep_asian_other <- NULL
my_data.2016$share_desc_rep_black_nonhispanic <- NULL
my_data.2016$share_desc_rep_hispanic <- NULL
my_data.2016$share_desc_rep_white_nonhispanic <- NULL
my_data.2016$share_desc_rep_asian_other_quality <- NULL
my_data.2016$rate_violent_crime <- NULL
my_data.2016$rate_learning <- NULL
my_data.2016$rate_learning_lb <- NULL
my_data.2016$rate_learning_quality <- NULL
my_data.2016$rate_learning_ub <- NULL
my_data.2016$share_affordable_30_ami <- NULL
my_data.2016$share_affordable_50_ami<- NULL
my_data.2016$share_affordable_80_ami<- NULL
my_data.2016$count_homeless <- NULL
my_data.2016$share_homeless <- NULL
my_data.2016$count_homeless_lb <- NULL
my_data.2016$count_homeless_ub <- NULL
my_data.2016$homeless_quality <- NULL
my_data.2016$share_poverty_exposure <- NULL
my_data.2016$share_poverty_exposure_quality <- NULL
my_data.2016$share_black_nh_exposure <- NULL
my_data.2016$share_black_nh_exposure_quality <- NULL
my_data.2016$share_hispanic_exposure <- NULL
my_data.2016$share_hispanic_exposure_quality <- NULL
my_data.2016$share_other_nh_exposure <- NULL
my_data.2016$share_other_nh_exposure_quality <- NULL
my_data.2016$share_white_nh_exposure <- NULL
my_data.2016$share_white_nh_exposure_quality <- NULL
my_data.2016$share_in_preschool <- NULL
my_data.2016$share_in_preschool_quality <- NULL
my_data.2016$share_hs_degree <- NULL
my_data.2016$share_hs_degree_quality <- NULL
my_data.2016$ratio_average_to_living_wage <- NULL
my_data.2016$ratio_average_to_living_wage_quality <-NULL
my_data.2016$meps20_black <- NULL
my_data.2016$meps20_black_quality <- NULL
my_data.2016$meps20_hispanic <- NULL
my_data.2016$meps20_hispanic_quality <- NULL
my_data.2016$meps20_white <- NULL
my_data.2016$meps20_white_quality <- NULL
my_data.2016$share_employed <- NULL
my_data.2016$share_employed_quality <- NULL
my_data.2016$pctl_income_20 <- NULL
my_data.2016$pctl_income_50 <- NULL
my_data.2016$pctl_income_80 <- NULL
my_data.2016$pctl_income_quality <- NULL
my_data.2016$index_air_quality <- NULL
my_data.2016$index_air_quality_quality <- NULL
my_data.2016$share_debt_col_lb <- NULL
my_data.2016$share_debt_col_ub <- NULL
my_data.2016$share_debt_col <- NULL
my_data.2016$share_debt_col_quality <- NULL

Next, we focus on 2019: The data frame should have 3,142 observations and 16 variables.

# extracting data for year 2019
my_data.2019 <- my_data[my_data$year == 2019, ]

#getting the structure of the 2019 data frame
str(my_data.2019)
## 'data.frame':    3142 obs. of  91 variables:
##  $ year                                         : int  2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
##  $ state                                        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ county                                       : int  1 3 5 7 9 11 13 15 17 19 ...
##  $ state_name                                   : chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ county_name                                  : chr  "Autauga County" "Baldwin County" "Barbour County" "Bibb County" ...
##  $ share_in_preschool                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_in_preschool_quality                   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_learning                                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_learning_lb                             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_learning_ub                             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_learning_quality                        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ meps20_black                                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ meps20_black_quality                         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ meps20_hispanic                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ meps20_hispanic_quality                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ meps20_white                                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ meps20_white_quality                         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_hs_degree                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_hs_degree_quality                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_digital_access                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_digital_access_quality                 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ pctl_income_20                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ pctl_income_50                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ pctl_income_80                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ pctl_income_quality                          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_debt_col                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_debt_col_quality                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_debt_col_lb                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_debt_col_ub                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_black_nh_house_value_households        : chr  "16.9%:28.9%" "7.2%:8.3%" "42.7%:50.9%" "39.7%:59.6%" ...
##  $ ratio_black_nh_house_value_households_quality: int  3 3 3 3 3 3 3 1 3 3 ...
##  $ ratio_hispanic_house_value_households        : chr  "5.3%:3.1%" "1.2%:3.0%" "3.5%:3.6%" "0.3%:0.6%" ...
##  $ ratio_hispanic_house_value_households_quality: int  3 3 3 3 3 3 3 3 3 3 ...
##  $ ratio_other_nh_house_value_households        : chr  "1.2%:2.0%" "1.7%:1.8%" "2.5%:3.0%" "0.3%:0.6%" ...
##  $ ratio_other_nh_house_value_households_quality: int  3 3 3 3 3 3 3 3 3 3 ...
##  $ ratio_white_nh_house_value_households        : chr  "76.6%:66.0%" "89.8%:86.9%" "51.4%:42.5%" "59.8%:39.2%" ...
##  $ ratio_white_nh_house_value_households_quality: int  3 1 3 3 2 3 3 1 3 3 ...
##  $ ratio_population_pc_physician                : chr  "2235:1" "1450:1" "2743:1" "1723:1" ...
##  $ ratio_population_pc_physician_quality        : int  1 1 1 1 1 2 1 1 1 1 ...
##  $ rate_low_birth_weight                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_low_birth_weight_lb                     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_low_birth_weight_ub                     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_low_birth_weight_quality                : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_injury_death                            : num  68.6 74.8 82.1 103.3 104.8 ...
##  $ rate_injury_death_lb                         : num  58.8 69.7 66.3 84.5 93 ...
##  $ rate_injury_death_ub                         : num  78.3 80 98 122.2 116.5 ...
##  $ rate_injury_death_quality                    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ share_affordable_80_ami                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_affordable_50_ami                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_affordable_30_ami                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_affordable_quality                     : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ count_homeless                               : int  57 167 17 41 104 11 108 250 28 276 ...
##  $ count_homeless_lb                            : int  57 167 16 41 104 11 108 250 27 276 ...
##  $ count_homeless_ub                            : int  57 167 18 41 104 11 108 250 29 276 ...
##  $ share_homeless                               : num  0.00625 0.00509 0.00266 0.01264 0.01359 ...
##  $ homeless_quality                             : int  1 1 3 1 1 1 1 1 2 1 ...
##  $ share_election_turnout                       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_election_turnout_quality               : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_asian_other                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_black_nonhispanic             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_hispanic                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_desc_rep_white_nonhispanic             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_poverty_exposure                       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_poverty_exposure_quality               : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_black_nh_exposure                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_black_nh_exposure_quality              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_hispanic_exposure                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_hispanic_exposure_quality              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_other_nh_exposure                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_other_nh_exposure_quality              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_white_nh_exposure                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_white_nh_exposure_quality              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ transportation_cost                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ index_transit_trips                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ transportation_cost_quality                  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ index_transit_trips_quality                  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ index_air_quality                            : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ index_air_quality_quality                    : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_high_low_ses_fb_friends                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_high_low_ses_fb_friends_quality        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ count_membership_associations_per_10k        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ count_membership_associations_per_10k_quality: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_violent_crime                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_property_crime                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_crime_quality                           : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_juv_arrest                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rate_juv_arrest_quality                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_employed                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_employed_quality                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_average_to_living_wage                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ ratio_average_to_living_wage_quality         : int  NA NA NA NA NA NA NA NA NA NA ...
# deleting data from columns that were NOT most recently measured in 2019
my_data.2019$rate_violent_crime <- NULL
my_data.2019$rate_property_crime <- NULL
my_data.2019$rate_crime_quality <- NULL
my_data.2019$rate_juv_arrest <- NULL
my_data.2019$rate_juv_arrest_quality <- NULL
my_data.2019$share_affordable_quality <- NULL
my_data.2019$ratio_black_nh_house_value_households <- NULL
my_data.2019$ratio_black_nh_house_value_households_quality <- NULL
my_data.2019$ratio_hispanic_house_value_households <- NULL
my_data.2019$ratio_hispanic_house_value_households_quality <- NULL
my_data.2019$ratio_other_nh_house_value_households <- NULL
my_data.2019$ratio_other_nh_house_value_households_quality <- NULL
my_data.2019$ratio_white_nh_house_value_households <- NULL
my_data.2019$ratio_white_nh_house_value_households_quality <- NULL
my_data.2019$rate_low_birth_weight <- NULL
my_data.2019$rate_low_birth_weight_lb <- NULL
my_data.2019$rate_low_birth_weight_ub <- NULL
my_data.2019$rate_low_birth_weight_quality <- NULL
my_data.2019$count_membership_associations_per_10k <- NULL
my_data.2019$count_membership_associations_per_10k_quality <- NULL
my_data.2019$ratio_high_low_ses_fb_friends <- NULL
my_data.2019$ratio_high_low_ses_fb_friends_quality <- NULL

my_data.2019$share_digital_access <- NULL
my_data.2019$share_digital_access_quality <- NULL

  
my_data.2019$share_election_turnout <- NULL
my_data.2019$share_election_turnout_quality <- NULL
my_data.2019$share_desc_rep_asian_other <- NULL
my_data.2019$share_desc_rep_black_nonhispanic <- NULL
my_data.2019$share_desc_rep_hispanic <- NULL
my_data.2019$share_desc_rep_white_nonhispanic <- NULL
my_data.2019$share_desc_rep_asian_other_quality <- NULL
my_data.2019$rate_violent_crime <- NULL
my_data.2019$rate_learning <- NULL
my_data.2019$rate_learning_lb <- NULL
my_data.2019$rate_learning_quality <- NULL
my_data.2019$rate_learning_ub <- NULL
my_data.2019$share_affordable_30_ami <- NULL
my_data.2019$share_affordable_50_ami<- NULL
my_data.2019$share_affordable_80_ami<- NULL


my_data.2019$share_poverty_exposure <- NULL
my_data.2019$share_poverty_exposure_quality <- NULL
my_data.2019$share_black_nh_exposure <- NULL
my_data.2019$share_black_nh_exposure_quality <- NULL
my_data.2019$share_hispanic_exposure <- NULL
my_data.2019$share_hispanic_exposure_quality <- NULL
my_data.2019$share_other_nh_exposure <- NULL
my_data.2019$share_other_nh_exposure_quality <- NULL
my_data.2019$share_white_nh_exposure <- NULL
my_data.2019$share_white_nh_exposure_quality <- NULL
my_data.2019$share_in_preschool <- NULL
my_data.2019$share_in_preschool_quality <- NULL
my_data.2019$share_hs_degree <- NULL
my_data.2019$share_hs_degree_quality <- NULL
my_data.2019$ratio_average_to_living_wage <- NULL
my_data.2019$ratio_average_to_living_wage_quality <-NULL
my_data.2019$meps20_black <- NULL
my_data.2019$meps20_black_quality <- NULL
my_data.2019$meps20_hispanic <- NULL
my_data.2019$meps20_hispanic_quality <- NULL
my_data.2019$meps20_white <- NULL
my_data.2019$meps20_white_quality <- NULL
my_data.2019$share_employed <- NULL
my_data.2019$share_employed_quality <- NULL
my_data.2019$pctl_income_20 <- NULL
my_data.2019$pctl_income_50 <- NULL
my_data.2019$pctl_income_80 <- NULL
my_data.2019$pctl_income_quality <- NULL
my_data.2019$index_air_quality <- NULL
my_data.2019$index_air_quality_quality <- NULL
my_data.2019$share_debt_col_lb <- NULL
my_data.2019$share_debt_col_ub <- NULL
my_data.2019$share_debt_col <- NULL
my_data.2019$share_debt_col_quality <- NULL
my_data.2019$transportation_cost <- NULL
my_data.2019$transportation_cost_quality <- NULL

my_data.2019$index_transit_trips <- NULL
my_data.2019$index_transit_trips_quality <- NULL

Then, we have 2017. The data frame should have 3,142 observations and 19 variables when finished cleaning.

#extracting data from 2017
my_data.2017 <- my_data[my_data$year == 2017, ]
# deleting data from columns that were NOT most recently measured in 2017
my_data.2017$rate_violent_crime <- NULL
my_data.2017$rate_property_crime <- NULL
my_data.2017$rate_crime_quality <- NULL
my_data.2017$rate_juv_arrest <- NULL
my_data.2017$rate_juv_arrest_quality <- NULL
my_data.2017$share_affordable_quality <- NULL
my_data.2017$ratio_black_nh_house_value_households <- NULL
my_data.2017$ratio_black_nh_house_value_households_quality <- NULL
my_data.2017$ratio_hispanic_house_value_households <- NULL
my_data.2017$ratio_hispanic_house_value_households_quality <- NULL
my_data.2017$ratio_other_nh_house_value_households <- NULL
my_data.2017$ratio_other_nh_house_value_households_quality <- NULL
my_data.2017$ratio_white_nh_house_value_households <- NULL
my_data.2017$ratio_white_nh_house_value_households_quality <- NULL
my_data.2017$rate_low_birth_weight <- NULL
my_data.2017$rate_low_birth_weight_lb <- NULL
my_data.2017$rate_low_birth_weight_ub <- NULL
my_data.2017$rate_low_birth_weight_quality <- NULL
my_data.2017$count_membership_associations_per_10k <- NULL
my_data.2017$count_membership_associations_per_10k_quality <- NULL
my_data.2017$ratio_high_low_ses_fb_friends <- NULL
my_data.2017$ratio_high_low_ses_fb_friends_quality <- NULL

my_data.2017$share_digital_access <- NULL
my_data.2017$share_digital_access_quality <- NULL
my_data.2017$ratio_population_pc_physician <- NULL 
my_data.2017$ratio_population_pc_physician_quality <- NULL  
my_data.2017$rate_injury_death <- NULL  
my_data.2017$rate_injury_death_lb <- NULL
my_data.2017$rate_injury_death_ub <- NULL
my_data.2017$rate_injury_death_quality <- NULL  
my_data.2017$share_election_turnout <- NULL
my_data.2017$share_election_turnout_quality <- NULL
my_data.2017$share_desc_rep_asian_other <- NULL
my_data.2017$share_desc_rep_black_nonhispanic <- NULL
my_data.2017$share_desc_rep_hispanic <- NULL
my_data.2017$share_desc_rep_white_nonhispanic <- NULL
my_data.2017$share_desc_rep_asian_other_quality <- NULL
my_data.2017$rate_violent_crime <- NULL

my_data.2017$share_affordable_30_ami <- NULL
my_data.2017$share_affordable_50_ami<- NULL
my_data.2017$share_affordable_80_ami<- NULL
my_data.2017$count_homeless <- NULL
my_data.2017$share_homeless <- NULL
my_data.2017$count_homeless_lb <- NULL
my_data.2017$count_homeless_ub <- NULL
my_data.2017$homeless_quality <- NULL
my_data.2017$share_poverty_exposure <- NULL
my_data.2017$share_poverty_exposure_quality <- NULL
my_data.2017$share_black_nh_exposure <- NULL
my_data.2017$share_black_nh_exposure_quality <- NULL
my_data.2017$share_hispanic_exposure <- NULL
my_data.2017$share_hispanic_exposure_quality <- NULL
my_data.2017$share_other_nh_exposure <- NULL
my_data.2017$share_other_nh_exposure_quality <- NULL
my_data.2017$share_white_nh_exposure <- NULL
my_data.2017$share_white_nh_exposure_quality <- NULL
my_data.2017$share_in_preschool <- NULL
my_data.2017$share_in_preschool_quality <- NULL
my_data.2017$share_hs_degree <- NULL
my_data.2017$share_hs_degree_quality <- NULL
my_data.2017$ratio_average_to_living_wage <- NULL
my_data.2017$ratio_average_to_living_wage_quality <-NULL
my_data.2017$meps20_black <- NULL
my_data.2017$meps20_black_quality <- NULL
my_data.2017$meps20_hispanic <- NULL
my_data.2017$meps20_hispanic_quality <- NULL
my_data.2017$meps20_white <- NULL
my_data.2017$meps20_white_quality <- NULL
my_data.2017$share_employed <- NULL
my_data.2017$share_employed_quality <- NULL
my_data.2017$pctl_income_20 <- NULL

my_data.2017$pctl_income_50 <- NULL
my_data.2017$pctl_income_80 <- NULL
my_data.2017$pctl_income_quality <- NULL
my_data.2017$index_air_quality <- NULL
my_data.2017$index_air_quality_quality <- NULL
my_data.2017$share_debt_col_lb <- NULL
my_data.2017$share_debt_col_ub <- NULL
my_data.2017$share_debt_col <- NULL
my_data.2017$share_debt_col_quality <- NULL
my_data.2017$transportation_cost <- NULL
my_data.2017$transportation_cost_quality <- NULL
my_data.2017$index_transit_trips <- NULL
my_data.2017$index_transit_trips_quality <- NULL

Then, we have 2021. The data frame should have 3,143 observations and 44 variables.

# extracting data from 2021
my_data.2021 <- my_data[my_data$year == 2021, ]
# deleting data from columns that were NOT most recently measured in 2021
my_data.2021$count_homeless <- NULL
my_data.2021$count_homeless_lb <- NULL
my_data.2021$count_homeless_ub <- NULL
my_data.2021$share_homeless <- NULL
my_data.2021$homeless_quality <- NULL
my_data.2021$count_membership_associations_per_10k <- NULL
my_data.2021$count_membership_associations_per_10k_quality <- NULL
my_data.2021$ratio_high_low_ses_fb_friends <- NULL
my_data.2021$ratio_high_low_ses_fb_friends_quality <- NULL

  
my_data.2021$transportation_cost <- NULL
my_data.2021$transportation_cost_quality <- NULL
my_data.2021$index_transit_trips <- NULL
my_data.2021$index_transit_trips_quality <- NULL
my_data.2021$rate_learning <- NULL
my_data.2021$rate_learning_lb <- NULL
my_data.2021$rate_learning_ub <- NULL
my_data.2021$rate_learning_quality <- NULL

my_data.2021$meps20_black <- NULL
my_data.2021$meps20_black_quality <- NULL  
my_data.2021$meps20_hispanic <- NULL  
my_data.2021$meps20_hispanic_quality <- NULL  
my_data.2021$meps20_white <- NULL  
my_data.2021$meps20_white_quality <- NULL  
my_data.2021$share_employed <- NULL
my_data.2021$share_employed_quality <- NULL
my_data.2021$pctl_income_20 <- NULL
my_data.2021$pctl_income_50 <- NULL
my_data.2021$pctl_income_80 <- NULL
my_data.2021$pctl_income_quality <- NULL
my_data.2021$share_debt_col <- NULL
my_data.2021$share_debt_col_lb <- NULL
my_data.2021$share_debt_col_quality <- NULL
my_data.2021$share_debt_col_ub <- NULL
my_data.2021$ratio_population_pc_physician <- NULL
my_data.2021$ratio_population_pc_physician_quality <- NULL  
my_data.2021$rate_low_birth_weight <- NULL
my_data.2021$rate_low_birth_weight_lb <- NULL
my_data.2021$rate_low_birth_weight_ub <- NULL
my_data.2021$rate_low_birth_weight_quality <- NULL
my_data.2021$index_air_quality <- NULL
my_data.2021$index_air_quality_quality <- NULL
my_data.2021$rate_injury_death <- NULL
my_data.2021$rate_injury_death_lb <- NULL
my_data.2021$rate_injury_death <- NULL
my_data.2021$rate_injury_death_quality <- NULL
my_data.2021$rate_injury_death_ub <- NULL
my_data.2021$share_election_turnout <- NULL
my_data.2021$share_election_turnout_quality <- NULL

Lastly, we will clean the data for 2020. This data frame should have 3,143 observations and 13 variables when finished.

#extracting data from 2020
my_data.2020 <- my_data[my_data$year == 2020, ]

# deleting data from columns that were not most recently measured in 2020
my_data.2020$count_homeless <- NULL
my_data.2020$count_homeless_lb <- NULL
my_data.2020$count_homeless_ub <- NULL
my_data.2020$share_homeless <- NULL
my_data.2020$homeless_quality <- NULL

my_data.2020$ratio_high_low_ses_fb_friends <- NULL
my_data.2020$ratio_high_low_ses_fb_friends_quality <- NULL

my_data.2020$transportation_cost <- NULL
my_data.2020$transportation_cost_quality <- NULL
my_data.2020$index_transit_trips <- NULL
my_data.2020$index_transit_trips_quality <- NULL
my_data.2020$rate_learning <- NULL
my_data.2020$rate_learning_lb <- NULL
my_data.2020$rate_learning_ub <- NULL
my_data.2020$rate_learning_quality <- NULL

my_data.2020$meps20_black <- NULL
my_data.2020$meps20_black_quality <- NULL  
my_data.2020$meps20_hispanic <- NULL  
my_data.2020$meps20_hispanic_quality <- NULL  
my_data.2020$meps20_white <- NULL  
my_data.2020$meps20_white_quality <- NULL  
my_data.2020$share_employed <- NULL
my_data.2020$share_employed_quality <- NULL
my_data.2020$pctl_income_20 <- NULL
my_data.2020$pctl_income_50 <- NULL
my_data.2020$pctl_income_80 <- NULL

my_data.2020$pctl_income_quality <- NULL
my_data.2020$share_debt_col <- NULL
my_data.2020$share_debt_col_lb <- NULL
my_data.2020$share_debt_col_quality <- NULL
my_data.2020$share_debt_col_ub <- NULL
my_data.2020$ratio_population_pc_physician <- NULL
my_data.2020$ratio_population_pc_physician_quality <- NULL  

my_data.2020$index_air_quality <- NULL
my_data.2020$index_air_quality_quality <- NULL
my_data.2020$rate_injury_death <- NULL
my_data.2020$rate_injury_death_lb <- NULL
my_data.2020$rate_injury_death <- NULL
my_data.2020$rate_injury_death_quality <- NULL
my_data.2020$rate_injury_death_ub <- NULL

my_data.2020$share_affordable_80_ami <- NULL
my_data.2020$share_affordable_30_ami <- NULL
my_data.2020$share_affordable_50_ami <- NULL
my_data.2020$share_affordable_quality <- NULL

my_data.2020$share_in_preschool <- NULL
my_data.2020$share_in_preschool_quality <- NULL
my_data.2020$share_hs_degree <- NULL
my_data.2020$share_hs_degree_quality <- NULL
my_data.2020$share_digital_access <- NULL
my_data.2020$share_digital_access_quality <- NULL
my_data.2020$ratio_black_nh_house_value_households <- NULL
my_data.2020$ratio_black_nh_house_value_households_quality <- NULL
my_data.2020$ratio_hispanic_house_value_households <- NULL
my_data.2020$ratio_hispanic_house_value_households_quality <- NULL
my_data.2020$ratio_other_nh_households <- NULL
my_data.2020$ratio_other_nh_house_value_households <- NULL
my_data.2020$ratio_other_nh_house_value_households_quality <- NULL
my_data.2020$ratio_white_nh_house_value_households <- NULL
my_data.2020$ratio_white_nh_house_value_households_quality <- NULL
my_data.2020$share_desc_rep_asian_other <- NULL
my_data.2020$share_desc_rep_black_nonhispanic <- NULL
my_data.2020$share_desc_rep_hispanic <- NULL
my_data.2020$share_desc_rep_white_nonhispanic <- NULL
my_data.2020$share_poverty_exposure <- NULL
my_data.2020$share_poverty_exposure_quality <- NULL
my_data.2020$share_black_nh_exposure <- NULL
my_data.2020$share_black_nh_exposure_quality <- NULL
my_data.2020$share_hispanic_exposure <- NULL
my_data.2020$share_hispanic_exposure_quality <- NULL
my_data.2020$share_other_nh_exposure <- NULL
my_data.2020$share_other_nh_exposure_quality <- NULL
my_data.2020$share_white_nh_exposure <- NULL
my_data.2020$share_white_nh_exposure_quality <- NULL
my_data.2020$rate_violent_crime <- NULL
my_data.2020$rate_property_crime <- NULL
my_data.2020$rate_crime_quality <- NULL
my_data.2020$rate_juv_arrest <- NULL
my_data.2020$rate_juv_arrest_quality <- NULL
my_data.2020$ratio_average_to_living_wage <- NULL
my_data.2020$ratio_average_to_living_wage_quality <- NULL

Out of preference and for better data-merging, the programmer deleted the year column for all of the data frames. This means that each data frame should now have one less variable.

# deleting the year column for all variables 
my_data.2016$year <- NULL
my_data.2017$year <- NULL
my_data.2018$year <- NULL
my_data.2019$year <- NULL
my_data.2020$year <- NULL
my_data.2021$year <- NULL
my_data.2022$year <- NULL

Merging the data into a final data set

Finally, we merge the data by columns (vertically join them). First, the tidyverse package is called. Then, a full_join function is used to join every data frame created for each year into one whole data frame. Every data frame is joined by the common/overlapping variables state, county, state name, and county name:

#calling the tidyverse package in order to use the pipe operator %>% 
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# merge the data frames by overlapping variables/common variables which are
# state, county, state_name, and county_name, the final data frame is called result

result <- my_data.2016 %>%
  full_join(my_data.2017, by = c("state", "county","state_name","county_name")) %>%
  full_join(my_data.2018, by = c("state", "county","state_name","county_name")) %>%
  full_join(my_data.2019, by = c("state", "county","state_name","county_name")) %>% 
  full_join(my_data.2020, by = c("state", "county","state_name","county_name")) %>% 
  full_join(my_data.2021, by = c("state", "county","state_name","county_name")) %>% 
  full_join(my_data.2022, by = c("state", "county","state_name","county_name"))

# getting the dimensions of the final analytic data set 
dim(result)
## [1] 3144   90

The final data frame should have 3,144 observations and 90 variables.