library(tidyverse)
library(dplyr)
There are 3142 rows and 35 columns within the county_data data set.
county_data <- read_csv("acs_2015_county_data_revised.csv")
dim(county_data)
## [1] 3142 35
In order to know what data types need to be changed we need to see what are the current data types in the county_data data set.
glimpse(county_data)
## Rows: 3,142
## Columns: 35
## $ census_id <dbl> 1001, 1003, 1005, 1007, 1009, 1011, 1013, 1015, 1017...
## $ state <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama...
## $ county <chr> "Autauga", "Baldwin", "Barbour", "Bibb", "Blount", "...
## $ total_pop <dbl> 55221, 195121, 26932, 22604, 57710, 10678, 20354, 11...
## $ men <dbl> 26745, 95314, 14497, 12073, 28512, 5660, 9502, 56274...
## $ women <dbl> 28476, 99807, 12435, 10531, 29198, 5018, 10852, 6037...
## $ hispanic <dbl> 2.6, 4.5, 4.6, 2.2, 8.6, 4.4, 1.2, 3.5, 0.4, 1.5, 7....
## $ white <dbl> 75.8, 83.1, 46.2, 74.5, 87.9, 22.2, 53.3, 73.0, 57.3...
## $ black <dbl> 18.5, 9.5, 46.7, 21.4, 1.5, 70.7, 43.8, 20.3, 40.3, ...
## $ native <dbl> 0.4, 0.6, 0.2, 0.4, 0.3, 1.2, 0.1, 0.2, 0.2, 0.6, 0....
## $ asian <dbl> 1.0, 0.7, 0.4, 0.1, 0.1, 0.2, 0.4, 0.9, 0.8, 0.3, 0....
## $ pacific <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0....
## $ citizen <dbl> 40725, 147695, 20714, 17495, 42345, 8057, 15581, 886...
## $ income <dbl> 51281, 50254, 32964, 38678, 45813, 31938, 32229, 417...
## $ income_per_cap <dbl> 24974, 27317, 16824, 18431, 20532, 17580, 18390, 213...
## $ poverty <dbl> 12.9, 13.4, 26.7, 16.8, 16.7, 24.6, 25.4, 20.5, 21.6...
## $ child_poverty <dbl> 18.6, 19.2, 45.3, 27.9, 27.2, 38.4, 39.2, 31.6, 37.2...
## $ professional <dbl> 33.2, 33.1, 26.8, 21.5, 28.5, 18.8, 27.5, 27.3, 23.3...
## $ service <dbl> 17.0, 17.7, 16.1, 17.9, 14.1, 15.0, 16.6, 17.7, 14.5...
## $ office <dbl> 24.2, 27.1, 23.1, 17.8, 23.9, 19.7, 21.9, 24.2, 26.3...
## $ construction <dbl> 8.6, 10.8, 10.8, 19.0, 13.5, 20.1, 10.3, 10.5, 11.5,...
## $ production <dbl> 17.1, 11.2, 23.1, 23.7, 19.9, 26.4, 23.7, 20.4, 24.4...
## $ drive <dbl> 87.5, 84.7, 83.8, 83.2, 84.9, 74.9, 84.5, 85.3, 85.1...
## $ carpool <dbl> 8.8, 8.8, 10.9, 13.5, 11.2, 14.9, 12.4, 9.4, 11.9, 1...
## $ transit <dbl> 0.1, 0.1, 0.4, 0.5, 0.4, 0.7, 0.0, 0.2, 0.2, 0.2, 0....
## $ walk <dbl> 0.5, 1.0, 1.8, 0.6, 0.9, 5.0, 0.8, 1.2, 0.3, 0.6, 1....
## $ other_transp <dbl> 1.3, 1.4, 1.5, 1.5, 0.4, 1.7, 0.6, 1.2, 0.4, 0.7, 1....
## $ work_at_home <dbl> 1.8, 3.9, 1.6, 0.7, 2.3, 2.8, 1.7, 2.7, 2.1, 2.5, 1....
## $ mean_commute <dbl> 26.5, 26.4, 24.1, 28.8, 34.9, 27.5, 24.6, 24.1, 25.1...
## $ employed <dbl> 23986, 85953, 8597, 8294, 22189, 3865, 7813, 47401, ...
## $ private_work <dbl> 73.6, 81.5, 71.8, 76.8, 82.0, 79.5, 77.4, 74.1, 85.1...
## $ public_work <dbl> 20.9, 12.3, 20.8, 16.1, 13.5, 15.1, 16.2, 20.8, 12.1...
## $ self_employed <dbl> 5.5, 5.8, 7.3, 6.7, 4.2, 5.4, 6.2, 5.0, 2.8, 7.9, 4....
## $ family_work <dbl> 0.0, 0.4, 0.1, 0.4, 0.4, 0.0, 0.2, 0.1, 0.0, 0.5, 0....
## $ unemployment <dbl> 7.6, 7.5, 17.6, 8.3, 7.7, 18.0, 10.9, 12.3, 8.9, 7.9...
From observations, it appears that all but two of the columns (state and county) have the type dbl, when in reality many of these varibles should be int instead because they are whole numbers. These variables include census_id, total_pop, men, women, citizen, income,income_per_cap, and employed.
To change the type of these variables we need to use the following code:
When we look at the data types in county_data data set again using glimpse we can see that the data types have now changed
glimpse(county_data)
## Rows: 3,142
## Columns: 35
## $ census_id <int> 1001, 1003, 1005, 1007, 1009, 1011, 1013, 1015, 1017...
## $ state <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama...
## $ county <chr> "Autauga", "Baldwin", "Barbour", "Bibb", "Blount", "...
## $ total_pop <int> 55221, 195121, 26932, 22604, 57710, 10678, 20354, 11...
## $ men <int> 26745, 95314, 14497, 12073, 28512, 5660, 9502, 56274...
## $ women <int> 28476, 99807, 12435, 10531, 29198, 5018, 10852, 6037...
## $ hispanic <dbl> 2.6, 4.5, 4.6, 2.2, 8.6, 4.4, 1.2, 3.5, 0.4, 1.5, 7....
## $ white <dbl> 75.8, 83.1, 46.2, 74.5, 87.9, 22.2, 53.3, 73.0, 57.3...
## $ black <dbl> 18.5, 9.5, 46.7, 21.4, 1.5, 70.7, 43.8, 20.3, 40.3, ...
## $ native <dbl> 0.4, 0.6, 0.2, 0.4, 0.3, 1.2, 0.1, 0.2, 0.2, 0.6, 0....
## $ asian <dbl> 1.0, 0.7, 0.4, 0.1, 0.1, 0.2, 0.4, 0.9, 0.8, 0.3, 0....
## $ pacific <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0....
## $ citizen <int> 40725, 147695, 20714, 17495, 42345, 8057, 15581, 886...
## $ income <int> 51281, 50254, 32964, 38678, 45813, 31938, 32229, 417...
## $ income_per_cap <int> 24974, 27317, 16824, 18431, 20532, 17580, 18390, 213...
## $ poverty <dbl> 12.9, 13.4, 26.7, 16.8, 16.7, 24.6, 25.4, 20.5, 21.6...
## $ child_poverty <dbl> 18.6, 19.2, 45.3, 27.9, 27.2, 38.4, 39.2, 31.6, 37.2...
## $ professional <dbl> 33.2, 33.1, 26.8, 21.5, 28.5, 18.8, 27.5, 27.3, 23.3...
## $ service <dbl> 17.0, 17.7, 16.1, 17.9, 14.1, 15.0, 16.6, 17.7, 14.5...
## $ office <dbl> 24.2, 27.1, 23.1, 17.8, 23.9, 19.7, 21.9, 24.2, 26.3...
## $ construction <dbl> 8.6, 10.8, 10.8, 19.0, 13.5, 20.1, 10.3, 10.5, 11.5,...
## $ production <dbl> 17.1, 11.2, 23.1, 23.7, 19.9, 26.4, 23.7, 20.4, 24.4...
## $ drive <dbl> 87.5, 84.7, 83.8, 83.2, 84.9, 74.9, 84.5, 85.3, 85.1...
## $ carpool <dbl> 8.8, 8.8, 10.9, 13.5, 11.2, 14.9, 12.4, 9.4, 11.9, 1...
## $ transit <dbl> 0.1, 0.1, 0.4, 0.5, 0.4, 0.7, 0.0, 0.2, 0.2, 0.2, 0....
## $ walk <dbl> 0.5, 1.0, 1.8, 0.6, 0.9, 5.0, 0.8, 1.2, 0.3, 0.6, 1....
## $ other_transp <dbl> 1.3, 1.4, 1.5, 1.5, 0.4, 1.7, 0.6, 1.2, 0.4, 0.7, 1....
## $ work_at_home <dbl> 1.8, 3.9, 1.6, 0.7, 2.3, 2.8, 1.7, 2.7, 2.1, 2.5, 1....
## $ mean_commute <dbl> 26.5, 26.4, 24.1, 28.8, 34.9, 27.5, 24.6, 24.1, 25.1...
## $ employed <int> 23986, 85953, 8597, 8294, 22189, 3865, 7813, 47401, ...
## $ private_work <dbl> 73.6, 81.5, 71.8, 76.8, 82.0, 79.5, 77.4, 74.1, 85.1...
## $ public_work <dbl> 20.9, 12.3, 20.8, 16.1, 13.5, 15.1, 16.2, 20.8, 12.1...
## $ self_employed <dbl> 5.5, 5.8, 7.3, 6.7, 4.2, 5.4, 6.2, 5.0, 2.8, 7.9, 4....
## $ family_work <dbl> 0.0, 0.4, 0.1, 0.4, 0.4, 0.0, 0.2, 0.1, 0.0, 0.5, 0....
## $ unemployment <dbl> 7.6, 7.5, 17.6, 8.3, 7.7, 18.0, 10.9, 12.3, 8.9, 7.9...
First, we need to check if there are any missing values in the first place?
colSums(is.na(county_data))
## census_id state county total_pop men
## 0 0 0 0 0
## women hispanic white black native
## 0 0 0 0 0
## asian pacific citizen income income_per_cap
## 0 0 0 1 0
## poverty child_poverty professional service office
## 0 1 0 0 0
## construction production drive carpool transit
## 0 0 0 0 0
## walk other_transp work_at_home mean_commute employed
## 0 0 0 0 0
## private_work public_work self_employed family_work unemployment
## 0 0 0 0 0
From this, we can see that there is one missing values within the following variables: income child_poverty
Since there are only 2 occurrences of missing values, I decided to leave the values alone since we are not using them in our study of county_data data set.
We can use the summary function to see if there are any unusual values within the county_data data set.
summary(county_data)
## census_id state county total_pop
## Min. : 1001 Length:3142 Length:3142 Min. : 85
## 1st Qu.:18178 Class :character Class :character 1st Qu.: 11028
## Median :29176 Mode :character Mode :character Median : 25768
## Mean :30384 Mean : 100737
## 3rd Qu.:45081 3rd Qu.: 67552
## Max. :56045 Max. :10038388
##
## men women hispanic white
## Min. : 42 Min. : 43 Min. : 0.000 Min. : 0.90
## 1st Qu.: 5546 1st Qu.: 5466 1st Qu.: 1.900 1st Qu.:65.60
## Median : 12826 Median : 12907 Median : 3.700 Median :84.60
## Mean : 49565 Mean : 51171 Mean : 8.826 Mean :77.28
## 3rd Qu.: 33319 3rd Qu.: 34122 3rd Qu.: 9.000 3rd Qu.:93.30
## Max. :4945351 Max. :5093037 Max. :98.700 Max. :99.80
##
## black native asian pacific
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.00000
## 1st Qu.: 0.600 1st Qu.: 0.100 1st Qu.: 0.200 1st Qu.: 0.00000
## Median : 2.100 Median : 0.300 Median : 0.500 Median : 0.00000
## Mean : 8.879 Mean : 1.766 Mean : 1.258 Mean : 0.08475
## 3rd Qu.:10.175 3rd Qu.: 0.600 3rd Qu.: 1.200 3rd Qu.: 0.00000
## Max. :85.900 Max. :92.100 Max. :41.600 Max. :35.30000
##
## citizen income income_per_cap poverty
## Min. : 80 Min. : 19328 Min. : 8292 Min. : 1.4
## 1st Qu.: 8254 1st Qu.: 38826 1st Qu.:20471 1st Qu.:12.0
## Median : 19434 Median : 45111 Median :23577 Median :16.0
## Mean : 70804 Mean : 46830 Mean :24338 Mean :16.7
## 3rd Qu.: 50728 3rd Qu.: 52250 3rd Qu.:27138 3rd Qu.:20.3
## Max. :6046749 Max. :123453 Max. :65600 Max. :53.3
## NA's :1
## child_poverty professional service office
## Min. : 0.00 Min. :13.50 Min. : 5.00 Min. : 4.10
## 1st Qu.:16.10 1st Qu.:26.70 1st Qu.:15.90 1st Qu.:20.20
## Median :22.50 Median :30.00 Median :18.00 Median :22.40
## Mean :23.29 Mean :31.04 Mean :18.26 Mean :22.13
## 3rd Qu.:29.50 3rd Qu.:34.40 3rd Qu.:20.20 3rd Qu.:24.30
## Max. :72.30 Max. :74.00 Max. :36.60 Max. :35.40
## NA's :1
## construction production drive carpool
## Min. : 1.70 Min. : 0.00 Min. : 5.20 Min. : 0.00
## 1st Qu.: 9.80 1st Qu.:11.53 1st Qu.:76.60 1st Qu.: 8.50
## Median :12.20 Median :15.40 Median :80.60 Median : 9.90
## Mean :12.74 Mean :15.82 Mean :79.08 Mean :10.33
## 3rd Qu.:15.00 3rd Qu.:19.40 3rd Qu.:83.60 3rd Qu.:11.88
## Max. :40.30 Max. :55.60 Max. :94.60 Max. :29.90
##
## transit walk other_transp work_at_home
## Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.1000 1st Qu.: 1.400 1st Qu.: 0.900 1st Qu.: 2.800
## Median : 0.4000 Median : 2.400 Median : 1.300 Median : 4.000
## Mean : 0.9675 Mean : 3.307 Mean : 1.614 Mean : 4.697
## 3rd Qu.: 0.8000 3rd Qu.: 4.000 3rd Qu.: 1.900 3rd Qu.: 5.700
## Max. :61.7000 Max. :71.200 Max. :39.100 Max. :37.200
##
## mean_commute employed private_work public_work
## Min. : 4.90 Min. : 62 Min. :25.00 Min. : 5.80
## 1st Qu.:19.30 1st Qu.: 4524 1st Qu.:70.90 1st Qu.:13.10
## Median :22.90 Median : 10644 Median :75.80 Median :16.10
## Mean :23.15 Mean : 46387 Mean :74.44 Mean :17.35
## 3rd Qu.:26.60 3rd Qu.: 29254 3rd Qu.:79.80 3rd Qu.:20.10
## Max. :44.00 Max. :4635465 Max. :88.30 Max. :66.20
##
## self_employed family_work unemployment
## Min. : 0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.: 5.400 1st Qu.:0.1000 1st Qu.: 5.500
## Median : 6.900 Median :0.2000 Median : 7.500
## Mean : 7.921 Mean :0.2915 Mean : 7.815
## 3rd Qu.: 9.400 3rd Qu.:0.3000 3rd Qu.: 9.700
## Max. :36.600 Max. :9.8000 Max. :29.400
##
An usual value can be seen by observing the minimum and maximum to see if there are any observations within the variable that are out of the desired range or is not in the correct format.
For example, the variable employed is described to be a percentage. However, its maximum is equal to 4635465, and its minimum value is 62, showing that in fact it is not a percentage. To fix this, we must manipulate employed through equations to convert it into percentages for each observation.
county_data <- county_data %>%
mutate(employed = as.double(employed),employed = employed / total_pop)
summary(county_data$employed)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1657 0.3901 0.4367 0.4337 0.4820 0.7624
employed was the only observation found of unusual values.
county_data %>%
mutate(more_women = women - men) %>%
count(more_women > 0)
## # A tibble: 2 x 2
## `more_women > 0` n
## <lgl> <int>
## 1 FALSE 1157
## 2 TRUE 1985
There are 1985 counties that have more women than men.
county_data %>%
count(unemployment < .10)
## # A tibble: 2 x 2
## `unemployment < 0.1` n
## <lgl> <int>
## 1 FALSE 3138
## 2 TRUE 4
There are 4 counties with an unemployment rate that is lower than 10%.
county_data %>%
arrange(mean_commute) %>%
select(census_id, county, state, mean_commute) %>%
top_n(10)
## # A tibble: 10 x 4
## census_id county state mean_commute
## <int> <chr> <chr> <dbl>
## 1 54015 Clay West Virginia 41.4
## 2 36047 Kings New York 41.7
## 3 8093 Park Colorado 42.4
## 4 51193 Westmoreland Virginia 42.5
## 5 36081 Queens New York 42.6
## 6 36085 Richmond New York 42.6
## 7 51187 Warren Virginia 42.7
## 8 24017 Charles Maryland 42.8
## 9 36005 Bronx New York 43
## 10 42103 Pike Pennsylvania 44
The top 10 counties with the highest mean_commute are, respectively: Clay, West Virginia; Kings, New York; Park, Colorado; Westmoreland, Virginia; Queens, New York: Richmond, New York; Warren, New York; Charles, Maryland; Bronx, New York; Pike, Pennsylvania.
county_data %>%
mutate(perc_women = women / total_pop) %>%
arrange(perc_women) %>%
select(census_id, county, state, perc_women) %>%
top_n(-10)
## # A tibble: 10 x 4
## census_id county state perc_women
## <int> <chr> <chr> <dbl>
## 1 42053 Forest Pennsylvania 0.268
## 2 8011 Bent Colorado 0.314
## 3 51183 Sussex Virginia 0.315
## 4 13309 Wheeler Georgia 0.321
## 5 6035 Lassen California 0.332
## 6 48095 Concho Texas 0.333
## 7 13053 Chattahoochee Georgia 0.334
## 8 2013 Aleutians East Borough Alaska 0.335
## 9 22125 West Feliciana Louisiana 0.336
## 10 32027 Pershing Nevada 0.337
The 10 counties with the lowest percentage of women are, respectively: Forest, Pennsylvania; Bent, Colorado; Sussex, Virginia; Wheeler, Georgia; Lassen, California; Concho, Texas; Chattahoochee, Georgia; Aleutians East Borough, Alaska; West Feliciana, Louisiana; Pershing, Nevada.
county_data <- county_data %>%
mutate(sum_races = hispanic + white + black + native + asian + pacific)
county_data %>%
arrange(sum_races) %>%
select(census_id, county, state, sum_races) %>%
top_n(-10)
## # A tibble: 10 x 4
## census_id county state sum_races
## <int> <chr> <chr> <dbl>
## 1 15001 Hawaii Hawaii 76.4
## 2 15009 Maui Hawaii 79.2
## 3 40097 Mayes Oklahoma 79.7
## 4 15003 Honolulu Hawaii 81.5
## 5 40123 Pontotoc Oklahoma 82.8
## 6 47061 Grundy Tennessee 83.
## 7 2282 Yakutat City and Borough Alaska 83.4
## 8 40069 Johnston Oklahoma 84
## 9 15007 Kauai Hawaii 84.1
## 10 40003 Alfalfa Oklahoma 85.1
The top 10 counties with the lowest sum of these race percentage variables respectively are: Hawaii, Hawaii; Maui, Hawaii; Mayes, Oklahoma; Honolulu, Hawaii; Pontotoc, Oklahoma; Grundy, Tennessee; Yakutat City and Borough, Alaska; Johnston, Oklahoma; Kauai, Hawaii; Alfalfa, Oklahoma.
county_data %>%
group_by(state) %>%
summarise(avg_races = mean(sum_races, na.rm = TRUE)) %>%
top_n(-1)
## # A tibble: 1 x 2
## state avg_races
## <chr> <dbl>
## 1 Hawaii 84
Hawaii has the lowest sum of these race percentage variables average.
county_data %>%
arrange(sum_races) %>%
select(census_id, county, state, sum_races) %>%
filter(sum_races > 100)
## # A tibble: 11 x 4
## census_id county state sum_races
## <int> <chr> <chr> <dbl>
## 1 28021 Claiborne Mississippi 100.
## 2 48131 Duval Texas 100.
## 3 48261 Kenedy Texas 100.
## 4 48263 Kent Texas 100.
## 5 48377 Presidio Texas 100.
## 6 49001 Beaver Utah 100.
## 7 31125 Nance Nebraska 100.
## 8 31091 Hooker Nebraska 100.
## 9 48017 Bailey Texas 100.
## 10 48137 Edwards Texas 100.
## 11 31073 Gosper Nebraska 100.
There are 11 counties that have a sum greater than 100%.
county_data %>%
group_by(state) %>%
filter(sum_races == 100) %>%
summarise(sum_races)
## # A tibble: 27 x 2
## # Groups: state [13]
## state sum_races
## <chr> <dbl>
## 1 Alabama 100
## 2 Alabama 100
## 3 Georgia 100
## 4 Georgia 100
## 5 Kansas 100
## 6 Kentucky 100
## 7 Mississippi 100
## 8 Montana 100
## 9 Montana 100
## 10 Montana 100
## # ... with 17 more rows
There are 27 states that have a sum that exactly equal to 100, but because sum_races is by county, some states are listed multiple times.
carpool variable,dplyr::min_rank() function to create a new variable called carpool_rank where the highest ranked county (rank = 1) is the county with the highest carpool value.county_data <- county_data %>%
mutate(carpool_rank = min_rank(desc(carpool))) %>%
arrange(carpool_rank)
county_data %>%
select(census_id, county, state, carpool, carpool_rank) %>%
top_n(-10)
## # A tibble: 10 x 5
## census_id county state carpool carpool_rank
## <int> <chr> <chr> <dbl> <int>
## 1 13061 Clay Georgia 29.9 1
## 2 18087 LaGrange Indiana 27 2
## 3 13165 Jenkins Georgia 25.3 3
## 4 5133 Sevier Arkansas 24.4 4
## 5 20175 Seward Kansas 23.4 5
## 6 48079 Cochran Texas 22.8 6
## 7 48247 Jim Hogg Texas 22.6 7
## 8 48393 Roberts Texas 22.4 8
## 9 39075 Holmes Ohio 21.8 9
## 10 21197 Powell Kentucky 21.6 10
The 10 highest ranked counties for carpooling are respectively: Clay, Georgia; LaGrange, Indiana; Jenkins, Georgia; Sevier, Arkansas; Seward, Kansas; Cochran, Texas; Jim Hogg, Texas; Roberts, Texas; Holmes, Ohio; Powell, Kentucky.
county_data %>%
select(census_id, county, state, carpool, carpool_rank) %>%
top_n(10)
## # A tibble: 11 x 5
## census_id county state carpool carpool_rank
## <int> <chr> <chr> <dbl> <int>
## 1 46069 Hyde South Dakota 2.8 3132
## 2 51720 Norton city Virginia 2.8 3132
## 3 30019 Daniels Montana 2.6 3134
## 4 31057 Dundy Nebraska 2.6 3134
## 5 13309 Wheeler Georgia 2.3 3136
## 6 38029 Emmons North Dakota 2.3 3136
## 7 36061 New York New York 1.9 3138
## 8 31183 Wheeler Nebraska 1.3 3139
## 9 48235 Irion Texas 0.9 3140
## 10 48261 Kenedy Texas 0 3141
## 11 48269 King Texas 0 3141
The 10 lowest ranked counties for carpooling are respectively: Hype, South Dakota; Norton City, Virginia; Daniels, Montana; Dundy, Nebraska;Wheeler, Georgia; Emmons, North Dakota; New York, New York; Wheeler, Nebraska; Irion, Texas; Kenedy, Texas; King, Texas.
county_data %>%
group_by(state) %>%
summarise(avg_carpool = mean(carpool, na.rm = TRUE)) %>%
top_n(1)
## # A tibble: 1 x 2
## state avg_carpool
## <chr> <dbl>
## 1 Alaska 12.1
Alaska is best ranked state for carpooling, on average.
county_data %>%
group_by(state) %>%
summarise(avg_carpool = mean(carpool, na.rm = TRUE)) %>%
top_n(5)
## # A tibble: 5 x 2
## state avg_carpool
## <chr> <dbl>
## 1 Alaska 12.1
## 2 Arkansas 11.9
## 3 Nevada 11.7
## 4 Texas 11.8
## 5 Utah 11.9
The top 5 states for carpooling are Alaska, Arkansa, Utah, Nevada, and Texas.