(20SS-2nd)BANA 7025(001) Data Wrangling in R

Q1 - Import the data set using a Tidyverse function and NOT with a Base R function. How many rows and columns are in the data set?

library(tidyverse)

acs_country_data <- read_csv("C:/Users/Anjali Shalimar/Documents/acs_2015_county_data_revised.csv")
dim(acs_country_data)

## [1] 3142   35

There are 3142 rows and 35 columns in the dataset

Q2 - Do any data types need changed? Show any code to change variable types and show code/output for a command after you’re finished.

We look at the structure of the dataset

str(acs_country_data)

## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 3142 obs. of  35 variables:
##  $ census_id     : num  1001 1003 1005 1007 1009 ...
##  $ state         : chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ county        : chr  "Autauga" "Baldwin" "Barbour" "Bibb" ...
##  $ total_pop     : num  55221 195121 26932 22604 57710 ...
##  $ men           : num  26745 95314 14497 12073 28512 ...
##  $ women         : num  28476 99807 12435 10531 29198 ...
##  $ hispanic      : num  2.6 4.5 4.6 2.2 8.6 4.4 1.2 3.5 0.4 1.5 ...
##  $ white         : num  75.8 83.1 46.2 74.5 87.9 22.2 53.3 73 57.3 91.7 ...
##  $ black         : num  18.5 9.5 46.7 21.4 1.5 70.7 43.8 20.3 40.3 4.8 ...
##  $ native        : num  0.4 0.6 0.2 0.4 0.3 1.2 0.1 0.2 0.2 0.6 ...
##  $ asian         : num  1 0.7 0.4 0.1 0.1 0.2 0.4 0.9 0.8 0.3 ...
##  $ pacific       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ citizen       : num  40725 147695 20714 17495 42345 ...
##  $ income        : num  51281 50254 32964 38678 45813 ...
##  $ income_per_cap: num  24974 27317 16824 18431 20532 ...
##  $ poverty       : num  12.9 13.4 26.7 16.8 16.7 24.6 25.4 20.5 21.6 19.2 ...
##  $ child_poverty : num  18.6 19.2 45.3 27.9 27.2 38.4 39.2 31.6 37.2 30.1 ...
##  $ professional  : num  33.2 33.1 26.8 21.5 28.5 18.8 27.5 27.3 23.3 29.3 ...
##  $ service       : num  17 17.7 16.1 17.9 14.1 15 16.6 17.7 14.5 16 ...
##  $ office        : num  24.2 27.1 23.1 17.8 23.9 19.7 21.9 24.2 26.3 19.5 ...
##  $ construction  : num  8.6 10.8 10.8 19 13.5 20.1 10.3 10.5 11.5 13.7 ...
##  $ production    : num  17.1 11.2 23.1 23.7 19.9 26.4 23.7 20.4 24.4 21.5 ...
##  $ drive         : num  87.5 84.7 83.8 83.2 84.9 74.9 84.5 85.3 85.1 83.9 ...
##  $ carpool       : num  8.8 8.8 10.9 13.5 11.2 14.9 12.4 9.4 11.9 12.1 ...
##  $ transit       : num  0.1 0.1 0.4 0.5 0.4 0.7 0 0.2 0.2 0.2 ...
##  $ walk          : num  0.5 1 1.8 0.6 0.9 5 0.8 1.2 0.3 0.6 ...
##  $ other_transp  : num  1.3 1.4 1.5 1.5 0.4 1.7 0.6 1.2 0.4 0.7 ...
##  $ work_at_home  : num  1.8 3.9 1.6 0.7 2.3 2.8 1.7 2.7 2.1 2.5 ...
##  $ mean_commute  : num  26.5 26.4 24.1 28.8 34.9 27.5 24.6 24.1 25.1 27.4 ...
##  $ employed      : num  23986 85953 8597 8294 22189 ...
##  $ private_work  : num  73.6 81.5 71.8 76.8 82 79.5 77.4 74.1 85.1 73.1 ...
##  $ public_work   : num  20.9 12.3 20.8 16.1 13.5 15.1 16.2 20.8 12.1 18.5 ...
##  $ self_employed : num  5.5 5.8 7.3 6.7 4.2 5.4 6.2 5 2.8 7.9 ...
##  $ family_work   : num  0 0.4 0.1 0.4 0.4 0 0.2 0.1 0 0.5 ...
##  $ unemployment  : num  7.6 7.5 17.6 8.3 7.7 18 10.9 12.3 8.9 7.9 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   census_id = col_double(),
##   ..   state = col_character(),
##   ..   county = col_character(),
##   ..   total_pop = col_double(),
##   ..   men = col_double(),
##   ..   women = col_double(),
##   ..   hispanic = col_double(),
##   ..   white = col_double(),
##   ..   black = col_double(),
##   ..   native = col_double(),
##   ..   asian = col_double(),
##   ..   pacific = col_double(),
##   ..   citizen = col_double(),
##   ..   income = col_double(),
##   ..   income_per_cap = col_double(),
##   ..   poverty = col_double(),
##   ..   child_poverty = col_double(),
##   ..   professional = col_double(),
##   ..   service = col_double(),
##   ..   office = col_double(),
##   ..   construction = col_double(),
##   ..   production = col_double(),
##   ..   drive = col_double(),
##   ..   carpool = col_double(),
##   ..   transit = col_double(),
##   ..   walk = col_double(),
##   ..   other_transp = col_double(),
##   ..   work_at_home = col_double(),
##   ..   mean_commute = col_double(),
##   ..   employed = col_double(),
##   ..   private_work = col_double(),
##   ..   public_work = col_double(),
##   ..   self_employed = col_double(),
##   ..   family_work = col_double(),
##   ..   unemployment = col_double()
##   .. )

We see that the state and country have datatype as chr. We have modified this to factor to fit perfectly with our future modelling of the dataset.

acs_country_data$state <- as.factor(acs_country_data$state)
acs_country_data$county <- as.factor(acs_country_data$county)

Rest all datatypes look perfect. We do not require to rename any attributes as they clearly demonstrate the meaning of the dataset.

glimpse(acs_country_data)

## Observations: 3,142
## Variables: 35
## $ census_id      <dbl> 1001, 1003, 1005, 1007, 1009, 1011, 1013, 1015, 1017...
## $ state          <fct> Alabama, Alabama, Alabama, Alabama, Alabama, Alabama...
## $ county         <fct> Autauga, Baldwin, Barbour, Bibb, Blount, Bullock, Bu...
## $ total_pop      <dbl> 55221, 195121, 26932, 22604, 57710, 10678, 20354, 11...
## $ men            <dbl> 26745, 95314, 14497, 12073, 28512, 5660, 9502, 56274...
## $ women          <dbl> 28476, 99807, 12435, 10531, 29198, 5018, 10852, 6037...
## $ hispanic       <dbl> 2.6, 4.5, 4.6, 2.2, 8.6, 4.4, 1.2, 3.5, 0.4, 1.5, 7....
## $ white          <dbl> 75.8, 83.1, 46.2, 74.5, 87.9, 22.2, 53.3, 73.0, 57.3...
## $ black          <dbl> 18.5, 9.5, 46.7, 21.4, 1.5, 70.7, 43.8, 20.3, 40.3, ...
## $ native         <dbl> 0.4, 0.6, 0.2, 0.4, 0.3, 1.2, 0.1, 0.2, 0.2, 0.6, 0....
## $ asian          <dbl> 1.0, 0.7, 0.4, 0.1, 0.1, 0.2, 0.4, 0.9, 0.8, 0.3, 0....
## $ pacific        <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0....
## $ citizen        <dbl> 40725, 147695, 20714, 17495, 42345, 8057, 15581, 886...
## $ income         <dbl> 51281, 50254, 32964, 38678, 45813, 31938, 32229, 417...
## $ income_per_cap <dbl> 24974, 27317, 16824, 18431, 20532, 17580, 18390, 213...
## $ poverty        <dbl> 12.9, 13.4, 26.7, 16.8, 16.7, 24.6, 25.4, 20.5, 21.6...
## $ child_poverty  <dbl> 18.6, 19.2, 45.3, 27.9, 27.2, 38.4, 39.2, 31.6, 37.2...
## $ professional   <dbl> 33.2, 33.1, 26.8, 21.5, 28.5, 18.8, 27.5, 27.3, 23.3...
## $ service        <dbl> 17.0, 17.7, 16.1, 17.9, 14.1, 15.0, 16.6, 17.7, 14.5...
## $ office         <dbl> 24.2, 27.1, 23.1, 17.8, 23.9, 19.7, 21.9, 24.2, 26.3...
## $ construction   <dbl> 8.6, 10.8, 10.8, 19.0, 13.5, 20.1, 10.3, 10.5, 11.5,...
## $ production     <dbl> 17.1, 11.2, 23.1, 23.7, 19.9, 26.4, 23.7, 20.4, 24.4...
## $ drive          <dbl> 87.5, 84.7, 83.8, 83.2, 84.9, 74.9, 84.5, 85.3, 85.1...
## $ carpool        <dbl> 8.8, 8.8, 10.9, 13.5, 11.2, 14.9, 12.4, 9.4, 11.9, 1...
## $ transit        <dbl> 0.1, 0.1, 0.4, 0.5, 0.4, 0.7, 0.0, 0.2, 0.2, 0.2, 0....
## $ walk           <dbl> 0.5, 1.0, 1.8, 0.6, 0.9, 5.0, 0.8, 1.2, 0.3, 0.6, 1....
## $ other_transp   <dbl> 1.3, 1.4, 1.5, 1.5, 0.4, 1.7, 0.6, 1.2, 0.4, 0.7, 1....
## $ work_at_home   <dbl> 1.8, 3.9, 1.6, 0.7, 2.3, 2.8, 1.7, 2.7, 2.1, 2.5, 1....
## $ mean_commute   <dbl> 26.5, 26.4, 24.1, 28.8, 34.9, 27.5, 24.6, 24.1, 25.1...
## $ employed       <dbl> 23986, 85953, 8597, 8294, 22189, 3865, 7813, 47401, ...
## $ private_work   <dbl> 73.6, 81.5, 71.8, 76.8, 82.0, 79.5, 77.4, 74.1, 85.1...
## $ public_work    <dbl> 20.9, 12.3, 20.8, 16.1, 13.5, 15.1, 16.2, 20.8, 12.1...
## $ self_employed  <dbl> 5.5, 5.8, 7.3, 6.7, 4.2, 5.4, 6.2, 5.0, 2.8, 7.9, 4....
## $ family_work    <dbl> 0.0, 0.4, 0.1, 0.4, 0.4, 0.0, 0.2, 0.1, 0.0, 0.5, 0....
## $ unemployment   <dbl> 7.6, 7.5, 17.6, 8.3, 7.7, 18.0, 10.9, 12.3, 8.9, 7.9...

As specified we have use the function glimpse(). Its a combination of both dim(),str() and head() functions. It gives us the dimensions of the dataset, the structure as well as the first few observations of the datasets.

Q3 - Are there any missing values? How will you handle missing values? Will you impute a missing value with, for example, a mean or median value for the entire column, or will you remove the entire observation? Give a rationale for your decision and show any code/output to handle missing values.

sum(is.na(acs_country_data))

## [1] 2

There are 2 missing values in the entire dataset.

apply(is.na(acs_country_data),2,sum)

##      census_id          state         county      total_pop            men 
##              0              0              0              0              0 
##          women       hispanic          white          black         native 
##              0              0              0              0              0 
##          asian        pacific        citizen         income income_per_cap 
##              0              0              0              1              0 
##        poverty  child_poverty   professional        service         office 
##              0              1              0              0              0 
##   construction     production          drive        carpool        transit 
##              0              0              0              0              0 
##           walk   other_transp   work_at_home   mean_commute       employed 
##              0              0              0              0              0 
##   private_work    public_work  self_employed    family_work   unemployment 
##              0              0              0              0              0

From the above observation, there is one missing value in income and 1 in child_poverty. We can remove this two observations as this won’t impact much moving forward with data modelling.

acs_country_data <- acs_country_data %>% filter(is.na(income)==FALSE)
acs_country_data <- acs_country_data %>% filter(is.na(child_poverty)==FALSE)

sum(is.na(acs_country_data))

## [1] 0

apply(is.na(acs_country_data),2,sum)

##      census_id          state         county      total_pop            men 
##              0              0              0              0              0 
##          women       hispanic          white          black         native 
##              0              0              0              0              0 
##          asian        pacific        citizen         income income_per_cap 
##              0              0              0              0              0 
##        poverty  child_poverty   professional        service         office 
##              0              0              0              0              0 
##   construction     production          drive        carpool        transit 
##              0              0              0              0              0 
##           walk   other_transp   work_at_home   mean_commute       employed 
##              0              0              0              0              0 
##   private_work    public_work  self_employed    family_work   unemployment 
##              0              0              0              0              0

If we verify we can see that the missing values are been handled.

Q4 - Summary of the Dataset

The dataset definitely contains skewed numeric variables. Though we are not removing the outlier values in variables such as population, ‘women’ , ‘men’ etc. at this point as we would need further information to ensure these values are infact data descrepancies. Since numeric variables such proportion of a particular race in a county can vary widely, we have currently not removed the extreme values from the dataset .

summary(acs_country_data)

##    census_id          state             county       total_pop       
##  Min.   : 1001   Texas   : 253   Washington:  31   Min.   :     267  
##  1st Qu.:18179   Georgia : 159   Jefferson :  26   1st Qu.:   11036  
##  Median :29176   Virginia: 133   Franklin  :  25   Median :   25793  
##  Mean   :30383   Kentucky: 120   Jackson   :  24   Mean   :  100801  
##  3rd Qu.:45080   Missouri: 115   Lincoln   :  24   3rd Qu.:   67620  
##  Max.   :56045   Kansas  : 105   Madison   :  20   Max.   :10038388  
##                  (Other) :2255   (Other)   :2990                     
##       men              women            hispanic          white      
##  Min.   :    136   Min.   :    131   Min.   : 0.000   Min.   : 0.90  
##  1st Qu.:   5551   1st Qu.:   5488   1st Qu.: 1.900   1st Qu.:65.67  
##  Median :  12838   Median :  12916   Median : 3.700   Median :84.65  
##  Mean   :  49597   Mean   :  51204   Mean   : 8.819   Mean   :77.31  
##  3rd Qu.:  33328   3rd Qu.:  34123   3rd Qu.: 9.000   3rd Qu.:93.33  
##  Max.   :4945351   Max.   :5093037   Max.   :98.700   Max.   :99.80  
##                                                                      
##      black            native           asian           pacific        
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.00000  
##  1st Qu.: 0.600   1st Qu.: 0.100   1st Qu.: 0.200   1st Qu.: 0.00000  
##  Median : 2.100   Median : 0.300   Median : 0.500   Median : 0.00000  
##  Mean   : 8.885   Mean   : 1.763   Mean   : 1.253   Mean   : 0.07357  
##  3rd Qu.:10.200   3rd Qu.: 0.600   3rd Qu.: 1.200   3rd Qu.: 0.00000  
##  Max.   :85.900   Max.   :92.100   Max.   :41.600   Max.   :11.10000  
##                                                                       
##     citizen            income       income_per_cap     poverty    
##  Min.   :    199   Min.   : 19328   Min.   : 8292   Min.   : 1.4  
##  1st Qu.:   8276   1st Qu.: 38826   1st Qu.:20470   1st Qu.:12.0  
##  Median :  19454   Median : 45095   Median :23575   Median :16.0  
##  Mean   :  70849   Mean   : 46824   Mean   :24331   Mean   :16.7  
##  3rd Qu.:  50795   3rd Qu.: 52248   3rd Qu.:27138   3rd Qu.:20.3  
##  Max.   :6046749   Max.   :123453   Max.   :65600   Max.   :53.3  
##                                                                   
##  child_poverty    professional      service          office     
##  Min.   : 0.00   Min.   :13.50   Min.   : 5.00   Min.   : 4.10  
##  1st Qu.:16.10   1st Qu.:26.70   1st Qu.:15.90   1st Qu.:20.20  
##  Median :22.50   Median :30.00   Median :18.00   Median :22.40  
##  Mean   :23.29   Mean   :31.05   Mean   :18.25   Mean   :22.13  
##  3rd Qu.:29.50   3rd Qu.:34.42   3rd Qu.:20.20   3rd Qu.:24.30  
##  Max.   :72.30   Max.   :74.00   Max.   :36.60   Max.   :35.40  
##                                                                 
##   construction     production        drive         carpool     
##  Min.   : 1.70   Min.   : 0.00   Min.   : 5.2   Min.   : 0.00  
##  1st Qu.: 9.80   1st Qu.:11.50   1st Qu.:76.6   1st Qu.: 8.50  
##  Median :12.20   Median :15.40   Median :80.6   Median : 9.90  
##  Mean   :12.75   Mean   :15.82   Mean   :79.1   Mean   :10.33  
##  3rd Qu.:15.00   3rd Qu.:19.40   3rd Qu.:83.6   3rd Qu.:11.90  
##  Max.   :40.30   Max.   :55.60   Max.   :94.6   Max.   :29.90  
##                                                                
##     transit             walk         other_transp    work_at_home   
##  Min.   : 0.0000   Min.   : 0.000   Min.   : 0.00   Min.   : 0.000  
##  1st Qu.: 0.1000   1st Qu.: 1.400   1st Qu.: 0.90   1st Qu.: 2.800  
##  Median : 0.4000   Median : 2.400   Median : 1.30   Median : 4.000  
##  Mean   : 0.9681   Mean   : 3.294   Mean   : 1.61   Mean   : 4.694  
##  3rd Qu.: 0.8000   3rd Qu.: 4.000   3rd Qu.: 1.90   3rd Qu.: 5.700  
##  Max.   :61.7000   Max.   :71.200   Max.   :39.10   Max.   :37.200  
##                                                                     
##   mean_commute      employed        private_work    public_work   
##  Min.   : 4.90   Min.   :    166   Min.   :29.50   Min.   : 5.80  
##  1st Qu.:19.30   1st Qu.:   4532   1st Qu.:70.90   1st Qu.:13.07  
##  Median :22.90   Median :  10657   Median :75.85   Median :16.10  
##  Mean   :23.15   Mean   :  46416   Mean   :74.45   Mean   :17.33  
##  3rd Qu.:26.60   3rd Qu.:  29272   3rd Qu.:79.80   3rd Qu.:20.10  
##  Max.   :44.00   Max.   :4635465   Max.   :88.30   Max.   :66.20  
##                                                                   
##  self_employed     family_work      unemployment   
##  Min.   : 0.000   Min.   :0.0000   Min.   : 0.000  
##  1st Qu.: 5.400   1st Qu.:0.1000   1st Qu.: 5.500  
##  Median : 6.900   Median :0.2000   Median : 7.500  
##  Mean   : 7.922   Mean   :0.2917   Mean   : 7.815  
##  3rd Qu.: 9.400   3rd Qu.:0.3000   3rd Qu.: 9.700  
##  Max.   :36.600   Max.   :9.8000   Max.   :29.400  
##

Q5 - How many counties have more women than men

For this question the analysis should count a combination of county and state, as multiple states would have the same county name.There are 1984 county-state combinations where number of women are greater than men.

nrow(subset(acs_country_data, women > men))

## [1] 1984

Q6 - How many counties have an unemployment rate lower than 10%?

There are 2419 county-state combinations with unemployment rate lower than 10%.

nrow(subset(acs_country_data, unemployment < 10.0))

## [1] 2419

Q7 - What are the top 10 counties with the highest mean commute?

top_n(acs_country_data[order(acs_country_data$mean_commute,decreasing = T),c(1,2,3,29)],10)

## Selecting by mean_commute

## # A tibble: 10 x 4
##    census_id state         county       mean_commute
##        <dbl> <fct>         <fct>               <dbl>
##  1     42103 Pennsylvania  Pike                 44  
##  2     36005 New York      Bronx                43  
##  3     24017 Maryland      Charles              42.8
##  4     51187 Virginia      Warren               42.7
##  5     36081 New York      Queens               42.6
##  6     36085 New York      Richmond             42.6
##  7     51193 Virginia      Westmoreland         42.5
##  8      8093 Colorado      Park                 42.4
##  9     36047 New York      Kings                41.7
## 10     54015 West Virginia Clay                 41.4

Q8- Top ten counties with lowest percentage of women

acs_country_data$percent_women=(acs_country_data$women/acs_country_data$total_pop)*100
acs_women_df<-acs_country_data[order(acs_country_data$percent_women),]
acs_women_df<-select(acs_women_df,1,2,3,36)
head(acs_women_df,10)

## # A tibble: 10 x 4
##    census_id state        county                 percent_women
##        <dbl> <fct>        <fct>                          <dbl>
##  1     42053 Pennsylvania Forest                          26.8
##  2      8011 Colorado     Bent                            31.4
##  3     51183 Virginia     Sussex                          31.5
##  4     13309 Georgia      Wheeler                         32.1
##  5      6035 California   Lassen                          33.2
##  6     48095 Texas        Concho                          33.3
##  7     13053 Georgia      Chattahoochee                   33.4
##  8      2013 Alaska       Aleutians East Borough          33.5
##  9     22125 Louisiana    West Feliciana                  33.6
## 10     32027 Nevada       Pershing                        33.7

Q9- Sum of Race Percentages

Part A: Top Ten lowest race percentage sums

acs_country_data$percent_race=rowSums(acs_country_data[, c(7:12)])
acs_race_df<-acs_country_data[order(acs_country_data$percent_race),]
acs_race_df<-select(acs_race_df,3,37)
head(acs_race_df,10)

## # A tibble: 10 x 2
##    county                   percent_race
##    <fct>                           <dbl>
##  1 Hawaii                           76.4
##  2 Maui                             79.2
##  3 Mayes                            79.7
##  4 Honolulu                         81.5
##  5 Pontotoc                         82.8
##  6 Grundy                           83  
##  7 Yakutat City and Borough         83.4
##  8 Johnston                         84  
##  9 Kauai                            84.1
## 10 Alfalfa                          85.1

The top ten countries with teh lowest racal percentage sums are Hawaii, Maui, Mayes, Honolulu, Pontotoc, Grundy, Yakutat City and Borough, Johnston, Kauai and Alfalfa.

Part B: Top ten states with lowest race percentage sums

acs_race_state_df=group_by(acs_country_data, state) %>% summarize(average_race_percent = mean(percent_race))
acs_race_state_df=acs_race_state_df[order(acs_race_state_df$average_race_percent),]
head(acs_race_state_df,1)

## # A tibble: 1 x 2
##   state  average_race_percent
##   <fct>                 <dbl>
## 1 Hawaii                 80.3

Hawaii has the lowest average racial percentage sum at 84%

Part C: counties with sum greater than 100%

nrow(subset(acs_country_data,percent_race>100))

## [1] 5

5 Entries have a racial percentage sum above 100%

Part D: Counties with sum at 100%

nrow(subset(acs_country_data,percent_race==100))

## [1] 38

38 countires have a racial percentage sum of exactly 100%

Question 10: Carpool Rank

Part A: Generating Carpool Rank

acs_country_data$carpool_rank=min_rank(-(acs_country_data$carpool))
acs_carpool_data=acs_country_data[order(acs_country_data$carpool_rank),]

Part B: Top 10 highest ranked countries for carpooling

acs_carpool<-select(acs_carpool_data,1,2,3,24,38)
head(acs_carpool,10)

## # A tibble: 10 x 5
##    census_id state    county   carpool carpool_rank
##        <dbl> <fct>    <fct>      <dbl>        <int>
##  1     13061 Georgia  Clay        29.9            1
##  2     18087 Indiana  LaGrange    27              2
##  3     13165 Georgia  Jenkins     25.3            3
##  4      5133 Arkansas Sevier      24.4            4
##  5     20175 Kansas   Seward      23.4            5
##  6     48079 Texas    Cochran     22.8            6
##  7     48247 Texas    Jim Hogg    22.6            7
##  8     48393 Texas    Roberts     22.4            8
##  9     39075 Ohio     Holmes      21.8            9
## 10     21197 Kentucky Powell      21.6           10

Top 10 highest ranked countries for carpooling are Clay, LaGrange, Jenkins, Sevier, Seward, Cochran, Jim Hogg, Roberts, Holmes, and Powell

Part C: Top 10 lowest ranked Countries for Carpooling

tail(acs_carpool,10)

## # A tibble: 10 x 5
##    census_id state        county      carpool carpool_rank
##        <dbl> <fct>        <fct>         <dbl>        <int>
##  1     51720 Virginia     Norton city     2.8         3130
##  2     30019 Montana      Daniels         2.6         3132
##  3     31057 Nebraska     Dundy           2.6         3132
##  4     13309 Georgia      Wheeler         2.3         3134
##  5     38029 North Dakota Emmons          2.3         3134
##  6     36061 New York     New York        1.9         3136
##  7     31183 Nebraska     Wheeler         1.3         3137
##  8     48235 Texas        Irion           0.9         3138
##  9     48261 Texas        Kenedy          0           3139
## 10     48269 Texas        King            0           3139

Top 10 lowest ranked countries for carpooling are Norton City, Daniels, Dundy, Wheeler, Emmons, New York, Wheeler, Irion, Kendedy, and King

Part D: Top Ranked State for carpooling

acs_carpool_state=group_by(acs_country_data, state) %>% summarize(average_carpool_rate = mean(carpool))
acs_carpool_state=acs_carpool_state[order(-acs_carpool_state$average_carpool_rate),]
head(acs_carpool_state,1)

## # A tibble: 1 x 2
##   state  average_carpool_rate
##   <fct>                 <dbl>
## 1 Hawaii                 12.8

Number one state for highest average rate of carpooling is Alaska

Part E: Top 5 ranked states for carpooling

head(acs_carpool_state,5)

## # A tibble: 5 x 2
##   state    average_carpool_rate
##   <fct>                   <dbl>
## 1 Hawaii                   12.8
## 2 Alaska                   12.1
## 3 Arkansas                 11.9
## 4 Utah                     11.9
## 5 Texas                    11.8

The top 5 states for carpooling on average are Alsaka, Arkansas, Utah, Texas, and Nevada