Working with cleveland plots - looking at the data using glimpse

library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
library(ggplot2)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following objects are masked from 'package:readr':
## 
##     col_factor, col_numeric
library(openintro)
## Please visit openintro.org for free statistics materials
## 
## Attaching package: 'openintro'
## The following object is masked from 'package:datasets':
## 
##     cars
library(dplyr)

glimpse(countyComplete)
## Observations: 3,143
## Variables: 53
## $ state                                     <fctr> Alabama, Alabama, A...
## $ name                                      <fctr> Autauga County, Bal...
## $ FIPS                                      <dbl> 1001, 1003, 1005, 10...
## $ pop2010                                   <dbl> 54571, 182265, 27457...
## $ pop2000                                   <dbl> 43671, 140415, 29038...
## $ age_under_5                               <dbl> 6.6, 6.1, 6.2, 6.0, ...
## $ age_under_18                              <dbl> 26.8, 23.0, 21.9, 22...
## $ age_over_65                               <dbl> 12.0, 16.8, 14.2, 12...
## $ female                                    <dbl> 51.3, 51.1, 46.9, 46...
## $ white                                     <dbl> 78.5, 85.7, 48.0, 75...
## $ black                                     <dbl> 17.7, 9.4, 46.9, 22....
## $ native                                    <dbl> 0.4, 0.7, 0.4, 0.3, ...
## $ asian                                     <dbl> 0.9, 0.7, 0.4, 0.1, ...
## $ pac_isl                                   <dbl> NA, NA, NA, NA, NA, ...
## $ two_plus_races                            <dbl> 1.6, 1.5, 0.9, 0.9, ...
## $ hispanic                                  <dbl> 2.4, 4.4, 5.1, 1.8, ...
## $ white_not_hispanic                        <dbl> 77.2, 83.5, 46.8, 75...
## $ no_move_in_one_plus_year                  <dbl> 86.3, 83.0, 83.0, 90...
## $ foreign_born                              <dbl> 2.0, 3.6, 2.8, 0.7, ...
## $ foreign_spoken_at_home                    <dbl> 3.7, 5.5, 4.7, 1.5, ...
## $ hs_grad                                   <dbl> 85.3, 87.6, 71.9, 74...
## $ bachelors                                 <dbl> 21.7, 26.8, 13.5, 10...
## $ veterans                                  <dbl> 5817, 20396, 2327, 1...
## $ mean_work_travel                          <dbl> 25.1, 25.8, 23.8, 28...
## $ housing_units                             <dbl> 22135, 104061, 11829...
## $ home_ownership                            <dbl> 77.5, 76.7, 68.0, 82...
## $ housing_multi_unit                        <dbl> 7.2, 22.6, 11.1, 6.6...
## $ median_val_owner_occupied                 <dbl> 133900, 177200, 8820...
## $ households                                <dbl> 19718, 69476, 9795, ...
## $ persons_per_household                     <dbl> 2.70, 2.50, 2.52, 3....
## $ per_capita_income                         <dbl> 24568, 26469, 15875,...
## $ median_household_income                   <dbl> 53255, 50147, 33219,...
## $ poverty                                   <dbl> 10.6, 12.2, 25.0, 12...
## $ private_nonfarm_establishments            <dbl> 877, 4812, 522, 318,...
## $ private_nonfarm_employment                <dbl> 10628, 52233, 7990, ...
## $ percent_change_private_nonfarm_employment <dbl> 16.6, 17.4, -27.0, -...
## $ nonemployment_establishments              <dbl> 2971, 14175, 1527, 1...
## $ firms                                     <dbl> 4067, 19035, 1667, 1...
## $ black_owned_firms                         <dbl> 15.2, 2.7, NA, 14.9,...
## $ native_owned_firms                        <dbl> NA, 0.4, NA, NA, NA,...
## $ asian_owned_firms                         <dbl> 1.3, 1.0, NA, NA, NA...
## $ pac_isl_owned_firms                       <dbl> NA, NA, NA, NA, NA, ...
## $ hispanic_owned_firms                      <dbl> 0.7, 1.3, NA, NA, NA...
## $ women_owned_firms                         <dbl> 31.7, 27.3, 27.0, NA...
## $ manufacturer_shipments_2007               <dbl> NA, 1410273, NA, 0, ...
## $ mercent_whole_sales_2007                  <dbl> NA, NA, NA, NA, NA, ...
## $ sales                                     <dbl> 598175, 2966489, 188...
## $ sales_per_capita                          <dbl> 12003, 17166, 6334, ...
## $ accommodation_food_service                <dbl> 88157, 436955, NA, 1...
## $ building_permits                          <dbl> 191, 696, 10, 8, 18,...
## $ fed_spending                              <dbl> 331142, 1119082, 240...
## $ area                                      <dbl> 594.44, 1589.78, 884...
## $ density                                   <dbl> 91.8, 114.6, 31.0, 3...

Selecting only the data needed - State - Pop2010 - cleveland plots generally work with categorical data, otherwise it would be too crowded.

cc <- countyComplete %>%
  select(state, pop2010)

#factoring the. 
cc %>%
  group_by(state) #grouing by state
## Source: local data frame [3,143 x 2]
## Groups: state [51]
## 
##      state pop2010
## *   <fctr>   <dbl>
## 1  Alabama   54571
## 2  Alabama  182265
## 3  Alabama   27457
## 4  Alabama   22915
## 5  Alabama   57322
## 6  Alabama   10914
## 7  Alabama   20947
## 8  Alabama  118572
## 9  Alabama   34215
## 10 Alabama   25989
## # ... with 3,133 more rows
ggplot(cc, aes(x = factor(state), y = pop2010)) + 
         stat_summary(fun.y = "mean", geom = "bar")

When creating a plot with states in the x-axis all the names are jumbled together. You could change the name settings to a 90 degree turn but will still be difficult to read. - Cleveland plot reverse the x and y axis in order to have many factors readable and still maintaining accuracy.

#increased plot size by 10 & 11
cc1 <- ggplot(cc, aes(x = factor(state), y = pop2010))+
  geom_point(col = "red", size = 2) +
  geom_segment(aes(x = state,
                   xend = state,
                   y=min(pop2010),
                   yend=max(pop2010)),
               linetype="dashed",
               size=0.1) +
  labs(title="State Dot Plot",
       subtitle="State Vs Pop2010") +
  coord_flip()

print(cc1)