Harold Nelson
2/9/2022
Load the tidyverse and the dataset “county_clean.Rdata”. Import the file “state_region.csv” into the dataframe state_region. Use left_join to add the region data to county_clean. Glimpse county_clean to make sure you’re OK
library(tidyverse)
load("county_clean.Rdata")
state_region <- read_csv("state_region.csv")
county_clean = county_clean %>%
left_join(state_region, by = c("state" = "State"))
glimpse(county_clean)
## Rows: 3,135
## Columns: 17
## $ name <fct> Autauga County, Baldwin County, Barbour County, Bibb…
## $ state <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama…
## $ pop2000 <dbl> 43671, 140415, 29038, 20826, 51024, 11714, 21399, 11…
## $ pop2010 <dbl> 54571, 182265, 27457, 22915, 57322, 10914, 20947, 11…
## $ pop2017 <int> 55504, 212628, 25270, 22668, 58013, 10309, 19825, 11…
## $ pop_change <dbl> 1.48, 9.19, -6.22, 0.73, 0.68, -2.28, -2.69, -1.51, …
## $ poverty <dbl> 13.7, 11.8, 27.2, 15.2, 15.6, 28.5, 24.4, 18.6, 18.8…
## $ homeownership <dbl> 77.5, 76.7, 68.0, 82.9, 82.0, 76.9, 69.0, 70.7, 71.4…
## $ multi_unit <dbl> 7.2, 22.6, 11.1, 6.6, 3.7, 9.9, 13.7, 14.3, 8.7, 4.3…
## $ unemployment_rate <dbl> 3.86, 3.99, 5.90, 4.39, 4.02, 4.93, 5.49, 4.93, 4.08…
## $ metro <fct> yes, yes, no, yes, yes, no, no, yes, no, no, yes, no…
## $ median_edu <fct> some_college, some_college, hs_diploma, hs_diploma, …
## $ per_capita_income <dbl> 27841.70, 27779.85, 17891.73, 20572.05, 21367.39, 15…
## $ median_hh_income <int> 55317, 52562, 33368, 43404, 47412, 29655, 36326, 436…
## $ `State Code` <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"…
## $ Region <chr> "South", "South", "South", "South", "South", "South"…
## $ Division <chr> "East South Central", "East South Central", "East So…
Review Histogram. Display the distribution of per_capita_income using a histogram.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The data displayed by default with a histogram is the count of observations in each bin. You can also ask that the data be normalized so that the total area of the histogram is equal to 1.0. To see this, in the call to geom_histogra() insert aes(y = ..density..).
The .. .. indicates to ggplot2 that this “variable” is not present in the dataframe, but will be made available.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
We have an alternative, geom_density(), to geom_histogra() which displays the distribution of a single quantitative variable.
Display the distribution of per_capita_income using geom_density() in place of geom_histogram().
Add geom_histogram() to the previous graph for comparison.
county_clean %>%
ggplot(aes(x = per_capita_income)) +
geom_histogram(aes(y = ..density..)) +
geom_density()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
geom_density has an optional parameter, adjust. The default value is 1. Try setting it to .5 and 2.
Use geom_density() and add geom_rug(). There is a stroke for each observation.
Do a facet by region of the last graph.