Harold Nelson
2/5/2022
Load the tidyverse and the dataset “county_clean.Rdata”. Import the file “state_region.csv” into the dataframe state_region.
Do a glimpse of county_clean and state_region.
## Rows: 3,135
## Columns: 14
## $ name <fct> Autauga County, Baldwin County, Barbour County, Bibb…
## $ state <fct> Alabama, Alabama, Alabama, Alabama, Alabama, Alabama…
## $ pop2000 <dbl> 43671, 140415, 29038, 20826, 51024, 11714, 21399, 11…
## $ pop2010 <dbl> 54571, 182265, 27457, 22915, 57322, 10914, 20947, 11…
## $ pop2017 <int> 55504, 212628, 25270, 22668, 58013, 10309, 19825, 11…
## $ pop_change <dbl> 1.48, 9.19, -6.22, 0.73, 0.68, -2.28, -2.69, -1.51, …
## $ poverty <dbl> 13.7, 11.8, 27.2, 15.2, 15.6, 28.5, 24.4, 18.6, 18.8…
## $ homeownership <dbl> 77.5, 76.7, 68.0, 82.9, 82.0, 76.9, 69.0, 70.7, 71.4…
## $ multi_unit <dbl> 7.2, 22.6, 11.1, 6.6, 3.7, 9.9, 13.7, 14.3, 8.7, 4.3…
## $ unemployment_rate <dbl> 3.86, 3.99, 5.90, 4.39, 4.02, 4.93, 5.49, 4.93, 4.08…
## $ metro <fct> yes, yes, no, yes, yes, no, no, yes, no, no, yes, no…
## $ median_edu <fct> some_college, some_college, hs_diploma, hs_diploma, …
## $ per_capita_income <dbl> 27841.70, 27779.85, 17891.73, 20572.05, 21367.39, 15…
## $ median_hh_income <int> 55317, 52562, 33368, 43404, 47412, 29655, 36326, 436…
## Rows: 51
## Columns: 4
## $ State <chr> "Alaska", "Alabama", "Arkansas", "Arizona", "California",…
## $ `State Code` <chr> "AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC", "DE", "FL…
## $ Region <chr> "West", "South", "South", "West", "West", "West", "Northe…
## $ Division <chr> "Pacific", "East South Central", "West South Central", "M…
Do a simple scatterplot of per_capita_income on the y-axis against homeownership on the x-axis.
Use the alpha and size parameters of geom_point() to clean up the overplotting. Add a smoother.
county_clean %>%
ggplot(aes(x = per_capita_income, y = homeownership)) +
geom_point(size = .2,alpha=.2) +
geom_smooth(color = "red")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
The variable median_edu is categorical and the variable per_capita_income is quantitative. Put the education variable on the x-axis and the income variable on the y-axis. Use geom_point().
Repeat the previous exercise with geom_jitter(). Play with the size parameter to get a value you like.
Make the graph look more professional.
county_clean %>%
ggplot(aes(x = median_edu, y = per_capita_income)) +
geom_jitter(size = .5) +
labs(x = "Median Educational Level",
y = "Per Capita Income",
title = "Per Capita Income by Education Level",
subtitle = "US Counties 2017")
Use left_join to join the region dataframe to the bulk of the data. Glimpse the result.
county_clean = county_clean %>%
left_join(state_region,by = c("state" = "State"))
glimpse(county_clean)
## Rows: 3,135
## Columns: 17
## $ name <fct> Autauga County, Baldwin County, Barbour County, Bibb…
## $ state <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama…
## $ pop2000 <dbl> 43671, 140415, 29038, 20826, 51024, 11714, 21399, 11…
## $ pop2010 <dbl> 54571, 182265, 27457, 22915, 57322, 10914, 20947, 11…
## $ pop2017 <int> 55504, 212628, 25270, 22668, 58013, 10309, 19825, 11…
## $ pop_change <dbl> 1.48, 9.19, -6.22, 0.73, 0.68, -2.28, -2.69, -1.51, …
## $ poverty <dbl> 13.7, 11.8, 27.2, 15.2, 15.6, 28.5, 24.4, 18.6, 18.8…
## $ homeownership <dbl> 77.5, 76.7, 68.0, 82.9, 82.0, 76.9, 69.0, 70.7, 71.4…
## $ multi_unit <dbl> 7.2, 22.6, 11.1, 6.6, 3.7, 9.9, 13.7, 14.3, 8.7, 4.3…
## $ unemployment_rate <dbl> 3.86, 3.99, 5.90, 4.39, 4.02, 4.93, 5.49, 4.93, 4.08…
## $ metro <fct> yes, yes, no, yes, yes, no, no, yes, no, no, yes, no…
## $ median_edu <fct> some_college, some_college, hs_diploma, hs_diploma, …
## $ per_capita_income <dbl> 27841.70, 27779.85, 17891.73, 20572.05, 21367.39, 15…
## $ median_hh_income <int> 55317, 52562, 33368, 43404, 47412, 29655, 36326, 436…
## $ `State Code` <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"…
## $ Region <chr> "South", "South", "South", "South", "South", "South"…
## $ Division <chr> "East South Central", "East South Central", "East So…
Use geom_bar and fill to describe the relationship between Regian and median_edu. Make Region a factor first. In the call to geom_bar() set color = “white”.
county_clean = county_clean %>%
mutate(Region = factor(Region))
county_clean %>%
ggplot(aes(x = Region, fill = median_edu)) +
geom_bar(color = "white")
The default value of the parameter position is “stack”. Try setting position = “dodge” and “dodge2”.