we’ll be using the county_complete
data set that has 188
variables from the openintro
package, but we’ll be using
the following columns:
In the code chunk below:
It should have a total of 3107 rows. To clean the data, use 2 or 3 dplyr verbs we’ve seen in class so far.
## # A tibble: 3,107 × 9
## name state fips households_2019 Spanish European Asian_Pacific Other
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Autauga Cou… Alab… 1001 21397 2.9 0.3 1.8 0.2
## 2 Baldwin Cou… Alab… 1003 80930 4.6 1.8 0.6 0
## 3 Barbour Cou… Alab… 1005 9345 5.2 1.1 0.6 0
## 4 Bibb County Alab… 1007 6891 1.9 0.5 0 0
## 5 Blount Coun… Alab… 1009 20847 6.6 0.9 0.1 0.2
## 6 Bullock Cou… Alab… 1011 3521 3.6 1.6 0.4 0
## 7 Butler Coun… Alab… 1013 6506 1.4 0.6 0.2 0
## 8 Calhoun Cou… Alab… 1015 44605 3.1 0.8 0.8 0.1
## 9 Chambers Co… Alab… 1017 13448 1.4 0.5 1 0
## 10 Cherokee Co… Alab… 1019 10737 2 0.1 0.5 0.1
## # ℹ 3,097 more rows
## # ℹ 1 more variable: limited <dbl>
In question 1, you’ll make a graph of the percentage of people that speak limited English by state
Create a data set with 48 rows (one for each continental state) and the following three columns:
Save the results as states_q1a. Display the results from
highest to lowest in the knitted document (make sure to use
tibble()
as well to not show all 48 states). You can check
your results in Brightspace.
## # A tibble: 48 × 3
## state households limited_per
## <chr> <dbl> <dbl>
## 1 California 13044266 8.95
## 2 New York 7343234 8.00
## 3 Texas 9691647 7.73
## 4 New Jersey 3231874 7.02
## 5 Florida 7736311 6.92
## 6 Nevada 1098602 5.98
## 7 Massachusetts 2617497 5.84
## 8 Rhode Island 410489 5.47
## 9 Connecticut 1370746 5.22
## 10 New Mexico 780249 5.21
## # ℹ 38 more rows
Create a data set with the three columns from question 1A, plus the latitude and longitude of the outline for each state. Save the results as state_lines and use the code provided at the bottom to display the same columns as seen in Brightspace
## long lat group order region subregion households
## 1 -122.37806 37.40841 4 941 california <NA> 13044266
## 2 -81.25114 31.55852 10 2376 georgia <NA> 3758798
## 3 -114.39675 46.66168 11 2832 idaho <NA> 630008
## 4 -90.31534 30.32665 17 4500 louisiana <NA> 1739497
## 5 -68.77785 44.52455 18 5249 maine <NA> 559921
## 6 -75.21790 38.02721 19 5479 maryland <NA> 2205204
## 7 -76.58727 39.23615 19 5733 maryland <NA> 2205204
## 8 -79.24007 33.36906 47 11508 south carolina <NA> 1921862
## 9 -98.04454 33.99932 50 13146 texas <NA> 9691647
## 10 -73.34433 44.94281 52 13478 vermont <NA> 260029
## limited_per
## 1 8.9457728
## 2 2.8300776
## 3 2.0222857
## 4 1.9297767
## 5 0.9558641
## 6 3.2835147
## 7 3.2835147
## 8 1.4331921
## 9 7.7322729
## 10 0.7482015
Create the graph seen in Brightspace. To get the color guide to be
similar to what is in the solutions, use
scale_fill_fermenter()
(you’ll need to use some specific
arguments!)
Next, you’ll create a graph that displays the most spoken language group (Spanish,Other European, Asian/Pacific Islander, Other) by county.
For each county, find the column with the largest percentage between Spanish, European, Asian_Pacific, and Other and save the result in most_common. The resulting data set should have 5 columns:
In the appropriate dplyr
verb, include the
with_ties = F
to only keep 1 row per county (in case of a
tie).
You’ll need to use the counties6 data set created at the beginning. Save the results as counties_2a. Once you’ve finished, uncomment the code at the bottom of the code chunk to display the counties seen in the solutions in Brightspace
Hint: you’ll only need to use two (max three)
dplyr
verbs, and neither of them are
mutate()
## # A tibble: 3,107 × 5
## name state fips language lang_per
## <chr> <chr> <dbl> <chr> <dbl>
## 1 Rock County Wisconsin 55105 Spanish 5.2
## 2 Hancock County Kentucky 21091 Spanish 1.3
## 3 Mason County Illinois 17125 Spanish 0.9
## 4 Wahkiakum County Washington 53069 Asian_Pacific 2.3
## 5 San Patricio County Texas 48409 Spanish 45.5
## 6 Riley County Kansas 20161 Spanish 5.7
## 7 Decatur County Georgia 13087 Spanish 2.1
## 8 Skamania County Washington 53059 Spanish 3.1
## 9 Murray County Oklahoma 40099 Spanish 4.3
## 10 Caledonia County Vermont 50005 European 3.4
## # ℹ 3,097 more rows
Similar to Question 1B), merge the data set created in Quesiton 2A) with a data set that has the longitude and latitude outlines of each of the counties. Save it as county_lines. Make sure that the final data set has exactly 87,949 rows and 8 columns
See the results in Brightspace. It will probably take multiple steps! Make sure that the resultin
## # A tibble: 87,949 × 8
## long lat group order fips state_county language lang_per
## <dbl> <dbl> <dbl> <int> <dbl> <chr> <chr> <dbl>
## 1 -86.5 32.3 1 1 1001 alabama,autauga Spanish 2.9
## 2 -86.5 32.4 1 2 1001 alabama,autauga Spanish 2.9
## 3 -86.5 32.4 1 3 1001 alabama,autauga Spanish 2.9
## 4 -86.6 32.4 1 4 1001 alabama,autauga Spanish 2.9
## 5 -86.6 32.4 1 5 1001 alabama,autauga Spanish 2.9
## 6 -86.6 32.4 1 6 1001 alabama,autauga Spanish 2.9
## 7 -86.6 32.4 1 7 1001 alabama,autauga Spanish 2.9
## 8 -86.6 32.4 1 8 1001 alabama,autauga Spanish 2.9
## 9 -86.6 32.4 1 9 1001 alabama,autauga Spanish 2.9
## 10 -86.6 32.4 1 10 1001 alabama,autauga Spanish 2.9
## # ℹ 87,939 more rows
Create the graph seen in Brightspace. The hex codes for the colors in the graph are: