access to the shared posit.cloud space for this course (link in Brightspace)
a RPubs account (free)
GitHub Copilot account approved and enabled in posit.cloud
Go to the shared posit.cloud workspace for this class and open the lab03_assign03 project. Open the lab03.qmd file and complete the exercises. Below is an annotated guide to assist you. There is also a video in the Brightspace Todo section for this module.
We will be using the same two files from the joins chapter in your text. This time, however, we will be cleaning up the files a bit before the join to illustrate how joining can be easier when you have identical column names that you are matching on. Let’s start by loading the tidyverse family of packages. readxl to read in an xlsx file, gt for making pretty tables, ggrepel for a better layout for our geom_labels() chart, and read in the two data files. We’ll be using the message: false option to suppress the output message from loading the packages
There are eight exercises in this lab. Grading is shown in Section 3 at the end of the document. First we’ll glimpse the maine_age_by_county_2000_2020 file to see the column names and data.
The second tibble has better naming conventions so we’ll change column names in the first file to match the names and formatting of the second column. We’ll also drop Location since we don’t need it. After we complete this we’ll perform a simple inner_join() on the tibbles.
Exercise 1
Change the names of. the following columns in maine_county_pop_1960_2020:
County Name to `county`
State to state
Year to year
Population to population
We can do this with a simple select. Notice, since we don’t select Location it isn’t in our new version of the tibble.
maine_county_pop_1960_2020 <- maine_county_pop_1960_2020 |>select(county =`County Name`, state = State, year = Year, population = Population)glimpse(maine_county_pop_1960_2020)
In the text, we used a join_by clause, which was required because the column names were different.
Exercise 2
Use an maine_county_pop_1960_2020 |>inner_join(maine_age_by_county)without a join_by() to create a new tibble called maine_age_pop(). Glimpse the data to see if the join worked.
callout-tip Reminder that maine_age_pop_2000_2020 only has three decades worth of data comprising 48 records while maine_county_pop_1960_2020 has several more decades and 112 records. Since we expect all of the maine_age_pop_2000_2020 records to match, our joined tibble should have 48 rows.
The code below shows us the county, year, average_age, and population for the county with the highest average age in 2020. Modify the code so it shows the county with the lowest average age in 2020.
# A tibble: 1 × 4
county year average_age population
<chr> <dbl> <dbl> <dbl>
1 Cumberland 2020 42.4 303312
Exercise 5
Below is the code showing a replica of the graph from the Joins chapter in your text. Instead of geom_label() modify the code to use geom_label_repel() which comes from the ggrepel package we loaded earlier. Describe the difference between the new graph and the old one from your book.
library(ggrepel)maine_age_pop |>filter(year ==2020) |>ggplot(aes(x = average_age, y = population)) +geom_label_repel(aes(label = county), color="blue") +scale_y_continuous(labels = scales::comma) +theme_minimal() +ggtitle("Relationship between average age and population for Maine counties in 2020")
On the new graph, the names don’t overlap so it is easier to read the chart.
Exercise 6
To submit your lab:
Change the author name to your name in the YAML portion at the top of this document
Render your document to html and publish it to RPubs.
Submit the link to your Rpubs document in the Brightspace comments section for this assignment.
Click on the “Add a File” button and upload your .qmd file for this assignment to Brightspace.
Publish your document to quarto.pub and copy/paste the link to the Lab 3 assessment in Brightspace.