INFO 201 Data Assignment #5

Washington State Vehicle Registrations

Author

Samyak Shrestha

Published

March 20, 2025

Overview

In this data assignment, you’ll write code step-by-step to create two visualizations that can be used to understand patterns in vehicle ownership in Washington state. You’ll then come up with your own question about vehicles or population in Washington state and create a small analysis and visualization to answer that question. To complete this assignment, you should use tools covered in class, except in cases where you explain them. Complete each exercise as indicated in Analysis 1 and 2, and then fill in the four sections in Analysis 3 as instructed for you to create a visualization to answer a novel question of your own. If a code cell includes code to display data (such as glimpse calls), please leave that code. If you use coding tools not covered in class, reference them in the coding notes at the bottom following the instructions there.

This assignment only requires using the most basic plotly interactive visualization function (ggplotly).

Setup

Load the necessary libraries.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

options(scipen = 999)

Load the datasets!

The first contains information about vehicles registrations at the county-level in Washington State. This is a slightly altered version of the dataset available from the Washington state public data repository.

vehicles <- read.csv("registrations.csv")
vehicles |> glimpse()

Rows: 763,406
Columns: 7
$ Fiscal.Year        <int> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 201…
$ Transaction.Date   <chr> "01/31/2017", "01/31/2017", "01/31/2017", "01/31/20…
$ Transaction.County <chr> "Ferry", "King", "Pierce", "Pierce", "King", "Skagi…
$ Residential.County <chr> "Ferry", "Grays Harbor", "Lewis", "Snohomish", "Lew…
$ Fuel.Type          <chr> "Unpowered", "Diesel", "Unpowered", "Unpowered", "G…
$ Primary.Use.Class  <chr> "Camper", "Motorhome", "Truck", "Passenger Vehicle"…
$ Counts             <int> 2, 1, 1, 3, 5, 1, 5, 2, 1, 3, 1, 1, 14, 1, 19, 191,…

Some notes on these features:

Fiscal.Year: Fiscal years are different than standard Western calendar years, in similar ways to the way that an academic year is different from the calendar year. In Washington State, a “fiscal year” starts in July and goes through June. For example, July 1, 2021 through June 30, 2022 is called “Fiscal Year 2022” and “Fiscal Year 2025” will actually end in the summer of 2025. The reasons for this are tied to the history and rhythms of accounting and accountability in governments and businesses. For instance, imagine that you need to finish reporting on the annual economic state of a state and there is a legal reason why the report must be submitted by January 1 – the report might take a long time to prepare, so using a year that ends in Summer allows organizations to have time to complete the report. For the purposes of this assignment, we’ll use Fiscal.Year as if it was referring to calendar years – but note that conclusions might be slightly different if we used calendar years!
Transaction.Date: The date (adjusted to the end of the month) when vehicles where registered.
Transaction.County: Vehicles can be registered in offices in different locations or through the Department of Licensing Headquarters. The place that cars are registered might be different than the place that a person lives who owns the vehicle.
Residential.County: We’ll pay more attention to this. This is the county that the owner of a registered vehicle lists as their place of residence (or the headquarters of a company if it is a corporate-owned commercial vehicle). For our purposes, we’ll treat this county as the “home” for vehicles – though note that there is no real guarantee that it is the county where a vehicle is used the most or stored.
Fuel.Type: This lists a number of different ways that vehicles may be powered (such as “electric” or “steam” or “diesel”). For us, an important note is that there are multiple options here that include gasoline in addition to the “gasoline” label (for instance, hybrid vehicles use gasoline, as do flex fuel vehicles).
Primary.Use.Class: This is the “kind” of vehicle that is being registered. This is a combination of categories based on use (“Logging”, “Farm Use”, “Cab”), vehicle construction (“Motorcycle”, “Moped”), and even age (“Antique Vehicle”)
Counts: The total number of vehicles registered for each combination of the other features.

counties <- read.csv("counties.csv")
counties |> glimpse()

Rows: 39
Columns: 22
$ Residential.County <chr> "Adams", "Asotin", "Benton", "Chelan", "Clallam", "…
$ POP_2020           <int> 20613, 22285, 206873, 79141, 77155, 503311, 3952, 1…
$ POP_2021           <int> 20900, 22500, 209400, 80000, 77750, 513100, 3950, 1…
$ POP_2022           <int> 21100, 22600, 212300, 80650, 77625, 520900, 3950, 1…
$ POP_2023           <int> 21200, 22650, 215500, 81500, 78075, 527400, 3950, 1…
$ POP_2024           <int> 21475, 22725, 217850, 82300, 78550, 536300, 3975, 1…
$ NC_20.21           <int> 287, 215, 2527, 859, 595, 9789, -2, 770, 612, 72, 1…
$ NC_21.22           <int> 200, 100, 2900, 650, -125, 7800, 0, 850, 450, 50, 1…
$ NC_22.23           <int> 100, 50, 3200, 850, 450, 6500, 0, 650, 500, 0, 1350…
$ NC_23.24           <int> 275, 75, 2350, 800, 475, 8900, 25, 900, 650, 50, 12…
$ PC_20.21           <dbl> 0.01, 0.01, 0.01, 0.01, 0.01, 0.02, 0.00, 0.01, 0.0…
$ PC_21.22           <dbl> 0.0096, 0.0044, 0.0139, 0.0081, -0.0016, 0.0152, 0.…
$ PC_22.23           <dbl> 0.0047, 0.0022, 0.0151, 0.0105, 0.0058, 0.0125, 0.0…
$ PC_23.24           <dbl> 1.30, 0.33, 1.09, 0.98, 0.61, 1.69, 0.63, 0.80, 1.4…
$ RANK_2020          <int> 31, 30, 10, 17, 18, 5, 38, 12, 25, 36, 14, 39, 13, …
$ RANK_2021          <int> 31, 30, 10, 17, 18, 5, 38, 12, 25, 36, 14, 39, 13, …
$ RANK_2022          <int> 31, 30, 10, 17, 18, 5, 38, 12, 25, 36, 14, 39, 13, …
$ RANK_2023          <int> 31, 30, 10, 17, 18, 5, 38, 12, 25, 36, 14, 39, 13, …
$ RANK_2024          <int> 31, 30, 10, 17, 18, 5, 38, 12, 25, 36, 14, 39, 13, …
$ State              <chr> "WA", "WA", "WA", "WA", "WA", "WA", "WA", "WA", "WA…
$ Longitude          <dbl> -118.5333, -117.2278, -119.5169, -120.6185, -123.93…
$ Latitude           <dbl> 47.00484, 46.18186, 46.22807, 47.85989, 48.11301, 4…

Some notes on some of these features:

Years refer to fiscal years
POP_ here refers to the estimated total resident population in a county in the year marked
NC_ refers to changes in estimated resident population sizes from one year to another (growth/shrinking)
RANK_ is the ranking of counties in terms of estimated resident population where 1 is the most populous county (expect it to be King County with Seattle) and 39 being the smallest (there have been 39 counties for more than a century).

Analysis 1: Vehicle Ownsership and County Size (1.5 pts)

In the states of the United States, there are nearly as many working vehicles (cars, trucks, motorcycles) as there are living human beings. However, there is a fair amount of geographic variability in the “per capita” number of vehicles found. According to some estimates, there are about 2 vehicles for every person living in Montana (a per capita vehicle rate of 2) while there is a little under 1 vehicle for every two people living in New York State (a per capita vehicle rate of 0.5). One big picture trend is that more dense, urban areas tend to have fewer vehicles while sparsely populated, rural areas tend to have more vehicles. Does this pattern appear to be the case in Washington State at the county level?

In Washington state, the denser counties also tend to be the more populous counties – In this analysis, you’ll create a visualization that will let you compare the per capita vehicle rate against the overall population size for Washington counties to see if there is a lower rate in populous counties and a small rate in sparse counties in general and to identify counties that might buck that trend.

This analysis will require you to use various data wrangling and visualization tools, including a join function and converting ggplot to plotly graphs.

Exercise 1

Filter the vehicle registration to include only data from year 2024. Save the filtered data set with a new name.

vehicles2024 <- vehicles |>
  filter(Fiscal.Year == 2024)

vehicles2024 |> glimpse()

Rows: 99,638
Columns: 7
$ Fiscal.Year        <int> 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, 202…
$ Transaction.Date   <chr> "07/31/2023", "07/31/2023", "07/31/2023", "07/31/20…
$ Transaction.County <chr> "Skagit", "Pacific", "Okanogan", "Lewis", "Island",…
$ Residential.County <chr> "Pierce", "Pacific", "Lincoln", "Adams", "Snohomish…
$ Fuel.Type          <chr> "Gasoline", "Electric", "Unpowered", "Gasoline", "U…
$ Primary.Use.Class  <chr> "All Terrain Vehicle (WATV)", "Truck", "Travel Trai…
$ Counts             <int> 1, 1, 1, 1, 5, 2, 1, 4, 2, 1, 4, 2028, 1, 1, 1, 1, …

Exercise 2

Create a new dataframe called counties24 that includes only two columns: One that lists the name of a county and one that shows the county population size in 2024.

counties24 <- counties |>
  select("County Name" = Residential.County, "2024 Population Size" = POP_2024)

counties24 |> glimpse()

Rows: 39
Columns: 2
$ `County Name`          <chr> "Adams", "Asotin", "Benton", "Chelan", "Clallam…
$ `2024 Population Size` <int> 21475, 22725, 217850, 82300, 78550, 536300, 397…

Exercise 3

For each row in the filtered vehicle registration data append values for the population size of the residential county. In other words, create a new column that shows the 2024 population for each residential county.

vehicles2024 <- left_join(vehicles2024, counties24, by = c("Residential.County" = "County Name"))

vehicles2024 |> glimpse()

Rows: 99,638
Columns: 8
$ Fiscal.Year            <int> 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024,…
$ Transaction.Date       <chr> "07/31/2023", "07/31/2023", "07/31/2023", "07/3…
$ Transaction.County     <chr> "Skagit", "Pacific", "Okanogan", "Lewis", "Isla…
$ Residential.County     <chr> "Pierce", "Pacific", "Lincoln", "Adams", "Snoho…
$ Fuel.Type              <chr> "Gasoline", "Electric", "Unpowered", "Gasoline"…
$ Primary.Use.Class      <chr> "All Terrain Vehicle (WATV)", "Truck", "Travel …
$ Counts                 <int> 1, 1, 1, 1, 5, 2, 1, 4, 2, 1, 4, 2028, 1, 1, 1,…
$ `2024 Population Size` <int> 952600, 23950, 11300, 21475, 867100, 559400, 95…

Exercise 4

Create a new dataframe called “ownership_df” that includes four columns: One for the name of each county, one with the population in that county, one for the total number of vehicles registered for people living in that county in 2024, and one that shows the average number of vehicles registered per person in the county (the per resident or per capita vehicle rate). You can name these variables as you see fit.

ownership_df <- vehicles2024 |>
  group_by(Residential.County) |>
  summarise(
    Population = first(`2024 Population Size`), 
    Total_Vehicles = sum(Counts),
    Vehicles_Per_Capita = Total_Vehicles / Population
  ) |>
  rename(
    County_Name = Residential.County
  )

ownership_df |>
  glimpse()

Rows: 40
Columns: 4
$ County_Name         <chr> "Adams", "Asotin", "Benton", "Chelan", "Clallam", …
$ Population          <int> 21475, 22725, 217850, 82300, 78550, 536300, 3975, …
$ Total_Vehicles      <int> 26173, 27418, 249268, 113943, 101247, 513662, 6684…
$ Vehicles_Per_Capita <dbl> 1.2187660, 1.2065127, 1.1442185, 1.3844836, 1.2889…

Exercise 5

Create a visualization that lets you see how the number of vehicles per resident might covary with the resident population size of each Washington county in 2024.

You should use the ggplotly-based approach we covered in class to create this visualization. The visualization should be a scatterplot that relates the rate of vehicles per resident and the resident population for each county in Washington for year 2024. Color should not be used to communicate any information. The name of the county should be visible when you hover over a data points with a cursor. This figure should include a meaningful title with a take-away, readable labels for the axes, and alt-text that includes the three types of information covered in the class.

library(ggplot2)
library(plotly)
#| fig-alt: "Interactive scatterplot of the relationship between population size and vehicles per capita in Washington counties, showing that counties with larger populations have fewer vehicles per person, illustrating a trend where car ownership rates can be affected by urban density."


vehicle_plot <- ggplot(ownership_df, aes(x = Population, y = Vehicles_Per_Capita, text = paste("County: ", County_Name))) +
  geom_point() + 
  labs(
    title = "Population Size and Vehicle Ownership Per Capita in Washington County",
    x = "Population Size",
    y = "Vehicles Per Capita",
 
  ) +
  theme_minimal()

ggplotly(vehicle_plot)

Exercise 6: Interpretation

Consider the general pattern across counties in population size and the per resident vehicle rate. Which county do you think has the most surprising values – values that run counter to the general trend? Look up that county and what is famous there and try to think of a reason why this might be the case (if 2 minutes of investigation doesn’t give you a clue – that’s OK! You can speculate wildly). Write 1-3 sentences describing your answer.

I think Snohomish was the most surprising, as though it is show has one of the highest populated counties it still fairly has a large number of vechiles per person. I think this might be the case due to Snohomish residents having to commute to their jobs outside their county

Analysis 2: Electric Vehicle Ownership Across Time (1.5 pts)

Seattle is known for having a relatively high presence of electric vehicles compared to other metropolitan areas and regions in Washington and North America and compared to the number of gasoline-powered vehicles. How does the growth of electric vehicles relative to gas-powered vehicles in Seattle’s King County compare to other Washington counties? Has it always been higher? In this analysis, you will create a visualization that lets you investigate questions related to this topic.

This analysis will require you to use various data wrangling and visualization tools, including case_when, pivot functions, and creating line graphs, and converting ggplot to plotly graphs.

Exercise 7

The state vehicle registry tracks fuel type. In the dataset, there is a category for fully electric vehicles. However, there is not actually a single category for gasoline powered vehicles – instead, there are several sub-categories. Create a new column in the vehicle registration data frame called Fuel.Category that shows “Electric” for fully-electric vehicles and shows “Gas” for vehicles listed as being powered with “Gasoline”, “Hybrid”, or “Flex Fuel/Gasoline” systems. Label any other kind of vehicle “Other”. You may want to double-check the spelling of categories first (using unique) before writing code.

vehicles |> pull(Fuel.Type) |> unique()

 [1] "Unpowered"               "Diesel"                 
 [3] "Gasoline"                "Electric"               
 [5] "Propane"                 "Hybrid"                 
 [7] "Flex Fuel/Gasoline"      "Compressed Natural Gas" 
 [9] "Liquefied Natural Gas"   "Butane"                 
[11] "Other"                   "Liquefied Petroleum Gas"
[13] "Steam"                   "Hydrogen Fuel Cell"

vehicles <- vehicles |>
  mutate(Fuel.Category = case_when(
    Fuel.Type == "Electric" ~ "Electric",
    Fuel.Type %in% c("Gasoline", "Hybrid", "Flex Fuel/Gasoline") ~ "Gas",
    TRUE ~ "Other"
  ))

vehicles |> glimpse()

Rows: 763,406
Columns: 8
$ Fiscal.Year        <int> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 201…
$ Transaction.Date   <chr> "01/31/2017", "01/31/2017", "01/31/2017", "01/31/20…
$ Transaction.County <chr> "Ferry", "King", "Pierce", "Pierce", "King", "Skagi…
$ Residential.County <chr> "Ferry", "Grays Harbor", "Lewis", "Snohomish", "Lew…
$ Fuel.Type          <chr> "Unpowered", "Diesel", "Unpowered", "Unpowered", "G…
$ Primary.Use.Class  <chr> "Camper", "Motorhome", "Truck", "Passenger Vehicle"…
$ Counts             <int> 2, 1, 1, 3, 5, 1, 5, 2, 1, 3, 1, 1, 14, 1, 19, 191,…
$ Fuel.Category      <chr> "Other", "Other", "Other", "Other", "Gas", "Other",…

Exercise 8

Create a new data frame called fuel_summary that shows the total number of “Gas”, “Electric”, or “Other”-fuelled vehicles registered in each residential county for each year in our data. This data frame should have 4 columns (that show the year, county, fuel type, and total count of cars).

fuel_summary <- vehicles |>
  group_by(Fiscal.Year, Residential.County, Fuel.Category) |>
  summarise(Total_Count = sum(Counts), .groups = "drop")



fuel_summary |> glimpse()

Rows: 1,073
Columns: 4
$ Fiscal.Year        <int> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 201…
$ Residential.County <chr> "Adams", "Adams", "Adams", "Asotin", "Asotin", "Aso…
$ Fuel.Category      <chr> "Electric", "Gas", "Other", "Electric", "Gas", "Oth…
$ Total_Count        <int> 1, 9897, 2790, 7, 11133, 3531, 94, 99023, 23057, 55…

Exercise 9

Create a new dataframe called fuel_summary2 that contains all the information from fuel_summary, but is reformatted and now contains exactly 5 columns called “Fiscal.Year”, “Residential.County”, “Electric”, “Gas”, and “Other”.

fuel_summary2 <- fuel_summary |>
  pivot_wider(
    names_from = Fuel.Category,
    values_from = Total_Count,
    values_fill = list(Total_Count = 0)
  )

fuel_summary2 |> glimpse()

Rows: 360
Columns: 5
$ Fiscal.Year        <int> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 201…
$ Residential.County <chr> "Adams", "Asotin", "Benton", "Chelan", "Clallam", "…
$ Electric           <int> 1, 7, 94, 55, 57, 490, 1, 30, 29, 1, 14, 0, 37, 26,…
$ Gas                <int> 9897, 11133, 99023, 42993, 39854, 210136, 2403, 565…
$ Other              <int> 2790, 3531, 23057, 11557, 11587, 35569, 1136, 13087…

Exercise 10

Create a new columns in fuel_summary2 that tells you how many electric vehicles there are in each county and year as a percent of the total number of gas vehicles in that county/year. Note that if a proportion is 0.31, then the equivalent percent is 31.

fuel_summary2 <- fuel_summary2 |>
  mutate(Electric_Percent = round((Electric / Gas) * 100, 2))

fuel_summary2 |> glimpse()

Rows: 360
Columns: 6
$ Fiscal.Year        <int> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 201…
$ Residential.County <chr> "Adams", "Asotin", "Benton", "Chelan", "Clallam", "…
$ Electric           <int> 1, 7, 94, 55, 57, 490, 1, 30, 29, 1, 14, 0, 37, 26,…
$ Gas                <int> 9897, 11133, 99023, 42993, 39854, 210136, 2403, 565…
$ Other              <int> 2790, 3531, 23057, 11557, 11587, 35569, 1136, 13087…
$ Electric_Percent   <dbl> 0.01, 0.06, 0.09, 0.13, 0.14, 0.23, 0.04, 0.05, 0.1…

Exercise 11

Create a visualization that lets you see how the number of electric vehicles relative to gas vehicles in each Washington county has changed from 2017 to 2025. Users of this visualization should be able to hide the line for different counties or show only a single county at a time interactively. The exact percentage of electric vehicles relative to gas vehicles should also be displayed when you hover a cursor over a line.

You should use the ggplotly-based approach we covered in class to create this visualization. The visualization should be a line graph. You can color lines separately. This figure should include a meaningful title with a take-away, readable labels for the axes, and alt-text that includes the three types of information covered in the class.

library(ggplot2)
library(plotly)

#| fig-alt: "Interactive line graph showing how the percent of electric vehicles compared to gas vehicles has changed in each Washington county from 2017 to 2025. Some counties have higher or faster-growing percentages than others."

electric_vs_gas <- fuel_summary2 |>
  filter(Fiscal.Year >= 2017, Fiscal.Year <= 2025) |>
  ggplot(aes(x = Fiscal.Year, y = Electric_Percent, color = Residential.County,
             group = Residential.County,
             text = paste("County:", Residential.County,
                          "Year:", Fiscal.Year,
                          "Electric Percentage:", Electric_Percent))) +
  geom_line() +
  labs(
    title = "Electric Vehicle vs Relative to Gas Vehicles (2017–2025)",
    x = "Fiscal Year",
    y = "Electric Vehicles (% of Gas Vehicles)"
  ) +
  theme_minimal()

electric_vs_gas |> ggplotly()

Exercise 12

Since the year 2020, has there been any years when there was another county in Washington that had a greater percentage of electric vehicles registered in it relative to gas vehicles? If yes, which county and when? If no, how can you tell? Write around 1-2 sentences to answer.

Yes, between 2020-2022, San Juan had a greater percentage of registered electric vehicles as depicted by the graph.

Analysis 3 (2 pt)

For analysis three, you should form a question that can be answered by exploring an interactive visualization that uses the vehicle registrations data and/or the county population data.

You should think of a kind of question you could ask and a visualization that could be used in an analysis to answer the questions. The analysis that you create must be a little complex (require multiple steps to complete).

Your analysis/visualization must require you to do at least one of the following:

Pivot a data frame at least once
Join two or more data frames
Use case to create categories that you couldn’t easily access otherwise
Make meaningful use of at least 4 variables either in data wrangling or visualization (in total from one or both of the data frames)
Combining two or more graphs into a single figure using either facetting or subplot
Creating an animation with ggplot and plotly

Additionally, when you create a visualization, it must include:

Meaningful alt-text
A title and axis labels
At least basic interactivity (such as the ability to hover a cursor over points and see values).

Describe your goal or question (.5 pts)

What is the visualization you are going to create useful for? Write around 2 sentences that describe the kind of thing you hope to find out using the analysis:

I want to see which counties in Washington had the most electric vehicles per person in 2024. This can show where electric cars are most popular and well suited.

Complete analysis code (.5 pts)

Complete the data wrangling and coding work to create your visualization below. This could be completed in a single code chunk or multiple code chunks. This should incorporate tools used in class (unless you explain them below).

library(ggplot2)
library(plotly)
#| fig-alt: "Interactive bar chart showing the number of electric vehicles per person in each Washington county in 2024. Highlighting that some counties have more EVs per person than others, indicating regional differences in electric vehicle ownership across the state."


(vehicles |>
  filter(Fiscal.Year == 2024, Fuel.Type == "Electric") |>
  left_join(counties |> select("County Name" = Residential.County, Population = POP_2024),
            by = c("Residential.County" = "County Name")) |>
  group_by(Residential.County) |>
  summarise(EVs = sum(Counts), Population = first(Population)) |>
  mutate(EVs_Per_Person = round(EVs / Population, 4)) |>
  ggplot(aes(x = Residential.County, y = EVs_Per_Person,
             text = paste("County:", Residential.County, "EVs per person:", EVs_Per_Person))) +
  geom_col() +
  labs(title = "Electric Vehicles Per Person by County (2024)",
       x = "Counties", y = "EVs Per Person") +
  theme_minimal()
) |> ggplotly(tooltip = "text")

Results (.5 pts)

In 2-3 sentences, describe what you were able to find out with the visualization.

I found out through the visualization that it’s not high populated counties that have adopted electric vehicles the most. Indicating that there are other factors besides population size that can influence electric vehicles per person in each county.

Debrief (.5 pts)

In around 3-4 sentences, describe what the trickiest part of creating this analysis/visualization was and how you figured out how to complete it. Alternatively, describe where you learned how to use tools that you used for the analysis. Mention specific class lessons/activities/readings that your referenced and provide links to outside sources that were particularly useful or that showed you a tool that you had not seen before.

The trickiest part was figuring out how to join the two data sets so I could calculate EVs per person. I used left_join() as shown from chapter 19, Joins, of R for Data Science (2e) when combining datasets. I also used mutate() to do the per capita calculation and ggplotly() from online websites to make the graohs clickable. Links: https://www.rdocumentation.org/packages/plotly/versions/4.10.4/topics/ggplotly

Coding Notes

All of the tools (R functions, operators, and code formalisms) needed to complete these activities are covered in the lecture activities and labs. You can use R tools that were not covered in class for these activities, but if you do so, you need to be able to relate them to material covered in class.

If you didn’t use any functions, operators, or formalisms that were not covered in class, then you should write something along the lines of “I used tools covered in class materials” for the first bullet point below and delete other bullet points. If you used outside tools (functions, operators, or formalisms), then for each such tool, you should write 3-4 sentences. These sentences must describe: a) What the goal of the tools is (what is it used for in general – not just in the code here), b) Which tool that has been covered in class does the closest thing to the tool that you decided to use, and c) How you learned about the tool and a web link to a resource on how to learn how the tool works.

If you do not include the information below, then you may be marked off on exercises that use outside tools. If you later realize you accidentally used an outside tool and forgot to mark it, you can attend office hours and explain its use to have those points reassigned.

Tools Used:

Tool 1: ggplotly(): ggplotly() makes a ggplot chart interactive. It uses the same kind of plot like we’ve been making with ggplot in classes, it just adds features like hover and zoom. I learned of this tool through brief mentionings in class and working on this assignment, however we didn’t really have the chance to go over it so learned it through the website, https://www.rdocumentation.org/packages/plotly/versions/4.10.4/topics/ggplotly, and asking ChatGPT for clarifcations if needed.

That’s all!

Render your document, check that it looks as you expect, and upload both the qmd and html!