DS Labs Graph Assignment

Author

A Porambo

DSLabs Graph Assignment

Load libraries.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggplot2)
library(dslabs)
library(dplyr)
library(RColorBrewer)

I started by loading the libraries needed for what I sought to do for this project.

Load data.

For this assignment I used the Contagious Disease Data for US States dataset (“us_contagious_diseases”) from the DSLabs package compiled by the Tycho Project at the University of Pittsburgh.

data("us_contagious_diseases")

In the above, I loaded the “Contagious Disease Data for US States” dataset.

Create Maryland Contagious Diseases dataset.

md_contagious_diseases <- us_contagious_diseases |> filter(state == 'Maryland')

I then created a “md_contagious_diseases” dataset, filtering data from the US contagious disease dataset to observations from the state of Maryland only.

Create Variable for Cases per Hundred Thousand People.

md_contagious_diseases <- md_contagious_diseases %>%
mutate(cases_per_hund_thou = count / population * 100000)

Next, also in the Maryland Contagious Diseases dataset, I created a new variable for cases per 100,000 people in the Maryland Contagious Diseases dataset by dividing the count of cases by the population, then multiplying the result by 100,000.

Dotted Line Plot.

ggplot(md_contagious_diseases, aes(x = year, y = cases_per_hund_thou, color = disease)) + # Plots the year against cases per hundred thousand people for each observation in the Maryland Contagious Diseases dataset. Separates observations by disease through color.
  labs(title = "Cases of Infectious Diseases in Maryland, 1928 - 2011", x = "Year", y = "Annual Number of Cases (per 100,000 people)", caption = "Source: The Tycho Project at the University of Pittsburgh") + # Creates labels for the title, x axis, y axis and caption.
  geom_line(aes(group = disease)) + # Creates separate lines for each disease.
  geom_point() + # Adds points to mark the number of cases per 100,000 for each disease and each year.
  scale_color_brewer(palette = "Set3") + # Applies Set3 Color Brewer palette.
  theme_dark() # Applies a dark theme in order to provide a greater contrast between the background and the plotted lines.

Finally, I created my data visualization. I created a dotted line plot where each observation was plotted by year against the annual total of cases per 100,000 people. The observations were then divided up by disease, each of which were tracked on separate plotted lines and further demarcated by color. It added dots to each plotted line to indicate where each recorded observation laid. For these colors I added the Color Brewer palette “Set3”. Since the colors in this palette are considerably lighter in tone than those in the standard ggplot color palette, I added a dark theme to provide greater contrast to the lines.