library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.0 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.1 ✔ tibble 3.1.8
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
setwd("~/Desktop/RWD")
diseases <- read_csv("us_contagious_diseases.csv")
## Rows: 18870 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): disease, state
## dbl (4): year, weeks_reporting, count, population
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
states <- c("California", "Arizona", "Alabama", "Connecticut", "New York")
Polio <- diseases %>%
filter(disease == "Polio", state %in% states)
ggplot(Polio, aes(x = year, y = count, color = state)) +
geom_point(aes(size = count, alpha = .5)) +
geom_line(size = .3) +
scale_color_manual(values = c("#DE7CEB", "#FF79C5", "#574143", "#00C4B9", "#FFCAFF")) +
labs(title = "Total Number of Polio Cases in the US by State",
x = "Year",
y = "Total Cases",
color = "State") +
geom_vline(xintercept=1956, col = "pink") +
geom_vline(xintercept=1973, col = "pink") +
theme_minimal(base_size = 14, base_family = "serif")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
I chose to use the contagious diseases data set. I created a graph based on polio in 5 random states through the years. I filtered it so that the only data in my new data frame would be polio and the states I randomly selected. I then created a graph that shows this data. I made a custom color palette and changed the font to make the graph prettier. I added lines in 1956 and 1973, as 1956 is when cases went to 0, and in 1973, some cases came back. I made the points translucent, and their size depended on the number of cases. I also made the graph minimal so you can focus on the importance of the points.