── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)library(plotly) # for interactivity
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(viridis)
Loading required package: viridisLite
# loading DSLabs datasetlibrary(dslabs)
data("us_contagious_diseases")unique(us_contagious_diseases$disease) # for listing different diseases recorded in the dataset
[1] Hepatitis A Measles Mumps Pertussis Polio Rubella
[7] Smallpox
Levels: Hepatitis A Measles Mumps Pertussis Polio Rubella Smallpox
# Creating a new/ modified data setpolio_data <- us_contagious_diseases %>%filter(disease =="Polio"&!state %in%c("Hawaii", "Alaska") &!is.na(year) &!is.na(count) &!is.na(population) &!is.na(weeks_reporting)) %>%mutate(rate =ifelse(weeks_reporting >0, count / population *10000/ (weeks_reporting /52), NA)) %>%filter(!is.na(rate)) # Calculation so rate amount is significant and filtering for 0 week reporting values
p <-ggplot(polio_data, aes(x = year, y = rate, color = state, group = state)) +geom_line() +# Plot the line for each stategeom_vline(xintercept =1955, linetype ="dashed", color ="blue") +# Vertical line at the introduction of the vaccinetheme_dark() +# Dark theme for a dark backgroundlabs(title ="Polio Incidence Over Time by State",x ="Year", y ="Polio Incidence Rate (per 10,000 people)",caption ="Source: Tycho Project") +theme(legend.position ="right", # Move the legend to the rightlegend.box ="vertical", # Arrange legend items verticallylegend.key.size =unit(0.4, "cm"), # Adjust the size of the legend keyslegend.text =element_text(size =8), # Adjust legend text size for better readabilityaxis.text.x =element_text(angle =45, hjust =1) # Rotate x-axis labels for readability ) +scale_color_viridis(discrete =TRUE) +# Using viridis colors (discrete scale)guides(color =guide_legend(ncol =3)) # Using one column in the legendp <-ggplotly(p)p
Week 7 Dslabs Notes
I used “us_contagious_diseases” dataset from dslabs. I wanted all US states to be included except Hawaii and Alaska as I was not getting desired results on including them. I also wanted to know the states with hightest cases recorded and those with lowest, hence in order to get specific with my graph I used plotly. I wanted to use highchart, but faced difficulty with implementation. I went ahead and created a new data set called “polio_data” with no NA’s and used the calculation provided by professor Saidi for meaning numbers to plot on graph. Then lastly for the graph I used dark them and used “viridisLite” color palatte for my legend and aligned the legend on the right side. Used a x-intercept line at 1955 to compare the rates after the introduction of vaccine in the United States on April 12, 1955. With plotly I can see the highest rates of Polio reported over the years for different states. Some interesting points to note are: Nebraska and South Dakota had one of the highest values reported on 1952 considered the worst outbreak year all across the country, with the rates 15.72 and 15.16 respectively. Pennsylvania, South Carolina, even New York were some of the states which maintained low rates in 1950s.