Assignment 7

Assignment 7: Using DSLabs Datasets

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(dslabs)
options(repos = c(CRAN = "https://cloud.r-project.org"))
library(highcharter)
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use

Attaching package: 'highcharter'
The following object is masked from 'package:dslabs':

    stars
library(RColorBrewer)
setwd("C:/Users/24680/Downloads")
file<-read.csv("us_contagious_diseases.csv")
head(file)
      disease   state year weeks_reporting count population
1 Hepatitis A Alabama 1966              50   321    3345787
2 Hepatitis A Alabama 1967              49   291    3364130
3 Hepatitis A Alabama 1968              52   314    3386068
4 Hepatitis A Alabama 1969              49   380    3412450
5 Hepatitis A Alabama 1970              51   413    3444165
6 Hepatitis A Alabama 1971              51   378    3481798
selected<-select(file, disease, state, count, population, year)
filtered<-filter(selected, disease=="Measles" & state%in% c("California", "New York", "Texas")) |>
  arrange(year)
#Deafult Formation of Graph
#Color Variable of Graph
cols<-brewer.pal(4, "Set2")
#Deafult Formation of Graph
highchart()|>
  hc_add_series(data=filtered,
                type="line", 
                hcaes(x=year, 
                       y=count,
                       group=state))|>
#Adding Color to Graph
hc_colors(cols)|>
#Titles and Text on Graph
hc_title(text = "Measles Cases Over the Years")|>
hc_subtitle(text = "In Top 3 Most Populated States")|>
#Adding Source as a caption
hc_caption(text = "Source: DSLabs")|>
#Adding Axis Titles
hc_xAxis(title = list(text = "Year"))|>
hc_yAxis(title = list(text = "Count of Cases of Measles"))|>

#Designing the symbol of the Graph
hc_plotOptions(series = list(marker = list(symbol = "circle")))|>
hc_tooltip(shared = TRUE, borderColor = "black")

Insights:

The data I used from DSLabs is about US contagious diseases from the 1920s to the early 2000s. From this data, I narrowed it down to the most populated states (in this case it is California, New York, and Texas) and focused on the Measles cases. After filtering and selecting the data, I made a high chart which shows the count of cases in those states from the 1920s to the early 2000s. From the graph, I noticed that the pinnacle of the Measles was from around the 1940s to the 1950s since all three states had the most cases during that time (with New York having the most cases from the three states). I also noticed that after the cases of the Measles died down in the 2000s to the point there were 0 cases in New York and Texas, California still had some cases during that time. What was intersting was the fact that California started off with the least cases and ended up being the last state out of the three to have 0 cases.