Introduction

This data set is taken from the package DsLabs. us_contagious_disease is a dataset that is about some dreadful contiguous diseases for US States. It has 6 variables and 16065 observations

Variable Summary

• Variable Name and description • Disease: A factor containing disease names. • State: A factor containing state names. • Year: The year reported. • weeks_reporting: Number of weeks counts were reported that year. • Count: Total number of reported cases. • Population: State population.

Load library DsLab

library("dslabs")
data(package="dslabs")
list.files(system.file("script", package = "dslabs"))
##  [1] "make-admissions.R"                   
##  [2] "make-brca.R"                         
##  [3] "make-brexit_polls.R"                 
##  [4] "make-death_prob.R"                   
##  [5] "make-divorce_margarine.R"            
##  [6] "make-gapminder-rdas.R"               
##  [7] "make-greenhouse_gases.R"             
##  [8] "make-historic_co2.R"                 
##  [9] "make-mnist_27.R"                     
## [10] "make-movielens.R"                    
## [11] "make-murders-rda.R"                  
## [12] "make-na_example-rda.R"               
## [13] "make-nyc_regents_scores.R"           
## [14] "make-olive.R"                        
## [15] "make-outlier_example.R"              
## [16] "make-polls_2008.R"                   
## [17] "make-polls_us_election_2016.R"       
## [18] "make-reported_heights-rda.R"         
## [19] "make-research_funding_rates.R"       
## [20] "make-stars.R"                        
## [21] "make-temp_carbon.R"                  
## [22] "make-tissue-gene-expression.R"       
## [23] "make-trump_tweets.R"                 
## [24] "make-weekly_us_contagious_diseases.R"
## [25] "save-gapminder-example-csv.R"

Load Libraries

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggplot2)
library(readr)
library("dslabs")
library(ggthemes)
library(ggrepel)
library(RColorBrewer)
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
library(dplyr)
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo 
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
## 
## Attaching package: 'highcharter'
## 
## The following object is masked from 'package:dslabs':
## 
##     stars
library(streamgraph)
library(devtools)
## Loading required package: usethis

US Contagious Disease Dataset

write_csv(us_contagious_diseases , "us_contagious_diseases.csv", na="")

Summarise and view the structure of the dataset

us_disease <- us_contagious_diseases 
summary(us_disease)
##         disease            state            year      weeks_reporting
##  Hepatitis A:2346   Alabama   :  315   Min.   :1928   Min.   : 0.00  
##  Measles    :3825   Alaska    :  315   1st Qu.:1950   1st Qu.:31.00  
##  Mumps      :1785   Arizona   :  315   Median :1975   Median :46.00  
##  Pertussis  :2856   Arkansas  :  315   Mean   :1971   Mean   :37.38  
##  Polio      :2091   California:  315   3rd Qu.:1990   3rd Qu.:50.00  
##  Rubella    :1887   Colorado  :  315   Max.   :2011   Max.   :52.00  
##  Smallpox   :1275   (Other)   :14175                                 
##      count          population      
##  Min.   :     0   Min.   :   86853  
##  1st Qu.:     7   1st Qu.: 1018755  
##  Median :    69   Median : 2749249  
##  Mean   :  1492   Mean   : 4107584  
##  3rd Qu.:   525   3rd Qu.: 4996229  
##  Max.   :132342   Max.   :37607525  
##                   NA's   :214
str(us_disease)
## 'data.frame':    16065 obs. of  6 variables:
##  $ disease        : Factor w/ 7 levels "Hepatitis A",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ state          : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year           : num  1966 1967 1968 1969 1970 ...
##  $ weeks_reporting: num  50 49 52 49 51 51 45 45 45 46 ...
##  $ count          : num  321 291 314 380 413 378 342 467 244 286 ...
##  $ population     : num  3345787 3364130 3386068 3412450 3444165 ...

Create a table of disease

table(us_disease$disease)
## 
## Hepatitis A     Measles       Mumps   Pertussis       Polio     Rubella 
##        2346        3825        1785        2856        2091        1887 
##    Smallpox 
##        1275

Top 5 states with the most “Measles” cases over 20 years from 1990 to 2010

us_disease %>%
  filter(year >= 1990, year <= 2010, disease == "Measles") %>%
  arrange(desc(count)) %>%
  head(10)
##    disease        state year weeks_reporting count population
## 1  Measles   California 1990              48  9598   29760021
## 2  Measles        Texas 1990              43  3898   16986510
## 3  Measles   California 1991              48  1963   30311890
## 4  Measles     New York 1991              37  1407   18091650
## 5  Measles Pennsylvania 1991              47  1402   11905415
## 6  Measles     Illinois 1990              43  1037   11430602
## 7  Measles   New Jersey 1991              44  1019    7793388
## 8  Measles        Texas 1992              32  1002   17644041
## 9  Measles     New York 1990              45   806   17990455
## 10 Measles    Wisconsin 1990              46   657    4891769

Filter the data

us_d_disease <- us_disease %>%
  filter(disease == "Measles" | disease == "Pertussis" | disease == "Hepatitis A" | disease == "Polio" | disease == "Rubella" | disease == "Smallpox") %>%
  arrange(year, state)

Visualization

To create the chart, I will use three variables that I want to compare and see the progression of the disease each year. I will use the highchart column chart to create an interactive chart and make it easy for people to understand. I will also use the hc_color function to add color by using the color Brewer palette packages. Then, I will use the hc_xAxis, hc_yAxis, and hc_title to add the axis and title to the chart. Moreover, I will use the hc_plotOptions to config the object for each series type, hc_legend,( to create a box containing a point item in the chart), and hc_chart( to add style in the text and adjust the format).

cols <- brewer.pal(3, "Dark2")
highchart() %>% 
  hc_add_series(data = us_d_disease,
                   type = "column", hcaes(x= year, y = count,
                                           group = disease)) %>%
  hc_colors(cols)%>%
  hc_xAxis(title = list(text="Year")) %>%
  hc_yAxis(title = list(text="Population Per Thousand)"))%>%
  hc_plotOptions(series = list(marker = list(symbol = "circle"))) %>%
  hc_title(text="Comparison of Number of Infected People per Year")%>%
hc_legend(align = "right", 
            verticalAlign = "top",
                      layout = "vertical")%>%
  hc_tooltip(shared = TRUE,
             borderColor = "Blue",
             pointFormat = "{point.disease:} :{point.count:.f}<br>") %>%
  hc_chart(style = list(fontFamily = "Arial",
                        fontWeight = "bold"))

Paragraph

Contagious Diseases From 1928 to 2011 is a multivariable chart. The variables represented in the visualization are year, count, and number of people infected. The type of disease is further identified by color. I thought it would be interesting to compare the number of people infected by those diseases each year, specifically to identify the trends in correlation with the development of vaccines. The reader should gain insight from this graph that the number of contagious diseases has strongly decreased over time. it is evident that the number of contagious diseases and types of contagious diseases has been greatly reduced compared to the previous years (1928, 1930, and 1940).It is important to recognize how Measles abruptly reduced in number during the mid-1960s because maybe of the Measles ’vaccines. Therefore, we might consider this change due to this medical advancement.