This data set is taken from the package DsLabs. us_contagious_disease is a dataset that is about some dreadful contiguous diseases for US States. It has 6 variables and 16065 observations
• Variable Name and description • Disease: A factor containing disease names. • State: A factor containing state names. • Year: The year reported. • weeks_reporting: Number of weeks counts were reported that year. • Count: Total number of reported cases. • Population: State population.
library("dslabs")
data(package="dslabs")
list.files(system.file("script", package = "dslabs"))
## [1] "make-admissions.R"
## [2] "make-brca.R"
## [3] "make-brexit_polls.R"
## [4] "make-death_prob.R"
## [5] "make-divorce_margarine.R"
## [6] "make-gapminder-rdas.R"
## [7] "make-greenhouse_gases.R"
## [8] "make-historic_co2.R"
## [9] "make-mnist_27.R"
## [10] "make-movielens.R"
## [11] "make-murders-rda.R"
## [12] "make-na_example-rda.R"
## [13] "make-nyc_regents_scores.R"
## [14] "make-olive.R"
## [15] "make-outlier_example.R"
## [16] "make-polls_2008.R"
## [17] "make-polls_us_election_2016.R"
## [18] "make-reported_heights-rda.R"
## [19] "make-research_funding_rates.R"
## [20] "make-stars.R"
## [21] "make-temp_carbon.R"
## [22] "make-tissue-gene-expression.R"
## [23] "make-trump_tweets.R"
## [24] "make-weekly_us_contagious_diseases.R"
## [25] "save-gapminder-example-csv.R"
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggplot2)
library(readr)
library("dslabs")
library(ggthemes)
library(ggrepel)
library(RColorBrewer)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library(dplyr)
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
##
## Attaching package: 'highcharter'
##
## The following object is masked from 'package:dslabs':
##
## stars
library(streamgraph)
library(devtools)
## Loading required package: usethis
write_csv(us_contagious_diseases , "us_contagious_diseases.csv", na="")
us_disease <- us_contagious_diseases
summary(us_disease)
## disease state year weeks_reporting
## Hepatitis A:2346 Alabama : 315 Min. :1928 Min. : 0.00
## Measles :3825 Alaska : 315 1st Qu.:1950 1st Qu.:31.00
## Mumps :1785 Arizona : 315 Median :1975 Median :46.00
## Pertussis :2856 Arkansas : 315 Mean :1971 Mean :37.38
## Polio :2091 California: 315 3rd Qu.:1990 3rd Qu.:50.00
## Rubella :1887 Colorado : 315 Max. :2011 Max. :52.00
## Smallpox :1275 (Other) :14175
## count population
## Min. : 0 Min. : 86853
## 1st Qu.: 7 1st Qu.: 1018755
## Median : 69 Median : 2749249
## Mean : 1492 Mean : 4107584
## 3rd Qu.: 525 3rd Qu.: 4996229
## Max. :132342 Max. :37607525
## NA's :214
str(us_disease)
## 'data.frame': 16065 obs. of 6 variables:
## $ disease : Factor w/ 7 levels "Hepatitis A",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ state : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : num 1966 1967 1968 1969 1970 ...
## $ weeks_reporting: num 50 49 52 49 51 51 45 45 45 46 ...
## $ count : num 321 291 314 380 413 378 342 467 244 286 ...
## $ population : num 3345787 3364130 3386068 3412450 3444165 ...
table(us_disease$disease)
##
## Hepatitis A Measles Mumps Pertussis Polio Rubella
## 2346 3825 1785 2856 2091 1887
## Smallpox
## 1275
us_disease %>%
filter(year >= 1990, year <= 2010, disease == "Measles") %>%
arrange(desc(count)) %>%
head(10)
## disease state year weeks_reporting count population
## 1 Measles California 1990 48 9598 29760021
## 2 Measles Texas 1990 43 3898 16986510
## 3 Measles California 1991 48 1963 30311890
## 4 Measles New York 1991 37 1407 18091650
## 5 Measles Pennsylvania 1991 47 1402 11905415
## 6 Measles Illinois 1990 43 1037 11430602
## 7 Measles New Jersey 1991 44 1019 7793388
## 8 Measles Texas 1992 32 1002 17644041
## 9 Measles New York 1990 45 806 17990455
## 10 Measles Wisconsin 1990 46 657 4891769
us_d_disease <- us_disease %>%
filter(disease == "Measles" | disease == "Pertussis" | disease == "Hepatitis A" | disease == "Polio" | disease == "Rubella" | disease == "Smallpox") %>%
arrange(year, state)
To create the chart, I will use three variables that I want to compare and see the progression of the disease each year. I will use the highchart column chart to create an interactive chart and make it easy for people to understand. I will also use the hc_color function to add color by using the color Brewer palette packages. Then, I will use the hc_xAxis, hc_yAxis, and hc_title to add the axis and title to the chart. Moreover, I will use the hc_plotOptions to config the object for each series type, hc_legend,( to create a box containing a point item in the chart), and hc_chart( to add style in the text and adjust the format).
cols <- brewer.pal(3, "Dark2")
highchart() %>%
hc_add_series(data = us_d_disease,
type = "column", hcaes(x= year, y = count,
group = disease)) %>%
hc_colors(cols)%>%
hc_xAxis(title = list(text="Year")) %>%
hc_yAxis(title = list(text="Population Per Thousand)"))%>%
hc_plotOptions(series = list(marker = list(symbol = "circle"))) %>%
hc_title(text="Comparison of Number of Infected People per Year")%>%
hc_legend(align = "right",
verticalAlign = "top",
layout = "vertical")%>%
hc_tooltip(shared = TRUE,
borderColor = "Blue",
pointFormat = "{point.disease:} :{point.count:.f}<br>") %>%
hc_chart(style = list(fontFamily = "Arial",
fontWeight = "bold"))
Contagious Diseases From 1928 to 2011 is a multivariable chart. The variables represented in the visualization are year, count, and number of people infected. The type of disease is further identified by color. I thought it would be interesting to compare the number of people infected by those diseases each year, specifically to identify the trends in correlation with the development of vaccines. The reader should gain insight from this graph that the number of contagious diseases has strongly decreased over time. it is evident that the number of contagious diseases and types of contagious diseases has been greatly reduced compared to the previous years (1928, 1930, and 1940).It is important to recognize how Measles abruptly reduced in number during the mid-1960s because maybe of the Measles ’vaccines. Therefore, we might consider this change due to this medical advancement.