DSlab

Author

Ava Haghighi

Us Contagious Diseases Data

The data chosen for this assignment from the Dslap library is the US Contagious Diseases Data, which illustrates the rates of various diseases over the years.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(scales)

Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor
library(treemap)
library(ggstream)
Warning: package 'ggstream' was built under R version 4.4.1
library(dslabs)
data("us_contagious_diseases")
head(us_contagious_diseases)
      disease   state year weeks_reporting count population
1 Hepatitis A Alabama 1966              50   321    3345787
2 Hepatitis A Alabama 1967              49   291    3364130
3 Hepatitis A Alabama 1968              52   314    3386068
4 Hepatitis A Alabama 1969              49   380    3412450
5 Hepatitis A Alabama 1970              51   413    3444165
6 Hepatitis A Alabama 1971              51   378    3481798

I created a new variable that calculates the rate of the disease by dividing the number of cases observed in a specific s

tate and year by the population of that state, then multiplying by 100,000.

# Calculating the disease rate per 100,000 people
disease_data <- us_contagious_diseases %>%
  mutate(disease_rate = (count / population) * 100000)

I chose Hepatitis A from the diseases to visualize its rate and how it changed throughout the years.

Hepatitisa <- disease_data |>
  filter( disease == "Hepatitis A")

To understand how each state has changed over the years, I calculated the average disease rate for each state.

# calculating average using mean argument
Data <- Hepatitisa |>
  group_by( state)%>%
  summarise( avg_disease_rate = mean(disease_rate))%>%
  # function slice_max to give the top 10 state 
slice_max(order_by = avg_disease_rate, n = 10, with_ties = TRUE) 
print(Data)
# A tibble: 10 × 2
   state        avg_disease_rate
   <fct>                   <dbl>
 1 Alaska                   30.8
 2 Oregon                   24.2
 3 Arizona                  22.8
 4 New Mexico               22.4
 5 California               17.4
 6 Washington               14.8
 7 Utah                     14.0
 8 Nevada                   13.6
 9 Idaho                    13.2
10 Rhode Island             13.0

For a clearer visualization, I selected only the top 10 states with the highest rates of Hepatitis A.

# filtering the 10 highest  state 
state_data <- Hepatitisa |>
  filter( state == c("Alaska", "Oregon", "Arizona", "New Mexico", "California", "Washington", "Utah", "Nevada", "Nevada", "Idaho", "Rhode Island" ))
Warning: There were 2 warnings in `filter()`.
The first warning was:
ℹ In argument: `==...`.
Caused by warning in `==.default`:
! longer object length is not a multiple of shorter object length
ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
print(state_data)
       disease        state year weeks_reporting count population disease_rate
1  Hepatitis A       Alaska 1975              37   409     345253  118.4638511
2  Hepatitis A       Alaska 1986              21    66     491942   13.4162157
3  Hepatitis A       Alaska 1997              24    20     609112    3.2834684
4  Hepatitis A       Alaska 2008              41     3     691147    0.4340611
5  Hepatitis A      Arizona 1975              50   437    2186365   19.9875135
6  Hepatitis A      Arizona 1986              37  1015    3273618   31.0054502
7  Hepatitis A      Arizona 1997              27  1078    4658333   23.1413254
8  Hepatitis A      Arizona 2008              47   101    6161152    1.6393038
9  Hepatitis A   California 1973              49  6875   20972407   32.7811681
10 Hepatitis A   California 1984              43  6021   25948031   23.2040728
11 Hepatitis A   California 1995              41  5130   32133526   15.9646346
12 Hepatitis A   California 2006              52   688   35870957    1.9179862
13 Hepatitis A        Idaho 1973              46   211     771163   27.3612712
14 Hepatitis A        Idaho 1984              31    85     978566    8.6861796
15 Hepatitis A        Idaho 1995              41   296    1122377   26.3726003
16 Hepatitis A        Idaho 2006              48     8    1468781    0.5446694
17 Hepatitis A       Nevada 1972              27   116     542597   21.3786659
18 Hepatitis A       Nevada 1973              26   111     571543   19.4211109
19 Hepatitis A       Nevada 1983              40   137     903179   15.1686432
20 Hepatitis A       Nevada 1984              46   286     938546   30.4726673
21 Hepatitis A       Nevada 1994              27   178    1476255   12.0575375
22 Hepatitis A       Nevada 1995              39   262    1557866   16.8178778
23 Hepatitis A       Nevada 2005              46     9    2379808    0.3781818
24 Hepatitis A       Nevada 2006              47     9    2446420    0.3678845
25 Hepatitis A   New Mexico 1973              51   491    1077656   45.5618491
26 Hepatitis A   New Mexico 1984              46   254    1399412   18.1504803
27 Hepatitis A   New Mexico 1995              41   686    1656994   41.4002706
28 Hepatitis A   New Mexico 2006              47    12    1973608    0.6080235
29 Hepatitis A       Oregon 1970              51  1008    2091385   48.1977254
30 Hepatitis A       Oregon 1981              48   475    2666441   17.8140075
31 Hepatitis A       Oregon 1992              35   287    2925245    9.8111440
32 Hepatitis A       Oregon 2003              45    55    3578065    1.5371437
33 Hepatitis A Rhode Island 1975              48   267     950344   28.0950898
34 Hepatitis A Rhode Island 1986              19    27     975280    2.7684357
35 Hepatitis A Rhode Island 1997              25    44    1039802    4.2315749
36 Hepatitis A Rhode Island 2008              50    13    1053331    1.2341799
37 Hepatitis A         Utah 1972              46   302    1123473   26.8809308
38 Hepatitis A         Utah 1983              47   142    1552360    9.1473627
39 Hepatitis A         Utah 1994              29   426    1892358   22.5115966
40 Hepatitis A         Utah 2005              50    21    2505982    0.8379948
41 Hepatitis A   Washington 1976              50   445    3838897   11.5918713
42 Hepatitis A   Washington 1987              50  2431    4624310   52.5700050
43 Hepatitis A   Washington 1998              33   710    5687112   12.4843682
44 Hepatitis A   Washington 2009              48    42    6648698    0.6317026
p1 <- state_data |>
  ggplot( aes(x = year, y = disease_rate, fill = state)) +
  
 geom_stream(width = NULL, height = NULL,
  offset = "silhouette", interpolate = "cardinal", interactive = TRUE,
  scale = "date", top = 20, right = 40, bottom = 50, left = 200) +
  labs(title = " Rate of Hepatitis A  of each state through Years (1970-2010)",
       x = "Year", y = "Disease Rate ",
       fill = "States") + scale_fill_brewer(palette = "Spectral") +  theme_minimal()
Warning in geom_stream(width = NULL, height = NULL, offset = "silhouette", : Ignoring unknown parameters: `width`, `height`, `offset`, `interpolate`,
`interactive`, `scale`, `top`, `right`, `bottom`, and `left`
p1

This is a Streamgraph depicting the average rate of Hepatitis A diseases over the years, with each color representing a different state. It’s important to note that this is a comparative chart, where the rates are compared across states. The y-axis does not display the actual number of disease rates but rather illustrates the relative changes and trends in rates over time among the states represented by the different colors.

As observed, Alaska exhibits the highest rate compared to other states, particularly evident at the onset of this disease.