The data chosen for this assignment from the Dslap library is the US Contagious Diseases Data, which illustrates the rates of various diseases over the years.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(scales)
Attaching package: 'scales'
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
library(treemap)library(ggstream)
Warning: package 'ggstream' was built under R version 4.4.1
To understand how each state has changed over the years, I calculated the average disease rate for each state.
# calculating average using mean argumentData <- Hepatitisa |>group_by( state)%>%summarise( avg_disease_rate =mean(disease_rate))%>%# function slice_max to give the top 10 state slice_max(order_by = avg_disease_rate, n =10, with_ties =TRUE) print(Data)
# A tibble: 10 × 2
state avg_disease_rate
<fct> <dbl>
1 Alaska 30.8
2 Oregon 24.2
3 Arizona 22.8
4 New Mexico 22.4
5 California 17.4
6 Washington 14.8
7 Utah 14.0
8 Nevada 13.6
9 Idaho 13.2
10 Rhode Island 13.0
For a clearer visualization, I selected only the top 10 states with the highest rates of Hepatitis A.
# filtering the 10 highest state state_data <- Hepatitisa |>filter( state ==c("Alaska", "Oregon", "Arizona", "New Mexico", "California", "Washington", "Utah", "Nevada", "Nevada", "Idaho", "Rhode Island" ))
Warning: There were 2 warnings in `filter()`.
The first warning was:
ℹ In argument: `==...`.
Caused by warning in `==.default`:
! longer object length is not a multiple of shorter object length
ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
print(state_data)
disease state year weeks_reporting count population disease_rate
1 Hepatitis A Alaska 1975 37 409 345253 118.4638511
2 Hepatitis A Alaska 1986 21 66 491942 13.4162157
3 Hepatitis A Alaska 1997 24 20 609112 3.2834684
4 Hepatitis A Alaska 2008 41 3 691147 0.4340611
5 Hepatitis A Arizona 1975 50 437 2186365 19.9875135
6 Hepatitis A Arizona 1986 37 1015 3273618 31.0054502
7 Hepatitis A Arizona 1997 27 1078 4658333 23.1413254
8 Hepatitis A Arizona 2008 47 101 6161152 1.6393038
9 Hepatitis A California 1973 49 6875 20972407 32.7811681
10 Hepatitis A California 1984 43 6021 25948031 23.2040728
11 Hepatitis A California 1995 41 5130 32133526 15.9646346
12 Hepatitis A California 2006 52 688 35870957 1.9179862
13 Hepatitis A Idaho 1973 46 211 771163 27.3612712
14 Hepatitis A Idaho 1984 31 85 978566 8.6861796
15 Hepatitis A Idaho 1995 41 296 1122377 26.3726003
16 Hepatitis A Idaho 2006 48 8 1468781 0.5446694
17 Hepatitis A Nevada 1972 27 116 542597 21.3786659
18 Hepatitis A Nevada 1973 26 111 571543 19.4211109
19 Hepatitis A Nevada 1983 40 137 903179 15.1686432
20 Hepatitis A Nevada 1984 46 286 938546 30.4726673
21 Hepatitis A Nevada 1994 27 178 1476255 12.0575375
22 Hepatitis A Nevada 1995 39 262 1557866 16.8178778
23 Hepatitis A Nevada 2005 46 9 2379808 0.3781818
24 Hepatitis A Nevada 2006 47 9 2446420 0.3678845
25 Hepatitis A New Mexico 1973 51 491 1077656 45.5618491
26 Hepatitis A New Mexico 1984 46 254 1399412 18.1504803
27 Hepatitis A New Mexico 1995 41 686 1656994 41.4002706
28 Hepatitis A New Mexico 2006 47 12 1973608 0.6080235
29 Hepatitis A Oregon 1970 51 1008 2091385 48.1977254
30 Hepatitis A Oregon 1981 48 475 2666441 17.8140075
31 Hepatitis A Oregon 1992 35 287 2925245 9.8111440
32 Hepatitis A Oregon 2003 45 55 3578065 1.5371437
33 Hepatitis A Rhode Island 1975 48 267 950344 28.0950898
34 Hepatitis A Rhode Island 1986 19 27 975280 2.7684357
35 Hepatitis A Rhode Island 1997 25 44 1039802 4.2315749
36 Hepatitis A Rhode Island 2008 50 13 1053331 1.2341799
37 Hepatitis A Utah 1972 46 302 1123473 26.8809308
38 Hepatitis A Utah 1983 47 142 1552360 9.1473627
39 Hepatitis A Utah 1994 29 426 1892358 22.5115966
40 Hepatitis A Utah 2005 50 21 2505982 0.8379948
41 Hepatitis A Washington 1976 50 445 3838897 11.5918713
42 Hepatitis A Washington 1987 50 2431 4624310 52.5700050
43 Hepatitis A Washington 1998 33 710 5687112 12.4843682
44 Hepatitis A Washington 2009 48 42 6648698 0.6317026
p1 <- state_data |>ggplot( aes(x = year, y = disease_rate, fill = state)) +geom_stream(width =NULL, height =NULL,offset ="silhouette", interpolate ="cardinal", interactive =TRUE,scale ="date", top =20, right =40, bottom =50, left =200) +labs(title =" Rate of Hepatitis A of each state through Years (1970-2010)",x ="Year", y ="Disease Rate ",fill ="States") +scale_fill_brewer(palette ="Spectral") +theme_minimal()
This is a Streamgraph depicting the average rate of Hepatitis A diseases over the years, with each color representing a different state. It’s important to note that this is a comparative chart, where the rates are compared across states. The y-axis does not display the actual number of disease rates but rather illustrates the relative changes and trends in rates over time among the states represented by the different colors.
As observed, Alaska exhibits the highest rate compared to other states, particularly evident at the onset of this disease.