The data set is from a case-control study of smoking and Alzheimer’s disease. The data set has two variables of main interest:

library(tidyverse)
## ── Attaching packages ────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# Import data
data("alzheimer", package = "coin")

# create a table
tbl <- xtabs(~disease + smoking, alzheimer)
ftable(tbl)
##                 smoking None <10 10-20 >20
## disease                                   
## Alzheimer                126  15    30  27
## Other dementias           79   8    33  44
## Other diagnoses          104   5    47  20
# create a mosaic plot from the table
library(vcd)
## Loading required package: grid
mosaic(tbl, 
       shade = TRUE,
       legend = TRUE,
       labeling_args = list(set_varnames = c(disease = "")),
       set_labels = list(disease = c("Alzheimer", "Other\ndementias", "Other\ndiagnoses")))

Q1 Describe the largest group that has other dementias. Discuss it by number of cigarettes per day.

The largest group that has dementia is the non smokers. They have zero cigarettes per day. ## Q2 Describe one group that has more cases than expected given independence (by chance). Discuss it by number of cigarettes per day. The group that has more cases than expected given indepence is people with dementia who smoke twenty or more cigarettes per day. ## Q3 Does smoking seem to matter in determining other dementias? Discuss your reason using the masaic chart above. Based on the data smoking does not seem to matter in determining dementia. As people who smoke get alzheimers at nearly the same rate as those who dont. Also the pearson residiuals number is 0 meaning they get it at the rate expected. ## Q4 Create correlation plot for RailTrail. Hint: The RailTrail data set is from the mosaicData package.

data(mosaicdata)
## Warning in data(mosaicdata): data set 'mosaicdata' not found

Q5 List all four variables that have negative correlation with the number of trail users (volume).

Variables that have positive relationship with the number of trail users are: hightemp, avgtemp, lowtemp, and summer. ## Q6 What season seems to be least popular for trail users? The season that seems to be the most popular for trail users is summer because it has the highest positive correlation out of all the seasons. ## Q7 The correlation coefficient between hightemp and cloudcover is quite small. Would you be sure that the two variables are not related at all? Hint: One word answer (e.g., yes or no) is NOT enough. Explain why.

data(hightemp)
## Warning in data(hightemp): data set 'hightemp' not found
data(cloudcover)
## Warning in data(cloudcover): data set 'cloudcover' not found

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.