RailTrail.hightemp and cloudcover is quite small. Would you be sure that the two variables are not related at all? Create scatter plot. After examing the scatter plot, would you conclude that the two variables are not related at all?The data set is from a case-control study of smoking and Alzheimer’s disease. The data set has two variables of main interest:
smoking a factor with four levels “None”, “<10”, “10-20”, and “>20” (cigarettes per day)disease a factor with three levels “Alzheimer”, “Other dementias”, and “Other diagnoses”.library(tidyverse)
# Import data
data("alzheimer", package = "coin")
# create a table
tbl <- xtabs(~disease + smoking, alzheimer)
ftable(tbl)
## smoking None <10 10-20 >20
## disease
## Alzheimer 126 15 30 27
## Other dementias 79 8 33 44
## Other diagnoses 104 5 47 20
# create a mosaic plot from the table
library(vcd)
mosaic(tbl,
shade = TRUE,
legend = TRUE,
labeling_args = list(set_varnames = c(disease = "")),
set_labels = list(disease = c("Alzheimer", "Other\ndementias", "Other\ndiagnoses")))
The Non smoker group, is the largest group that has Alzheimer. The more likely you are to have altzheimer if you dont smoke.
A group that has more cases than expected would be other dementias who consume a minimum of 20 cigarettes a day.
No I dont think it matters, because the non smokers are the largest group with Alzheimer.
RailTrail.Hint: The RailTrail data set is from the mosaicData package.
data(RailTrail, package="mosaicData")
# select numeric variables
df <- dplyr::select_if(RailTrail, is.numeric)
# calulate the correlations
r <- cor(df, use="complete.obs")
round(r,2)
## hightemp lowtemp avgtemp spring summer fall cloudcover precip
## hightemp 1.00 0.66 0.92 -0.33 0.67 -0.40 -0.10 0.13
## lowtemp 0.66 1.00 0.90 -0.39 0.74 -0.41 0.37 0.37
## avgtemp 0.92 0.90 1.00 -0.39 0.77 -0.44 0.14 0.27
## spring -0.33 -0.39 -0.39 1.00 -0.74 -0.47 -0.10 -0.25
## summer 0.67 0.74 0.77 -0.74 1.00 -0.24 0.17 0.34
## fall -0.40 -0.41 -0.44 -0.47 -0.24 1.00 -0.08 -0.09
## cloudcover -0.10 0.37 0.14 -0.10 0.17 -0.08 1.00 0.37
## precip 0.13 0.37 0.27 -0.25 0.34 -0.09 0.37 1.00
## volume 0.58 0.18 0.43 -0.04 0.23 -0.25 -0.37 -0.23
## volume
## hightemp 0.58
## lowtemp 0.18
## avgtemp 0.43
## spring -0.04
## summer 0.23
## fall -0.25
## cloudcover -0.37
## precip -0.23
## volume 1.00
The variables that have positive correlation with the number of trail users are Hightemp, AverageTemp, Lowtemp and summer.
I would say that summer seems to be the most popular season for trail users because it only has two negatives in its row while all the others have more than two negatives.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.