RailTrail.hightemp and cloudcover is quite small. Would you be sure that the two variables are not related at all?The data set is from a case-control study of smoking and Alzheimer’s disease. The data set has two variables of main interest:
smoking a factor with four levels “None”, “<10”, “10-20”, and “>20” (cigarettes per day)disease a factor with three levels “Alzheimer”, “Other dementias”, and “Other diagnoses”.library(tidyverse)
# Import data
data("alzheimer", package = "coin")
# create a table
tbl <- xtabs(~disease + smoking, alzheimer)
ftable(tbl)
# create a mosaic plot from the table
library(vcd)
mosaic(tbl,
shade = TRUE,
legend = TRUE,
labeling_args = list(set_varnames = c(disease = "")),
set_labels = list(disease = c("Alzheimer", "Other\ndementias", "Other\ndiagnoses")))
The largest group that has dementia are people who smoke 0 cigarettes each day.
People with dementia that smoke more than 20 cigarettes a day have more cases than expected.
Yes, because smoking is known to increase the risks of having dementia. Using the chart, people are more likely to die of dementia if they smoke more than 20 cigarettes a day.
RailTrail.Hint: The RailTrail data set is from the mosaicData package.
# import data
data(RailTrail, package="mosaicData")
# select numeric variables
df <- dplyr::select_if(RailTrail, is.numeric)
# calulate the correlations
r <- cor(df, use="complete.obs")
round(r,2)
library(ggplot2)
library(ggcorrplot)
# visualize the correlations
ggcorrplot(r,
hc.order = TRUE,
type = "lower",
lab = TRUE)
The variables that have negative correlation are -0.04, -0.25, -0.37, and -0.23.
Fall seems to be the least popular for trail users.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.