The data set is from a case-control study of smoking and Alzheimer’s disease. The data set has two variables of main interest:
smoking a factor with four levels “None”, “<10”, “10-20”, and “>20” (cigarettes per day)disease a factor with three levels “Alzheimer”, “Other dementias”, and “Other diagnoses”.library(tidyverse)
## ── Attaching packages ────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.5
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
# Import data
data("alzheimer", package = "coin")
# create a table
tbl <- xtabs(~disease + smoking, alzheimer)
ftable(tbl)
## smoking None <10 10-20 >20
## disease
## Alzheimer 126 15 30 27
## Other dementias 79 8 33 44
## Other diagnoses 104 5 47 20
# create a mosaic plot from the table
library(vcd)
## Loading required package: grid
mosaic(tbl,
shade = TRUE,
legend = TRUE,
labeling_args = list(set_varnames = c(disease = "")),
set_labels = list(disease = c("Alzheimer", "Other\ndementias", "Other\ndiagnoses")))
The largest would be the non smokers they smoked the least each day.
People who smoke more then 20 cigaretts per day.
people who have dementias increase the number of smoking more often. ## Q4 Create correlation plot for RailTrail. Hint: The RailTrail data set is from the mosaicData package.
The variables that have negtive corrlation with the number of trail users are as seen in the chart, Spring, fall, cloudcover, and precipitation.
The most popular season is fall -0.25. ## Q7 The correlation coefficient between hightemp and cloudcover is quite small. Would you be sure that the two variables are not related at all? Hint: One word answer (e.g., yes or no) is NOT enough. Explain why. You cant really tell 100%. ## Q7.a Continued from Q7. Create scatter plot. After examing the scatter plot, would you conclude that the two variables are not related at all? Hint: Discuss your reason by explaining your scatter plot.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.