SaratogaHouses.lotSize and pctCollege is quite small. Would you be sure that the two variables are not related at all?The data set is from a case-control study of smoking and Alzheimer’s disease. The data set has two variables of main interest:
smoking a factor with four levels “None”, “<10”, “10-20”, and “>20” (cigarettes per day)disease a factor with three levels “Alzheimer”, “Other dementias”, and “Other diagnoses”.## ── Attaching packages ────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.4
## ✓ tibble 3.0.1 ✓ dplyr 0.8.5
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Loading required package: grid
The smallest group that has Other dementias are the group that smokes less than 10 cigarettes a day. The other group smokes over 20 cigarettes a day that has the next largest amount of dementias.
Other diagnoses of other cases have a large amount of people who smoke 10-20 packs a day.
Smoking does seem to matter in determining Alzheimers. Other diagnoses are also affected by cigarretts a lot as well, more so in some cases.
SaratogaHouses.Hint: The SaratogaHouses data set is from the mosaicData package.
Living area has the highest correlation with home price.
Houses with the most living area would be the most expensive.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.