RailTrail.The data set is from a case-control study of smoking and Alzheimer’s disease. The data set has two variables of main interest:
smoking a factor with four levels “None”, “<10”, “10-20”, and “>20” (cigarettes per day)disease a factor with three levels “Alzheimer”, “Other dementias”, and “Other diagnoses”.## ── Attaching packages ──────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 0.8.3 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ─────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## Loading required package: grid
None smokers are the largest alzheimer group. By not smoking cigaretes your chances of getting alzheimers tends to increase. ## Q2 Describe one group that has more cases than expected given independence (by chance). Discuss it by number of cigarettes per day. The one group that has more cases than expected are the people who smoke 10 or more cigarettes a day. ## Q3 Does smoking seem to matter in determining Alzheimer? Discuss your reason using the masaic chart above. No because the non smokers have the highest percentage of people with alzheimers.
RailTrail.Hint: The RailTrail data set is from the mosaicData package.
The variables with positive correlation are hightemp,lowtemp, avgtemp, summer. ## Q6 What season seems to be most popular for trail users? The most popular season seems to be the summer. ## Q7 The correlation coefficient between hightemp and cloudcover is quite small. Would you be sure that the two variables are not related at all? Create scatter plot. After examing the scatter plot, would you conclude that the two variables are not related at all? The 2 variables are not related at all. The scatter plot has no direction and just has dots all over the page. Hint: Discuss your reason by explaining your scatter plot.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.