The data set is from a case-control study of smoking and Alzheimer’s disease. The data set has two variables of main interest:
smoking
a factor with four levels “None”, “<10”, “10-20”, and “>20” (cigarettes per day)disease
a factor with three levels “Alzheimer”, “Other dementias”, and “Other diagnoses”.## ── Attaching packages ────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 0.8.3 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## smoking None <10 10-20 >20
## disease
## Alzheimer 126 15 30 27
## Other dementias 79 8 33 44
## Other diagnoses 104 5 47 20
## Loading required package: grid
The largest group that has Alzheimers is the none smokers. The largest group smokes no cigarettes.
The one group that has more cases than expected given independence is the dementias and they smoke about 20 cigarettes per day.
No smoking does not seem to matter due to the faact that out of the group of people who have alzheimers, non smokers have it most. ## Q4 Create correlation plot for RailTrail
. Hint: The RailTrail
data set is from the mosaicData
package.
## hightemp lowtemp avgtemp spring summer fall cloudcover precip
## hightemp 1.00 0.66 0.92 -0.33 0.67 -0.40 -0.10 0.13
## lowtemp 0.66 1.00 0.90 -0.39 0.74 -0.41 0.37 0.37
## avgtemp 0.92 0.90 1.00 -0.39 0.77 -0.44 0.14 0.27
## spring -0.33 -0.39 -0.39 1.00 -0.74 -0.47 -0.10 -0.25
## summer 0.67 0.74 0.77 -0.74 1.00 -0.24 0.17 0.34
## fall -0.40 -0.41 -0.44 -0.47 -0.24 1.00 -0.08 -0.09
## cloudcover -0.10 0.37 0.14 -0.10 0.17 -0.08 1.00 0.37
## precip 0.13 0.37 0.27 -0.25 0.34 -0.09 0.37 1.00
## volume 0.58 0.18 0.43 -0.04 0.23 -0.25 -0.37 -0.23
## volume
## hightemp 0.58
## lowtemp 0.18
## avgtemp 0.43
## spring -0.04
## summer 0.23
## fall -0.25
## cloudcover -0.37
## precip -0.23
## volume 1.00
The variables that have a correlaton with the number of trail users are high temp, average temp and low temp because they are all positive. ## Q6 What season seems to be most popular for trail users? Summer is the most popular season for trail users. ## Q7 The correlation coefficient between hightemp
and cloudcover
is quite small. Would you be sure that the two variables are not related at all? Create scatter plot. After examing the scatter plot, would you conclude that the two variables are not related at all? Hint: Discuss your reason by explaining your scatter plot.
Hint: Use message
, echo
and results
in the chunk options. Refer to the RMarkdown Reference Guide.