1.8 Smoking Habits of U.K. Residents
#Import the smoking csv file
smoking<-read.csv(url("https://raw.githubusercontent.com/jbryer/DATA606Fall2017/master/Data/Data%20from%20openintro.org/Ch%201%20Exercise%20Data/smoking.csv"))
Each row represents a data observation
By using the dim function we see the smoking dataset has 1691 observations and 12 variables.
dim(smoking)
## [1] 1691 12
split(names(smoking),sapply(smoking,function(x) paste(class(x),collapse=" ")))
## $factor
## [1] "gender" "maritalStatus" "highestQualification"
## [4] "nationality" "ethnicity" "grossIncome"
## [7] "region" "smoke" "type"
##
## $integer
## [1] "age" "amtWeekends" "amtWeekdays"
1.10 Cheaters, scope of inference
The population of interest was 160 children with ages of 5 to 15
The results can not be generalized to the population since half of the sample size was coerced in order to receive a reward.
1.28 Reading the Paper
A myriad of factors can lead to the development of dementia with smoking being one. The question asks if smoking causes dementia and it doesn’t; it increases the risk.
The statement is not justified, close to a third of the students were identified as having bully issues or disruptive behaviors. One could state that these behavioral patterns can lead to sleep disorders.
1.36 Exercise and Mental Health
An experiment
The treatment group are study participants that are receiving exercise while the control group are those without exercise.
Yes, the blocking variable is the age variable. Half of the participants from each age group are randomly placed in separate groups.
No, because its stated the control group were instructed not to particiate in exercise.
Yes, the results of the study can be used to establish a causal relationship. The conclusions can also be generalized to the population since you have several groups to report on as well as their baseline values and conclusion values.
I would add one more time point midway of the study; for example if the study ran 52 weeks I would have participants monitored at week 26.
1.48 Stats Scores
#Import the scores csv file
scores<-read.csv(url("https://raw.githubusercontent.com/jbryer/DATA606Fall2017/master/Data/Data%20from%20openintro.org/Ch%201%20Exercise%20Data/stats_scores.csv"))
#Install the psych package
library("psych")
summary(scores)
## scores
## Min. :57.00
## 1st Qu.:72.75
## Median :78.50
## Mean :77.70
## 3rd Qu.:82.25
## Max. :94.00
boxplot(scores)
1.50 Mix-and-Match
Unimodal and match to boxplot #2
Multimodal and match to boxplot #3
Right-skewed and match to boxplot #1
1.56 Distributions and Appropriate Statistics Pt II
Right skewed, median & IQR based on the text it is often more helpful to use median & IQR to describe the center & spread
Symmetric, mean & standard deviation would be better descriptives to run
Left skewed, median & IQR based on the text it is often more helpful to use median & IQR to describe the center & spread
Symmetric, mean & standard deviation would be better descriptives to run
1.70 Heart Transplants
#Import the heartTr csv file
heartTr<-read.csv(url("https://raw.githubusercontent.com/jbryer/DATA606Fall2017/master/Data/Data%20from%20openintro.org/Ch%201%20Exercise%20Data/heartTr.csv"))
88% of the control group passed while 65% of the treatment group passed. Based on the box plot I would have to say yes, survival is independent of whether or not a patient got a transplant.
Compared to the control group, the treatment group mean is higher than the control groups upper whisker. Based on the eyeball analysis, the treatment group was more effective.
See the results in A
Di) That participants receiving a heart transplant will survive longer than patiets that don’t
Dii) face card, non-face card, 24, 24, 12, high
Diii) It is slightly skewed to the right