##1.8
##a) Each row represents an individual case with data about gender, age, marital status, income,
##whether the person smokes, and no of cigarettes smoked on weekends and weekdays
##b) 1691 participants
##c) Sex= Categorical, not ordinal
##Age= Numercical, Continuous
##Marital=Categorical, not ordinal
##Gross income= Categorical, Ordinal
##Some=Categorical, not ordinal
##amtWeekends=Numerical, discrete
##amtWeekdays=Numerical, discrete
#1.10
##a) Population is children between the ages 5 to 15
##Sample is the 160 children on whom the experiment was conducted
##B) Yes the results can be generalized to population. Yes the findings may be used to establish causal relationships as differences were observed.
#1.28
##a) No, the study is oberservational and not an experiment. Although the research shows an association, it is not obvious that smoking is the factor that causes dementia in later life.
##b) No, the study is observationa. Although there is an association between sleep disorders and bullying, we cannot say sleep disorders caused the bullying.
## The study concludes that there is a relationship between sleep disorders and bullying in schoolchildren.
#1.36
##a) Experiment
##b) Treatment group is 18-30, 31-40 and 41-55 year olds who exercise twice a week
## Control group is 18-30, 31-40 and 41-55 year olds who did not exercise for the period of study.
##c) Yes, the blocking varaible is age
##d) Yes, the person conducting the mental health exam does so for everyone without knowing whether they exercised or not.
##e) Yes the results can be used to establish a causal relationship as the experiment controlled for the variable of interest.
##The results may therefore be generalized to the population at large.
##f) I would have reservations because some people in the no exercise group may lie about not exercising and vice versa.
##Ehtically, funding a study that asks people to not exercise for a long time is not acceptable.
#1.48
table<-c(57,66,69,71,72,73,74,77,78,78,79,79,81,81,82,83,83,88,89,94)
boxplot(table, horizontal=TRUE, col="Red")

quantile(table)
## 0% 25% 50% 75% 100%
## 57.00 72.75 78.50 82.25 94.00
summary(table)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 57.00 72.75 78.50 77.70 82.25 94.00
#1.50
##a) The given histogram is symmetric so we may expect the boxplot first and third quartile to be symmetric around the median.
##The matching boxplot is 2.The values are centered from 50 to 70.
##b) The range of values in the histogram ranges from 0 to 100.
##The matching boxplot is 3.
##c) The histogram is negatively skewed. The values range from 0 to 6.
##The matching boxplot is 1.
#1.56
##a) The housing prices are right skewed. A meaningful number of houses cost over $6m.
##Median and IQR best represent skewed data as they are not affected by outliers.
##b) The housing prices are symmetric as there are equal no for each $300,000 with few outliers.
##Mean and standard deviation are the best representatives.
##c) The distribution of no of drinks is right skewed as a few drink excessively.
##Median and IQR are the best representatives.
##d) The distribution is right skewed as a few employees earn much higher salaries.
##Median and IQR are the best representatives.
#1.70
library(openintro)
## Warning: package 'openintro' was built under R version 3.5.2
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
data(heartTr)
mosaicplot(heartTr$transplant~heartTr$survived, col="Blue")

##a) From the data, it looks like more people in the control group are dead compared to the treatment group
##This suggests the variables are associated.
##b) The median surviving time of the treatment group is significantly higher suggesting the treatment is somewhat effective.
##c)
table(heartTr$survived, heartTr$transplant)
##
## control treatment
## alive 4 24
## dead 30 45
30/34
## [1] 0.8823529
##88.23% dies in control group
45/(45+24)
## [1] 0.6521739
##65.21% died in the treatment group
##d)i) The claims being tested are that the heart transplant treatment has an impact on the survival rate
##ii)We write alive on 28 cardsrepresenting patients who were alive at the end of the study, and dead on
##75 cards representing patients who were not. Then we shuffle and split in two groups, one group of size 69
##representing treatment and another group of size 34 representing control. We calculate the difference between the proportions.
##We repeat this 100 times to build a distribution centered at mean. Lastly, we calculate the fraction of simulations where the
##simulated differences in proportions are at least the difference observed in the study outcome(24/69-4/34=23%. If this fraction is low, we conclude it is unlikely to have observed an outcome by chance
## and that the null hypothesis should be rejected.
##iii) There are only 2 simulations with a difference of at least 23%.It is unlikely that the outcome is due to chance.