Data606HW1

##1.8
##a) Each row represents an individual case with data about gender, age, marital status, income, 
##whether the person smokes, and no of cigarettes smoked on weekends and weekdays
##b) 1691 participants
##c) Sex= Categorical, not ordinal
##Age= Numercical, Continuous
##Marital=Categorical, not ordinal
##Gross income= Categorical, Ordinal
##Some=Categorical, not ordinal
##amtWeekends=Numerical, discrete
##amtWeekdays=Numerical, discrete

#1.10
##a) Population is children between the ages 5 to 15
##Sample is the 160 children on whom the experiment was conducted
##B) Yes the results can be generalized to population. Yes the findings may be used to establish causal relationships as differences were observed.

#1.28
##a) No, the study is oberservational and not an experiment. Although the research shows an association, it is not obvious that smoking is the factor that causes dementia in later life.
##b) No, the study is observationa. Although there is an association between sleep disorders and bullying, we cannot say sleep disorders caused the bullying.
## The study concludes that there is a relationship between sleep disorders and bullying in schoolchildren.

#1.36
##a) Experiment
##b) Treatment group is 18-30, 31-40 and 41-55 year olds who exercise twice a week
## Control group is 18-30, 31-40 and 41-55 year olds who did not exercise for the period of study.
##c) Yes, the blocking varaible is age
##d) Yes, the person conducting the mental health exam does so for everyone without knowing whether they exercised or not.
##e) Yes the results can be used to establish a causal relationship as the experiment controlled for the variable of interest.
##The results may therefore be generalized to the population at large.
##f) I would have reservations because some people in the no exercise group may lie about not exercising and vice versa. 
##Ehtically, funding a study that asks people to not exercise for a long time is not acceptable.

#1.48
table<-c(57,66,69,71,72,73,74,77,78,78,79,79,81,81,82,83,83,88,89,94)
boxplot(table, horizontal=TRUE, col="Red")

quantile(table)

##    0%   25%   50%   75%  100% 
## 57.00 72.75 78.50 82.25 94.00

summary(table)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   57.00   72.75   78.50   77.70   82.25   94.00

#1.50
##a) The given histogram is symmetric so we may expect the boxplot first and third quartile to be symmetric around the median.
##The matching boxplot is 2.The values are centered from 50 to 70.
##b) The range of values in the histogram ranges from 0 to 100. 
##The matching boxplot is 3.
##c) The histogram is negatively skewed. The values range from 0 to 6.
##The matching boxplot is 1.

#1.56
##a) The housing prices are right skewed. A meaningful number of houses cost over $6m. 
##Median and IQR best represent skewed data as they are not affected by outliers.
##b) The housing prices are symmetric as there are equal no for each $300,000 with few outliers.
##Mean and standard deviation are the best representatives.
##c) The distribution of no of drinks is right skewed as a few drink excessively.
##Median and IQR are the best representatives.
##d) The distribution is right skewed as a few employees earn much higher salaries.
##Median and IQR are the best representatives.

#1.70
library(openintro)

## Warning: package 'openintro' was built under R version 3.5.2

## Please visit openintro.org for free statistics materials

## 
## Attaching package: 'openintro'

## The following objects are masked from 'package:datasets':
## 
##     cars, trees

data(heartTr)
mosaicplot(heartTr$transplant~heartTr$survived, col="Blue")

##a) From the data, it looks like more people in the control group are dead compared to the treatment group
##This suggests the variables are associated.
##b) The median surviving time of the treatment group is significantly higher suggesting the treatment is somewhat effective.
##c)
table(heartTr$survived, heartTr$transplant)

##        
##         control treatment
##   alive       4        24
##   dead       30        45

30/34

## [1] 0.8823529

##88.23% dies in control group
45/(45+24)

## [1] 0.6521739

##65.21% died in the treatment group
##d)i) The claims being tested are that the heart transplant treatment has an impact on the survival rate
##ii)We write alive on 28 cardsrepresenting patients who were alive at the end of the study, and dead on
##75 cards representing patients who were not. Then we shuffle and split in two groups, one group of size 69 
##representing treatment and another group of size 34 representing control. We calculate the difference between the proportions.
##We repeat this 100 times to build a distribution centered at mean. Lastly, we calculate the fraction of simulations where the 
##simulated differences in proportions are at least the difference observed in the study outcome(24/69-4/34=23%. If this fraction is low, we conclude it is unlikely to have observed an outcome by chance
## and that the null hypothesis should be rejected.
##iii) There are only 2 simulations with a difference of at least 23%.It is unlikely that the outcome is due to chance.

Data606HW1

Farhana Zahir

February 3, 2019