1.8
- Each row of the data matrix is an invidual surveyed (and observation)
- 1691 total particpants c)Sex: nominal categorical Age: discrete numerical Gross Income: ordinal categorical Smoke: nominal categorical AmtWeekends: discrete categorical AmtWeekdays: discrete categorical
1.10
- The population of interest would be children aged 5 to 15.
- The relationship between age and honest/self control likely would generalize to the population of interest, assuming the children were chosen at random. If the children were chosen at random and assigned to each group randomly, a causal relationship could be established If, for instance, only children with behavioral probems were chosen, a causal relationship could not be established because behavioral problems could be a confounding variable.
1.28
- We can not determine a causal relationship between peolpe do not choose to smoke completely at random. Heavy smoking could be linked to other activities like heavy drinking. The drinking could be a confounding variable. b)Your friend’s statement is not justified. A more accurate statement would be bullying is correlated with sleep disorders. A causal relationship can not be determined. If children randomly selected, deprived of sleep, and found to bully, then one could say lack of sleep causes bullying.
1.36
- Experiment
- Treatment Group:Those told to excercise Control Group: Those told not to excercise c)The blocking variable is age
- It does not use blinding, as peole would be aware what group they’re in
- This experiment does have the basic feature of one that can determine a causal relationship, treatment/control groups. The results could generalize to the population, assuming the researcher chooses people from the proposed age groups at random.
- I would not fund this experiment because a baseline level of excercise for each individual is not established. Those who excercise frequently would actually not be “receiving treatment” if they were in the treatment group. The lack of blinding would also be a major hang-up for me.
1.48
library(ggplot2)
scores <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
thedf <- data.frame(scores)
ggplot(data = thedf, aes(x = 1, y = scores)) + geom_boxplot()

1.50
- unimodal, symmetric boxplot 2
- uniform, boxplot 3
- right skewed, unimodal boxplot 1
1.56
Right skewed, median over mean, IQR over stdev The signifcant number of houses with a value of 6 times the 75th percentile would cause the mean and standard deviation to not represent the population well.
symetric, either median or mean, Stdev This distribution looks quite symetric, so either point measure would work for to represent a typical obersation. Stdev would give a little extra information for observations outside the IQR, so it would be prefered.
Right skewed, mean over median, stdev over IQR here, many of the observations will be zero, so the median and 75th percentiles could be zero. Using median and IQR could give us no sense of the data for the heavy drinkers.
Right skewed, median over mean, IQR over stdev It’s better to use the robust measures to not have the salaries, possibly orders of magnitude greater than the others don’t dominate the measures. I might log transform before analyizing this data.
1.70
library(repmis)
heart <-source_data("https://raw.githubusercontent.com/jbryer/DATA606Spring2017/master/Data/Data%20from%20openintro.org/Ch%201%20Exercise%20Data/heartTr.csv")
## Downloading data from: https://raw.githubusercontent.com/jbryer/DATA606Spring2017/master/Data/Data%20from%20openintro.org/Ch%201%20Exercise%20Data/heartTr.csv
## SHA-1 hash of the downloaded data file is:
## de9cf481e5b32237f9bd9de886f31b7395b2a80b
table(heart$survived, heart$transplant)
##
## control treatment
## alive 4 24
## dead 30 45
- The length of the alive bar on the “alive/dead” axis appears to be several times greater for treatment than control. It appears that survival is not independat of treatment.
- The boxpolots suggest a sigfigantly different distribution of survival time for the treatment group vs the control group.The IQR is not even discernable for control, but goes from about 100 days to 700 days for treatment.
- Approximately 1/7 in the control group died, and about 1/3 in the control group died.
- i)Heart transplants increase lifespan for people with grave heart conditions
- 28, 75, 69, 34, 0, greater
- It was effective, as the difference is probably around .2. Only 3 simulations showed a greater difference.