DATA606

1.8 Smoking habits of UK residents.

Each row represents a single person and displays information about their gender, age, marital status, income, whether or not they smoke and how much on weekends and weekdays.
There are 1691 participants in this survey
sex: Categorical, Nominal age: Numerical, Continuous marital: Categorical, Nominal grossIncome: Categorical, Ordinal smoke: Categorical, Nominal amtWeekends: Numerical, Discrete amtWeekdays: Numerical, Discrete

1.10 Cheaters, scope of inference.

Population: Children between the ages of 5 and 15 Sample: 160 Children between the ages of 5 and 15
The sample is quite large at 160 children, so we can feel comfortable generalizing to the population of children between the ages of 5 and 15. We should however, never extrapolate. While we do know that the sample is quite large to be indicative of the population, it would be advantageous to know more how the children were selected. For example, were they chosen randomly? Additionally, since the study is experimental, the findings can be used to establish causal relationships.

1.28 Reading the paper.

To say that smoking causes dementia would be confusing correlation and causation. Perhaps smokers are more likely to get dementia, but we cannot say that it causes dementia. There are many other variables that may cause smokers to get dementia. Perhaps smokers tend to also have poor eating or drinking habits, which may increase the risk of a person to develop dementia. From this study, all we can say is that those who smoke more, are more likely to get dementia later in life.
This statement cannot be justified for the same reason as above, we are confusing correlation with causation. Perhaps those who have sleeping disorders may bully more, this statement looses the caution in the original study. Researchers stated that those “who had behavioral issues and those who were identified as bullies were twice as likely to have shown symptoms of sleep disorders”. Again, there is a correlation here with symptoms associated with sleeping disorders, and not necessarily with those who have sleep disorders. However, there is no causation here, and these symptoms may be indicative of other variables such as troubles at home or other medical issues.

1.36 Exercise and mental health.

Experiment
Treatment: Excercise twice a week Control: Do not excercise
Yes, age.
No, because the subjects had to be informed of the treatment (exercise)
Since this is an experiment and we are using a stratified random sample, we can make a causal statement that can be be generalized to the population.
My concern is the level of excerise being imposed on the people involved. What does “no exercise” entail: being a vegetable? Being able to do your normal tasks? Each person has a different activity level, and I don’t think a good control is to have people not exercise at all. This should instead be monitored, perhaps with activity logs or a pedometer.

1.48 Stats scores.

StatsScores <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
boxplot(StatsScores)

1.50 Mix-and-match.

matches (2) – Normal Distribution
matches (3) – Uniform Distribution
matches (1) – Skewed Right

1.56 Distributions and appropriate statistics, Part II .

Very Strongly Skewed Right, median would best desribe the data

#Using made-up data to better visualize the question
Prices <- c(100,200,300,300,300,375,375,375,400,425,500,550,600,700,800,900,1000,2000,5000,7000,8000)
hist(Prices)

Skewed Right, median and mode may both do adequate job explaining the data

#Using made-up data to better visualize the question
Prices <- c(100,200,300,300,300,375,375,375,400,425,500,550,600,700,800,900,1000,1000,1100,1200,1250)
hist(Prices)

Skewed Left with mean to the left of the median, median likely representing the data set better.
Skewed Right with mean to the right of the median, with the median representing the data set better.

1.70 Heart Transplants.

Survival is dependent on whether or not the patient got a transplant because more individuals were alive after the treatment then those without it.
The box plots demonstrate that survival time is significantly higher when there is a transplant taken place. The average increases by about 200 days and the person who survied the longest in the control would be normal in the treatment category.
The boxplot does not clearly determine the proportion here. There is some visual misunderstanding since the width of the columns are not even. Therefore, it is difficult to tell which area is bigger.

That a transplant program where patients who are gravely ill obtain new hearts would significantly increase the patients’ life span.
We write alive on (ten) cards representing patients who were alive at the end of the study, and dead on (ten) cards representing patients who were not. Then, we shuffe these cards and split them into two groups: one group of size (ten) representing treatment, and another group of size (ten) representing control. We calculate the difference between the proportion of dead cards in the treatment and control groups (treatment - control) and record this value. We repeat this 100 times to build a distribution centered at (zero). Lastly, we calculate the fraction of simulations where the simulated differences in proportions are (less than zero). If this fraction is low, we conclude that it is unlikely to have observed such an outcome by chance and that the null hypothesis should be rejected in favor of the alternative.
It appears to be skewed right, so the transplant program is a success.

DATA606_HW1

Michele Bradley

9/3/2017