Smoking habits of UK residents. A survey was conducted to study the smoking habits of UK residents. Below is a data matrix displaying a portion of the data collected in this survey. Note that “£” stands for British Pounds Sterling, “cig” stands for cigarettes, and “N/A” refers to a missing component of the data
Smoking Habits
What does each row of the data matrix represent? Each row represents one case, the answer by one individual to the study’s survey.
How many participants were included in the survey? 1691
Indicate whether each variable in the study is numerical or categorical. If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal. Sex: categorical Age: numerical discrete Marital: categorical Gross Income: categorical ordinal Smoke: categorical amtWeekends: numerical discrete amtWeekdays: numerical discrete
Cheaters, scope of inference. Exercise 1.5 introduces a study where researchers studying the relationship between honesty, age, and self-control conducted an experiment on 160 children between the ages of 5 and 15. The researchers asked each child to toss a fair coin in private and to record the outcome (white or black) on a paper sheet, and said they would only reward children who report white. Half the students were explicitly told not to cheat and the others were not given any explicit instructions. Di???erences were observed in the cheating rates in the instruction and no instruction groups, as well as some di???erences across children’s characteristics within each group.
Identify the population of interest and the sample in this study.
The population is all kids aged 5 and 15. The sample of 160 children in that age group.
Comment on whether or not the results of the study can be generalized to the population, and if the ???ndings of the study can be used to establish causal relationships. If the sample of 160 children was taken at random, then results can be generalized to the population. A random sample would eliminate bias towards a particular group within the population age group. It isn’t a ganrantee of no bias, but a good mitigation step. The results of this experiment do actually point to casual relationship. Results are not complete presented, but they do state that there is a difference in results between the to groups, those given instructions and those not. So there is a casual relationship between giving instructions and cheating. Although what the relationship is is not stated. The case also mentions differences within each group. Without knowing more on these differences, it is not possible to say if those are due to any casual relationship.
Haters are gonna hate, study con???rms. A study published in theJournal of Personality and Social Psychology asked a group of 200 randomly sampled men and women to evaluate how they felt about various subjects, such as camping, health care, architecture, taxidermy, crossword puzzles, and Japan in order to measure their dispositional attitude towards mostly independent stimuli. Then, they presented the participants with information about a new product: a microwave oven. This microwave oven does not exist, but the participants didn’t know this, and were given three positive and three negative fake reviews. People who reacted positively to the subjects on the dispositional attitude measurement also tended to react positively to the microwave oven, and those who reacted negatively also tended to react negatively to it. Researcher concluded that “some people tend to like things, whereas others tend to dislike things, and a more thorough understanding of this tendency will lead to a more thorough understanding of the psychology of attitudes.”
What are the cases?
Responces to a questionaire about how men and women feel about various subjects as the control. In the test experiment cases are responces by men and women on how they feel about a microwave for which positive and negative reviews were given.
What is (are) the response variable(s) in this study? There is one response varibale, how the individuals feel about the microwave for which positive and negative reviews were presented.
What is (are) the explanatory variable(s) in this study The explanatory variables are all of those items women and men were how they felt about.
Does the study employ random sampling? Yes, the case states that women and men in the sample were chosen at random
Is this an observational study or an experiment? Explain your reasoning. This is an experiment. Men and women were first ask a series of control questions. Then afterwords they were ask about a particular item. These questionaires were provided in a controlled setting. They did not represent samples taken from previous observations.
Can we establish a causal link between the explanatory and response variables? No, not really. Although an experiment, here what we can show is association of the variables. Feeling poorly for the different items doesn’t cause poor feels for the microwave. The experiment could have been ran interchanging the microwave with one of the other items and also providing good and bad reviews.
Can the results of the study be generalized to the population at large? Yes becouse the sample of women and men were taken at random they have a good chance of being representative of a larger population.
Reading the paper. Below are excerpts from two articles published in the NY Times
An article titled Risks: Smokers Found More Prone to Dementia Based on this study, can we conclude that smoking causes dementia later in life? Explain your reasoning. The article does not mention a control sample being part of the study. Without this the study appears to be really an observation, not an experiment. If this is the case, we can only conclude association, not a casual relationship. Results only show the number of individuals with dementia that are smokers, but how many dementia cases do we have with individuals that are not smokers? This must have been considered and used in the analysis to be able to conclude casuality.
Another article titled The School Bully Is Sleepy A friend of yours who read the article says, “The study shows that sleep disorders lead to bullying in school children.” Is this statement justi???ed? If not, how best can you describe the conclusion that can be drawn from this study? No, the statement is not justified. The case here stated is not an experiment, but rather observations. There is certainly an association between sleeping and behavior, but we can not conclude one causes the other.
Exercise and mental health. A researcher is interested in the e???ects of exercise on mental health and he proposes the following study: Use strati???ed random sampling to ensure representative proportions of 18-30, 31-40 and 41- 55 year olds from the population. Next, randomly assign half the subjects from each age group to exercise twice a week, and instruct the rest not to exercise. Conduct a mental health exam at the beginning and at the end of the study, and compare the results
What type of study is this?
Experiment
What are the treatment and control groups in this study?
Treatment: group asked to excersice twice a week Control: group instructed not to excersice
Does this study make use of blocking? If so, what is the blocking variable? Yes as individuals are first assugned to age groups and then picked randomly within each group. Age is the blocking variable.
Does this study make use of blinding?
No, becouse participants in both groups know what group they belong to. They know if they are excersicing or not.
Comment on whether or not the results of the study can be used to establish a causal relationship between exercise and mental health, and indicate whether or not the conclusions can be generalized to the population at large. Becouse this was an experiment, and we have a control, we can conclude there is a casual experiment. A short coming might be the lack of blinding. Maybe individuals knowing what group they were in might have affected results. But other than that, results suggest casuality. Blocking helps with generalizing the results. There might still be some bias which hasn’t been thought of, but steps have been taken to be able to generalize results.
Suppose you are given the task of determining if this proposed study should get funding. Would you have any reservations about the study proposal?
Stats scores. Below are the ???nal exam scores of twenty introductory statistics students.
57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94
Create a box plot of the distribution of these scores
scores=c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
boxplot(scores)
summary(scores)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 57.00 72.75 78.50 77.70 82.25 94.00
Mix-and-match. Describe the distribution in the histograms below and match them to the box plots.
Mix-and-match
Distributions and appropriate statistics, Part II.For each of the following, state whether you expect the distribution to be symmetric, right skewed, or left skewed. Also specify whether the mean or median would best represent a typical observation in the data, and whether the variability of observations would be best represented using the standard deviation or IQR. Explain your reasoning.
Housing prices in a country where 25% of the houses cost below $350,000, 50% of the houses cost below $450,000, 75% of the houses cost below $1,000,000 and there are a meaningful number of houses that cost more than $6,000,000.
Distribution: right skewed Mean or Median: median Standard Deviation or IQR: IQR
Housing prices in a country where 25% of the houses cost below $300,000, 50% of the houses cost below $600,000, 75% of the houses cost below $900,000 and very few houses that cost more than $1,200,000.
Distribution: symetric Mean or Median: median Standard Deviation or IQR: standard dev
Distribution: symetric Mean or Median: mean Standard Deviation or IQR: standard deviation
Distribution: right skewed Mean or Median: median Standard Deviation or IQR: IQR
The Stanford University Heart Transplant Study was conducted to determine whether an experimental heart transplant program increased lifespan. Each patient entering the program was designated an ocial heart transplant candidate, meaning that he was gravely ill and would most likely bene???t from a new heart. Some patients got a transplant and some did not. The variable transplant indicates which group the patients were in; patients in the treatment group got a transplant and those in the control group did not. Another variable called survived was used to indicate whether or not the patient was alive at the end of the study
library(openintro)
## Warning: package 'openintro' was built under R version 3.3.3
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
data(heartTr)
Heart Transplants
Based on the mosaic plot, is survival independent of whether or not the patient got a transplant? Explain your reasoning.
No it is not independent. There is visible higher number of survivors that received the traetment than in the control sample
What do the box plots below suggest about the ecacy (effectiveness) of the heart transplant treatment.
Suggest the chances of survival increase with the treatment, although there is a large spread in the number of survival days. But even there, with a large IQR, the second quartile is higher than the 1.5xIQR upper limit in the control. So survivlas days with treatment are mostly higher, only a few outliers in the control show as many days as in the treatment IQR, and only one shows in the uper quartile.
What proportion of patients in the treatment group and what proportion of patients in the control group died? Measuring the mosaic plot - since data is not available Control:
dim(heartTr)
## [1] 103 8
dim(subset(heartTr, transplant=='control' & survived=='dead'))/dim(subset(heartTr, transplant=='control'))
## [1] 0.8823529 1.0000000
88.23% dead
dim(subset(heartTr, transplant=='control' & survived=='alive'))/dim(subset(heartTr, transplant=='control'))
## [1] 0.1176471 1.0000000
11.76% alive
Treatment:
dim(subset(heartTr, transplant=='treatment' & survived=='dead'))/dim(subset(heartTr, transplant=='treatment'))
## [1] 0.6521739 1.0000000
65.21% dead
dim(subset(heartTr, transplant=='treatment' & survived=='alive'))/dim(subset(heartTr, transplant=='treatment'))
## [1] 0.3478261 1.0000000
34.78% LIVW
One approach for investigating whether or not the treatment is effective is to use a randomization technique.
What are the claims being tested?
That the treatment has a causual relationship with number of people alive, survival days or time. By using random samples, we eliminate or reduce any bias effect.
The paragraph below describes the set up for such approach, if we were to do it without using statistical software. Fill in the blanks with a number or phrase, whichever is appropriate.
We write alive on _____100__________ cards representing patients who were alive at the end of the study, and dead on ____100_________ cards representing patients who were not. Then, we shuffle these cards and split them into two groups: one group of size ____100__________ representing treatment, and another group of size _____100_________ representing control. We calculate the difference between the proportion of dead cards in the treatment and control groups (treatment control) and record this value. We repeat this 100 times to build a distribution centered at _____0_______ . Lastly, we calculate the fraction of simulations where the simulated differences in proportions are ____smaller than 0.6521-0.8823=-0.2302_______ . If this fraction is low, we conclude that it is unlikely to have observed such an outcome by chance and that the null hypothesis should be rejected in favor of the alternative.
What do the simulation results shown below suggest about the effectiveness of the transplant program?
If shows it is effective. It shows that it is hard by chance alone to get the result obtained. The results of the study show that a difference between treatment and control deaths of -0.23
dim(subset(heartTr, transplant=='treatment' & survived=='dead'))/dim(subset(heartTr, transplant=='treatment'))-dim(subset(heartTr, transplant=='control' & survived=='dead'))/dim(subset(heartTr, transplant=='control'))
## [1] -0.230179 0.000000
The simulation shows a minimum difference of only -25, and only two observations below -0.23. This means that to see a difference or proportion of -0.23, something other than chance has to be at play. We conclude that the null hypothesis, the by chance only status quo, is not valid with the treatment since the propability of getting the result shown is too small. So we reject the null, and assert the alternate, that the treatment in fact does affect survival rates and thus the transplant program is effective.
Simulated Differences