CHAPTER 1 HOMEWORK - 1.8, 1.10, 1.28, 1.36, 1.48, 1.50, 1.56, 1.70
What does each row of the data matrix represent? A SINGLE PARTICIPANT
How many participants were included in the survey? 1691 PARTICIPANTS
Indicate whether each variable in the study is numerical or categorical. If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal.
sex = CATEGORICAL_NOMINAL
age = NUMERICAL_CONTINUOUS
marital = CATEGORICAL_NOMINAL
grossIncome = CATEGORICAL_ORDINAL
smoke = CATEGORIAL_NOMINAL
amtWeekends = CATEGORIAL_ORDINAL
amtWeekdays = CATEGORIAL_ORDINAL
Identify the population of interest and the sample in this study
POPULATION OF INTEREST = ALL CHILDREN AGE 5-15
SAMPLE = 160 CHILDREN AGE 5-15
Comment on whether or not the results of the study can be generalized to the population, and if the findings of the study can be used to establish causal relationship INFORMATION ABOUT HOW THE SAMPLE WAS SELECTED IS REQUIRED IN ORDER TO DETERMINE IF THE RESULTS CAN BE GENERALIZED TO THE POPULATION. CHILDREN 5-15 FROM ALL OVER THE WORLD WOULD NEED TO BE EQUALLY REPRESENTED IN THE SAMPLE. THE FINDINGS OF THE STUDY CAN BE USED TO ESTABLISH CAUSAL RELATIONSHIP BECAUSE THE STUDY WAS EXPERIMENTAL.
An article titled Risks: Smokers Found More Prone to Dementia. Based on this study, can we conclude that smoking causes dementia later in life? Explain your reasoning WE CANNOT CONCLUDE THAT SMOKING CAUSES DEMENTIA BECAUSE ONLY 25% OF THE SAMPLE GROUP LATER DEVELOPED DEMENTIA.
Another article titled The School Bully Is Sleepy. A friend of yours who read the article says, “The study shows that sleep disorders lead to bullying in school children.” Is this statement justified? If not, how best can you describe the conclusion that can be drawn from this study? MY FRIEND IS NOT QUITE RIGHT. A BETTER CONCLUSION WOULD BE SLEEP DISORDERS CAN LEAD TO BEHAVIORAL ISSUES .
What type of study is this? THIS IS AN EXPERIMENTAL STUDY
What are the treatment and control groups in this study? CONTROL GROUP = THE GROUPS THAT DID NOT EXERCISE
TREATMENT GROUP = THE GROUPS THAT DID EXERCISE
Does this study make use of blocking? If so, what is the blocking variable? THE BLOCKING VARIABLE IS THE DIFFERENT AGE GROUPS, THE RESEARCHER KNOWS THE AGE OF EACH PARTICIPANT WILL ALSO AFFECT THE RESULTS
Does this study make use of blinding? THIS STUDY DOES NOT USE BLINDING - PARTICIPANTS ARE AWARE OF THE TREATMENT AND ABSENCE OF THE TREATMENT
Comment on whether or not the results of the study can be used to establish a causal relationship between exercise and mental health, and indicate whether or not the conclusions can be generalized to the population at large. THE RESULTS OF THE STUDY CAN BE USED TO ESTABLISH CAUSAL RELATIONSHIP BETWEEN EXERCISE AND MENTAL HEALTH. THE RESULTS MAY ALSO BE GENERALIZED TO THE POPULATION OF REPRESENTED AGE GROUPS BECAUSE PARTICIPANTS WERE SELECTED USING STRATIFIED RANDOM SAMPLING
Suppose you are given the task of determining if this proposed study should get funding. Would you have any reservations about the study proposal? I WOULD HAVE TWO PRIMARY CONCERNS. (1) THE TREATMENT NEEDS TO BE ADMINISTERED BY THE STUDY AND STANDARDIZED TO ENSURE ALL PARTICIPANTS ARE TREATED EVENLY. (2) MANY FACTORS THAT MAY AFFECT MENTAL HEALTH AS MUCH AS EXERCISE ARE DISREGARDED SUCH AS STRESS LEVELS, DIET, SOCIAL ACTIVITY, ETC
57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94 Create a box plot of the distribution of these scores. The five number summary provided below may be useful.
SCORES <- c(57,66,69,71,72,73,74,77,78,78,79,79,81,81,82,83,83,88,89,94)
boxplot(SCORES)
Describe the distribution in the histograms below and match them to the box plots.
Housing prices in a country where 25% of the houses cost below $350,000, 50% of the houses cost below $450,000, 75% of the houses cost below $1,000,000 and there are a meaningful number of houses that cost more than $6,000,000. RIGHT SKEWED, MEDIAN WOULD BEST REPESENT THE NORMAL OBSERVATION, VARIABILITY WOULD BE BEST REPRESENTED BY THE IQR
Housing prices in a country where 25% of the houses cost below $300,000, 50% of the houses cost below $600,000, 75% of the houses cost below $900,000 and very few houses that cost more than $1,200,000. SYMMETRIC = QUARTILE RANGES ARE SIMILAR, MEDIAN WOULD BEST REPESENT THE NORMAL OBSERVATION,VARIABILITY WOULD BE BEST REPRESENTED BY THE IQR
Number of alcoholic drinks consumed by college students in a given week. Assume that most of these students don’t drink since they are under 21 years old, and only a few drink excessively. _LEFT SKEWED, MEDIAN WOULD BEST REPESENT THE NORMAL OBSERVATION,VARIABILITY WOULD BE BEST REPRESENTED BY THE IQR
Annual salaries of the employees at a Fortune 500 company where only a few high level executives earn much higher salaries than the all other employees. SYMMETRIC = QUARTILE RANGES ARE SIMILAR, MEDIAN WOULD BEST REPESENT THE NORMAL OBSERVATION,VARIABILITY WOULD BE BEST REPRESENTED BY THE IQR
Based on the mosaic plot, is survival independent of whether or not the patient got a transplant? Explain your reasoning. IT IS NOT INDEPENDENT BECAUSE OF THE SIGNIFICANT DIFFERENCE IN RESULTS BETWEEN THOSE WHO RECEIVED THE TREATMENT AND THOSE WHO RECEIVED THE PLACEBO
What do the box plots below suggest about the efficacy (e???ectiveness) of the heart transplant treatment. THE BOX PLOTS SUGGEST THAT THE TREATMENT WAS EFFECTIVE
What proportion of patients in the treatment group and what proportion of patients in the control group died? CONTROL GROUP 30/34 TREAMENT GROUP 45/69
What are the claims being tested? Is the treatment effective?
The paragraph below describes the set up for such approach, if we were to do it without using statistical software. Fill in the blanks with a number or phrase, whichever is appropriate.
We write alive on 28 cards representing patients who were alive at the end of the study, and dead on 75 cards representing patients who were not. Then, we shuffle these cards and split them into two groups: one group of size 69 representing treatment, and another group of size 34 representing control. We calculate the difference between the proportion of dead cards in the treatment and control groups (treatment - control) and record this value. We repeat this 100 times to build a distribution centered at 0.23 . Lastly, we calculate the fraction of simulations where the simulated differences in proportions are 23% . If this fraction is low, we conclude that it is unlikely to have observed such an outcome by chance and that the null hypothesis should be rejected in favor of the alternative.
We have evidence against the NULL hypothesis, the treatment is effective!