Each row is an observation, in this case an individual resident.
There are a total of 1691 participants.
The population of interest appears to be children aged 5 to 15. The sample is simply the 160 children who took part in the study.
The results could be generalized to the population assuming the participants used in the sample were chosen randomly and generally respresent the population at large. As far as establishing causality, the findings could indeed be used to establish such a relationship.
I don’t think that we could conclude anything from the brief excerpt. There are many pieces of information that we’d need to make that conclusion such as: about how the study was conducted, who the participants were, and how the “…adjusting for other factors…” was done.
No, that statement is not justified by the brief excerpt. It could be that some other variable that was not controlled could have an impact on both sleep and behavior. At best this shows a correlation between bullying behavior and trouble sleeping, no causality should be inferred.
This is an experiment.
The treatment group are those who were instructed to exercise twice a week. The control group are those who were instructed not to exercise.
No, this study doesn’t seem to make use or blocking.
No, not according to the text.
No, I don’t believe that the study, as it was stated, can be used to establish a causal relationship between exercise and mental health. This is mostly because there doesn’t seem to be any controls in place for confounding variables. For example, if some of the participants have a job requiring a good deal of manual labor that may influence the results depending on which group they were placed in. This is also why I don’t feel that results can be generalized to the population at large.
In its current form, yes I would have reservations. I would require that their approach be refined and control for more variables.
Below are the final exam scores of twenty introductory statistics students.
scores <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
Create a box plot of the distribution of these scores. The five number summary provided below may be useful.
boxplot(scores,main="Intro to Statistics - Final Exams", ylab="Score")
I would expect that this distribution would be right (positively) skewed. The IQR here would be 650,000 making outliers 1,425,000 or above. Since there are “…a meaningful number of houses that cost more than $6,000,000” the median would be more representative of a typical observation, and IQR would be more appropriate to describe variability.
This seems to be the opposite of (a) above. The IQR is 600,000 which makes the definition of outliers $1.5M. Since few houses are even above 1.2M, the data appears to have no real positive skew. Thus, the mean would be a good measure of a “typical” house and standard deviation would be a good measure of variability.
Since, in this example, few college students drink alcohol, most observations would be zero. Thus, there may be a positive skew. However, if the definition “most” is taken to mean > 50% of the students, then the median value would be 0 which is a bit misleading (it makes it sound like college students do not drink at all). Since only a “few” drink excessively, the positive skew shouldn’t be too bad. I would probably use the mean as a “typical” value and standard deviation as a measure of variability (the large number of zeros will make this small, which reflects the stated “most of these student’s don’t drink”).
Similarly to (a), this distribution would have a positive skew. Because the executives are likely earning salaries an order of magnitude larger (or more!), the skew may be more pronounced. Thus, I would lean towards using median as a measure of the typical salary. I’d also tend to use IQR as a measure of variability.
The Stanford University Heart Transplant Study was conducted to determine whether an experimental heart transplant program increased lifespan. Each patient entering the program was designated an official heart transplant candidate, meaning that he was gravely ill and would most likely benefit from a new heart. Some patients got a transplant and some did not. The variable transplant indicates which group the patients were in; patients in the treatment group got a transplant and those in the control group did not. Another variable called survived was used to indicate whether or not the patient was alive at the end of the study.
It does not appear that survival is independent of whether the patient had a transplant or not. The proportion of those surviving amongst those who had a transplant was more than double those who did not have a transplant; so it appears that transplants had a positive impact on survival.
There is a significant increase in length of survival for those who had the transplant. The median value of the transplant group’s survival time is several times higher than the control group. The middle 50% of the data (box) is all considerably higher than the middle 50% of the control group.
alive | dead | |
---|---|---|
control | 0.118 | 0.882 |
treatment | 0.348 | 0.652 |
It appears that about 88% of the control group died by the end of the study, whereas only about 65% of the treatment group died.
What are the claims being tested? The claim being tested is that the experimental transplant treatment extends the lifespan of patients.
The paragraph below describes the set up for such approach, if we were to do it without using statistical software. Fill in the blanks with a number or phrase, whichever is appropriate. We write alive on 28 cards representing patients who were alive at the end of the study, and dead on 28 cards representing patients who were not. Then, we shuffle these cards and split them into two groups: one group of size 69 representing treatment, and another group of size 34 representing control. We calculate the difference between the proportion of dead cards in the treatment and control groups (treatment - control) and record this value. We repeat this 100 times to build a distribution centered at 0. Lastly, we calculate the fraction of simulations where the simulated differences in proportions are equal to or greater than what we observed in our experiement. If this fraction is low, we conclude that it is unlikely to have observed such an outcome by chance and that the null hypothesis should be rejected in favor of the alternative.