Question 1
- Each row of the matrix represents a single person.
- The total number of participants is 1693, which was found using the line in the code chunk below.
- Sex: Categorical Age: Numerical, discrete Marital Status: Categorical Highest Qualification: Categorical, Ordinal1 Nationality: Categorical Ethnicity: Categorical Gross Income: Categorical, Ordinal2 Region: Categorical Smoke?: Categorical Amount Weekends: Numerical, Discrete Amount Weekdays: Numerical, Discrete Type: Categorical
*1: Because this could be ordered in ascending or descending order, I would argue that highest qualification can be ordinal, assuming there is a hierarchy to every one of the qualifications.
*2: Income could be continuous if we were given specific numbers. Because there appear to be finite ranges for the incomes, it seems more discrete than continuous. Further, I thought that this could possibly even be categorical because it separates individuals into specific income brackets. Because the income is listed as intervals of numerical data,it seems numerical, but the specific brackets make the numerical data categorical.
Question 2
- The population of interest is children between the ages of 5 and 15 years old. The sample is the 160 children between ages 5 and 15 that were selected.
- 160 may be too small of a sample and it is difficult to know if the sample was unbiased and if it were a truly accurate representation of the entire population of children ages 5-15. This study would certainly enable researchers to comment on the sample and to generate hypotheses about the population as a whole, but in order to generalize the study to the population, additional trials with similar groups would need to be run in order to collect more accurate data. The findings of the experiment could certainly be used to establish causal relationships between the desire for the reward versus the explicit instruction against cheating. The control group in this setting would be the group that was not told to not cheat, and this group could be compared to the group who was told not to cheat. Any difference in the two groups’ rate of success could be attributed to a causal relationship between being told not to cheat and the outcome of the coin flips.
Question 3
The sample size is large and it appears somewhat unbiased. We can certainly conclude that smoking caused dementia to be more likely for this sample, and we can conclude that the study showed a direct relationship between smoking and dementia. With that being said, this is a voluntary survey taken by members of a specific health plan. There are many questions to answer prior to drawing a conclusion that smoking causes dementia. Smoking may cause other problems that lend themselves to worsening symptoms of dementia, but it cannot be claimed that smoking directly causes dementia. Without further studies, it is not appropriate to claim that smoking causes dementia.
This statement is partially justified. If the friend said, “The study shows that in this sample, sleep disorders led to bullying in school children,” I would say that the friend is somewhat justified in their claim. However, even this claim may be inaccurate, because we would be assuming that the sleep issues led to the bullying, and not the other way around. The two variables could actually not have any causal relationship whatsoever. All we truly know is that in the study, it was shown that the two variables have a correlation. Another possibly appropriate conclusion would be to say that the study showed that sleep disorders may be linked to bullying in school children. It is not appropriate to claim that the study proved the friend’s statement without knowing the sample size and the other factors about the sample. The study could have shown an even higher correlation between annual income of parents and bullying in school. Without knowing the details, nearly any claim about the study could be inaccurate. The only conclusions that can be drawn here are conclusions about relationships within the sample itself, but no extrapolation should be done without understanding further details about the study.
Question 4
- This study is an experiment. They are performing trials with a control group and an experimental group.
- The control group is the group that does not exercise. The treatment group is the group that exercises twice per week.
- Blocking was used in this study because the treatment and control groups were separated into three different age groups. The blocking variable in this case is age. This may enable researchers to conclude different things for different groups. One group may be unaffected by the exercise while another group is heavily affected.
- Blinding in the simple definition was not used. The group that was exercising knew that they were exercising, and the group that was resting knew that they were resting. If the participants were uncertain whether they were exercising or not, then blinding would be in use.
- This study may not be valid for establishing causal relationships. A true control group would be one that was instructed to live life as they normally would, with no change in exercise habits. The treatment group would then be tasked with a specific behavioral adjustment to test for changes in mental health. Because of the very narrow scope of the experiment, it could not be generalized to the population at large. There is not a true control group and there is not enough variation within the treatment group. If the researcher were interested in finding out the impacts of exercising two days per week, then they may have a valid treatment group. If they are interested in making generalizations about the population, they need to adjust the study to include additional days of exercise and they need to construct a true control group.
- I would have the same reservations I presented in part E. The researcher needs a true control group, they need multiple experimental groups that are told to exercise different amounts to more appropriately assess the impacts of exercise on mental health.