1. We will continue working with the dataset “employee_data.sav” that was collected to assess a number of educational and demographic characteristics and their relationship to salary. For this homework, we will look at variables “salary” (current salary, the outcome), “gender” (male or female), and “educ_cat” (educational categories). Carry out analyses to evaluate the effects of educational category (high school, some or full BA, and above BA) and gender on salary. State your research questions and hypotheses. Notice that since there are 2 IVs, you can ask more than 1 research question; be explicit about all research questions and hypotheses. What statistical test is appropriate for answering these research questions? Make sure to evaluate test assumptions, summarize relevant descriptive statistics and test results. Summarize and interpret your findings as you would have done for a brief scientific paper. There is no need to make the response long; however, make sure to include all necessary and relevant information.
  2. Write one to two examples of research questions that can be answered with each of the following statistical tests:
  • Independent-sample t-test
  • Paired-sample t-test
  • Single-sample t-test
  • One-way factorial ANOVA
  • Two-way factorial ANOVA Make sure it is clear what you independent and dependent variables are, how they are measured, and what hypothesis you are testing.
  1. Researchers designed a weight-loss program, where daily reminders of healthy eating and physical activity were delivered to college students via text messaging. To evaluate efficacy of the program, researchers randomly assigned young men and women to either experimental or control groups. The figure below summarizes results of the study.
  • Write out hypotheses for this study (there should be 3!)
  • Based on the graph above, comment on whether each hypothesis was supported or not & interpret the results. You won’t be able to say whether findings are statistically significant, but you would be able to make general conclusions (at least, predicting whether findings could be statistically significant).
  1. The same researchers decided to enhance the intervention component and added a third experimental group that, in addition to text messages, received weekly educational group sessions. Below are the graphical results of the efficacy study.
  • Write out hypotheses for this study (there should be 3!)
  • Based on the graph above, comment on whether each hypothesis was supported or not & interpret the results.

Q1

We will continue working with the dataset “employee_data.sav” that was collected to assess a number of educational and demographic characteristics and their relationship to salary. For this homework, we will look at variables “salary” (current salary, the outcome), “gender” (male or female), and “educ_cat” (educational categories). Carry out analyses to evaluate the effects of educational category (high school, some or full BA, and above BA) and gender on salary. State your research questions and hypotheses. Notice that since there are 2 IVs, you can ask more than 1 research question; be explicit about all research questions and hypotheses. What statistical test is appropriate for answering these research questions? Make sure to evaluate test assumptions, summarize relevant descriptive statistics and test results. Summarize and interpret your findings as you would have done for a brief scientific paper. There is no need to make the response long; however, make sure to include all necessary and relevant information.

Hypothesis

\(H1_{0}:\) There is no difference in salary between people of different genders.
\(H1_{a}:\) There is a difference in salary beween people of different genders.
\(H2_{0}:\) There is no difference in salary between people with different levels of education.
\(H2_{a}:\) There is a difference in salary between people with different levels of education.
\(H3_0:\) There is no difference in salary between people of different gender who have different levels of education.
\(H3_a:\) There is a difference in salary between people of different gender who have different levels of education.
\[H_1: \hat{y}_{x1} = \hat{y}_{y1} \\ H_2: \hat{y}_{a2} = \hat{y}_{b2} = \hat{y}_{c2} \\ H_3: \hat{y}_{1} = \hat{y}_2 \]

From graphical examination it appears that educational level will have the most significant effect, followed by gender, and there will likely also be a significant interaction between the two.

Assumptions

Homogeneity of Variance

Levene's Test for Log of Salary across Educational Level

Df F value Pr(> F)
group 2 12.029 0.00001
471

Levene's Test for Log of Salary across Gender

Df F value Pr(> F)
group 1 31.781 0.00000
472

Groups in neither variable exhibit homogeneity of variance, the variance will not be pooled.

Normality

We know from previous tests that the response variable violates the Shapiro-Wilk’s test of normality, however the deviation from normality is not so pronounced that ANOVA is not appropriate.

Two-Way ANOVA

Sum Sq Df F value Pr(> F)
educat 25.231 2 203.267 0
gender 3.793 1 61.117 0
educat:gender 0.426 2 3.431 0.033
Residuals 29.046 468

The two-way ANOVA presents evidence that gender has a significant effect on salary F (1) =61.12, p<.001, as does educational level F (2) =203.27, p<.001, and the interaction between the two is significant as well, albeit less so F (2) =3.43, p<.05.

From the graphical examination, it would appear that the interaction between gender and educational level is more pronounced in those with a graduate education, than with those with other levels of education. We can test this with contrasts.

Contrasts of gender for the Graduate school level of education:

## lm model parameter contrast
## 
##   Contrast      S.E.     Lower    Upper    t  df Pr(>|t|)
##  0.8233166 0.2516563 0.3288005 1.317833 3.27 468   0.0011

Contrasts of gender for the Bachelor's degree level of education:

## lm model parameter contrast
## 
##   Contrast       S.E.      Lower     Upper    t  df Pr(>|t|)
##  0.1683638 0.03986684 0.09002358 0.2467039 4.22 468        0

The significance is actually greater for the Bachelor’s degree level of education, despite the intuitions derived from the graph. Notable is the sizeable difference in contrast that is inversely proportional to the p-values. The magnitude of difference in significance could be due to the number of individuals in each respective category and due to the much larger standard deviation in salaries for those with graduate school education.

##       
##                  f           m
##   <=HS 0.333333333 0.179324895
##   <=BD 0.120253165 0.261603376
##   >=GS 0.002109705 0.103375527
##       
##          f   m
##   <=HS 158  85
##   <=BD  57 124
##   >=GS   1  49
##      <=HS      <=BD      >=GS 
## 0.2068560 0.3222403 0.2934343

Q2

  1. Write one to two examples of research questions that can be answered with each of the following statistical tests:
  • Independent-sample t-test
  • Paired-sample t-test
  • Single-sample t-test
  • One-way factorial ANOVA
  • Two-way factorial ANOVA Make sure it is clear what you independent and dependent variables are, how they are measured, and what hypothesis you are testing.

Independent Sample t-test

Is there a significant difference in salary (DV) between those who are considered a minority (IV) and those who aren’t?

\(H_0: \hat{y}_0=\hat{y}_1\)
\(H_a: \hat{y}_0\neq\hat{y}_1\)

Paired-sample t-test

Is there a significant difference between beginning salary (DV) and current salary (IV) for this sample?

\(H_0: \hat{y}_{begin}=\hat{y}_{current}\)
\(H_a: \hat{y}_{begin}\neq\hat{y}_{current}\)

Single-sample t-test

Is the mean of salary for individuals with a bachelor’s degree education (DV) in this sample different from the US national average (IV)?

\(H_0: \hat{y}_{sample}=\hat{y}_{natl}\)
\(H_a: \hat{y}_{sample}\neq\hat{y}_{natl}\)

One-way factorial ANOVA

Does the level of education (IV) of an individual have an effect on salary (DV)?

\(H_0: \hat{y}_{<=HS}=\hat{y}_{<=BD}=\hat{y}_{>=GS}\)
\(H_a: \hat{y}_{<=HS}\neq\hat{y}_{<=BD}\neq\hat{y}_{>=GS}\)

Two-way factorial ANOVA

Does the level of education (IV) and the minority status (IV) of an individual each have have an effect on salary (DV), and do they interact?

\[\text{Educat across Minority status}\\ H_1: \hat{y}_{educat}^1 = \hat{y}_{educat}^0 \\ \text{Minority status across Educat}\\H_{2}: \hat{y}_{0}^{educat} = \hat{y}_1^{educat} \\ \text{Interaction: Each level of Educat to each level of Minority status}\\ H_3: \hat{y}_{<=HS}^0 = \hat{y}_{<=BD}^0 = \hat{y}_{>=GS}^0=\hat{y}_{<=HS}^1 = \hat{y}_{<=BD}^1 = \hat{y}_{>=GS}^1 \\ \text{Educat across Minority status}\\ H_{1a}: \hat{y}_{educat}^1 \neq \hat{y}_{educat}^0 \\ \text{Minority status across Educat}H_{2a}: \hat{y}_{0}^{educat} \neq \hat{y}_1^{educat} \\ \text{Interaction Each level of Educat to each level of Minority status}\\ H_{3a}: \hat{y}_{<=HS}^0 \neq \hat{y}_{<=BD}^0 \neq \hat{y}_{>=GS}^0 \neq \hat{y}_{<=HS}^1 \neq \hat{y}_{<=BD}^1 \neq \hat{y}_{>=GS}^1 \]

Q3

  1. Researchers designed a weight-loss program, where daily reminders of healthy eating and physical activity were delivered to college students via text messaging. To evaluate efficacy of the program, researchers randomly assigned young men and women to either experimental or control groups. The figure below summarizes results of the study.
  • Write out hypotheses for this study (there should be 3!)
  • Based on the graph above, comment on whether each hypothesis was supported or not & interpret the results. You won’t be able to say whether findings are statistically significant, but you would be able to make general conclusions (at least, predicting whether findings could be statistically significant).
  • \[\text{Treatment Across Gender}\\ H_1: \hat{y}_{exp}^m = \hat{y}_{con}^f \\ \text{Gender across Treatment}\\H_{2}: \hat{y}_{m}^{exp} = \hat{y}_{f}^{con} \\ \text{Interaction: Each level of Treatment to each level of Gender status}\\ H_3: \hat{y}_{exp}^m = \hat{y}_{con}^m =\hat{y}_{exp}^f = \hat{y}_{con}^f \\ \text{Treatment Across Gender}\\ H_1: \hat{y}_{exp}^m \neq \hat{y}_{con}^f \\ \text{Gender across Treatment}\\H_{2}: \hat{y}_{m}^{exp} \neq \hat{y}_{f}^{con} \\ \text{Interaction: Each level of Treatment to each level of Gender status}\\ H_3: \hat{y}_{exp}^m \neq \hat{y}_{con}^m \neq \hat{y}_{exp}^f \neq \hat{y}_{con}^f \]
  • \(\text{Treatment Across Gender} H_1:\) Based on the difference in slopes, the experimental group appears to not have a significant difference across gender where the control group does appear to have a significant difference, taken together, treatment does seem to have a significant difference across gender, supporting the alternative hypothesis.

    \(\text{Gender across Treatment} H_{2}:\) The gap between the two lines is sizeable, this test would have a high likelihood likelihood of supporting the alternative hypothesis with statistical significance.

    \(\text{Interaction: Each level of Treatment to each level of Gender status}\\ H_3:\) Imagining lines between the two existing and noticing the difference in size between them suggests that there would likely not be enough evidence or maybe marginal significance to support the alternative hypothesis. There would not be a significant effect of the interaction between treatment and gender.

Q4

  1. The same researchers decided to enhance the intervention component and added a third experimental group that, in addition to text messages, received weekly educational group sessions. Below are the graphical results of the efficacy study.
  • Write out hypotheses for this study (there should be 3!)
  • Based on the graph above, comment on whether each hypothesis was supported or not & interpret the results.

Hypotheses

\[\text{There is no difference between men and women across treatment groups} \\ H_1: \hat{y}_{treatment}^{men} = \hat{y}_{treatment}^{women}\\ \text{There is no difference between treatment groups across gender} \\ H_2: \hat{y}_{men}^{treatment} = \hat{y}_{women}^{treatment} \\ \text{There is no difference produced by the interaction between treatment and gender} \\ H_3: \hat{y}_{con}^{men} = \hat{y}_{txt}^{men} = \hat{y}_{txt+grp}^{men} = \hat{y}_{con}^{women} = \hat{y}_{txt}^{women} = \hat{y}_{txt+grp}^{women} \\ \text{There is no difference between men and women across treatment groups} \\ H_{1a}: \hat{y}_{treatment}^{men} \neq \hat{y}_{treatment}^{women}\\ \text{There is no difference between treatment groups across gender} \\ H_{2a}: \hat{y}_{men}^{treatment} \neq \hat{y}_{women}^{treatment} \\ \text{There is no difference produced by the interaction between treatment and gender} \\ H_{3a}: \hat{y}_{con}^{men} \neq \hat{y}_{txt}^{men} \neq \hat{y}_{txt+grp}^{men} \neq \hat{y}_{con}^{women} \neq \hat{y}_{txt}^{women} \neq \hat{y}_{txt+grp}^{women}\]

Results

\(H_1:\) The change in slope of the two lines suggests that there is indeed a difference between men and women across treatment groups.
\(H_2:\) The change in y-value along the x-axis supports the alternative hypothesis that there is a difference in treatment across gender.
\(H_3:\) The sharp change in slope from txt to txt & group between gender lines supports the alternative hypothesis that there is an interaction between treatment & gender.