Problem set 1

Scenario 1

Explain what type of test will you use and why.
- I will use Dependent Sample T-test. The reason is that it’s a pre-post design. Data of reading skills for each participants were collected before and after the intervention. Therefore, it is a repeated measures that each participant is observed repeatedly.
In words, state the null hypothesis being tested.
- H0: The mean reading score on the pretest is not statistically significantly different from the mean reading score on the post-test



Table: Before and After Intervention

|             | Overall (N=500) |
|:------------|:---------------:|
|time1        |                 |
|-  Mean (SD) | 40.697 (3.887)  |
|-  Range     | 32.250 - 51.000 |
|time2        |                 |
|-  Mean (SD) | 46.889 (9.288)  |
|-  Range     | 13.500 - 76.000 |

The mean for reading score at the pretest is 40.697 and the mean for reading score at the post-test is 46.889. The standard deviation for reading score at the pretest is 5.017 and the standard deviation for reading score at the post-test is 8.475.
Based on the descriptive statistics, I don’t think that all students’ scores changed in the same way between the pretest and post-test because the standard deviaiton before and after intervention become larger


    Paired t-test

data:  repeat0$time2 and repeat0$time1
t = 15.392, df = 499, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 5.401601 6.982399
sample estimates:
mean difference 
          6.192

Since the p-value is less than alpha(𝛼=.05), we reject the null hypothesis and conclude that, for 𝛼=.05 and a two-tailed test, the mean reading score on the pretest is statistically significantly different from the mean reading score on the post-test.

The observed t value is 15.592.

The 95% confidence interval is [5.401601, 6.982399]. Over all possible randomly sampled differences between means the probability is .95 that zero does not fall in the interval. Therefore, it supports the decision to reject the null.

Scenario 2

What is the name of the type of test you will use?
- Chi-Square Test of association
In words, state the null hypothesis being tested.
- H0: There is no relationship between gender and employment category.

   
      1   2   3
  0 206   0  10
  1 157  27  74

   
           1       2        3
  0 165.4177 12.3038 38.27848
  1 197.5823 14.6962 45.72152

        
         Clerical Custodial Manager
  Female      206         0      10
  Male        157        27      74



Table: Gender distribution by employment categories

|          | Clerical (N=363) | Custodial (N=27) | Manager (N=84) | Total (N=474) | p value|
|:---------|:----------------:|:----------------:|:--------------:|:-------------:|-------:|
|gender    |                  |                  |                |               | < 0.001|
|-  Female |   206 (56.7%)    |     0 (0.0%)     |   10 (11.9%)   |  216 (45.6%)  |        |
|-  Male   |   157 (43.3%)    |   27 (100.0%)    |   74 (88.1%)   |  258 (54.4%)  |        |

For people with clerical job, there are 56.7% of female and 43.4% of male. For people with custoial job, there are 0% of female and 100% of male. For people with Manager job, there are 11.9% of female and 88.1% of male. Based on the data, the pattern shows that women are heavily concentrated in clerical positions, while men dominate both custodial and managerial roles.


    Pearson's Chi-squared test

data:  Table2
X-squared = 79.277, df = 2, p-value < 2.2e-16

Because the p-value (< 2.2e-16) is less than the significance level of 0.05, we reject the null hypothesis and conclude that there exists a statistically significant association between gender and employment category (χ² = 79.277, p < 0.05). It confirms with what we observed in 3. above. It further demonstrates the extreme gender segregation patterns we observed in the data, where complete male dominance in custodial positions and severe underrepresentation of females in management roles.