Data

1. What is the demographic profile of the respondents in terms of:


Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Sex

Strand

The tables above provides the distributions of respondents in terms of sex, year level, and strand. It can be seen that there are 58 females and 42 males; 24 of which are from ABM, 23 from GAS, 37 from HUMSS, and 16 from STEM.

2. Is there a significant difference on the variables background knowledge and vocabulary when grouped according to:

2.1 Sex


Call:
lm(formula = `Background Knowledge` ~ Vocabulary, data = Data)

Coefficients:
(Intercept)   Vocabulary  
     1.4028       0.5336  

From this, we may deduce that the data fail to satisfy two assumptions – Linearity and Homogeneity of Variance.

Loading required package: carData

Attaching package: 'car'
The following object is masked from 'package:dplyr':

    recode
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ readr     2.1.5
✔ ggplot2   3.4.4     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ car::recode()   masks dplyr::recode()
✖ purrr::some()   masks car::some()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: 'rstatix'


The following object is masked from 'package:stats':

    filter

2.1.1 Sex and Background Knowledge

Normality Test


    Shapiro-Wilk normality test

data:  Data$`Background Knowledge`
W = 0.97339, p-value = 0.04043

Since p-value = 0.04043 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.2384 0.6265
      98               

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.


Attaching package: 'gplots'
The following object is masked from 'package:stats':

    lowess
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 2 × 11
  Sex    variable             n   min   max median   iqr  mean    sd    se    ci
  <fct>  <fct>            <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FEMALE Background Know…    58   2       4   3    0.938  3    0.56  0.074 0.147
2 MALE   Background Know…    42   1.5     4   2.88 0.75   2.83 0.604 0.093 0.188

The mean of female and male is 3.000 and 2.827, respectively.

Mann Whitney U Test

# A tibble: 1 × 6
  .y.                      n statistic    df     p method        
* <chr>                <int>     <dbl> <int> <dbl> <chr>         
1 Background Knowledge   100      1.40     1 0.237 Kruskal-Wallis

Based on the p-value, there is no significant difference on the Background Knowledge when grouped according to sex.

2.1.2 Sex and Vocabulary

Normality Test


    Shapiro-Wilk normality test

data:  Data$Vocabulary
W = 0.96332, p-value = 0.006982

Since p-value = 0.006982 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.5562 0.4576
      98               

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 2 × 11
  Sex    variable       n   min   max median   iqr  mean    sd    se    ci
  <fct>  <fct>      <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FEMALE Vocabulary    58  1.75     4   2.75  0.75  2.86 0.554 0.073 0.146
2 MALE   Vocabulary    42  1.75     4   2.75  0.75  2.85 0.595 0.092 0.185

The mean of female and male is 2.862 and 2.851, respectively.

Mann Whitney U Test

# A tibble: 1 × 6
  .y.            n statistic    df     p method        
* <chr>      <int>     <dbl> <int> <dbl> <chr>         
1 Vocabulary   100  0.000609     1  0.98 Kruskal-Wallis

Based on the p-value, there is no significant difference on the Vocabulary when grouped according to sex.

2.2 Strand

2.2.1 Strand and Background Knowledge

Normality Test


    Shapiro-Wilk normality test

data:  Data$`Background Knowledge`
W = 0.97339, p-value = 0.04043

Since p-value = 0.04043 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  3   1.233  0.302
      96               

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 4 × 11
  Strand variable             n   min   max median   iqr  mean    sd    se    ci
  <fct>  <fct>            <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 STEM   Background Know…    16  1.5      4   3.12 1      2.95 0.759 0.19  0.405
2 ABM    Background Know…    24  2        4   3    0.625  2.98 0.566 0.116 0.239
3 HUMSS  Background Know…    37  1.75     4   2.75 1      2.78 0.555 0.091 0.185
4 GAS    Background Know…    23  2.25     4   3    0.625  3.10 0.469 0.098 0.203

The mean of STEM, ABM, HUMSS, and GAS is 2.953, 2.979, 2.777, and 3.098, respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.                      n statistic    df     p method        
* <chr>                <int>     <dbl> <int> <dbl> <chr>         
1 Background Knowledge   100      4.99     3 0.173 Kruskal-Wallis

Based on the p-value, there is no significant difference was observed between the group pairs.

Pairwise Comparisons

# A tibble: 6 × 9
  .y.              group1 group2    n1    n2 statistic      p p.adj p.adj.signif
* <chr>            <chr>  <chr>  <int> <int>     <dbl>  <dbl> <dbl> <chr>       
1 Background Know… STEM   ABM       16    24    -0.285 0.776  1     ns          
2 Background Know… STEM   HUMSS     16    37    -1.47  0.143  0.855 ns          
3 Background Know… STEM   GAS       16    23     0.316 0.752  1     ns          
4 Background Know… ABM    HUMSS     24    37    -1.32  0.186  1     ns          
5 Background Know… ABM    GAS       24    23     0.668 0.504  1     ns          
6 Background Know… HUMSS  GAS       37    23     2.04  0.0413 0.248 ns          

There is significant difference between HUMMS and GAS.

2.2.2 Strand and Vocabulary

Normality Test


    Shapiro-Wilk normality test

data:  Data$Vocabulary
W = 0.96332, p-value = 0.006982

Since p-value = 0.006982 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  3  0.5704 0.6359
      96               

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 4 × 11
  Strand variable       n   min   max median   iqr  mean    sd    se    ci
  <fct>  <fct>      <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 STEM   Vocabulary    16  1.75  3.5    3.25 0.812  2.97 0.531 0.133 0.283
2 ABM    Vocabulary    24  2     3.75   2.75 0.562  2.78 0.496 0.101 0.21 
3 HUMSS  Vocabulary    37  1.75  4      2.75 0.75   2.82 0.609 0.1   0.203
4 GAS    Vocabulary    23  2     4      2.75 0.75   2.91 0.615 0.128 0.266

The mean of STEM, ABM, HUMSS, and GAS is 2.969, 2.781, 2.824, and 2.913, respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.            n statistic    df     p method        
* <chr>      <int>     <dbl> <int> <dbl> <chr>         
1 Vocabulary   100      1.49     3 0.685 Kruskal-Wallis

Based on the p-value, there is no significant difference was observed between the group pairs.

Pairwise Comparisons

# A tibble: 6 × 9
  .y.        group1 group2    n1    n2 statistic     p p.adj p.adj.signif
* <chr>      <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <chr>       
1 Vocabulary STEM   ABM       16    24    -1.18  0.240     1 ns          
2 Vocabulary STEM   HUMSS     16    37    -0.921 0.357     1 ns          
3 Vocabulary STEM   GAS       16    23    -0.589 0.556     1 ns          
4 Vocabulary ABM    HUMSS     24    37     0.395 0.693     1 ns          
5 Vocabulary ABM    GAS       24    23     0.643 0.520     1 ns          
6 Vocabulary HUMSS  GAS       37    23     0.316 0.752     1 ns          

3. Is there a significant relationship between background knowledge and vocabulary?

Normality Test


    Shapiro-Wilk normality test

data:  Data1$Scores
W = 0.97258, p-value = 0.0005945

Since p-value = 0.0005945 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   1  0.0966 0.7563
      198               

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 2 × 11
  Variables      variable     n   min   max median   iqr  mean    sd    se    ci
  <fct>          <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Background Kn… Scores     100  1.5      4   3     0.75  2.93 0.582 0.058 0.115
2 Vocabulary     Scores     100  1.75     4   2.75  0.75  2.86 0.569 0.057 0.113

The mean of physical health and mental health is 2.928 and 2.857, respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.        n statistic    df     p method        
* <chr>  <int>     <dbl> <int> <dbl> <chr>         
1 Scores   200     0.820     1 0.365 Kruskal-Wallis

Based on the p-value, there is no significant difference between physical health and mental health.

4. Which variable have the most significant impact towards comprehension in English language?

Based on the provided output above, we can say that it is the Background Knowledge.