2. Is there a significant difference on the topic presentation and language proficiency when grouped according to:

2.1 Sex


Call:
lm(formula = `Topic Presentation` ~ `Language Proficiency`, data = Data)

Coefficients:
           (Intercept)  `Language Proficiency`  
                 1.487                   0.528

From this, we may deduce that the data fail to satisfy the two assumptions – Linearity and Homogeneity of Variance.

2.1.1 Sex and Topic Presentation

`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.


Attaching package: 'rstatix'

The following object is masked from 'package:stats':

    filter

The mean for male and female is 3.147 and 3.076, respectively.

The above graph shows the plotting of data by sex, which contains two sexes – male and female.

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
✔ readr     2.1.5     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ rstatix::filter() masks dplyr::filter(), stats::filter()
✖ dplyr::lag()      masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

It clearly shows that there is a difference on the topic presentation (variable) when grouped according to their sex. However, we still need to check the significance of this difference.

Loading required package: carData


Attaching package: 'car'

The following object is masked from 'package:purrr':

    some

The following object is masked from 'package:dplyr':

    recode

The histogram does not resemble a bell curve as seen above, means that the residuals do not have a normal distribution. Moreover, the points in the QQ-plots do not follow the straight line, with the majority of them falling outside the confidence bands. .

Normality Test


    Shapiro-Wilk normality test

data:  res_aov$residuals
W = 0.94742, p-value = 0.0005633

The Shapiro-Wilk p-value = 0.0005633 on the residuals is less than the usual significance level of 0.05. Thus, we reject the hypothesis that residuals have a normal distribution.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.3517 0.5545
      98

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Wilcoxon Rank Sum Test


    Wilcoxon rank sum test

data:  a and b
W = 1291, p-value = 0.4559
alternative hypothesis: true location shift is not equal to 0

Since the p-value is greater than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the variable topic presentation when grouped according to sex.

2.1.2 Sex and Language Proficiency

`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.

The mean for male and female is 3.206 and 2.979, respectively.

The above graph shows the plotting of data by sex, which contains two sexes – male and female.

Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

It clearly shows that there is a difference on the variable language proficiency when grouped according to their sex. However, we still need to check the significance of its difference.

The histogram does not resemble a bell curve as seen above, means that the residuals do not have a normal distribution. Moreover, the points in the QQ-plots roughly follow the straight line, with the majority of them falling outside the confidence bands.

Normality Test


    Shapiro-Wilk normality test

data:  res_aov$residuals
W = 0.95943, p-value = 0.003663

The Shapiro-Wilk p-value = 0.003663 on the residuals is less than the usual significance level of 0.05. Thus, we reject the hypothesis that residuals have a normal distribution.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1   0.491 0.4852
      98

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Wilcoxon Rank Sum Test


    Wilcoxon rank sum test

data:  c and d
W = 1397.5, p-value = 0.02573
alternative hypothesis: true location shift is not equal to 0

Since the p-value is less than 0.05, we reject the null hypothesis, that is, there is significant difference on the variable language proficiency when grouped according to sex.

2.2 Strand

2.2.1 Strand and Topic Presentation

Normality Test


    Shapiro-Wilk normality test

data:  Data$`Topic Presentation`
W = 0.92912, p-value = 4.436e-05

Since p-value = 4.436e-05 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  3  1.1061 0.3506
      96

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.


Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 4 × 11
  Strand variable             n   min   max median   iqr  mean    sd    se    ci
  <fct>  <fct>            <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 GAS    Topic Presentat…    25   2     4        3   0.4  2.98 0.477 0.095 0.197
2 ABM    Topic Presentat…    25   2     4        3   0.6  3.12 0.507 0.101 0.209
3 HUMSS  Topic Presentat…    25   2.6   3.6      3   0.4  3.10 0.278 0.056 0.115
4 STEM   Topic Presentat…    25   2.2   4        3   0.6  3.2  0.458 0.092 0.189

The mean of GAS, ABM, HUMSS, and STEM is 2.976, 3.120, 3.104, and 3.200, respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.                    n statistic    df     p method        
* <chr>              <int>     <dbl> <int> <dbl> <chr>         
1 Topic Presentation   100      4.87     3 0.182 Kruskal-Wallis

Based on the p-value, there is no significant difference was observed between the group pairs.

Pairwise Comparisons

# A tibble: 6 × 9
  .y.              group1 group2    n1    n2 statistic      p p.adj p.adj.signif
* <chr>            <chr>  <chr>  <int> <int>     <dbl>  <dbl> <dbl> <chr>       
1 Topic Presentat… GAS    ABM       25    25     1.43  0.153  0.919 ns          
2 Topic Presentat… GAS    HUMSS     25    25     1.58  0.114  0.683 ns          
3 Topic Presentat… GAS    STEM      25    25     2.11  0.0352 0.211 ns          
4 Topic Presentat… ABM    HUMSS     25    25     0.152 0.879  1     ns          
5 Topic Presentat… ABM    STEM      25    25     0.678 0.498  1     ns          
6 Topic Presentat… HUMSS  STEM      25    25     0.525 0.599  1     ns

There is a significant difference between GAS and STEM.

2.2.2 Strand and Language Proficiency

Normality Test


    Shapiro-Wilk normality test

data:  Data$`Language Proficiency`
W = 0.92795, p-value = 3.813e-05

Since p-value = 3.813e-05 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  3  1.9735 0.1231
      96

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 4 × 11
  Strand variable             n   min   max median   iqr  mean    sd    se    ci
  <fct>  <fct>            <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 GAS    Language Profic…    25   2     3.6      3   0.2  2.95 0.333 0.067 0.137
2 ABM    Language Profic…    25   2     4        3   0.8  3.07 0.55  0.11  0.227
3 HUMSS  Language Profic…    25   2.6   4        3   0.2  3.16 0.374 0.075 0.154
4 STEM   Language Profic…    25   2.2   4        3   0.2  3.04 0.44  0.088 0.181

The mean of GAS, ABM, HUMSS, and STEM is 2.952, 3.072, 3.160, and 3.040, respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.                      n statistic    df     p method        
* <chr>                <int>     <dbl> <int> <dbl> <chr>         
1 Language Proficiency   100      2.58     3 0.462 Kruskal-Wallis

Based on the p-value, there is no significant difference was observed between the group pairs.

Pairwise Comparisons

# A tibble: 6 × 9
  .y.               group1 group2    n1    n2 statistic     p p.adj p.adj.signif
* <chr>             <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <chr>       
1 Language Profici… GAS    ABM       25    25    0.687  0.492 1     ns          
2 Language Profici… GAS    HUMSS     25    25    1.60   0.110 0.659 ns          
3 Language Profici… GAS    STEM      25    25    0.725  0.469 1     ns          
4 Language Profici… ABM    HUMSS     25    25    0.912  0.362 1     ns          
5 Language Profici… ABM    STEM      25    25    0.0379 0.970 1     ns          
6 Language Profici… HUMSS  STEM      25    25   -0.874  0.382 1     ns

3. Is there a significant between the variables topic presentation and language proficiency?

Normality Test


    Shapiro-Wilk normality test

data:  Data1$Scores
W = 0.93032, p-value = 3.58e-08

Since p-value = 3.58e-08 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.

Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   1  0.0673 0.7956
      198

The p-value is less than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is not met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 2 × 11
  Variables      variable     n   min   max median   iqr  mean    sd    se    ci
  <fct>          <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Topic Present… Scores     100     2     4      3   0.4  3.1  0.44  0.044 0.087
2 Language Prof… Scores     100     2     4      3   0.4  3.06 0.432 0.043 0.086

The mean of topic presentation and language proficiency is 3.100 and 3.056, respectively.

Mann-Whitney U Test

# A tibble: 1 × 6
  .y.        n statistic    df     p method        
* <chr>  <int>     <dbl> <int> <dbl> <chr>         
1 Scores   200     0.442     1 0.506 Kruskal-Wallis

Based on the p-value, there is no significant difference between topic presentation and language proficiency.

4. Which have the most significant impact?

Based on the outputs above, we can say that it is the topic presentation.

MYKA POMOY GROUP STATISTICAL ANALYSIS

Kyle Kenneth Ruaya

2024-02-16

Data

1. What is the demographic profile of the respondents in terms of:

Sex

Strand

2. Is there a significant difference on the topic presentation and language proficiency when grouped according to:

2.1 Sex

2.1.1 Sex and Topic Presentation

Normality Test

Equality of Variance

Wilcoxon Rank Sum Test

2.1.2 Sex and Language Proficiency

Normality Test

Equality of Variance

Wilcoxon Rank Sum Test

2.2 Strand

2.2.1 Strand and Topic Presentation

Normality Test

Equality of Variance

Kruskal-wallis Test

Pairwise Comparisons

2.2.2 Strand and Language Proficiency

Normality Test

Equality of Variance

Kruskal-wallis Test

Pairwise Comparisons

3. Is there a significant between the variables topic presentation and language proficiency?

Normality Test

Equality of Variance

Mann-Whitney U Test

4. Which have the most significant impact?