Data

1. What is the demographic profile of the respondents in terms of:


Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Age

Sex

Strand

The tables above provides the distributions of respondents in terms of age, sex, and strand. It can be seen that there are 26 ages 15-16 years old, 72 ages 17-18, and 2 ages 19-20. Furthermore, it is observed there are 64 feamle respondents while 36 are males; 25 of which are from each strand.

2. Is there a significant difference on the variables academic performance and level of retention and attention when grouped according to:

2.1 Sex


Call:
lm(formula = `Academic Performance` ~ `Level of Retention and Attention`, 
    data = Data)

Coefficients:
                       (Intercept)  `Level of Retention and Attention`  
                             1.390                               0.556  

From this, we may deduce that the data fail to satisfy the two assumptions – Linearity and Homogeneity of Variance.

2.1.1 Sex and Personal Problems

`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.

Attaching package: 'rstatix'
The following object is masked from 'package:stats':

    filter

The mean for female and male is 2.969 and 3.033, respectively.

The above graph shows the plotting of data by sex, which contains two sexes – male and female.

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
✔ readr     2.1.5     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ rstatix::filter() masks dplyr::filter(), stats::filter()
✖ dplyr::lag()      masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

It clearly shows that there is no significant difference on the variable academic performance when grouped according to their sex. However, we still need to check the significance of this difference.

Loading required package: carData

Attaching package: 'car'
The following object is masked from 'package:purrr':

    some
The following object is masked from 'package:dplyr':

    recode

The histogram does not resemble a bell curve as seen above, means that the residuals do not have a normal distribution. Moreover, the points in the QQ-plots roughly follow the straight line, with the majority of them falling within the confidence bands. However, this does not guarantee that residuals follow a normal distribution since when based on the diagram on the left, it is the exact opposite of it. Thus, it is more convenient to observe the two.

Normality Test


    Shapiro-Wilk normality test

data:  res_aov$residuals
W = 0.92096, p-value = 1.583e-05

The Shapiro-Wilk p-value = 1.583e-05 on the residuals is less than the usual significance level of 0.05. Thus, we reject the hypothesis that residuals have a normal distribution.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
      Df F value  Pr(>F)  
group  1  3.3205 0.07147 .
      98                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Wilcoxon Rank Sum Test


    Wilcoxon rank sum test

data:  a and b
W = 1244, p-value = 0.3517
alternative hypothesis: true location shift is not equal to 0

Since the p-value is larger than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the variable academic performance when grouped according to sex.

2.1.2 Sex and Level of Retention and Attention

`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.

The mean for female and male is 2.888 and 2.872, respectively.

The above graph shows the plotting of data by sex, which contains two sexes – male and female.

Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

It clearly shows that there is difference on the variable academic performance when grouped according to their sex. However, we still need to check the significance of this difference.

The histogram does not resemble a bell curve as seen above, means that the residuals do not have a normal distribution. Moreover, the points in the QQ-plots roughly follow the straight line, with the majority of them falling outside the confidence bands.

Normality Test


    Shapiro-Wilk normality test

data:  res_aov$residuals
W = 0.92064, p-value = 1.523e-05

The Shapiro-Wilk p-value = 1.523e-05 on the residuals is less than the usual significance level of 0.05. Thus, we reject the hypothesis that residuals have a normal distribution.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1    2.42  0.123
      98               

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Wilcoxon Rank Sum Test


    Wilcoxon rank sum test

data:  c and d
W = 1102.5, p-value = 0.8146
alternative hypothesis: true location shift is not equal to 0

Since the p-value is larger than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the variable level of retention and attention when grouped according to sex.

2.2 Strand

2.2.1 Strand and Academic Performance

Normality Test


    Shapiro-Wilk normality test

data:  Data$`Academic Performance`
W = 0.91194, p-value = 5.39e-06

Since p-value = 5.39e-06 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  3  1.1746 0.3236
      96               

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.


Attaching package: 'gplots'
The following object is masked from 'package:stats':

    lowess
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 4 × 11
  Strand variable             n   min   max median   iqr  mean    sd    se    ci
  <fct>  <fct>            <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 STEM   Academic Perfor…    25   1.8   4        3   0.6  3.11 0.483 0.097 0.2  
2 ABM    Academic Perfor…    25   1.6   3.4      3   0    2.90 0.379 0.076 0.156
3 HUMSS  Academic Perfor…    25   1.8   4        3   0.4  2.98 0.437 0.087 0.18 
4 GAS    Academic Perfor…    25   1.8   3.8      3   0.4  2.98 0.5   0.1   0.206

The mean of STEM, ABM, HUMSS, and GAS is 3.112, 2.896, 2.976, and 2.984, respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.                      n statistic    df     p method        
* <chr>                <int>     <dbl> <int> <dbl> <chr>         
1 Academic Performance   100      2.61     3 0.456 Kruskal-Wallis

Based on the p-value, there is no significant difference was observed between the group pairs.

Pairwise Comparisons

# A tibble: 6 × 9
  .y.               group1 group2    n1    n2 statistic     p p.adj p.adj.signif
* <chr>             <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <chr>       
1 Academic Perform… STEM   ABM       25    25    -1.54  0.124 0.742 ns          
2 Academic Perform… STEM   HUMSS     25    25    -0.684 0.494 1     ns          
3 Academic Perform… STEM   GAS       25    25    -0.351 0.726 1     ns          
4 Academic Perform… ABM    HUMSS     25    25     0.856 0.392 1     ns          
5 Academic Perform… ABM    GAS       25    25     1.19  0.234 1     ns          
6 Academic Perform… HUMSS  GAS       25    25     0.333 0.739 1     ns          

2.2.2 Strand and Level of Retention and Attention

Normality Test


    Shapiro-Wilk normality test

data:  Data$`Level of Retention and Attention`
W = 0.91648, p-value = 9.2e-06

Since p-value = 9.2e-06 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  3  0.5041 0.6804
      96               

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 4 × 11
  Strand variable             n   min   max median   iqr  mean    sd    se    ci
  <fct>  <fct>            <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 STEM   Level of Retent…    25   2.2   3.6      3   0.6  2.99 0.398 0.08  0.164
2 ABM    Level of Retent…    25   2     3.4      3   0.4  2.83 0.34  0.068 0.14 
3 HUMSS  Level of Retent…    25   1     3.6      3   0.2  2.83 0.562 0.112 0.232
4 GAS    Level of Retent…    25   1.8   3.6      3   0.4  2.87 0.461 0.092 0.19 

The mean of STEM, ABM, HUMSS, and GAS is 2.992, 2.832, 2.832, and 2.872, respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.                                  n statistic    df     p method        
* <chr>                            <int>     <dbl> <int> <dbl> <chr>         
1 Level of Retention and Attention   100      2.08     3 0.555 Kruskal-Wallis

Based on the p-value, there is no significant difference was observed between the group pairs.

Pairwise Comparisons

# A tibble: 6 × 9
  .y.               group1 group2    n1    n2 statistic     p p.adj p.adj.signif
* <chr>             <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <chr>       
1 Level of Retenti… STEM   ABM       25    25    -1.40  0.162 0.971 ns          
2 Level of Retenti… STEM   HUMSS     25    25    -0.878 0.380 1     ns          
3 Level of Retenti… STEM   GAS       25    25    -0.995 0.320 1     ns          
4 Level of Retenti… ABM    HUMSS     25    25     0.521 0.602 1     ns          
5 Level of Retenti… ABM    GAS       25    25     0.404 0.686 1     ns          
6 Level of Retenti… HUMSS  GAS       25    25    -0.117 0.907 1     ns          

3. Is there a significant between the academic performance and level of retention and attention?

Normality Test


    Shapiro-Wilk normality test

data:  Data1$Scores
W = 0.9308, p-value = 3.911e-08

Since p-value = 3.911e-08 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   1  0.1413 0.7074
      198               

The p-value is less than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is not met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 2 × 11
  Variables      variable     n   min   max median   iqr  mean    sd    se    ci
  <fct>          <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Academic Perf… Scores     100   1.6   4        3   0.4  2.99 0.452 0.045 0.09 
2 Level of Rete… Scores     100   1     3.6      3   0.4  2.88 0.446 0.045 0.088

The mean of topic presentation and language proficiency is 2.992 and 2.882, respectively.

Mann-Whitney U Test

# A tibble: 1 × 6
  .y.        n statistic    df      p method        
* <chr>  <int>     <dbl> <int>  <dbl> <chr>         
1 Scores   200      4.41     1 0.0358 Kruskal-Wallis

Based on the p-value, there is significant difference between topic presentation and language proficiency.

4. Which of the two variables where the copernican plan schedule have the most significant impact?

Based on the outputs above, we can say that it is the academic performance.