Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
The tables above provides the distributions of respondents in terms of age, sex, and strand. It can be seen that there are 26 ages 15-16 years old, 72 ages 17-18, and 2 ages 19-20. Furthermore, it is observed there are 64 feamle respondents while 36 are males; 25 of which are from each strand.
Call:
lm(formula = `Academic Performance` ~ `Level of Retention and Attention`,
data = Data)
Coefficients:
(Intercept) `Level of Retention and Attention`
1.390 0.556
From this, we may deduce that the data fail to satisfy the two assumptions – Linearity and Homogeneity of Variance.
`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.
Attaching package: 'rstatix'
The following object is masked from 'package:stats':
filter
The mean for female and male is 2.969 and 3.033, respectively.
The above graph shows the plotting of data by sex, which contains two sexes – male and female.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ lubridate 1.9.3 ✔ tibble 3.2.1
✔ purrr 1.0.2 ✔ tidyr 1.3.1
✔ readr 2.1.5
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ rstatix::filter() masks dplyr::filter(), stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
It clearly shows that there is no significant difference on the variable academic performance when grouped according to their sex. However, we still need to check the significance of this difference.
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:purrr':
some
The following object is masked from 'package:dplyr':
recode
The histogram does not resemble a bell curve as seen above, means that the residuals do not have a normal distribution. Moreover, the points in the QQ-plots roughly follow the straight line, with the majority of them falling within the confidence bands. However, this does not guarantee that residuals follow a normal distribution since when based on the diagram on the left, it is the exact opposite of it. Thus, it is more convenient to observe the two.
Shapiro-Wilk normality test
data: res_aov$residuals
W = 0.92096, p-value = 1.583e-05
The Shapiro-Wilk p-value = 1.583e-05 on the residuals is less than the usual significance level of 0.05. Thus, we reject the hypothesis that residuals have a normal distribution.
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 3.3205 0.07147 .
98
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.
Wilcoxon rank sum test
data: a and b
W = 1244, p-value = 0.3517
alternative hypothesis: true location shift is not equal to 0
Since the p-value is larger than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the variable academic performance when grouped according to sex.
`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.
The mean for female and male is 2.888 and 2.872, respectively.
The above graph shows the plotting of data by sex, which contains two sexes – male and female.
Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
It clearly shows that there is difference on the variable academic performance when grouped according to their sex. However, we still need to check the significance of this difference.
The histogram does not resemble a bell curve as seen above, means that the residuals do not have a normal distribution. Moreover, the points in the QQ-plots roughly follow the straight line, with the majority of them falling outside the confidence bands.
Shapiro-Wilk normality test
data: res_aov$residuals
W = 0.92064, p-value = 1.523e-05
The Shapiro-Wilk p-value = 1.523e-05 on the residuals is less than the usual significance level of 0.05. Thus, we reject the hypothesis that residuals have a normal distribution.
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 2.42 0.123
98
The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.
Wilcoxon rank sum test
data: c and d
W = 1102.5, p-value = 0.8146
alternative hypothesis: true location shift is not equal to 0
Since the p-value is larger than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the variable level of retention and attention when grouped according to sex.
Shapiro-Wilk normality test
data: Data$`Academic Performance`
W = 0.91194, p-value = 5.39e-06
Since p-value = 5.39e-06 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 1.1746 0.3236
96
The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.
Attaching package: 'gplots'
The following object is masked from 'package:stats':
lowess
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
# A tibble: 4 × 11
Strand variable n min max median iqr mean sd se ci
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 STEM Academic Perfor… 25 1.8 4 3 0.6 3.11 0.483 0.097 0.2
2 ABM Academic Perfor… 25 1.6 3.4 3 0 2.90 0.379 0.076 0.156
3 HUMSS Academic Perfor… 25 1.8 4 3 0.4 2.98 0.437 0.087 0.18
4 GAS Academic Perfor… 25 1.8 3.8 3 0.4 2.98 0.5 0.1 0.206
The mean of STEM, ABM, HUMSS, and GAS is 3.112, 2.896, 2.976, and 2.984, respectively.
# A tibble: 1 × 6
.y. n statistic df p method
* <chr> <int> <dbl> <int> <dbl> <chr>
1 Academic Performance 100 2.61 3 0.456 Kruskal-Wallis
Based on the p-value, there is no significant difference was observed between the group pairs.
# A tibble: 6 × 9
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
* <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
1 Academic Perform… STEM ABM 25 25 -1.54 0.124 0.742 ns
2 Academic Perform… STEM HUMSS 25 25 -0.684 0.494 1 ns
3 Academic Perform… STEM GAS 25 25 -0.351 0.726 1 ns
4 Academic Perform… ABM HUMSS 25 25 0.856 0.392 1 ns
5 Academic Perform… ABM GAS 25 25 1.19 0.234 1 ns
6 Academic Perform… HUMSS GAS 25 25 0.333 0.739 1 ns
Shapiro-Wilk normality test
data: Data$`Level of Retention and Attention`
W = 0.91648, p-value = 9.2e-06
Since p-value = 9.2e-06 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 0.5041 0.6804
96
The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
# A tibble: 4 × 11
Strand variable n min max median iqr mean sd se ci
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 STEM Level of Retent… 25 2.2 3.6 3 0.6 2.99 0.398 0.08 0.164
2 ABM Level of Retent… 25 2 3.4 3 0.4 2.83 0.34 0.068 0.14
3 HUMSS Level of Retent… 25 1 3.6 3 0.2 2.83 0.562 0.112 0.232
4 GAS Level of Retent… 25 1.8 3.6 3 0.4 2.87 0.461 0.092 0.19
The mean of STEM, ABM, HUMSS, and GAS is 2.992, 2.832, 2.832, and 2.872, respectively.
# A tibble: 1 × 6
.y. n statistic df p method
* <chr> <int> <dbl> <int> <dbl> <chr>
1 Level of Retention and Attention 100 2.08 3 0.555 Kruskal-Wallis
Based on the p-value, there is no significant difference was observed between the group pairs.
# A tibble: 6 × 9
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
* <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
1 Level of Retenti… STEM ABM 25 25 -1.40 0.162 0.971 ns
2 Level of Retenti… STEM HUMSS 25 25 -0.878 0.380 1 ns
3 Level of Retenti… STEM GAS 25 25 -0.995 0.320 1 ns
4 Level of Retenti… ABM HUMSS 25 25 0.521 0.602 1 ns
5 Level of Retenti… ABM GAS 25 25 0.404 0.686 1 ns
6 Level of Retenti… HUMSS GAS 25 25 -0.117 0.907 1 ns
Shapiro-Wilk normality test
data: Data1$Scores
W = 0.9308, p-value = 3.911e-08
Since p-value = 3.911e-08 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.1413 0.7074
198
The p-value is less than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is not met.
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
# A tibble: 2 × 11
Variables variable n min max median iqr mean sd se ci
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Academic Perf… Scores 100 1.6 4 3 0.4 2.99 0.452 0.045 0.09
2 Level of Rete… Scores 100 1 3.6 3 0.4 2.88 0.446 0.045 0.088
The mean of topic presentation and language proficiency is 2.992 and 2.882, respectively.
# A tibble: 1 × 6
.y. n statistic df p method
* <chr> <int> <dbl> <int> <dbl> <chr>
1 Scores 200 4.41 1 0.0358 Kruskal-Wallis
Based on the p-value, there is significant difference between topic presentation and language proficiency.
Based on the outputs above, we can say that it is the academic performance.