Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
The tables above provides the distributions of respondents in terms of sex, year level, and strand. It can be seen that there are 65 females and 35 males; 14 of which are from ABM, 22 from GAS, 42 from HUMSS, and 22 from STEM. Moreover, 12 said that they always read, 47 said sometimes, 28 said that they often read, and 13 never read.
Call:
lm(formula = `Reading Habit` ~ `Writing Practice` + `Language Exposure`,
data = Data)
Coefficients:
(Intercept) `Writing Practice` `Language Exposure`
0.3807 0.4226 0.4489
From this, we may deduce that the data fail to satisfy the two assumptions – Linearity and Homogeneity of Variance.
`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.
Attaching package: 'rstatix'
The following object is masked from 'package:stats':
filter
The mean for male and female is 2.954 and 3.114, respectively.
The above graph shows the plotting of data by sex, which contains two sexes – male and female.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ lubridate 1.9.3 ✔ tibble 3.2.1
✔ purrr 1.0.2 ✔ tidyr 1.3.1
✔ readr 2.1.5
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ rstatix::filter() masks dplyr::filter(), stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
It clearly shows that there is a difference on the Reading Habit (variable) when grouped according to their sex. However, we still need to check the significance of this difference.
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:purrr':
some
The following object is masked from 'package:dplyr':
recode
The histogram does not resemble a bell curve as seen above, means that the residuals do not have a normal distribution. Moreover, the points in the QQ-plots do not follow the straight line, with the majority of them falling outside the confidence bands. .
Shapiro-Wilk normality test
data: res_aov$residuals
W = 0.95858, p-value = 0.003191
The Shapiro-Wilk p-value = 0.003191 on the residuals is less than the usual significance level of 0.05. Thus, we reject the hypothesis that residuals have a normal distribution.
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 4.2091 0.04288 *
98
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is less than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is not met.
Wilcoxon rank sum test
data: a and b
W = 926.5, p-value = 0.09339
alternative hypothesis: true location shift is not equal to 0
Since the p-value is greater than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the variable reading habit when grouped according to sex.
`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.
The mean for male and female is 3.046 and 3.132, respectively.
The above graph shows the plotting of data by sex, which contains two sexes – male and female.
Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
It clearly shows that there is a difference on the variable writing practice when grouped according to their sex. However, we still need to check the significance of its difference.
The histogram does not resemble a bell curve as seen above, means that the residuals do not have a normal distribution. Moreover, the points in the QQ-plots roughly follow the straight line, with the majority of them falling outside the confidence bands.
Shapiro-Wilk normality test
data: res_aov$residuals
W = 0.96289, p-value = 0.0065
The Shapiro-Wilk p-value = 0.0065 on the residuals is less than the usual significance level of 0.05. Thus, we reject the hypothesis that residuals have a normal distribution.
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 5.8523 0.0174 *
98
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is less than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is not met.
Wilcoxon rank sum test
data: c and d
W = 962.5, p-value = 0.2894
alternative hypothesis: true location shift is not equal to 0
Since the p-value is greater than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the variable writing practice when grouped according to sex.
`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.
The mean for male and female is 3.000 and 3.068, respectively.
The above graph shows the plotting of data by sex, which contains two sexes – male and female.
Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
It clearly shows that there is no significant difference on the variable language exposure when grouped according to their sex.
The histogram does not resemble a bell curve as seen above, means that the residuals do not have a normal distribution. Moreover, the points in the QQ-plots roughly follow the straight line, with the majority of them falling within the confidence bands. However, it is necessary that we draw conclusion when both illustrations have the same answer which in this case is not true. Thus, to see exact findings, we have the following results.
Shapiro-Wilk normality test
data: res_aov$residuals
W = 0.95868, p-value = 0.00324
The Shapiro-Wilk p-value = 0.00324 on the residuals is less than the usual significance level of 0.05. Thus, we reject the hypothesis that residuals have a normal distribution.
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 2.5243 0.1153
98
The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.
Wilcoxon rank sum test
data: e and f
W = 991, p-value = 0.3964
alternative hypothesis: true location shift is not equal to 0
Since the p-value is greater than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the variable language exposure when grouped according to sex.
Shapiro-Wilk normality test
data: Data$`Reading Habit`
W = 0.94046, p-value = 0.0002056
Since p-value = 0.0002056 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 2.0878 0.1069
96
The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.
Attaching package: 'gplots'
The following object is masked from 'package:stats':
lowess
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
# A tibble: 4 × 11
`Reading Frequency` variable n min max median iqr mean sd se
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Always Reading … 12 2.8 4 3.1 0.25 3.23 0.389 0.112
2 Sometimes Reading … 47 2 4 3 0.3 2.99 0.391 0.057
3 Often Reading … 28 2.2 3.8 3.2 0.4 3.14 0.388 0.073
4 Never Reading … 13 2 4 3 1 2.95 0.639 0.177
# ℹ 1 more variable: ci <dbl>
The mean of always, sometimes, often and never is 3.233, 2.991, 3.143, and 2.954, respectively.
# A tibble: 1 × 6
.y. n statistic df p method
* <chr> <int> <dbl> <int> <dbl> <chr>
1 Reading Habit 100 5.53 3 0.137 Kruskal-Wallis
Based on the p-value, there is no significant difference was observed between the group pairs.
Shapiro-Wilk normality test
data: Data$`Writing Practice`
W = 0.95226, p-value = 0.001172
Since p-value = 0.001172 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 2.9223 0.03787 *
96
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is less than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is not met.
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
# A tibble: 4 × 11
`Reading Frequency` variable n min max median iqr mean sd se
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Always Writing … 12 2 3.8 3 0.3 3.12 0.455 0.131
2 Sometimes Writing … 47 2.2 4 3 0.3 3.05 0.284 0.041
3 Often Writing … 28 2.4 4 3.2 0.6 3.26 0.418 0.079
4 Never Writing … 13 2.2 3.6 3 0.6 2.92 0.413 0.114
# ℹ 1 more variable: ci <dbl>
The mean of always, sometimes, often and never is 3.117, 3.051, 3.264, and 2.923, respectively.
# A tibble: 1 × 6
.y. n statistic df p method
* <chr> <int> <dbl> <int> <dbl> <chr>
1 Writing Practice 100 8.52 3 0.0363 Kruskal-Wallis
Based on the p-value, there is significant difference was observed between the group pairs.
Shapiro-Wilk normality test
data: Data$`Language Exposure`
W = 0.94327, p-value = 0.0003071
Since p-value = 0.0003071 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 0.8257 0.4829
96
The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
# A tibble: 4 × 11
`Reading Frequency` variable n min max median iqr mean sd se
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Always Language… 12 2.4 3.8 3.1 0.45 3.18 0.376 0.109
2 Sometimes Language… 47 2.4 3.8 3 0.2 2.94 0.312 0.046
3 Often Language… 28 2.6 3.8 3.2 0.4 3.19 0.355 0.067
4 Never Language… 13 2.2 3.6 3 0 2.98 0.341 0.095
# ℹ 1 more variable: ci <dbl>
The mean of always, sometimes, often and never is 3.183, 2.936, 3.193, and 2.985, respectively.
# A tibble: 1 × 6
.y. n statistic df p method
* <chr> <int> <dbl> <int> <dbl> <chr>
1 Language Exposure 100 13.2 3 0.0042 Kruskal-Wallis
Based on the p-value, there is significant difference was observed between the group pairs.
Shapiro-Wilk normality test
data: Data1$Scores
W = 0.95374, p-value = 3.921e-08
Since p-value = 3.921e-08 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 2 0.6812 0.5068
297
The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
# A tibble: 3 × 11
Variables variable n min max median iqr mean sd se ci
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Reading Habit Scores 100 2 4 3 0.6 3.06 0.433 0.043 0.086
2 Writing Pract… Scores 100 2 4 3 0.3 3.10 0.377 0.038 0.075
3 Language Expo… Scores 100 2.2 3.8 3 0.4 3.04 0.352 0.035 0.07
The mean of reading habit, writing practice, and language exposure is 3.058, 3.102, and 3.044, respectively.
# A tibble: 1 × 6
.y. n statistic df p method
* <chr> <int> <dbl> <int> <dbl> <chr>
1 Scores 300 1.93 2 0.381 Kruskal-Wallis
Based on the p-value, there is no significant difference was observed between the group pairs.
# A tibble: 3 × 9
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
* <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
1 Scores Reading Habit Writin… 100 100 0.666 0.506 1 ns
2 Scores Reading Habit Langua… 100 100 -0.723 0.470 1 ns
3 Scores Writing Practice Langua… 100 100 -1.39 0.165 0.495 ns
Based on the provided output above, we can say that it is the writing practice.