Data

1. What is the demographic profile of the respondents in terms of:


Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Sex

Strand

The tables above provides the distributions of respondents in terms of sex, year level, and strand. It can be seen that there are 70 females and 50 males; 24 of which are from ABM, 29 from GAS, 29 from HUMSS, and 38 from STEM.

2. Is there a significant difference on the availability of reading materials, instructional facilities, and parental involvement when grouped according to:

2.1 Sex


Call:
lm(formula = `Availability of Reading Materials` ~ `Instructional Facilities` + 
    `Parental Involvement`, data = Data)

Coefficients:
               (Intercept)  `Instructional Facilities`  
                   0.68197                     0.70948  
    `Parental Involvement`  
                   0.09139  

From this, we may deduce that the data fail to satisfy the two assumptions – Linearity and Homogeneity of Variance.

2.1.1 Sex and Availability of Reading Materials

`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.

Attaching package: 'rstatix'
The following object is masked from 'package:stats':

    filter

The mean for male and female is 3.172 and 3.011, respectively.

The above graph shows the plotting of data by sex, which contains two sexes – male and female.

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
✔ readr     2.1.5     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ rstatix::filter() masks dplyr::filter(), stats::filter()
✖ dplyr::lag()      masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

It clearly shows that there is a significant difference on the availability of reading materials in terms of its effects towards reading comprehension when grouped according to their sex. However, we still need to check the significance of this difference.

Loading required package: carData

Attaching package: 'car'
The following object is masked from 'package:purrr':

    some
The following object is masked from 'package:dplyr':

    recode

The histogram does not resemble a bell curve as seen above, means that the residuals do not have a normal distribution. Moreover, the points in the QQ-plots roughly follow the straight line, with the majority of them falling within the confidence bands. However, this does not guarantee that residuals follow a normal distribution since when based on the diagram on the left, it is the exact opposite of it. Thus, it is more convenient to observe the two.

Normality Test


    Shapiro-Wilk normality test

data:  res_aov$residuals
W = 0.97713, p-value = 0.0386

The Shapiro-Wilk p-value = 0.0386 on the residuals is less than the usual significance level of 0.05. Thus, we reject the hypothesis that residuals have a normal distribution.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   1  0.6531 0.4206
      118               

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Wilcoxon Rank Sum Test


    Wilcoxon rank sum test

data:  a and b
W = 2046, p-value = 0.1099
alternative hypothesis: true location shift is not equal to 0

Since the p-value is greater than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the availability of reading materials when grouped according to sex.

2.1.2 Sex and Instructional Facilities

`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.

The mean for male and female is 3.108 and 3.009, respectively.

The above graph shows the plotting of data by sex, which contains two sexes – male and female.

Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

It clearly shows that there is a significant difference on the instructional facilities in terms of its effects towards reading comprehension when grouped according to their sex. However, we still need to check the significance of its difference.

The histogram resembles a bell curve as seen above, means that the residuals have a normal distribution. Moreover, the points in the QQ-plots roughly follow the straight line, with the majority of them falling within the confidence bands.

Normality Test


    Shapiro-Wilk normality test

data:  res_aov$residuals
W = 0.98276, p-value = 0.1283

The Shapiro-Wilk p-value = 0.1283 on the residuals is greater than the usual significance level of 0.05. Thus, we fail to reject the hypothesis that residuals have a normal distribution.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   1  0.7184 0.3984
      118               

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Two Sample T-test


    Welch Two Sample t-test

data:  c and d
t = 1.3413, df = 103.37, p-value = 0.1828
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.03992289  0.20678003
sample estimates:
mean of x mean of y 
 3.112000  3.028571 

Since the p-value is greater than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the instructional facilities in terms of its effects towards reading comprehension when grouped according to sex.

2.1.3 Sex and Parental Involvement

`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.

The mean for male and female is 2.446 and 2.680, respectively.

The above graph shows the plotting of data by sex, which contains two sexes – male and female.

Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

It clearly shows that there is a significant difference between the impact of parental involvement when grouped according to their sex. However, we still need to check the significance of its difference.

The histogram does not resemble a bell curve as seen above, means that the residuals do not have a normal distribution. Moreover, the points in the QQ-plots roughly follow the straight line, with the majority of them falling within the confidence bands. However, this does not guarantee that residuals follow a normal distribution since when based on the diagram on the left, it is the exact opposite of it. Thus, it is more convenient to observe the two.

Normality Test


    Shapiro-Wilk normality test

data:  res_aov$residuals
W = 0.98829, p-value = 0.3943

The Shapiro-Wilk p-value = 0.04386 on the residuals is greater than the usual significance level of 0.05. Thus, we fail to reject the hypothesis that residuals have a normal distribution.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   1  0.0426 0.8369
      118               

The p-value is less than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is not met.

Two Sample T-test


    Welch Two Sample t-test

data:  e and f
t = 1.7805, df = 112.37, p-value = 0.0777
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.02459003  0.46064637
sample estimates:
mean of x mean of y 
 2.680000  2.461972 

Since the p-value is greater than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the effects of parental involvement towards reading comprehension of their student when grouped according to sex.

2.2 Strand

2.2.1 Strand and Availability of Reading Materials

Normality Test


    Shapiro-Wilk normality test

data:  Data$`Availability of Reading Materials`
W = 0.96388, p-value = 0.002625

Since p-value = 0.002625 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
       Df F value  Pr(>F)  
group   3  2.4218 0.06948 .
      116                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.


Attaching package: 'gplots'
The following object is masked from 'package:stats':

    lowess
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 4 × 11
  Strand variable             n   min   max median   iqr  mean    sd    se    ci
  <fct>  <fct>            <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 GAS    Availability of…    29   2     4      2.8   0.6  2.98 0.519 0.096 0.197
2 ABM    Availability of…    24   2.8   3.8    3.1   0.3  3.13 0.293 0.06  0.124
3 HUMSS  Availability of…    29   2.2   3.8    3     0.4  2.91 0.332 0.062 0.126
4 STEM   Availability of…    38   2.2   4      3.2   0.6  3.25 0.404 0.066 0.133

The mean of GAS, ABM, HUMSS, and STEM is 2.979, 3.133, 2.910, and 3.247, respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.                                   n statistic    df       p method        
* <chr>                             <int>     <dbl> <int>   <dbl> <chr>         
1 Availability of Reading Materials   120      15.5     3 0.00146 Kruskal-Wallis

Based on the p-value, there is significant difference was observed between the group pairs.

Pairwise Comparisons

# A tibble: 6 × 9
  .y.           group1 group2    n1    n2 statistic       p   p.adj p.adj.signif
* <chr>         <chr>  <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
1 Availability… GAS    ABM       29    24     1.75  8.04e-2 0.482   ns          
2 Availability… GAS    HUMSS     29    29    -0.289 7.72e-1 1       ns          
3 Availability… GAS    STEM      29    38     3.08  2.09e-3 0.0126  *           
4 Availability… ABM    HUMSS     24    29    -2.02  4.30e-2 0.258   ns          
5 Availability… ABM    STEM      24    38     1.06  2.90e-1 1       ns          
6 Availability… HUMSS  STEM      29    38     3.38  7.13e-4 0.00428 **          

There is a significant difference between GAS and STEM, ABM and HUMSS, so with HUMSS and STEM.

2.2.2 Strand and Instructional Facilities

Normality Test


    Shapiro-Wilk normality test

data:  Data$`Instructional Facilities`
W = 0.96524, p-value = 0.003411

Since p-value 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   3  0.9743 0.4074
      116               

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 4 × 11
  Strand variable             n   min   max median   iqr  mean    sd    se    ci
  <fct>  <fct>            <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 GAS    Instructional F…    29   2.2   3.8    3     0.6  3.03 0.397 0.074 0.151
2 ABM    Instructional F…    24   2.4   3.6    3.2   0.3  3.08 0.311 0.063 0.131
3 HUMSS  Instructional F…    29   2.4   3.6    3     0.4  2.97 0.328 0.061 0.125
4 STEM   Instructional F…    38   2.4   3.8    3.2   0.2  3.10 0.289 0.047 0.095

The mean of GAS, ABM, HUMSS, and STEM is 3.034, 3.075, 2.972, and 3.105, respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.                          n statistic    df     p method        
* <chr>                    <int>     <dbl> <int> <dbl> <chr>         
1 Instructional Facilities   120      2.88     3  0.41 Kruskal-Wallis

Based on the p-value, there is no significant difference was observed between the group pairs.

Pairwise Comparisons

# A tibble: 6 × 9
  .y.               group1 group2    n1    n2 statistic     p p.adj p.adj.signif
* <chr>             <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <chr>       
1 Instructional Fa… GAS    ABM       29    24     0.629 0.530 1     ns          
2 Instructional Fa… GAS    HUMSS     29    29    -0.680 0.496 1     ns          
3 Instructional Fa… GAS    STEM      29    38     0.839 0.402 1     ns          
4 Instructional Fa… ABM    HUMSS     24    29    -1.28  0.202 1     ns          
5 Instructional Fa… ABM    STEM      24    38     0.128 0.898 1     ns          
6 Instructional Fa… HUMSS  STEM      29    38     1.56  0.118 0.708 ns          

2.2.3 Strand and Parental Involvement

Normality Test


    Shapiro-Wilk normality test

data:  Data$`Parental Involvement`
W = 0.97876, p-value = 0.05464

Since p-value = 0.05464 > 0.05, it is conclusive that we fail reject the null hypothesis. That is, we assume its normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   3  0.6647 0.5754
      116               

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 4 × 11
  Strand variable             n   min   max median   iqr  mean    sd    se    ci
  <fct>  <fct>            <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 GAS    Parental Involv…    29   1     4      2.4  1     2.58 0.766 0.142 0.291
2 ABM    Parental Involv…    24   1.2   3.6    2.8  0.9   2.74 0.65  0.133 0.274
3 HUMSS  Parental Involv…    29   1     3.8    2.6  0.6   2.46 0.61  0.113 0.232
4 STEM   Parental Involv…    38   1     3.8    2.4  0.75  2.45 0.645 0.105 0.212

The mean of GAS, ABM, HUMSS, and STEM is 2.579, 2.742, 2.462, and 2.453, respectively.

One-way ANOVA

             Df Sum Sq Mean Sq F value Pr(>F)
Strand        3   1.49  0.4952   1.105   0.35
Residuals   116  51.97  0.4480               

Since p-value = 0.35 > 0.05, we fail to reject the null hypothesis, that is, there is no significant difference between the effects of parental involvement towards reading comprehension when grouped according to strand.

3. Is there a significant relationship between availability of reading materials. instructional facilities, and parental involvement in terms of their effects towards reading comprehension?

Normality Test


    Shapiro-Wilk normality test

data:  Data1$Scores
W = 0.95992, p-value = 2.402e-07

Since p-value = 2.402e-07 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
       Df F value    Pr(>F)    
group   2  15.745 3.167e-07 ***
      297                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value is less than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is not met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter
Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 3 × 11
  Variables      variable     n   min   max median   iqr  mean    sd    se    ci
  <fct>          <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Availability … Scores     100   2     4      3    0.45  3.07 0.417 0.042 0.083
2 Instructional… Scores     100   2.2   3.8    3    0.4   3.05 0.339 0.034 0.067
3 Parental Invo… Scores     100   1     4      2.6  0.8   2.58 0.657 0.066 0.13 

The mean of Availability of Reading Materials, Instructional Facilities, Parental Involvement is 3.072, 3.048, and 2.576 respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.        n statistic    df        p method        
* <chr>  <int>     <dbl> <int>    <dbl> <chr>         
1 Scores   300      47.4     2 5.21e-11 Kruskal-Wallis

Based on the p-value, there is a significant difference was observed between the group pairs.

Pairwise Comparisons

# A tibble: 3 × 9
  .y.    group1        group2    n1    n2 statistic       p   p.adj p.adj.signif
* <chr>  <chr>         <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
1 Scores Availability… Instr…   100   100   -0.0978 9.22e-1 1   e+0 ns          
2 Scores Availability… Paren…   100   100   -6.01   1.88e-9 5.64e-9 ****        
3 Scores Instructiona… Paren…   100   100   -5.91   3.42e-9 1.03e-8 ****        

There is a significant difference between availability of reading materials and parental involvement so with instructional facilities and parental involvement.

4. Which have the most significant impact?

Based on the provided output above, it can be seen that availability of reading materials have been the most significant factor that affects the reading comprehension of the students.