2. Is there a significant difference on the reading habit, writing practice, and language exposure when grouped according to:

2.1 Sex


Call:
lm(formula = `Reading Habit` ~ `Writing Practice` + `Language Exposure`, 
    data = Data)

Coefficients:
        (Intercept)   `Writing Practice`  `Language Exposure`  
             0.3807               0.4226               0.4489

From this, we may deduce that the data fail to satisfy the two assumptions – Linearity and Homogeneity of Variance.

2.1.1 Sex and Reading Habit

`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.


Attaching package: 'rstatix'

The following object is masked from 'package:stats':

    filter

The mean for male and female is 2.954 and 3.114, respectively.

The above graph shows the plotting of data by sex, which contains two sexes – male and female.

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
✔ readr     2.1.5     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ rstatix::filter() masks dplyr::filter(), stats::filter()
✖ dplyr::lag()      masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

It clearly shows that there is a difference on the Reading Habit (variable) when grouped according to their sex. However, we still need to check the significance of this difference.

Loading required package: carData


Attaching package: 'car'

The following object is masked from 'package:purrr':

    some

The following object is masked from 'package:dplyr':

    recode

The histogram does not resemble a bell curve as seen above, means that the residuals do not have a normal distribution. Moreover, the points in the QQ-plots do not follow the straight line, with the majority of them falling outside the confidence bands. .

Normality Test


    Shapiro-Wilk normality test

data:  res_aov$residuals
W = 0.95858, p-value = 0.003191

The Shapiro-Wilk p-value = 0.003191 on the residuals is less than the usual significance level of 0.05. Thus, we reject the hypothesis that residuals have a normal distribution.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.

Levene's Test for Homogeneity of Variance (center = median)
      Df F value  Pr(>F)  
group  1  4.2091 0.04288 *
      98                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value is less than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is not met.

Wilcoxon Rank Sum Test


    Wilcoxon rank sum test

data:  a and b
W = 926.5, p-value = 0.09339
alternative hypothesis: true location shift is not equal to 0

Since the p-value is greater than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the variable reading habit when grouped according to sex.

2.1.2 Sex and Writing Practice

`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.

The mean for male and female is 3.046 and 3.132, respectively.

The above graph shows the plotting of data by sex, which contains two sexes – male and female.

Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

It clearly shows that there is a difference on the variable writing practice when grouped according to their sex. However, we still need to check the significance of its difference.

Normality Test


    Shapiro-Wilk normality test

data:  res_aov$residuals
W = 0.96289, p-value = 0.0065

The Shapiro-Wilk p-value = 0.0065 on the residuals is less than the usual significance level of 0.05. Thus, we reject the hypothesis that residuals have a normal distribution.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)  
group  1  5.8523 0.0174 *
      98                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value is less than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is not met.

Wilcoxon Rank Sum Test


    Wilcoxon rank sum test

data:  c and d
W = 962.5, p-value = 0.2894
alternative hypothesis: true location shift is not equal to 0

Since the p-value is greater than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the variable writing practice when grouped according to sex.

2.1.3 Sex and Language Exposure

`summarise()` has grouped output by 'Sex'. You can override using the `.groups`
argument.

The mean for male and female is 3.000 and 3.068, respectively.

The above graph shows the plotting of data by sex, which contains two sexes – male and female.

Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

It clearly shows that there is no significant difference on the variable language exposure when grouped according to their sex.

The histogram does not resemble a bell curve as seen above, means that the residuals do not have a normal distribution. Moreover, the points in the QQ-plots roughly follow the straight line, with the majority of them falling within the confidence bands. However, it is necessary that we draw conclusion when both illustrations have the same answer which in this case is not true. Thus, to see exact findings, we have the following results.

Normality Test


    Shapiro-Wilk normality test

data:  res_aov$residuals
W = 0.95868, p-value = 0.00324

The Shapiro-Wilk p-value = 0.00324 on the residuals is less than the usual significance level of 0.05. Thus, we reject the hypothesis that residuals have a normal distribution.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  2.5243 0.1153
      98

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Wilcoxon Rank Sum Test


    Wilcoxon rank sum test

data:  e and f
W = 991, p-value = 0.3964
alternative hypothesis: true location shift is not equal to 0

Since the p-value is greater than 0.05, we fail to reject the null hypothesis, that is, there is no significant difference on the variable language exposure when grouped according to sex.

2.2 Reading Frequency

2.2.1 Reading Frequency and Reading Habit

Normality Test


    Shapiro-Wilk normality test

data:  Data$`Reading Habit`
W = 0.94046, p-value = 0.0002056

Since p-value = 0.0002056 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  3  2.0878 0.1069
      96

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.


Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 4 × 11
  `Reading Frequency` variable      n   min   max median   iqr  mean    sd    se
  <fct>               <fct>     <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
1 Always              Reading …    12   2.8   4      3.1  0.25  3.23 0.389 0.112
2 Sometimes           Reading …    47   2     4      3    0.3   2.99 0.391 0.057
3 Often               Reading …    28   2.2   3.8    3.2  0.4   3.14 0.388 0.073
4 Never               Reading …    13   2     4      3    1     2.95 0.639 0.177
# ℹ 1 more variable: ci <dbl>

The mean of always, sometimes, often and never is 3.233, 2.991, 3.143, and 2.954, respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.               n statistic    df     p method        
* <chr>         <int>     <dbl> <int> <dbl> <chr>         
1 Reading Habit   100      5.53     3 0.137 Kruskal-Wallis

Based on the p-value, there is no significant difference was observed between the group pairs.

2.2.2 Reading Frequency and Writing Practice

Normality Test


    Shapiro-Wilk normality test

data:  Data$`Writing Practice`
W = 0.95226, p-value = 0.001172

Since p-value = 0.001172 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.

Levene's Test for Homogeneity of Variance (center = median)
      Df F value  Pr(>F)  
group  3  2.9223 0.03787 *
      96                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value is less than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is not met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 4 × 11
  `Reading Frequency` variable      n   min   max median   iqr  mean    sd    se
  <fct>               <fct>     <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
1 Always              Writing …    12   2     3.8    3     0.3  3.12 0.455 0.131
2 Sometimes           Writing …    47   2.2   4      3     0.3  3.05 0.284 0.041
3 Often               Writing …    28   2.4   4      3.2   0.6  3.26 0.418 0.079
4 Never               Writing …    13   2.2   3.6    3     0.6  2.92 0.413 0.114
# ℹ 1 more variable: ci <dbl>

The mean of always, sometimes, often and never is 3.117, 3.051, 3.264, and 2.923, respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.                  n statistic    df      p method        
* <chr>            <int>     <dbl> <int>  <dbl> <chr>         
1 Writing Practice   100      8.52     3 0.0363 Kruskal-Wallis

Based on the p-value, there is significant difference was observed between the group pairs.

2.2.3 Reading Frequency and Language Exposure

Normality Test


    Shapiro-Wilk normality test

data:  Data$`Language Exposure`
W = 0.94327, p-value = 0.0003071

Since p-value = 0.0003071 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  3  0.8257 0.4829
      96

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 4 × 11
  `Reading Frequency` variable      n   min   max median   iqr  mean    sd    se
  <fct>               <fct>     <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
1 Always              Language…    12   2.4   3.8    3.1  0.45  3.18 0.376 0.109
2 Sometimes           Language…    47   2.4   3.8    3    0.2   2.94 0.312 0.046
3 Often               Language…    28   2.6   3.8    3.2  0.4   3.19 0.355 0.067
4 Never               Language…    13   2.2   3.6    3    0     2.98 0.341 0.095
# ℹ 1 more variable: ci <dbl>

The mean of always, sometimes, often and never is 3.183, 2.936, 3.193, and 2.985, respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.                   n statistic    df      p method        
* <chr>             <int>     <dbl> <int>  <dbl> <chr>         
1 Language Exposure   100      13.2     3 0.0042 Kruskal-Wallis

Based on the p-value, there is significant difference was observed between the group pairs.

3. Is there a significant difference between reading habit, writing practice, and language exposure?

Normality Test


    Shapiro-Wilk normality test

data:  Data1$Scores
W = 0.95374, p-value = 3.921e-08

Since p-value = 3.921e-08 < 0.05, it is conclusive that we reject the null hypothesis. That is, we cannot assume normality.

Equality of Variance

Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.

Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   2  0.6812 0.5068
      297

The p-value is greater than the 0.05 level of significance. Thus, the homogeneity assumption of the variance is met.

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

# A tibble: 3 × 11
  Variables      variable     n   min   max median   iqr  mean    sd    se    ci
  <fct>          <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Reading Habit  Scores     100   2     4        3   0.6  3.06 0.433 0.043 0.086
2 Writing Pract… Scores     100   2     4        3   0.3  3.10 0.377 0.038 0.075
3 Language Expo… Scores     100   2.2   3.8      3   0.4  3.04 0.352 0.035 0.07

The mean of reading habit, writing practice, and language exposure is 3.058, 3.102, and 3.044, respectively.

Kruskal-wallis Test

# A tibble: 1 × 6
  .y.        n statistic    df     p method        
* <chr>  <int>     <dbl> <int> <dbl> <chr>         
1 Scores   300      1.93     2 0.381 Kruskal-Wallis

Based on the p-value, there is no significant difference was observed between the group pairs.

Pairwise Comparisons

# A tibble: 3 × 9
  .y.    group1           group2     n1    n2 statistic     p p.adj p.adj.signif
* <chr>  <chr>            <chr>   <int> <int>     <dbl> <dbl> <dbl> <chr>       
1 Scores Reading Habit    Writin…   100   100     0.666 0.506 1     ns          
2 Scores Reading Habit    Langua…   100   100    -0.723 0.470 1     ns          
3 Scores Writing Practice Langua…   100   100    -1.39  0.165 0.495 ns

4. Which have the most significant impact?

Based on the provided output above, we can say that it is the writing practice.

JESEL ESPUERTA GROUP STATISTICAL ANALYSIS

Kyle Kenneth Ruaya

2024-02-16

Data

1. What is the demographic profile of the respondents in terms of:

Sex

Strand

Reading Frequency

2. Is there a significant difference on the reading habit, writing practice, and language exposure when grouped according to:

2.1 Sex

2.1.1 Sex and Reading Habit

Normality Test

Equality of Variance

Wilcoxon Rank Sum Test

2.1.2 Sex and Writing Practice

Normality Test

Equality of Variance

Wilcoxon Rank Sum Test

2.1.3 Sex and Language Exposure

Normality Test

Equality of Variance

Wilcoxon Rank Sum Test

2.2 Reading Frequency

2.2.1 Reading Frequency and Reading Habit

Normality Test

Equality of Variance

Kruskal-wallis Test

2.2.2 Reading Frequency and Writing Practice

Normality Test

Equality of Variance

Kruskal-wallis Test

2.2.3 Reading Frequency and Language Exposure

Normality Test

Equality of Variance

Kruskal-wallis Test

3. Is there a significant difference between reading habit, writing practice, and language exposure?

Normality Test

Equality of Variance

Kruskal-wallis Test

Pairwise Comparisons

4. Which have the most significant impact?