Independent variables: SSOverall, STOverall , and SCOverall
Dependent variable: GWA (1st sem SY: 2021-2022)
Q1. How many of the observations whose age is at least 21 years
old?
Q2. How many of the observations whose grades are above 1.25 to
1.75?
Q3. Provide the results in checking the assumptions in running multiple
regression analysis.
Q4. Which of the independent variables significantly predicts the
dependent variable?
library(readxl)
ken <- read_excel("D:/Regression Analysis/withage.xlsx")
ken
## # A tibble: 113 × 33
## Age GWA (1…¹ SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SSOve…² ST1
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21 1.54 5 4 4 5 5 5 5 5 4.75 5
## 2 20 1.27 5 5 5 4 5 5 1 1 3.88 3
## 3 15 1.4 4 4 4 4 5 5 5 5 4.5 4
## 4 15 1.19 5 5 5 5 5 5 5 5 5 4
## 5 18 1.47 1 5 5 4 5 4 3 3 3.75 3
## 6 17 1.85 3 3 3 3 3 3 3 3 3 3
## 7 20 1.4 2 4 4 4 5 2 4 4 3.62 5
## 8 25 1.52 4 3 5 3 3 3 4 3 3.5 5
## 9 25 1.2 3 3 3 3 3 3 3 3 3 2
## 10 26 2 3 5 4 3 5 5 5 5 4.38 5
## # … with 103 more rows, 21 more variables: ST2 <dbl>, ST3 <dbl>, ST4 <dbl>,
## # ST5 <dbl>, ST6 <dbl>, ST7 <dbl>, ST8 <dbl>, STOverall <dbl>, SC1 <dbl>,
## # SC2 <dbl>, SC3 <dbl>, SC4 <dbl>, SC5 <dbl>, SC6 <dbl>, SC7 <dbl>,
## # SC8 <dbl>, SCOverall <dbl>, Q1 <chr>, Q2 <chr>, Q3 <chr>, Q4 <chr>, and
## # abbreviated variable names ¹`GWA (1st sem SY: 2021-2022)`, ²SSOverall
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
ken%>%
mutate(Agecode=ifelse(Age>=21, "At least 21 years old", "Less than 21 years old"))%>%
group_by(Agecode)%>%
summarise(count=n())%>%
mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 2 × 3
## Agecode count Percentage
## <chr> <int> <dbl>
## 1 At least 21 years old 72 63.7
## 2 Less than 21 years old 41 36.3
Based on the table above, there are 72 observations whose age is at least 21 years old.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ stringr 1.4.1
## ✔ tidyr 1.2.1 ✔ forcats 0.5.2
## ✔ readr 2.1.3
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggpubr)
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
library(dplyr)
ken%>%
mutate(GWAcode=ifelse(`GWA (1st sem SY: 2021-2022)`>=1.25 & `GWA (1st sem SY: 2021-2022)`<=1.75, "GWA is in the interval [1.25, 1.75]", "Not in the given interval of GWA"))%>%
group_by(GWAcode)%>%
summarise(count=n())%>%
mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 2 × 3
## GWAcode count Percentage
## <chr> <int> <dbl>
## 1 GWA is in the interval [1.25, 1.75] 92 81.4
## 2 Not in the given interval of GWA 21 18.6
As shown in the table, it is conclusive that there are 92 observations whose grades are above 1.25 to 1.75.
head(ken)
## # A tibble: 6 × 33
## Age GWA (1s…¹ SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SSOve…² ST1
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21 1.54 5 4 4 5 5 5 5 5 4.75 5
## 2 20 1.27 5 5 5 4 5 5 1 1 3.88 3
## 3 15 1.4 4 4 4 4 5 5 5 5 4.5 4
## 4 15 1.19 5 5 5 5 5 5 5 5 5 4
## 5 18 1.47 1 5 5 4 5 4 3 3 3.75 3
## 6 17 1.85 3 3 3 3 3 3 3 3 3 3
## # … with 21 more variables: ST2 <dbl>, ST3 <dbl>, ST4 <dbl>, ST5 <dbl>,
## # ST6 <dbl>, ST7 <dbl>, ST8 <dbl>, STOverall <dbl>, SC1 <dbl>, SC2 <dbl>,
## # SC3 <dbl>, SC4 <dbl>, SC5 <dbl>, SC6 <dbl>, SC7 <dbl>, SC8 <dbl>,
## # SCOverall <dbl>, Q1 <chr>, Q2 <chr>, Q3 <chr>, Q4 <chr>, and abbreviated
## # variable names ¹`GWA (1st sem SY: 2021-2022)`, ²SSOverall
multiple <- lm( `GWA (1st sem SY: 2021-2022)`~ SSOverall + STOverall + SCOverall, data = ken)
multiple
##
## Call:
## lm(formula = `GWA (1st sem SY: 2021-2022)` ~ SSOverall + STOverall +
## SCOverall, data = ken)
##
## Coefficients:
## (Intercept) SSOverall STOverall SCOverall
## 1.996962 -0.047674 -0.067324 -0.005764
library(performance)
check_model(multiple)
The insights on how to interpret the different diagnostic plots and what you should expect are given and clearly were satisfied upon reading the subtitles given for each plot. Thus, conditions of application for multiple regression analysis are met.
summary(multiple)
##
## Call:
## lm(formula = `GWA (1st sem SY: 2021-2022)` ~ SSOverall + STOverall +
## SCOverall, data = ken)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.45511 -0.11195 -0.02104 0.10446 0.57345
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.996962 0.173046 11.540 <2e-16 ***
## SSOverall -0.047674 0.038355 -1.243 0.217
## STOverall -0.067324 0.047959 -1.404 0.163
## SCOverall -0.005764 0.048030 -0.120 0.905
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1993 on 109 degrees of freedom
## Multiple R-squared: 0.07516, Adjusted R-squared: 0.0497
## F-statistic: 2.953 on 3 and 109 DF, p-value: 0.03583
The table Coefficients gives the estimate for each parameter (column Estimate), together with the p-value of the nullity of the parameter (column Pr(>|t|)).
Null and alternative hypotheses are:
H_0: β_j = 0
H_a: β_j ≠ 0.
The test of β_j = 0 is equivalent to testing the hypothesis: is the
dependent variable associated with the independent variable studied, all
other things being equal, that is to say, at constant level of the other
independent variables.
The statistical output displays the coded coefficients, which are the standardized coefficients. STOverall has the standardized coefficient with the largest absolute value. This measure suggests that STOverall is the most important independent variable in the regression model.