Independent variables: SSOverall, STOverall, and SCOverall
Dependent variable: GWA(1st sem SY: 2021-2022)
Q1. How many of the observations whose age is at least 21 years old?
Q2. How many of the observations whose grades are above 1.25 to 1.75?
Q3. Provide the results in checking the assumptions in running multiple regression analysis.
Q4. Which of the independent variables significantly predicts the dependent variable?
library(readxl)
withage <- read_excel("D:/MARV BS MATH/Marv 4th year, 1st sem/Regression Analysis/withage.xlsx")
View(withage)
withage
# A tibble: 113 × 33
Age GWA (1s…¹ SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SSOve…² ST1
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 1.54 5 4 4 5 5 5 5 5 4.75 5
2 20 1.27 5 5 5 4 5 5 1 1 3.88 3
3 15 1.4 4 4 4 4 5 5 5 5 4.5 4
4 15 1.19 5 5 5 5 5 5 5 5 5 4
5 18 1.47 1 5 5 4 5 4 3 3 3.75 3
6 17 1.85 3 3 3 3 3 3 3 3 3 3
7 20 1.4 2 4 4 4 5 2 4 4 3.62 5
8 25 1.52 4 3 5 3 3 3 4 3 3.5 5
9 25 1.2 3 3 3 3 3 3 3 3 3 2
10 26 2 3 5 4 3 5 5 5 5 4.38 5
# … with 103 more rows, 21 more variables: ST2 <dbl>, ST3 <dbl>, ST4 <dbl>,
# ST5 <dbl>, ST6 <dbl>, ST7 <dbl>, ST8 <dbl>, STOverall <dbl>, SC1 <dbl>,
# SC2 <dbl>, SC3 <dbl>, SC4 <dbl>, SC5 <dbl>, SC6 <dbl>, SC7 <dbl>,
# SC8 <dbl>, SCOverall <dbl>, Q1 <chr>, Q2 <chr>, Q3 <chr>, Q4 <chr>, and
# abbreviated variable names ¹​`GWA (1st sem SY: 2021-2022)`, ²​SSOverall
library(dplyr)
Warning: package 'dplyr' was built under R version 4.2.2
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Marv <- withage%>%
mutate(Agecode=ifelse(Age>=21, "at least 21 years old", "Less than 21 years old"))%>%
group_by(Agecode)%>%
summarise(count=n())%>%
mutate(Percentage =round((count/sum(count)*100),2))
Marv
# A tibble: 2 × 3
Agecode count Percentage
<chr> <int> <dbl>
1 at least 21 years old 72 63.7
2 Less than 21 years old 41 36.3
Hence, there are 72 of them whose age is at least 21 years old.
Marv1 <- withage%>%
mutate(GWAcode=ifelse(`GWA (1st sem SY: 2021-2022)`>=1.25 & `GWA (1st sem SY: 2021-2022)`<=1.75, "GWA is the interval [1.25, 1.75]", "Not in the given interval of GWA"))%>%
group_by(GWAcode)%>%
summarise(count=n())%>%
mutate(Percentage =round((count/sum(count)*100),2))
Marv1
# A tibble: 2 × 3
GWAcode count Percentage
<chr> <int> <dbl>
1 GWA is the interval [1.25, 1.75] 92 81.4
2 Not in the given interval of GWA 21 18.6
Hence, there are 92 observations whose GWA is in the interval [1.25, 1.75].
library(performance)
multiple <- lm(`GWA (1st sem SY: 2021-2022)` ~ SSOverall + STOverall + SCOverall, data = withage)
summary(multiple)
Call:
lm(formula = `GWA (1st sem SY: 2021-2022)` ~ SSOverall + STOverall +
SCOverall, data = withage)
Residuals:
Min 1Q Median 3Q Max
-0.45511 -0.11195 -0.02104 0.10446 0.57345
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.996962 0.173046 11.540 <2e-16 ***
SSOverall -0.047674 0.038355 -1.243 0.217
STOverall -0.067324 0.047959 -1.404 0.163
SCOverall -0.005764 0.048030 -0.120 0.905
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1993 on 109 degrees of freedom
Multiple R-squared: 0.07516, Adjusted R-squared: 0.0497
F-statistic: 2.953 on 3 and 109 DF, p-value: 0.03583
check_model(multiple)
summary(multiple)
##
## Call:
## lm(formula = `GWA (1st sem SY: 2021-2022)` ~ SSOverall + STOverall +
## SCOverall, data = withage)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.45511 -0.11195 -0.02104 0.10446 0.57345
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.996962 0.173046 11.540 <2e-16 ***
## SSOverall -0.047674 0.038355 -1.243 0.217
## STOverall -0.067324 0.047959 -1.404 0.163
## SCOverall -0.005764 0.048030 -0.120 0.905
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1993 on 109 degrees of freedom
## Multiple R-squared: 0.07516, Adjusted R-squared: 0.0497
## F-statistic: 2.953 on 3 and 109 DF, p-value: 0.03583
Hence, the independent variable, STOverall has the coefficient with the largest absolute value which significantly predicts the dependent variable, GWA(1st sem SY: 2021-2022). This measure suggests that STOverall is the most important independent variable in the regression model.