Independent variables: SSOverall, STOverall, and SCOverall
Dependent variable: GWA(1st sem SY: 2021-2022)

Q1. How many of the observations whose age is at least 21 years old?

Q2. How many of the observations whose grades are above 1.25 to 1.75?

Q3. Provide the results in checking the assumptions in running multiple regression analysis.

Q4. Which of the independent variables significantly predicts the dependent variable?

library(readxl)
withage <- read_excel("D:/MARV BS MATH/Marv 4th year, 1st sem/Regression Analysis/withage.xlsx")
View(withage)
withage
# A tibble: 113 × 33
     Age GWA (1s…¹   SS1   SS2   SS3   SS4   SS5   SS6   SS7   SS8 SSOve…²   ST1
   <dbl>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl>
 1    21      1.54     5     4     4     5     5     5     5     5    4.75     5
 2    20      1.27     5     5     5     4     5     5     1     1    3.88     3
 3    15      1.4      4     4     4     4     5     5     5     5    4.5      4
 4    15      1.19     5     5     5     5     5     5     5     5    5        4
 5    18      1.47     1     5     5     4     5     4     3     3    3.75     3
 6    17      1.85     3     3     3     3     3     3     3     3    3        3
 7    20      1.4      2     4     4     4     5     2     4     4    3.62     5
 8    25      1.52     4     3     5     3     3     3     4     3    3.5      5
 9    25      1.2      3     3     3     3     3     3     3     3    3        2
10    26      2        3     5     4     3     5     5     5     5    4.38     5
# … with 103 more rows, 21 more variables: ST2 <dbl>, ST3 <dbl>, ST4 <dbl>,
#   ST5 <dbl>, ST6 <dbl>, ST7 <dbl>, ST8 <dbl>, STOverall <dbl>, SC1 <dbl>,
#   SC2 <dbl>, SC3 <dbl>, SC4 <dbl>, SC5 <dbl>, SC6 <dbl>, SC7 <dbl>,
#   SC8 <dbl>, SCOverall <dbl>, Q1 <chr>, Q2 <chr>, Q3 <chr>, Q4 <chr>, and
#   abbreviated variable names ¹​`GWA (1st sem SY: 2021-2022)`, ²​SSOverall

Q1. How many of the observations whose age is at least 21 years old?

library(dplyr)
Warning: package 'dplyr' was built under R version 4.2.2

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Marv <- withage%>%
  mutate(Agecode=ifelse(Age>=21, "at least 21 years old", "Less than 21 years old"))%>%
  group_by(Agecode)%>%
  summarise(count=n())%>%
  mutate(Percentage =round((count/sum(count)*100),2))
Marv
# A tibble: 2 × 3
  Agecode                count Percentage
  <chr>                  <int>      <dbl>
1 at least 21 years old     72       63.7
2 Less than 21 years old    41       36.3

Hence, there are 72 of them whose age is at least 21 years old.

Q2. How many of the observations whose grades are above 1.25 to 1.75?

Marv1 <- withage%>%
  mutate(GWAcode=ifelse(`GWA (1st sem SY: 2021-2022)`>=1.25 & `GWA (1st sem SY: 2021-2022)`<=1.75, "GWA is the interval [1.25, 1.75]", "Not in the given interval of GWA"))%>%
  group_by(GWAcode)%>%
  summarise(count=n())%>%
  mutate(Percentage =round((count/sum(count)*100),2))
Marv1
# A tibble: 2 × 3
  GWAcode                          count Percentage
  <chr>                            <int>      <dbl>
1 GWA is the interval [1.25, 1.75]    92       81.4
2 Not in the given interval of GWA    21       18.6

Hence, there are 92 observations whose GWA is in the interval [1.25, 1.75].

Q3. Provide the results in checking the assumptions in running multiple regression analysis.

library(performance)
multiple <- lm(`GWA (1st sem SY: 2021-2022)` ~ SSOverall + STOverall + SCOverall, data = withage)
summary(multiple)

Call:
lm(formula = `GWA (1st sem SY: 2021-2022)` ~ SSOverall + STOverall + 
    SCOverall, data = withage)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.45511 -0.11195 -0.02104  0.10446  0.57345 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.996962   0.173046  11.540   <2e-16 ***
SSOverall   -0.047674   0.038355  -1.243    0.217    
STOverall   -0.067324   0.047959  -1.404    0.163    
SCOverall   -0.005764   0.048030  -0.120    0.905    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1993 on 109 degrees of freedom
Multiple R-squared:  0.07516,   Adjusted R-squared:  0.0497 
F-statistic: 2.953 on 3 and 109 DF,  p-value: 0.03583
check_model(multiple)

Q4. Which of the independent variables significantly predicts the dependent variable?

summary(multiple)
## 
## Call:
## lm(formula = `GWA (1st sem SY: 2021-2022)` ~ SSOverall + STOverall + 
##     SCOverall, data = withage)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.45511 -0.11195 -0.02104  0.10446  0.57345 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.996962   0.173046  11.540   <2e-16 ***
## SSOverall   -0.047674   0.038355  -1.243    0.217    
## STOverall   -0.067324   0.047959  -1.404    0.163    
## SCOverall   -0.005764   0.048030  -0.120    0.905    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1993 on 109 degrees of freedom
## Multiple R-squared:  0.07516,    Adjusted R-squared:  0.0497 
## F-statistic: 2.953 on 3 and 109 DF,  p-value: 0.03583

Hence, the independent variable, STOverall has the coefficient with the largest absolute value which significantly predicts the dependent variable, GWA(1st sem SY: 2021-2022). This measure suggests that STOverall is the most important independent variable in the regression model.