Multiple Linear Regression

Author

Dr Omar, Dr Zaim, Dr Hafiz, Dr Syuaib

1 Introduction : Chapter 1

1.1 Team Member :

Dr Omar Bin Nazmi

Dr Muhammad Zaim Bin Muhd Samsuri

Dr Mohd Abdul Hafiz Bin Kamarul Zaman

Dr Syuaib Aiman Amir bin Kamarudin

1.2 Workflow

1.2.1 Datasets

This exercise will use dataset mental health literacy, The dataset belonged to a study on mental health literacy among house officers. The followings are regarding the datasets:

  1. ID : Subject’s ID.
  2. Age : Respondents’s Age (Numerical variable)
  3. Gender : gender (Categorical) , (labelled ; Male, female)
  4. Current Hospital Posting : (Categorical)
  5. Duration of employment : ( Numerical)
  6. Marital Status : (Categorical)
  7. Previous experiences of being bullied: (Categorical)
  8. Family members spouse: (Categorical)
  9. Peers friends: ( Categorical)

Literature review shows Age, female gender, Hospital with good mental support, duration of employment, marital status, previous experiences of being bullied, family member spouse who are supportive, and peerd friends shows better mental health literacy.

1.3 Prepare the Environment

1.3.1 Install packages

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.4.2
Warning: package 'dplyr' was built under R version 4.4.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
Warning: package 'here' was built under R version 4.4.2
here() starts at C:/Users/H P/OneDrive - The Goose and Duck/Desktop/mlinear final
library(broom)
Warning: package 'broom' was built under R version 4.4.2
library(broom.helpers)
Warning: package 'broom.helpers' was built under R version 4.4.2
library(haven)
Warning: package 'haven' was built under R version 4.4.2
library(gtsummary)
Warning: package 'gtsummary' was built under R version 4.4.2

Attaching package: 'gtsummary'

The following objects are masked from 'package:broom.helpers':

    all_categorical, all_continuous, all_contrasts, all_dichotomous,
    all_interaction, all_intercepts
library(dplyr)
library(corrplot)
Warning: package 'corrplot' was built under R version 4.4.2
corrplot 0.95 loaded
library(knitr)
Warning: package 'knitr' was built under R version 4.4.2
library(readxl)
Warning: package 'readxl' was built under R version 4.4.2
library(janitor)
Warning: package 'janitor' was built under R version 4.4.2

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(skimr)
Warning: package 'skimr' was built under R version 4.4.2
library(GGally)
Warning: package 'GGally' was built under R version 4.4.2
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
library(rsq)
Warning: package 'rsq' was built under R version 4.4.2

1.3.2 Read Data

data1 <- read_excel("C:/Users/H P/OneDrive - The Goose and Duck/Desktop/mlinear final/Data1.xlsx")
data1<-clean_names(data1)
summary(data1)
       id        current_hospital_posting      age           gender         
 Min.   :  1.0   Length:299               Min.   :25.00   Length:299        
 1st Qu.: 75.5   Class :character         1st Qu.:26.00   Class :character  
 Median :151.0   Mode  :character         Median :27.00   Mode  :character  
 Mean   :150.6                            Mean   :26.86                     
 3rd Qu.:225.5                            3rd Qu.:27.00                     
 Max.   :300.0                            Max.   :34.00                     
 marital_status     duration_of_employment previous_experience_of_being_bullied
 Length:299         Min.   : 4.00          Length:299                          
 Class :character   1st Qu.: 8.00          Class :character                    
 Mode  :character   Median :12.00          Mode  :character                    
                    Mean   :13.34                                              
                    3rd Qu.:18.00                                              
                    Max.   :24.00                                              
 family_members_spouse peers_friends      literacy_score  
 Length:299            Length:299         Min.   : 75.00  
 Class :character      Class :character   1st Qu.: 94.00  
 Mode  :character      Mode  :character   Median : 98.00  
                                          Mean   : 99.26  
                                          3rd Qu.:104.00  
                                          Max.   :155.00  
glimpse(data1)
Rows: 299
Columns: 10
$ id                                   <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11…
$ current_hospital_posting             <chr> "HSIP", "HSIP", "HSIP", "HSIP", "…
$ age                                  <dbl> 27, 26, 25, 26, 27, 26, 27, 29, 3…
$ gender                               <chr> "Female", "Female", "Female", "Fe…
$ marital_status                       <chr> "Single", "Single", "Single", "Si…
$ duration_of_employment               <dbl> 10, 9, 9, 9, 10, 8, 10, 23, 9, 9,…
$ previous_experience_of_being_bullied <chr> "No", "No", "Yes", "No", "Yes", "…
$ family_members_spouse                <chr> "No", "No", "Yes", "No", "No", "N…
$ peers_friends                        <chr> "No", "No", "Yes", "No", "Yes", "…
$ literacy_score                       <dbl> 102, 105, 102, 107, 111, 98, 97, …

1.3.3 Data exploration

1.3.3.1 Descriptive

data2<-
  data1 %>% 
  mutate(across(where(is.character),as.factor))
data2 <- data2 %>%
  mutate(age = as.numeric(as.character(age)))
data2 <- data2 %>%
  mutate(age = as.integer(age))
data2 %>% 
  select(duration_of_employment, current_hospital_posting, previous_experience_of_being_bullied, family_members_spouse, peers_friends, literacy_score)
# A tibble: 299 × 6
   duration_of_employment current_hospital_posting previous_experience_of_bein…¹
                    <dbl> <fct>                    <fct>                        
 1                     10 HSIP                     No                           
 2                      9 HSIP                     No                           
 3                      9 HSIP                     Yes                          
 4                      9 HSIP                     No                           
 5                     10 HSIP                     Yes                          
 6                      8 HSIP                     Yes                          
 7                     10 HSIP                     No                           
 8                     23 HSIP                     No                           
 9                      9 HSIP                     No                           
10                      9 HSIP                     No                           
# ℹ 289 more rows
# ℹ abbreviated name: ¹​previous_experience_of_being_bullied
# ℹ 3 more variables: family_members_spouse <fct>, peers_friends <fct>,
#   literacy_score <dbl>
str(data2)
tibble [299 × 10] (S3: tbl_df/tbl/data.frame)
 $ id                                  : num [1:299] 1 2 3 4 5 6 7 8 9 10 ...
 $ current_hospital_posting            : Factor w/ 4 levels "HRPZ","HSIP",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ age                                 : int [1:299] 27 26 25 26 27 26 27 29 31 28 ...
 $ gender                              : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...
 $ marital_status                      : Factor w/ 2 levels "Married","Single": 2 2 2 2 2 2 2 1 1 2 ...
 $ duration_of_employment              : num [1:299] 10 9 9 9 10 8 10 23 9 9 ...
 $ previous_experience_of_being_bullied: Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 1 1 1 1 ...
 $ family_members_spouse               : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 1 1 1 1 ...
 $ peers_friends                       : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 1 1 1 1 2 ...
 $ literacy_score                      : num [1:299] 102 105 102 107 111 98 97 98 108 84 ...
# Change the reference level for the gender factor
data2$gender <- relevel(data2$gender, ref = "Male")
# Check the levels to confirm the change
levels(data2$gender)
[1] "Male"   "Female"

According to the literature review, gender female are better in mental health literacy compared to gender male.

1.3.3.2 Summary data

data2 %>% 
  tbl_summary(
    label = list(
      duration_of_employment ~ "Duration of employment",
      age~"Age",
      current_hospital_posting ~ "Current hospital posting", 
      previous_experience_of_being_bullied ~ "Previous experience", 
      family_members_spouse ~ "Family members", 
      peers_friends ~ "Peers", 
      literacy_score ~ "Literacy score"
    ),
    statistic = list(
      all_continuous() ~ "{mean} ({sd})"
    )
  ) %>% 
  bold_labels() %>% 
  italicize_levels()
Characteristic N = 2991
id 151 (87)
Current hospital posting
    HRPZ 96 (32%)
    HSIP 57 (19%)
    HTM 38 (13%)
    HUSM 108 (36%)
Age
    25 24 (8.0%)
    26 106 (35%)
    27 106 (35%)
    28 40 (13%)
    29 12 (4.0%)
    30 4 (1.3%)
    31 4 (1.3%)
    32 1 (0.3%)
    34 2 (0.7%)
gender
    Male 109 (36%)
    Female 190 (64%)
marital_status
    Married 47 (16%)
    Single 252 (84%)
Duration of employment 13.3 (5.9)
Previous experience 91 (30%)
Family members 31 (10%)
Peers 87 (29%)
Literacy score 99 (8)
1 Mean (SD); n (%)

1.3.3.3 Explore and Wrangle Data

  1. Age
ggplot(data = data2,aes(age))+
  geom_histogram(fill= "black")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

geom_boxplot()
geom_boxplot: outliers = TRUE, outlier.colour = NULL, outlier.fill = NULL, outlier.shape = 19, outlier.size = 1.5, outlier.stroke = 0.5, outlier.alpha = NULL, notch = FALSE, notchwidth = 0.5, staplewidth = 0, varwidth = FALSE, na.rm = FALSE, orientation = NA
stat_boxplot: na.rm = FALSE, orientation = NA
position_dodge2 
  1. Gender
ggplot(data2, aes(x = gender)) + 
  geom_bar( fill= "black")

  1. Current Hospital Posting

    ggplot(data2, aes(x = current_hospital_posting )) + 
      geom_bar(fill= "black")

  2. Duration of employment

    ggplot(data = data2,aes(duration_of_employment))+
      geom_histogram(fill ="black")
    `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  3. Marital status

    ggplot(data2, aes(x = marital_status)) + 
      geom_bar( fill= "black")

  4. Previous Experiences Being Bullied

    ggplot(data2, aes(x = previous_experience_of_being_bullied )) + 
      geom_bar(fill="black")

  5. Family members spouse

    ggplot(data2, aes(x = family_members_spouse  )) + 
      geom_bar(fill= "black")

  6. peers friends

    ggplot(data2, aes(x = peers_friends  )) + 
      geom_bar(fill= "black")

1.3.4 Check Correlation

data3<-
data2 %>% 
select(where(is.numeric), -id)

cor.data3 <-
  cor(data3, use = "complete.obs", method = "pearson")
head(round(cor.data3,2))
                        age duration_of_employment literacy_score
age                    1.00                   0.34           0.08
duration_of_employment 0.34                   1.00          -0.12
literacy_score         0.08                  -0.12           1.00
corrplot(cor.data3, method = "circle")

Age has shown moderate correlation with 0.34

1.4 Perform Linear Regression

1.4.1 Univariable Analysis

In the univariable analysis, we will perform simple linear regression ( SLR) for each of the predictors. We aim to select variables that will be included in the multivariable model. in exploratory research, we want to choose only variables with p-values <0.25 and clinically significant to be included in the Multivariable Linear Regression ( MLR) model.

  1. Age

    sl_age<-lm(literacy_score~age,data = data2)
    summary(sl_age)
    
    Call:
    lm(formula = literacy_score ~ age, data = data2)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -24.331  -5.331  -0.845   4.455  52.269 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  86.2138     9.7774   8.818   <2e-16 ***
    age           0.4858     0.3636   1.336    0.183    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 8.186 on 297 degrees of freedom
    Multiple R-squared:  0.005974,  Adjusted R-squared:  0.002627 
    F-statistic: 1.785 on 1 and 297 DF,  p-value: 0.1826
    tidy(sl_age, conf.int= TRUE)
    # A tibble: 2 × 7
      term        estimate std.error statistic  p.value conf.low conf.high
      <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
    1 (Intercept)   86.2       9.78       8.82 1.01e-16   67.0      105.  
    2 age            0.486     0.364      1.34 1.83e- 1   -0.230      1.20
    tbl_regression(sl_age)
    Characteristic Beta 95% CI1 p-value
    age 0.49 -0.23, 1.2 0.2
    1 CI = Confidence Interval
    1. Gender
    sl_gender<-lm(literacy_score~gender ,data = data2)
    summary(sl_gender)
    
    Call:
    lm(formula = literacy_score ~ gender, data = data2)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -24.376  -5.376  -1.195   4.624  55.805 
    
    Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
    (Intercept)   99.3761     0.7864 126.365   <2e-16 ***
    genderFemale  -0.1814     0.9865  -0.184    0.854    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 8.21 on 297 degrees of freedom
    Multiple R-squared:  0.0001138, Adjusted R-squared:  -0.003253 
    F-statistic: 0.03381 on 1 and 297 DF,  p-value: 0.8542
    tidy(sl_gender, conf.int= TRUE)
    # A tibble: 2 × 7
      term         estimate std.error statistic   p.value conf.low conf.high
      <chr>           <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
    1 (Intercept)    99.4       0.786   126.    3.18e-260    97.8     101.  
    2 genderFemale   -0.181     0.987    -0.184 8.54e-  1    -2.12      1.76
    tbl_regression(sl_gender)
    Characteristic Beta 95% CI1 p-value
    gender


        Male
        Female -0.18 -2.1, 1.8 0.9
    1 CI = Confidence Interval
  1. Current Hospital Posting

    sl_currenthospital<-lm(literacy_score~current_hospital_posting ,data = data2)
    summary(sl_age)
    
    Call:
    lm(formula = literacy_score ~ age, data = data2)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -24.331  -5.331  -0.845   4.455  52.269 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  86.2138     9.7774   8.818   <2e-16 ***
    age           0.4858     0.3636   1.336    0.183    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 8.186 on 297 degrees of freedom
    Multiple R-squared:  0.005974,  Adjusted R-squared:  0.002627 
    F-statistic: 1.785 on 1 and 297 DF,  p-value: 0.1826
    tidy(sl_currenthospital, conf.int= TRUE)
    # A tibble: 4 × 7
      term                 estimate std.error statistic   p.value conf.low conf.high
      <chr>                   <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
    1 (Intercept)            98.5       0.834   118.    2.38e-250   96.8      100.  
    2 current_hospital_po…    2.86      1.37      2.10  3.69e-  2    0.175      5.55
    3 current_hospital_po…    0.505     1.57      0.322 7.47e-  1   -2.58       3.59
    4 current_hospital_po…    0.503     1.15      0.439 6.61e-  1   -1.75       2.76
    tbl_regression(sl_currenthospital)
    Characteristic Beta 95% CI1 p-value
    current_hospital_posting


        HRPZ
        HSIP 2.9 0.17, 5.6 0.037
        HTM 0.50 -2.6, 3.6 0.7
        HUSM 0.50 -1.8, 2.8 0.7
    1 CI = Confidence Interval
  2. Duration of emloyment

    sl_duration<-lm(literacy_score~duration_of_employment ,data = data2 )
    summary(sl_duration)
    
    Call:
    lm(formula = literacy_score ~ duration_of_employment, data = data2)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -23.156  -4.983  -0.654   4.761  54.191 
    
    Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
    (Intercept)            101.47225    1.16035  87.450   <2e-16 ***
    duration_of_employment  -0.16580    0.07949  -2.086   0.0379 *  
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 8.151 on 297 degrees of freedom
    Multiple R-squared:  0.01444,   Adjusted R-squared:  0.01112 
    F-statistic:  4.35 on 1 and 297 DF,  p-value: 0.03786
    tidy(sl_duration, conf.int= TRUE)
    # A tibble: 2 × 7
      term                 estimate std.error statistic   p.value conf.low conf.high
      <chr>                   <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
    1 (Intercept)           101.       1.16       87.4  5.23e-214   99.2   104.     
    2 duration_of_employm…   -0.166    0.0795     -2.09 3.79e-  2   -0.322  -0.00936
    tbl_regression(sl_duration)
    Characteristic Beta 95% CI1 p-value
    duration_of_employment -0.17 -0.32, -0.01 0.038
    1 CI = Confidence Interval
  3. Marital status

    sl_marital<-lm(literacy_score~marital_status ,data = data2 )
    summary(sl_marital)
    
    Call:
    lm(formula = literacy_score ~ marital_status, data = data2)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -24.194  -5.194  -1.194   4.383  55.383 
    
    Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
    (Intercept)           99.6170     1.1975  83.189   <2e-16 ***
    marital_statusSingle  -0.4226     1.3044  -0.324    0.746    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 8.209 on 297 degrees of freedom
    Multiple R-squared:  0.0003533, Adjusted R-squared:  -0.003013 
    F-statistic: 0.105 on 1 and 297 DF,  p-value: 0.7462
    tidy(sl_marital, conf.int= TRUE)
    # A tibble: 2 × 7
      term                 estimate std.error statistic   p.value conf.low conf.high
      <chr>                   <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
    1 (Intercept)            99.6        1.20    83.2   8.11e-208    97.3     102.  
    2 marital_statusSingle   -0.423      1.30    -0.324 7.46e-  1    -2.99      2.14
    tbl_regression(sl_marital)
    Characteristic Beta 95% CI1 p-value
    marital_status


        Married
        Single -0.42 -3.0, 2.1 0.7
    1 CI = Confidence Interval
  4. Previous experience of being bully

    sl_experience<-lm(literacy_score~previous_experience_of_being_bullied ,data = data2 )
    summary(sl_experience)
    
    Call:
    lm(formula = literacy_score ~ previous_experience_of_being_bullied, 
        data = data2)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -25.879  -4.879  -0.553   4.447  54.121 
    
    Coefficients:
                                            Estimate Std. Error t value Pr(>|t|)
    (Intercept)                              98.5529     0.5644 174.604   <2e-16
    previous_experience_of_being_bulliedYes   2.3262     1.0231   2.274   0.0237
    
    (Intercept)                             ***
    previous_experience_of_being_bulliedYes *  
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 8.14 on 297 degrees of freedom
    Multiple R-squared:  0.01711,   Adjusted R-squared:  0.0138 
    F-statistic: 5.169 on 1 and 297 DF,  p-value: 0.0237
    tidy(sl_experience, conf.int= TRUE)
    # A tibble: 2 × 7
      term                 estimate std.error statistic   p.value conf.low conf.high
      <chr>                   <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
    1 (Intercept)             98.6      0.564    175.   2.27e-301   97.4       99.7 
    2 previous_experience…     2.33     1.02       2.27 2.37e-  2    0.313      4.34
    tbl_regression(sl_experience)
    Characteristic Beta 95% CI1 p-value
    previous_experience_of_being_bullied


        No
        Yes 2.3 0.31, 4.3 0.024
    1 CI = Confidence Interval
  5. Family member spouse

    sl_family<-lm(literacy_score~family_members_spouse ,data = data2 )
    summary(sl_family)
    
    Call:
    lm(formula = literacy_score ~ family_members_spouse, data = data2)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -23.724  -4.724  -0.724   4.276  51.097 
    
    Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
    (Intercept)               98.7239     0.4921 200.601  < 2e-16 ***
    family_members_spouseYes   5.1793     1.5284   3.389 0.000797 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 8.057 on 297 degrees of freedom
    Multiple R-squared:  0.03722,   Adjusted R-squared:  0.03398 
    F-statistic: 11.48 on 1 and 297 DF,  p-value: 0.0007971
    tidy(sl_family, conf.int= TRUE)
    # A tibble: 2 × 7
      term                 estimate std.error statistic   p.value conf.low conf.high
      <chr>                   <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
    1 (Intercept)             98.7      0.492    201.   4.02e-319    97.8      99.7 
    2 family_members_spou…     5.18     1.53       3.39 7.97e-  4     2.17      8.19
    tbl_regression(sl_family)
    Characteristic Beta 95% CI1 p-value
    family_members_spouse


        No
        Yes 5.2 2.2, 8.2 <0.001
    1 CI = Confidence Interval
  6. Peers friends

    sl_peers<-lm(literacy_score~peers_friends ,data = data2 )
    summary(sl_peers)
    
    Call:
    lm(formula = literacy_score ~ peers_friends, data = data2)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -23.632  -4.793  -0.632   4.207  54.207 
    
    Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
    (Intercept)       98.6321     0.5599 176.173   <2e-16 ***
    peers_friendsYes   2.1610     1.0379   2.082   0.0382 *  
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 8.152 on 297 degrees of freedom
    Multiple R-squared:  0.01439,   Adjusted R-squared:  0.01107 
    F-statistic: 4.335 on 1 and 297 DF,  p-value: 0.03819
    tidy(sl_peers, conf.int= TRUE)
    # A tibble: 2 × 7
      term             estimate std.error statistic   p.value conf.low conf.high
      <chr>               <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
    1 (Intercept)         98.6      0.560    176.   1.63e-302   97.5       99.7 
    2 peers_friendsYes     2.16     1.04       2.08 3.82e-  2    0.118      4.20
    tbl_regression(sl_peers)
    Characteristic Beta 95% CI1 p-value
    peers_friends


        No
        Yes 2.2 0.12, 4.2 0.038
    1 CI = Confidence Interval
rsq(sl_age)
[1] 0.005973705
rsq(sl_currenthospital)
[1] 0.01584071
rsq(sl_duration)
[1] 0.01443512
rsq(sl_experience)
[1] 0.01710795
rsq(sl_family)
[1] 0.03722474
rsq(sl_gender)
[1] 0.0001138381
rsq(sl_marital)
[1] 0.0003532607
rsq(sl_peers)
[1] 0.01438671

Summary Table

#create a summary table for each model with confidence intervals
tbl_age <- tbl_regression(sl_age, conf.level = 0.95)

tbl_gender <- tbl_regression(sl_gender, conf.level = 0.95)

tbl_currenthospital <- tbl_regression(sl_currenthospital, conf.level = 0.95)

tbl_duration <- tbl_regression(sl_duration, conf.level = 0.95)

tbl_experience <- tbl_regression(sl_experience, conf.level = 0.95)

tbl_family <- tbl_regression(sl_family, conf.level = 0.95)

tbl_marital <- tbl_regression(sl_marital, conf.level = 0.95)

tbl_peers <- tbl_regression(sl_peers, conf.level = 0.95)


#Combine all tables into one
SLR_table <- tbl_stack(
  list(tbl_age, tbl_peers, tbl_marital, tbl_gender, tbl_family,tbl_experience,tbl_duration,tbl_currenthospital)
) %>%
  as_gt() %>%
  gt::tab_header(
    title = "Simple Linear Regression Model"
  )

#Print the summary table
SLR_table
Simple Linear Regression Model
Characteristic Beta 95% CI1 p-value
age 0.49 -0.23, 1.2 0.2
peers_friends


    No
    Yes 2.2 0.12, 4.2 0.038
marital_status


    Married
    Single -0.42 -3.0, 2.1 0.7
gender


    Male
    Female -0.18 -2.1, 1.8 0.9
family_members_spouse


    No
    Yes 5.2 2.2, 8.2 <0.001
previous_experience_of_being_bullied


    No
    Yes 2.3 0.31, 4.3 0.024
duration_of_employment -0.17 -0.32, -0.01 0.038
current_hospital_posting


    HRPZ
    HSIP 2.9 0.17, 5.6 0.037
    HTM 0.50 -2.6, 3.6 0.7
    HUSM 0.50 -1.8, 2.8 0.7
1 CI = Confidence Interval

1.4.2 Multivariable Analysis

Model 1

mlr_1<-lm(literacy_score~ duration_of_employment + family_members_spouse + 
    current_hospital_posting + previous_experience_of_being_bullied + peers_friends, data = data2)
summary(mlr_1)

Call:
lm(formula = literacy_score ~ duration_of_employment + family_members_spouse + 
    current_hospital_posting + previous_experience_of_being_bullied + 
    peers_friends, data = data2)

Residuals:
    Min      1Q  Median      3Q     Max 
-23.136  -4.873  -0.020   4.363  48.467 

Coefficients:
                                        Estimate Std. Error t value Pr(>|t|)
(Intercept)                             98.72898    1.47587  66.895  < 2e-16
duration_of_employment                  -0.13017    0.07935  -1.641  0.10197
family_members_spouseYes                 4.46020    1.54353   2.890  0.00415
current_hospital_postingHSIP             2.57927    1.34851   1.913  0.05677
current_hospital_postingHTM              0.87073    1.54378   0.564  0.57317
current_hospital_postingHUSM             0.74146    1.12609   0.658  0.51078
previous_experience_of_being_bulliedYes  2.01095    1.01586   1.980  0.04870
peers_friendsYes                         1.11174    1.05039   1.058  0.29075
                                           
(Intercept)                             ***
duration_of_employment                     
family_members_spouseYes                ** 
current_hospital_postingHSIP            .  
current_hospital_postingHTM                
current_hospital_postingHUSM               
previous_experience_of_being_bulliedYes *  
peers_friendsYes                           
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.96 on 291 degrees of freedom
Multiple R-squared:  0.07917,   Adjusted R-squared:  0.05702 
F-statistic: 3.574 on 7 and 291 DF,  p-value: 0.001048
rsq(mlr_1)
[1] 0.07916771
tidy(mlr_1, conf.int = TRUE)
# A tibble: 8 × 7
  term                 estimate std.error statistic   p.value conf.low conf.high
  <chr>                   <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 (Intercept)            98.7      1.48      66.9   1.02e-178  95.8     102.    
2 duration_of_employm…   -0.130    0.0793    -1.64  1.02e-  1  -0.286     0.0260
3 family_members_spou…    4.46     1.54       2.89  4.15e-  3   1.42      7.50  
4 current_hospital_po…    2.58     1.35       1.91  5.68e-  2  -0.0748    5.23  
5 current_hospital_po…    0.871    1.54       0.564 5.73e-  1  -2.17      3.91  
6 current_hospital_po…    0.741    1.13       0.658 5.11e-  1  -1.47      2.96  
7 previous_experience…    2.01     1.02       1.98  4.87e-  2   0.0116    4.01  
8 peers_friendsYes        1.11     1.05       1.06  2.91e-  1  -0.956     3.18  
tbl_regression(mlr_1) %>%  add_glance_table(include = c(adj.r.squared)) %>% 
  bold_labels() %>% italicize_levels() %>% 
  as_gt() %>%
  gt::tab_header(title = "Table 1. Multiple Linear Regression Model 1",
                 subtitle = "Without Interaction")
Table 1. Multiple Linear Regression Model 1
Without Interaction
Characteristic Beta 95% CI1 p-value
duration_of_employment -0.13 -0.29, 0.03 0.10
family_members_spouse


    No
    Yes 4.5 1.4, 7.5 0.004
current_hospital_posting


    HRPZ
    HSIP 2.6 -0.07, 5.2 0.057
    HTM 0.87 -2.2, 3.9 0.6
    HUSM 0.74 -1.5, 3.0 0.5
previous_experience_of_being_bullied


    No
    Yes 2.0 0.01, 4.0 0.049
peers_friends


    No
    Yes 1.1 -0.96, 3.2 0.3
Adjusted R² 0.057

1 CI = Confidence Interval

Model 2 with interaction

mlr_2<-lm(literacy_score~ duration_of_employment + family_members_spouse + 
    current_hospital_posting + previous_experience_of_being_bullied + current_hospital_posting * peers_friends + peers_friends, data = data2)
summary(mlr_2)

Call:
lm(formula = literacy_score ~ duration_of_employment + family_members_spouse + 
    current_hospital_posting + previous_experience_of_being_bullied + 
    current_hospital_posting * peers_friends + peers_friends, 
    data = data2)

Residuals:
    Min      1Q  Median      3Q     Max 
-23.341  -4.946  -0.034   4.600  47.272 

Coefficients:
                                              Estimate Std. Error t value
(Intercept)                                   99.11538    1.53482  64.578
duration_of_employment                        -0.12880    0.07945  -1.621
family_members_spouseYes                       4.42397    1.55014   2.854
current_hospital_postingHSIP                   2.30526    1.69846   1.357
current_hospital_postingHTM                    1.00719    1.71626   0.587
current_hospital_postingHUSM                  -0.11865    1.33047  -0.089
previous_experience_of_being_bulliedYes        1.80133    1.03688   1.737
peers_friendsYes                               0.01111    1.80628   0.006
current_hospital_postingHSIP:peers_friendsYes  0.94493    2.81025   0.336
current_hospital_postingHTM:peers_friendsYes  -1.95424    3.99613  -0.489
current_hospital_postingHUSM:peers_friendsYes  3.01034    2.49465   1.207
                                              Pr(>|t|)    
(Intercept)                                    < 2e-16 ***
duration_of_employment                         0.10610    
family_members_spouseYes                       0.00463 ** 
current_hospital_postingHSIP                   0.17576    
current_hospital_postingHTM                    0.55776    
current_hospital_postingHUSM                   0.92900    
previous_experience_of_being_bulliedYes        0.08341 .  
peers_friendsYes                               0.99510    
current_hospital_postingHSIP:peers_friendsYes  0.73693    
current_hospital_postingHTM:peers_friendsYes   0.62519    
current_hospital_postingHUSM:peers_friendsYes  0.22853    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.97 on 288 degrees of freedom
Multiple R-squared:  0.08646,   Adjusted R-squared:  0.05474 
F-statistic: 2.726 on 10 and 288 DF,  p-value: 0.003237
rsq(mlr_2)
[1] 0.08645867

Based on literature review, hospital with good advocates for mental health will likely to have a strong and positive peer support. This insteraction were chosen.

tidy(mlr_2)
# A tibble: 11 × 5
   term                                   estimate std.error statistic   p.value
   <chr>                                     <dbl>     <dbl>     <dbl>     <dbl>
 1 (Intercept)                             99.1       1.53    64.6     2.28e-173
 2 duration_of_employment                  -0.129     0.0795  -1.62    1.06e-  1
 3 family_members_spouseYes                 4.42      1.55     2.85    4.63e-  3
 4 current_hospital_postingHSIP             2.31      1.70     1.36    1.76e-  1
 5 current_hospital_postingHTM              1.01      1.72     0.587   5.58e-  1
 6 current_hospital_postingHUSM            -0.119     1.33    -0.0892  9.29e-  1
 7 previous_experience_of_being_bulliedY…   1.80      1.04     1.74    8.34e-  2
 8 peers_friendsYes                         0.0111    1.81     0.00615 9.95e-  1
 9 current_hospital_postingHSIP:peers_fr…   0.945     2.81     0.336   7.37e-  1
10 current_hospital_postingHTM:peers_fri…  -1.95      4.00    -0.489   6.25e-  1
11 current_hospital_postingHUSM:peers_fr…   3.01      2.49     1.21    2.29e-  1
tbl_regression(mlr_2) %>%  add_glance_table(include = c(adj.r.squared)) %>% 
  bold_labels() %>% italicize_levels() %>% add_n() %>% 
  as_gt() %>%
  gt::tab_header(title = "Multiple Linear Regression Model",
                 subtitle = "With Interaction")
Multiple Linear Regression Model
With Interaction
Characteristic N Beta 95% CI1 p-value
duration_of_employment 299 -0.13 -0.29, 0.03 0.11
family_members_spouse 299


    No

    Yes
4.4 1.4, 7.5 0.005
current_hospital_posting 299


    HRPZ

    HSIP
2.3 -1.0, 5.6 0.2
    HTM
1.0 -2.4, 4.4 0.6
    HUSM
-0.12 -2.7, 2.5 >0.9
previous_experience_of_being_bullied 299


    No

    Yes
1.8 -0.24, 3.8 0.083
peers_friends 299


    No

    Yes
0.01 -3.5, 3.6 >0.9
current_hospital_posting * peers_friends 299


    HSIP * Yes
0.94 -4.6, 6.5 0.7
    HTM * Yes
-2.0 -9.8, 5.9 0.6
    HUSM * Yes
3.0 -1.9, 7.9 0.2
Adjusted R²
0.055

1 CI = Confidence Interval

1.4.3 Model Comparison

anova(mlr_1,mlr_2)
Analysis of Variance Table

Model 1: literacy_score ~ duration_of_employment + family_members_spouse + 
    current_hospital_posting + previous_experience_of_being_bullied + 
    peers_friends
Model 2: literacy_score ~ duration_of_employment + family_members_spouse + 
    current_hospital_posting + previous_experience_of_being_bullied + 
    current_hospital_posting * peers_friends + peers_friends
  Res.Df   RSS Df Sum of Sq      F Pr(>F)
1    291 18438                           
2    288 18292  3    145.99 0.7662 0.5138

Model Assessment We took Model 1 (MLR model without interaction) as our preliminary model

1.4.4 Assessment of model fitness

prelim.final.m <- lm (literacy_score~ duration_of_employment + family_members_spouse + current_hospital_posting + peers_friends + current_hospital_posting + previous_experience_of_being_bullied, data = data2)
rsq(prelim.final.m)
[1] 0.07916771
tidy(prelim.final.m)
# A tibble: 8 × 5
  term                                    estimate std.error statistic   p.value
  <chr>                                      <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)                               98.7      1.48      66.9   1.02e-178
2 duration_of_employment                    -0.130    0.0793    -1.64  1.02e-  1
3 family_members_spouseYes                   4.46     1.54       2.89  4.15e-  3
4 current_hospital_postingHSIP               2.58     1.35       1.91  5.68e-  2
5 current_hospital_postingHTM                0.871    1.54       0.564 5.73e-  1
6 current_hospital_postingHUSM               0.741    1.13       0.658 5.11e-  1
7 peers_friendsYes                           1.11     1.05       1.06  2.91e-  1
8 previous_experience_of_being_bulliedYes    2.01     1.02       1.98  4.87e-  2

A diagnostic Plot

plot(prelim.final.m,which=1)

Comment

The red line through the scatter plot straight, horizontal and not curved indicating linearity assumption is satisfied. The homoscedasticity assumption is met as the residuals are equally spread around the y = 0 line.

plot(prelim.final.m,which=2)

In Q-Q plot, Only few data points (observation 13,56 and 39) have large residuals, while other observation lie well along the 45 degree line indicating normality holds

plot(prelim.final.m,which=3)

R flagged observations 13, 56, and 39. Besides that, we do see a horizontal line with randomly scattered data points around it, suggesting that the homoscedasticity assumption is satisfied.

plot(prelim.final.m,which=5)

B.Plot residuals against numerical independent variables (eg : Duration_of_employment) in the model to check for individual linearity. Residual vs Duration of employment

augment(prelim.final.m) %>%
  ggplot(aes(x = duration_of_employment, y = .resid))+
  geom_point()+
  geom_smooth()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

#Fitted values and residuals We produce diagnostic values to look for outliers or any influential observations within the model.

res.mod <- residuals(prelim.final.m)
head(res.mod)
        1         2         3         4         5         6 
 1.993493  4.863320 -5.719574  6.863320  7.870800 -4.277808 
hist(res.mod)

Histogram shows normally distributed

1.5 Prediction

data2.pred.res <- augment(prelim.final.m)
data2.pred.res
# A tibble: 299 × 12
   literacy_score duration_of_employment family_members_spouse
            <dbl>                  <dbl> <fct>                
 1            102                     10 No                   
 2            105                      9 No                   
 3            102                      9 Yes                  
 4            107                      9 No                   
 5            111                     10 No                   
 6             98                      8 No                   
 7             97                     10 No                   
 8             98                     23 No                   
 9            108                      9 No                   
10             84                      9 No                   
# ℹ 289 more rows
# ℹ 9 more variables: current_hospital_posting <fct>, peers_friends <fct>,
#   previous_experience_of_being_bullied <fct>, .fitted <dbl>, .resid <dbl>,
#   .hat <dbl>, .sigma <dbl>, .cooksd <dbl>, .std.resid <dbl>

1.6 Check for influential

#Testing for Influential Observations Keep standardized residuals between 2 and -2 (values above 2 or lower than −2 considered as influential observations)

non.influen.obs <- 
  data2.pred.res %>% 
  filter(.std.resid < 2 & .std.resid > -2 )

Re-run the Model with the non-influential observations (final model)

prelim.final.m2 <- lm (literacy_score~ duration_of_employment + family_members_spouse + current_hospital_posting + previous_experience_of_being_bullied + peers_friends, data = non.influen.obs)
summary ( prelim.final.m2)

Call:
lm(formula = literacy_score ~ duration_of_employment + family_members_spouse + 
    current_hospital_posting + previous_experience_of_being_bullied + 
    peers_friends, data = non.influen.obs)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.5639  -4.6679  -0.0265   4.3125  15.1355 

Coefficients:
                                        Estimate Std. Error t value Pr(>|t|)
(Intercept)                             98.64511    1.20382  81.943   <2e-16
duration_of_employment                  -0.07618    0.06512  -1.170   0.2431
family_members_spouseYes                 1.88305    1.27971   1.471   0.1423
current_hospital_postingHSIP             2.08555    1.11908   1.864   0.0634
current_hospital_postingHTM             -0.06297    1.28411  -0.049   0.9609
current_hospital_postingHUSM            -0.09498    0.91836  -0.103   0.9177
previous_experience_of_being_bulliedYes  1.83272    0.84150   2.178   0.0303
peers_friendsYes                         1.21412    0.86686   1.401   0.1624
                                           
(Intercept)                             ***
duration_of_employment                     
family_members_spouseYes                   
current_hospital_postingHSIP            .  
current_hospital_postingHTM                
current_hospital_postingHUSM               
previous_experience_of_being_bulliedYes *  
peers_friendsYes                           
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.402 on 279 degrees of freedom
Multiple R-squared:  0.06422,   Adjusted R-squared:  0.04074 
F-statistic: 2.735 on 7 and 279 DF,  p-value: 0.009244

#Model Assumptions for final model 1.Run Diagnostic Plots

plot(prelim.final.m2, which = 1)

Comment

The red line through the scatter plot straight, horizontal and not curved indicating linearity assumption is satisfied. The homoscedasticity assumption is met as the residuals are equally spread around the y = 0 line

plot(prelim.final.m2, which = 2)

Comment

In Q-Q plot, No obvious data point has large residuals, almost all observation lie well along the 45 degree line indicating normality holds

plot(prelim.final.m2, which = 3)

Comment

We do see a horizontal line with randomly scattered data points around it, suggesting that the homoscedasticity assumption is satisfied.

plot(prelim.final.m2, which = 5)

Plot residuals against numerical independent variables (eg : SBP and BMI) in the model to check for individual linearity.

plot(prelim.final.m2, which = 5)

Plot residuals against numerical independent variables (eg : Duration of employement) in the model to check for individual linearity.

augment(prelim.final.m2) %>%
  ggplot(aes(x =duration_of_employment, y = .resid))+
  geom_point()+
  geom_smooth()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Comment : Residuals for final model shows individual linearity against duration_of_employment.

Histogram of residual of final model

res.mod_2 <- residuals(prelim.final.m2)
head(res.mod_2)
        1         2         3         4         5         6 
 2.031155  4.954973 -2.974909  6.954973  7.984319 -3.953924 
hist(res.mod_2)

Comment : Residual of final model appear normally distributed

1.7 Final Model table

tbl_regression(prelim.final.m2) %>%  add_glance_table(include = c(adj.r.squared)) %>% 
  bold_labels() %>% italicize_levels() %>%
  as_gt() %>%
  gt::tab_header(title = "Multiple Linear Regression Model",
                 subtitle = "Without Interaction")
Multiple Linear Regression Model
Without Interaction
Characteristic Beta 95% CI1 p-value
duration_of_employment -0.08 -0.20, 0.05 0.2
family_members_spouse


    No
    Yes 1.9 -0.64, 4.4 0.14
current_hospital_posting


    HRPZ
    HSIP 2.1 -0.12, 4.3 0.063
    HTM -0.06 -2.6, 2.5 >0.9
    HUSM -0.09 -1.9, 1.7 >0.9
previous_experience_of_being_bullied


    No
    Yes 1.8 0.18, 3.5 0.030
peers_friends


    No
    Yes 1.2 -0.49, 2.9 0.2
Adjusted R² 0.041

1 CI = Confidence Interval

Model Equation literacy_score = 98.65 - 0.08 (Duration_of_employment) + 2.1 (HSIP) -0.0.6 (HTM) -0.09 (HUSM) + 1.8 (previous_experience_of_being_bulliedYes) + 1.9 (familymembers_spouseYes) + 1.2 (peers_friendsYes)

1.8 Interpretation

  1. The model explained 4.2% variability of mental health literacy score among house officer in the study sample after removing the influential residuals.
  2. For each year increase in duration of employment (or other unit of time), their literacy score slightly decreases by 0.08 points. This might suggest that the longer someone remains in their current employment, perhaps the less they engage in activities that boost their literacy, or it might reflect other life pressures.
  3. House officer working in HSIP has significantly increase a person’s literacy score by 2.1 points, indicating that this Hospital has a positive impact on mental health literacy compared to Hospital Tanah Merah which decrease 0.06 literacy score and Hospital Universiti Sains Malaysia which decrease 0.09 literacy score.
  4. House officer who has previously experienced being bullied, their literacy score is expected to increase by 1.8 units, relative to someone who has not been bullied, keeping all other factors the same.
  5. This suggests that having house officer who has family members or a spouse in medical field had increase of 1.9 in the literacy score, compared to not having family members or a spouse involved, other conditions remaining constant.
  6. House officer having a good peer support had mental health literacy score of increase 1.2 compared to house officer who didnt have peer support, when other varible are controlled

1.9 Thank You