Replace “Your Name” with your actual name.
This lab will focus on conducting multiple regression analyses and interpreting the coefficients (main effects) with a special emphasis on handling categorical variables using effect coding. You will work with various datasets to predict different outcomes, interpret the results, and understand how effect coding influences the interpretation of categorical variables.
Dataset: You are given a dataset with variables
Work_Hours
, Job_Complexity
,
Salary
, and Job_Satisfaction
. Your task is to
predict Job_Satisfaction
based on the other three
predictors.
Dataset Creation:
# Create the dataset
set.seed(100)
data_ex1 <- data.frame(
Work_Hours = c(40, 35, 45, 50, 38, 42, 48, 37, 44, 41, 40, 35, 45, 50, 38, 42, 48, 37, 44, 41, 40, 35, 45, 50, 38, 42, 48, 37, 44, 41, 40, 35, 45, 50, 38, 42, 48, 37, 44, 41, 40, 35, 45, 50, 38, 42, 48, 37, 44, 41, 40, 35, 45, 50, 38, 42, 48, 37, 44, 41, 40, 35, 45, 50, 38, 42, 48, 37, 44, 41, 40, 35, 45, 50, 38, 42, 48, 37, 44, 41, 40, 35, 45, 50, 38, 42, 48, 37, 44, 41, 40, 35, 45, 50, 38, 42, 48, 37, 44, 41),
Job_Complexity = c(7, 6, 8, 9, 5, 7, 8, 6, 7, 8, 7, 6, 8, 9, 5, 7, 8, 6, 7, 8, 7, 6, 8, 9, 5, 7, 8, 6, 7, 8, 7, 6, 8, 9, 5, 7, 8, 6, 7, 8, 7, 6, 8, 9, 5, 7, 8, 6, 7, 8, 7, 6, 8, 9, 5, 7, 8, 6, 7, 8, 7, 6, 8, 9, 5, 7, 8, 6, 7, 8, 7, 6, 8, 9, 5, 7, 8, 6, 7, 8, 7, 6, 8, 9, 5, 7, 8, 6, 7, 8, 7, 6, 8, 9, 5, 7, 8, 6, 7, 8),
Salary = c(50000, 48000, 52000, 55000, 47000, 51000, 53000, 46000, 54000, 49500, 50000, 48000, 52000, 55000, 47000, 51000, 53000, 46000, 54000, 49500, 50000, 48000, 52000, 55000, 47000, 51000, 53000, 46000, 54000, 49500, 50000, 48000, 52000, 55000, 47000, 51000, 53000, 46000, 54000, 49500, 50000, 48000, 52000, 55000, 47000, 51000, 53000, 46000, 54000, 49500, 50000, 48000, 52000, 55000, 47000, 51000, 53000, 46000, 54000, 49500, 50000, 48000, 52000, 55000, 47000, 51000, 53000, 46000, 54000, 49500, 50000, 48000, 52000, 55000, 47000, 51000, 53000, 46000, 54000, 49500, 50000, 48000, 52000, 55000, 47000, 51000, 53000, 46000, 54000, 49500, 50000, 48000, 52000, 55000, 47000, 51000, 53000, 46000, 54000, 49500),
Job_Satisfaction = c(78, 72, 85, 80, 70, 82, 79, 75, 81, 76, 78, 72, 85, 80, 70, 82, 79, 75, 81, 76, 78, 72, 85, 80, 70, 82, 79, 75, 81, 76, 78, 72, 85, 80, 70, 82, 79, 75, 81, 76, 78, 72, 85, 80, 70, 82, 79, 75, 81, 76, 78, 72, 85, 80, 70, 82, 79, 75, 81, 76, 78, 72, 85, 80, 70, 82, 79, 75, 81, 76, 78, 72, 85, 80, 70, 82, 79, 75, 81, 76, 78, 72, 85, 80, 70, 82, 79, 75, 81, 76, 78, 72, 85, 80, 70, 82, 79, 75, 81, 76)
)
# View the first few rows of the dataset
head(data_ex1)
## Work_Hours Job_Complexity Salary Job_Satisfaction
## 1 40 7 50000 78
## 2 35 6 48000 72
## 3 45 8 52000 85
## 4 50 9 55000 80
## 5 38 5 47000 70
## 6 42 7 51000 82
Task:
1. Conduct a multiple regression analysis to predict
Job_Satisfaction
using Work_Hours
,
Job_Complexity
, and Salary
as predictors. Be
sure to use the data
argument in the lm()
function.
2. Interpret the main effects of each predictor. What does each
coefficient tell you about its relationship with
Job_Satisfaction
?
# Multiple regression model
mod.1 <- lm(Job_Satisfaction ~ Work_Hours + Job_Complexity + Salary, data = data_ex1)
summary(mod.1)
##
## Call:
## lm(formula = Job_Satisfaction ~ Work_Hours + Job_Complexity +
## Salary, data = data_ex1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.367 -2.304 -0.491 2.131 5.056
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.3102797 7.1988322 3.933 0.000159 ***
## Work_Hours -0.1148592 0.1737588 -0.661 0.510179
## Job_Complexity 1.3367244 0.4796182 2.787 0.006411 **
## Salary 0.0008867 0.0002455 3.612 0.000485 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.852 on 96 degrees of freedom
## Multiple R-squared: 0.5925, Adjusted R-squared: 0.5798
## F-statistic: 46.53 on 3 and 96 DF, p-value: < 2.2e-16
Work_Hours
: Not significant.Job_Complexity
: For every 1 unit increase in job
complexity, job satisfaction increases by 1.33Salary
: For every 1 unit increase in salary (one dollar
increase), job satisfaction increases by 0.0009.Dataset: You are provided with a dataset containing
Study_Hours
, Attendance
,
Parent_Education_Level
, and GPA
. Your task is
to predict GPA
based on the other predictors.
Dataset Creation:
# Create the dataset with a larger sample size
set.seed(200)
data_ex2 <- data.frame(
Study_Hours = c(15, 12, 20, 18, 14, 17, 16, 13, 19, 14, 18, 16, 21, 13, 15, 20, 19, 18, 17, 16, 12, 14, 13, 20, 21, 22, 17, 19, 15, 16),
Attendance = c(90, 85, 95, 92, 88, 91, 89, 87, 93, 86, 91, 89, 95, 87, 90, 96, 94, 93, 89, 90, 85, 88, 87, 95, 96, 97, 92, 94, 88, 89),
Parent_Education_Level = factor(rep(c("High School", "College"), 15))
)
# Create GPA with stronger relationships to predictors for significance
data_ex2$GPA <- 2.5 + 0.07 * data_ex2$Study_Hours + 0.03 * data_ex2$Attendance + 0.4 * data_ex2$Parent_Education_Level + rnorm(30, 0, 0.1)
## Warning in Ops.factor(0.4, data_ex2$Parent_Education_Level): '*' not meaningful
## for factors
## Study_Hours Attendance Parent_Education_Level GPA
## 1 15 90 High School NA
## 2 12 85 College NA
## 3 20 95 High School NA
## 4 18 92 College NA
## 5 14 88 High School NA
## 6 17 91 College NA
Task:
1. Conduct a multiple regression analysis to predict GPA
using Study_Hours
, Attendance
, and
Parent_Education_Level
(coded as -1 for “High School” and 1
for “College”) as predictors.
2. Interpret the main effects. How does each predictor contribute to predicting GPA?
## [1] "College" "High School"
data_ex2$Parent_Education_Level_releveled <-
relevel(data_ex2$Parent_Education_Level, ref = "High School")
levels(data_ex2$Parent_Education_Level_releveled)
## [1] "High School" "College"
Study_Hours
: For every one unit increase in study hours
(one hour), GPA increases by 0.05.Attendance
: For every one unit increases in attendance
(one day), GPA increases by 0.04.Parent_Education_Level
: In comparison to having a
parent with a collage degreem having a parent with a high school degree
increases GPA by 0.39Dataset: You are provided with a dataset containing
Exercise_Frequency
, Diet_Quality
,
Sleep_Duration
, and Health_Index
. Your task is
to predict Health_Index
based on the other predictors.
Dataset Creation:
# Create the dataset with a larger sample size
set.seed(300)
data_ex3 <- data.frame(
Exercise_Frequency = c(4, 5, 3, 6, 2, 5, 4, 3, 5, 4, 6, 7, 3, 6, 2, 5, 7, 8, 4, 5, 3, 6, 7, 2, 4, 5, 6, 3, 7, 8),
Diet_Quality = c(8, 7, 9, 6, 5, 8, 7, 6, 8, 7, 9, 8, 6, 7, 5, 8, 9, 7, 8, 7, 9, 6, 8, 5, 7, 6, 9, 8, 7, 6),
Sleep_Duration = c(7, 8, 6, 7, 5, 8, 7, 6, 7, 7, 8, 7, 6, 7, 5, 8, 7, 8, 6, 7, 6, 7, 8, 5, 7, 8, 7, 6, 7, 8)
)
# Create Health_Index with stronger relationships to predictors for significance
data_ex3$Health_Index <- 50 + 2 * data_ex3$Exercise_Frequency + 1.5 * data_ex3$Diet_Quality + 1 * data_ex3$Sleep_Duration + rnorm(30, 0, 2)
# View the first few rows of the dataset
head(data_ex3)
## Exercise_Frequency Diet_Quality Sleep_Duration Health_Index
## 1 4 8 7 79.74758
## 2 5 7 8 80.22421
## 3 3 9 6 76.44698
## 4 6 6 7 79.40253
## 5 2 5 5 66.32989
## 6 5 8 8 83.13740
Task:
1. Conduct a multiple regression analysis to predict
Health_Index
using Exercise_Frequency
,
Diet_Quality
, and Sleep_Duration
as
predictors.
2. How do the coefficients inform you about the relative importance of each predictor in determining health outcomes?
The standardizing of the coeffecints allows u to see the true wieght of each peice in order to understand thier effect.-
# Multiple regression model
mod.3 <- lm(Health_Index ~ Diet_Quality + Sleep_Duration + Exercise_Frequency, data = data_ex3)
Exercise_Frequency
: for every one unit increase in
exercise frequency (one hour), health index increases by 1.84.Diet_Quality
:For every one unit inverses in diet
quality, health index increases by 1.85Sleep_Duration
: For every one unit increases in sleep
duration (one hour), helth index increases by 1.37.Dataset: You have a dataset with variables
Work_Experience
, Education_Level
,
Gender
, and Salary
. The Gender
variable is categorical with levels “Male” and “Female”.
Dataset Creation:
# Create the dataset with a larger sample size
set.seed(400)
data_ex4 <- data.frame(
Work_Experience = c(5, 7, 3, 6, 8, 4, 9, 6, 7, 5, 8, 9, 4, 6, 7, 5, 9, 10, 6, 7, 4, 5, 7, 6, 8, 9, 10, 5, 6, 8),
Education_Level = c(12, 14, 10, 16, 13, 15, 17, 12, 16, 14, 18, 19, 11, 14, 15, 13, 18, 20, 14, 15, 11, 13, 15, 14, 17, 18, 19, 13, 15, 17),
Gender = factor(rep(c("Male", "Female"), 15))
)
# Effect coding for Gender: 1 for Male, 1 for Female
data_ex4$Gender_Effect <- ifelse(data_ex4$Gender == "Male", -1, 1)
# Create Salary with stronger relationships to predictors for significance
data_ex4$Salary <- 30000 + 3000 * data_ex4$Work_Experience + 1500 * data_ex4$Education_Level + 5000 * data_ex4$Gender_Effect + rnorm(30, 0, 2000)
# View the first few rows of the dataset
head(data_ex4)
## Work_Experience Education_Level Gender Gender_Effect Salary
## 1 5 12 Male -1 55926.90
## 2 7 14 Female 1 78230.57
## 3 3 10 Male -1 51945.87
## 4 6 16 Female 1 75634.63
## 5 8 13 Male -1 67296.32
## 6 4 15 Female 1 66794.78
Task:
1. Conduct a multiple regression analysis to predict
Salary
using Work_Experience
,
Education_Level
, and Gender_Effect
as
predictors.
2. Interpret the coefficients, especially focusing on the effect of
Gender_Effect
.
3. Discuss how effect coding impacts the interpretation of the
Gender_Effect
variable.
# Multiple regression model with effect coding
mod.4 <- lm(Salary ~ Work_Experience + Education_Level + Gender, data = data_ex4)
summary(mod.4)
##
## Call:
## lm(formula = Salary ~ Work_Experience + Education_Level + Gender,
## data = data_ex4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4401.7 -1568.7 165.7 1265.8 3439.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35249.9 2677.3 13.166 5.22e-13 ***
## Work_Experience 3501.8 434.2 8.064 1.52e-08 ***
## Education_Level 1239.9 317.0 3.912 0.000588 ***
## GenderMale -9647.4 767.9 -12.563 1.51e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2025 on 26 degrees of freedom
## Multiple R-squared: 0.9688, Adjusted R-squared: 0.9652
## F-statistic: 269 on 3 and 26 DF, p-value: < 2.2e-16
Work_Experience: For every one unite increase in work experience (one year), salary increases by 3,502.
Education_Level: For every one unit increase in education level (one year), salary increases by 1,240.
Gender_Effect:In comparison to females, males have a salary that is 9,647 lower.
Gender_Effect
: The effective difference between males
and females pay is larger than a year of education and experience
combinedSubmission Instructions:
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission.