data <- read.csv("C:\\Users\\91814\\Desktop\\Statistics\\nurses.csv")

Interaction Term: An interaction term is created by multiplying two variables together. In this case, the interaction term would capture the combined effect of Location_Quotient and Employed_Standard_Error on the response variable, Annual_Salary_Avg.

Including Hourly_wage_Avg allows for the examination of its direct impact on Annual_Salary_Avg while controlling for other variables. It provides insights into how changes in hourly wage affect annual salary.

lm_model <- lm(Annual_Salary_Avg ~ Location_Quotient * Employed_Standard_Error + Hourly_Wage_Avg, data = data)

summary(lm_model)
## 
## Call:
## lm(formula = Annual_Salary_Avg ~ Location_Quotient * Employed_Standard_Error + 
##     Hourly_Wage_Avg, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.2337  -4.8747  -0.0873   5.1256  14.8843 
## 
## Coefficients:
##                                             Estimate Std. Error   t value
## (Intercept)                                  2.13585    2.91288     0.733
## Location_Quotient                           -3.60681    2.09060    -1.725
## Employed_Standard_Error                     -0.44969    0.28979    -1.552
## Hourly_Wage_Avg                           2080.00753    0.04828 43084.177
## Location_Quotient:Employed_Standard_Error    0.65412    0.29730     2.200
##                                           Pr(>|t|)    
## (Intercept)                                 0.4637    
## Location_Quotient                           0.0850 .  
## Employed_Standard_Error                     0.1213    
## Hourly_Wage_Avg                             <2e-16 ***
## Location_Quotient:Employed_Standard_Error   0.0282 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.506 on 587 degrees of freedom
##   (650 observations deleted due to missingness)
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 4.94e+08 on 4 and 587 DF,  p-value: < 2.2e-16

To visualize the relationship between each independent variable and the dependent variable, we can use a scatterplot matrix.

# Load the required library
library(ggplot2)

# Create a scatterplot with interaction effect
ggplot(data, aes(x = Location_Quotient, y = Annual_Salary_Avg, color = Employed_Standard_Error)) +
  geom_point() +
  labs(title = "Interaction Effect on Annual Salary",
       x = "Location Quotient",
       y = "Annual Salary Avg",
       color = "Employed Standard Error")
## Warning: Removed 650 rows containing missing values (`geom_point()`).

Interaction Term (between Location_Quotient and Employed_Standard_Error ):

Hourly Wage Average:

Employed Standard Error :

lm_model <- lm(Annual_Salary_Avg ~ Location_Quotient * Employed_Standard_Error + Hourly_Wage_Avg, data = data)

#diagnostic plots
par(mfrow=c(2, 2)) # Arrange plots in a 2x2 grid

# Residuals vs Fitted Values Plot
plot(lm_model, which = 1)

# Normal Q-Q Plot
plot(lm_model, which = 2)

# Scale-Location Plot
plot(lm_model, which = 3)

# Residuals vs Leverage Plot
plot(lm_model, which = 5)

Scale-Location Plot: