Psych 250C - Problem Set 5

Introduction

attitudedata <- read.csv('/Users/dgkamper/Library/Mobile Documents/com~apple~CloudDocs/Axis - HQ/PhD Terms/Classes/Spring 2024/Psych 250C/Problem Sets/Psych 250C Problems Sets/attitude.csv')

Question 1

Fit a “full model” predicting rating using all 6 other variables in the data set: complaints, privileges, learning, raises, critical, advance.

1.1) Provide the code and output a summary which includes coefficient estimates, ANOVA table, and R² results.

# Linear Model Predicting Rating
attitudemodel <- lm(rating ~ complaints + privileges + learning + raises + critical + advance, data = attitudedata)

attitudemodel_standarized <- lm(scale(rating) ~ scale(complaints) + scale(privileges) + scale(learning) + scale(raises) + scale(critical) + scale(advance), data = attitudedata)

options(scipen=999)

# Summary unstandardized
summary(attitudemodel)

## 
## Call:
## lm(formula = rating ~ complaints + privileges + learning + raises + 
##     critical + advance, data = attitudedata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.9418  -4.3555   0.3158   5.5425  11.5990 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.78708   11.58926   0.931 0.361634    
## complaints   0.61319    0.16098   3.809 0.000903 ***
## privileges  -0.07305    0.13572  -0.538 0.595594    
## learning     0.32033    0.16852   1.901 0.069925 .  
## raises       0.08173    0.22148   0.369 0.715480    
## critical     0.03838    0.14700   0.261 0.796334    
## advance     -0.21706    0.17821  -1.218 0.235577    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.068 on 23 degrees of freedom
## Multiple R-squared:  0.7326, Adjusted R-squared:  0.6628 
## F-statistic:  10.5 on 6 and 23 DF,  p-value: 0.0000124

# Summary standarized
summary(attitudemodel_standarized)

## 
## Call:
## lm(formula = scale(rating) ~ scale(complaints) + scale(privileges) + 
##     scale(learning) + scale(raises) + scale(critical) + scale(advance), 
##     data = attitudedata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.89889 -0.35781  0.02595  0.45533  0.95288 
## 
## Coefficients:
##                                 Estimate             Std. Error t value
## (Intercept)       -0.0000000000000009102  0.1060116360621554121   0.000
## scale(complaints)  0.6707252044117290035  0.1760887390206634096   3.809
## scale(privileges) -0.0734274276655839109  0.1364256721519378768  -0.538
## scale(learning)    0.3088702436569228382  0.1624904566210603474   1.901
## scale(raises)      0.0698117150313682655  0.1891757357915896109   0.369
## scale(critical)    0.0311997486054349411  0.1194905655336550993   0.261
## scale(advance)    -0.1834644500438649128  0.1506293303830832375  -1.218
##                   Pr(>|t|)    
## (Intercept)       1.000000    
## scale(complaints) 0.000903 ***
## scale(privileges) 0.595594    
## scale(learning)   0.069925 .  
## scale(raises)     0.715480    
## scale(critical)   0.796334    
## scale(advance)    0.235577    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5806 on 23 degrees of freedom
## Multiple R-squared:  0.7326, Adjusted R-squared:  0.6628 
## F-statistic:  10.5 on 6 and 23 DF,  p-value: 0.0000124

# ANOVA
anova(attitudemodel)

1.2) Select one regression coefficient from the model (not the intercept) and write out an interpretation for the unstandardized coefficient, interpretation for the standardized coefficient, inference, and conclusion (see writing guide for distinction) relevant to that regression coefficient.

The one regression coefficient from the model that I choose is the complaints coefficient.

The unstandardized regression coefficient for complaints is 0.61319. For two individuals who differ by one unit in their complaints, while holding constant privileges, learning, raises, critical, and advance, there is an expected difference of 0.61319 units in their rating, with the individual having more complaints expected to have a higher rating. It is found that it is t(23) = 3.609, p = 0.000993, which shows a statistically significant result. Therefore, we reject the null hypothesis that the true partial unstandarized regression coefficient for complaints predicting rating is zero, at a conventional alpha level of 0.05.

The standardized regression coefficient for complaints is 0.6707252044117290035. For two individuals who differ by one unit in their complaints, while holding constant privileges, learning, raises, critical, and advance, there is an expected difference of 0.670725 units in their rating, with the individual having more complaints expected to have a higher rating. It is found that it is t(23) = 3.809, p = 0.000903, which shows a statistically significant result. Therefore, we reject the null hypothesis that the true partial standarized regression coefficient for complaints predicting rating is zero, at a conventional alpha level of 0.05.

1.3) Conduct your regression diagnostics for this analysis. Look at dfbetas, linearity, normality, heterogeneity, and independence. Summarize your results as if you were including them in the results section of a paper (single paragraph form).

library(ggformula)

## Loading required package: ggplot2

## Loading required package: scales

## Loading required package: ggridges

## 
## New to ggformula?  Try the tutorials: 
##  learnr::run_tutorial("introduction", package = "ggformula")
##  learnr::run_tutorial("refining", package = "ggformula")

library(mosaic)

## Registered S3 method overwritten by 'mosaic':
##   method                           from   
##   fortify.SpatialPolygonsDataFrame ggplot2

## 
## The 'mosaic' package masks several functions from core packages in order to add 
## additional features.  The original behavior of these functions should not be affected by this.

## 
## Attaching package: 'mosaic'

## The following objects are masked from 'package:dplyr':
## 
##     count, do, tally

## The following object is masked from 'package:Matrix':
## 
##     mean

## The following object is masked from 'package:scales':
## 
##     rescale

## The following object is masked from 'package:ggplot2':
## 
##     stat

## The following objects are masked from 'package:stats':
## 
##     binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
##     quantile, sd, t.test, var

## The following objects are masked from 'package:base':
## 
##     max, mean, min, prod, range, sample, sum

library(car) #used for qqPlot function

## Loading required package: carData

## 
## Attaching package: 'car'

## The following objects are masked from 'package:mosaic':
## 
##     deltaMethod, logit

## The following object is masked from 'package:dplyr':
## 
##     recode

library(lmtest) #used for lmtest, bptest functions

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(olsrr) # used for measures of influence

## 
## Attaching package: 'olsrr'

## The following object is masked from 'package:datasets':
## 
##     rivers

attitudemodel <- lm(rating ~ complaints + privileges + learning + raises + critical + advance, data = attitudedata)

attitudedata$predictions <- predict(attitudemodel) #Predictions
attitudedata$residuals <- resid(attitudemodel) # Residuals

head(attitudedata) #Check dataset

library (ggplot2)

# Scatterplot with a model line to identify unusual cases
ggplot(attitudedata, aes(x = complaints, y = rating)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Scatterplot of Complaints vs. Rating",
       x = "Complaints",
       y = "Rating") +
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.text = element_text(size = 10, face = "plain"),
    axis.title = element_text(size = 11, face = "plain"),
    plot.title = element_text(size = 13, face = "bold", hjust = 0.5)
  )

## `geom_smooth()` using formula = 'y ~ x'

# Scatterplot for residuals versus observed data
ggplot(attitudedata, aes(x = rating, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Plot of Residuals vs. Observed Values",
       x = "Observed Rating",
       y = "Residuals") +
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.text = element_text(size = 10, face = "plain"),
    axis.title = element_text(size = 11, face = "plain"),
    plot.title = element_text(size = 13, face = "bold", hjust = 0.5)
  )

library(ggplot2)
library(reshape2)

dfbetas_df <- as.data.frame(dfbetas(attitudemodel))
dfbetas_df$Observation <- 1:nrow(dfbetas_df)
dfbetas_melted <- melt(dfbetas_df, id.vars = "Observation")

# Plotting DFbetas
ggplot(dfbetas_melted, aes(x = Observation, y = value, color = variable)) +
  geom_hline(yintercept = c(-1, 1), linetype = "dashed") +  # Dashed lines at -1 and 1
  geom_point(alpha = 0.5) +
  geom_smooth(se = FALSE, method = "loess") +
  labs(title = "DFBetas for each Observation and Predictor",
       x = "Observation Number",
       y = "DFBetas Value",
       color = "Predictor") +
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.text = element_text(size = 10, face = "plain"),
    axis.title = element_text(size = 11, face = "plain"),
    plot.title = element_text(size = 13, face = "bold", hjust = 0.5)
  )

## `geom_smooth()` using formula = 'y ~ x'

# Residuals vs. Fitted plot
plot(attitudemodel, which = 1)

# Normality: Checking the residuals - Q-Q plot
plot(attitudemodel, which = 2)

# Heteroskedasticity: Scale-Location plot
plot(attitudemodel, which = 3)

# Cook's Distance
plot(attitudemodel, which = 4)

# Independence: Residuals vs. Leverage plot
plot(attitudemodel, which = 5)

# Histogram
ggplot(data.frame(attitudedata$residuals), aes(x = attitudedata$residuals)) +
  geom_histogram(bins = 30, fill = "blue", color = "black", alpha = 0.7) +
  labs(title = "Histogram of Residuals", x = "Residuals", y = "Frequency") +
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.text = element_text(size = 10, face = "plain"),
    axis.title = element_text(size = 11, face = "plain"),
    plot.title = element_text(size = 13, face = "bold", hjust = 0.5)
  )

library(lmtest)
library(gvlma)

# Breusch-Pagan Test
bptest(attitudemodel)

## 
##  studentized Breusch-Pagan test
## 
## data:  attitudemodel
## BP = 10.257, df = 6, p-value = 0.1143

# Goldfeld-Quandt Test
gqtest(attitudemodel, point = 0.5, fraction = 0, alternative = "two.sided",
       order.by = NULL, data = attitudedata)

## 
##  Goldfeld-Quandt test
## 
## data:  attitudemodel
## GQ = 0.63396, df1 = 8, df2 = 8, p-value = 0.5338
## alternative hypothesis: variance changes from segment 1 to 2

First, I will look for outliers and just the general plot. The scatterplot of Complaints vs. Rating demonstrates a positive linear relationship between the number of complaints and the rating. This is visually represented by a blue line showing a positive slope. Moreover, the plot of residuals versus observed ratings indicates a reasonably random dispersion around the zero line, which suggests that the assumptions of linearity and homoscedasticity are generally met, although there are a few outliers. We then go further into this analysis.

The DFbetas plot, which helps identify influential points with respect to each predictor, shows that most data points do not exert an undue influence on all the parameters of the model. Most observations do not appear to exert a substantial influence on the regression coefficients, as the majority of the points lie within the ±0.5 range for all predictors. The lines, representing the smoothed trend of influence across observations for each predictor, mostly hover around zero, indicating minimal influence across the dataset. However, there are a few exceptions where specific observations show DFbetas values that slightly exceed this range, particularly for predictors such as complaints and privileges. These points merit closer inspection to determine if they represent outliers or leverage points that could be distorting the regression analysis.

Therefore, we proceed to the other set of diagnostic plots. The residuals versus fitted values plot show a relatively even distribution of residuals across the range of fitted values, with a smooth line close to zero, suggesting adequate model fit without obvious signs of non-linearity or heteroscedasticity. However, there are some outliers and potential influential points as indicated by Cook’s distance plot, where observations of 6, 1, and 9 exhibit higher values, suggesting they have high influence on the model parameters. The Q-Q plot of standardized residuals indicates slight deviations from normality, particularly with the tails, suggesting that the residuals are not perfectly normally distributed, which could affect inference. There we also find observations 6, 23, and 12, which show undue influence on the model by being above and below the line respectively. The deviations in the tails of the plot suggest that the residuals have heavier tails than a normal distribution. This indicates of kurtosis, where there are more extreme values than what the normal distribution would predict. This is complemented by the Scale-Location plot, which shows some increase in spread of residuals at higher fitted values, hinting at possible heteroscedasticity despite the general homogeneity observed in the residuals vs. fitted values plot. As for the histogram, I see some outstanding values.

The Residuals vs. Leverage plot, combined with Cook’s distance, identifies the same observations as potentially influential, confirming that they warrant possible exclusion to ensure robustness in the model’s conclusions. We can further look at the Breusch-Pagan Test, which is x²(6)=10.257, p = 0.1143. This result suggests that there is no significant variance heterogeneity in the residuals of the model. Furthermore, the Goldfeld-Quandt Test was conducted, where it indicates no significant heteroscedasticity F(8, 8) = 0.63396, p = 0.5338. indicates no significant evidence of heteroscedasticity, suggesting that the variance of the residuals is consistent across different levels of the independent variables in the model.

1.4) Consider the dfbetas from this model and compare them to a model where all of the predictors and the outcomes have been standardized. Do the same observations have extreme dfbetas across the two models?

library(ggplot2)
library(reshape2)

# Unstandardized Model
dfbetas_df <- as.data.frame(dfbetas(attitudemodel))
dfbetas_df$Observation <- 1:nrow(dfbetas_df)
dfbetas_melted <- melt(dfbetas_df, id.vars = "Observation")

# Plotting DFbetas
ggplot(dfbetas_melted, aes(x = Observation, y = value, color = variable)) +
  geom_hline(yintercept = c(-1, 1), linetype = "dashed") +  # Dashed lines at -1 and 1
  geom_point(alpha = 0.5) +
  geom_smooth(se = FALSE, method = "loess") +
  labs(title = "DFBetas for each Observation and Predictor",
       x = "Observation Number",
       y = "DFBetas Value",
       color = "Predictor") +
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.text = element_text(size = 10, face = "plain"),
    axis.title = element_text(size = 11, face = "plain"),
    plot.title = element_text(size = 13, face = "bold", hjust = 0.5)
  )

## `geom_smooth()` using formula = 'y ~ x'

# Standardized Model
attitudemodel_standarized <- lm(scale(rating) ~ scale(complaints) + scale(privileges) + scale(learning) + scale(raises) + scale(critical) + scale(advance), data = attitudedata)

dfbetas_df_2 <- as.data.frame(dfbetas(attitudemodel_standarized))
dfbetas_df_2$Observation <- 1:nrow(dfbetas_df_2)
dfbetas_melted_2 <- melt(dfbetas_df_2, id.vars = "Observation")

# Plotting DFbetas
ggplot(dfbetas_melted_2, aes(x = Observation, y = value, color = variable)) +
  geom_hline(yintercept = c(-1, 1), linetype = "dashed") +  # Dashed lines at -1 and 1
  geom_point(alpha = 0.5) +
  geom_smooth(se = FALSE, method = "loess") +
  labs(title = "DFBetas for each Observation and Predictor",
       x = "Observation Number",
       y = "DFBetas Value",
       color = "Predictor") +
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.text = element_text(size = 10, face = "plain"),
    axis.title = element_text(size = 11, face = "plain"),
    plot.title = element_text(size = 13, face = "bold", hjust = 0.5)
  )

## `geom_smooth()` using formula = 'y ~ x'

dfbetas_melted_histogram_1 <- melt(dfbetas_df, id.vars = "Observation", variable.name = "Predictor", value.name = "DFBeta")

dfbetas_melted_histogram_2 <- melt(dfbetas_df_2, id.vars = "Observation", variable.name = "Predictor", value.name = "DFBeta")

# Plotting histogram of DFbetas Unstandardized
ggplot(dfbetas_melted_histogram_1, aes(x = DFBeta)) +
  geom_histogram(bins = 30, fill = "blue", color = "black") +
  labs(title = "Histogram of DFbetas Unstandardized", x = "DFBeta Value", y = "Frequency") +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.text = element_text(size = 10, face = "plain"),
    axis.title = element_text(size = 11, face = "plain"),
    plot.title = element_text(size = 13, face = "bold", hjust = 0.5)
  )

# Plotting histogram of DFbetas Standardized
ggplot(dfbetas_melted_histogram_2, aes(x = DFBeta)) +
  geom_histogram(bins = 30, fill = "blue", color = "black") +
  labs(title = "Histogram of DFbetas Standardized", x = "DFBeta Value", y = "Frequency") +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.text = element_text(size = 10, face = "plain"),
    axis.title = element_text(size = 11, face = "plain"),
    plot.title = element_text(size = 13, face = "bold", hjust = 0.5)
  )

From the plots shown, the DFbetas for both models show similar trends across observations. The lines representing the smoothed trends in the DFbetas values for each predictor maintain roughly parallel paths in both plots, suggesting that standardizing the predictors and the outcome variable does not dramatically change the identification of influential observations. If the same observations appear with extreme DFbetas values (those surpassing the ±0.5 or ±1 thresholds) in both the standardized and unstandardized models, it indicates that these observations inherently possess characteristics that disproportionately affect the regression coefficients, irrespective of the scale of the variables. In the unstandardized model, we see two large ±0.5 or ±1 thresholds for the complaint and critical predictors, but the standardized model shows shows only only one large predictor for the critical predictor.

Furthermore, we find this is affirmed in the two histograms showing these skews, where in the unstandardized model there are two outliers, but in the standardized model, we find an outlier to the right. Still the dfbeta values are largely at zero.

Question 2

Calculate eta-squared, partial eta-squared, and Cohen’s f. for each predictor. Rank order the predictors based on how “important” they are according to these metrics (where 1 is most important and 6 is least important). Explain why the rank order of these variables will be the same across these 3 measures. Explain what differences there are between these 3 measures with respect to relative importance (e.g., Variable 1 is twice as important as Variable 2).

library(lsr)

# Linear Model Predicting Rating
attitudemodel <- lm(rating ~ complaints + privileges + learning + raises + critical + advance, data = attitudedata)

anova_model <- aov(attitudemodel)

attitudemodel_r2 <- summary(attitudemodel)$r.squared

print(attitudemodel_r2)

## [1] 0.732602

# Eta-squared and Partial Eta-squared
etaSquared(anova_model)

##                  eta.sq eta.sq.part
## complaints 0.1686772130 0.386807608
## privileges 0.0033678656 0.012438294
## learning   0.0420074921 0.135768408
## raises     0.0015832740 0.005886187
## critical   0.0007926205 0.002955437
## advance    0.0172470605 0.060591461

etaSquared_dataframe = data.frame(etaSquared(anova_model))

# Cohen's f
etaSquared_dataframe$cohens_f2 = (etaSquared_dataframe$eta.sq / (1 - attitudemodel_r2))
print(etaSquared_dataframe)

##                  eta.sq eta.sq.part   cohens_f2
## complaints 0.1686772130 0.386807608 0.630809536
## privileges 0.0033678656 0.012438294 0.012594954
## learning   0.0420074921 0.135768408 0.157097252
## raises     0.0015832740 0.005886187 0.005921039
## critical   0.0007926205 0.002955437 0.002964197
## advance    0.0172470605 0.060591461 0.064499585

For our outputs, we find that for the rank ordering is as follows:

	Eta-Squared	Partial Eta-Squared	Cohen’s F-squared
1) complaints	0.1686772130	0.386807608	0.630809536
2) learning	0.0420074921	0.135768408	0.157097252
3) advance	0.0172470605	0.060591461	0.064499585
4) privileges	0.0033678656	0.012438294	0.012594954
5) raises	0.0015832740	0.005886187	0.005921039
6) critical	0.0007926205	0.002955437	0.002964197

The rank order should be the same, as we in fact find it to be, because they are all measures derived from the same underlying information—the variance explained by the predictors in the model—though they provide different perspectives or adjustments on this variance. Eta-squared measures the proportion of the total variance in the dependent variable that is accounted for by the predictor. It’s a measure of effect size that indicates the degree of association between the predictor and the dependent variable. Partial Eta-squared is similar to Eta-squared but provides a more accurate measure when multiple predictors are involved. It measures the proportion of total variance accounted for by a predictor while controlling for other predictors in the model. Cohen’s f-squared is the ratio of variance explained by the predictor to the variance unexplained (residual variance). As demonstrated, all three metrics are derived from the variance explained by the predictors in the model. Eta-squared and Partial Eta-squared are direct measures of this variance, while Cohen’s f-squared is a transformation that still reflects the same underlying variance explained but adjusted to emphasize the proportion relative to unexplained variance.

While the rank order might be consistent due to their common basis in explained variance, the numerical values of these statistics can and often do differ, as shown. While partial eta-squared adjusts for the presence of other variables and Cohen’s f-squared adjusts this further relative to the residual variance, the relative magnitudes of variance explained by each predictor compared to each other remain consistent. If a predictor is important according to Eta-squared, it will typically be important according to Partial Eta-squared and Cohen’s f-squared as well, because they all evaluate how much the predictor contributes to explaining the variance, albeit with slightly different adjustments for other factors in the model. Specifically, Eta-squared might give a higher value in simpler models or where predictors are not overlapping much in the variance they explain. Partial Eta-squared might adjust these values downward when predictors share explanatory power over the dependent variable. Cohen’s f-squared amplifies differences, particularly when the residual variance is large, making it strong for identifying predictors that have a large impact on the model, relative to the variance they do not explain.

Relative importance markers are such, where you can see how much the relative importance is for each of the predictors relative to the others for Eta-squared, Partial Eta-squared, and Cohen’s f-squared respectfully.

# Names of predictors
predictors <- c("complaints", "learning", "advance", "privileges", "raises", "critical")

# Relative importance matrix for eta-squared
relative_importance_matrix_etasquared <- outer(etaSquared_dataframe$eta.sq, etaSquared_dataframe$eta.sq, FUN = "/")

# Relative importance matrix for partial eta-squared
relative_importance_matrix_partialetasquared <- outer(etaSquared_dataframe$eta.sq.part, etaSquared_dataframe$eta.sq.part, FUN = "/")

# Relative importance matrix for cohen's f
relative_importance_matrix_cohensfsquared <- outer(etaSquared_dataframe$cohens_f2, etaSquared_dataframe$cohens_f2, FUN = "/")

# Set the row and column names
dimnames(relative_importance_matrix_etasquared) <- list(predictors, predictors)
dimnames(relative_importance_matrix_partialetasquared) <- list(predictors, predictors)
dimnames(relative_importance_matrix_cohensfsquared) <- list(predictors, predictors)

# Print the relative importance matrix
print(relative_importance_matrix_etasquared)

##             complaints  learning    advance  privileges     raises   critical
## complaints 1.000000000 50.084306 4.01540784 106.5369686 212.809558 9.78005575
## learning   0.019966334  1.000000 0.08017298   2.1271527   4.249027 0.19527186
## advance    0.249040705 12.473031 1.00000000  26.5320418  52.998242 2.43563198
## privileges 0.009386413  0.470112 0.03769028   1.0000000   1.997518 0.09179964
## raises     0.004699037  0.235348 0.01886855   0.5006212   1.000000 0.04595684
## critical   0.102248906  5.121066 0.41057106  10.8932885  21.759544 1.00000000

print(relative_importance_matrix_partialetasquared)

##             complaints   learning    advance privileges     raises   critical
## complaints 1.000000000 31.0981231 2.84902515  65.714466 130.880010 6.38386338
## learning   0.032156281  1.0000000 0.09161405   2.113133   4.208614 0.20528131
## advance    0.350997253 10.9153558 1.00000000  23.065597  45.938524 2.24071851
## privileges 0.015217350  0.4732310 0.04335461   1.000000   1.991647 0.09714548
## raises     0.007640586  0.2376079 0.02176822   0.502097   1.000000 0.04877646
## critical   0.156644956  4.8713641 0.44628542  10.293840  20.501693 1.00000000

print(relative_importance_matrix_cohensfsquared)

##             complaints  learning    advance  privileges     raises   critical
## complaints 1.000000000 50.084306 4.01540784 106.5369686 212.809558 9.78005575
## learning   0.019966334  1.000000 0.08017298   2.1271527   4.249027 0.19527186
## advance    0.249040705 12.473031 1.00000000  26.5320418  52.998242 2.43563198
## privileges 0.009386413  0.470112 0.03769028   1.0000000   1.997518 0.09179964
## raises     0.004699037  0.235348 0.01886855   0.5006212   1.000000 0.04595684
## critical   0.102248906  5.121066 0.41057106  10.8932885  21.759544 1.00000000

Question 3

Conduct a dominance analysis using all 6 predictors in the attitude model predicting rating. Report your code and output. Provide a rank order of the variables (where 1 is most important and 6 is least important) based on their importance according to the results (remember there might be “ties”).

library (domir)

# Dominance Analysis
domin(rating ~ complaints + privileges + learning + raises + critical + advance,
             lm,
             list(summary, "r.squared"), 
             data = attitudedata)

## Overall Fit Statistic:      0.732602 
## 
## General Dominance Statistics:
##            General Dominance Standardized Ranks
## complaints       0.370816194  0.506163235     1
## privileges       0.050903793  0.069483558     4
## learning         0.155765990  0.212620211     2
## raises           0.120345079  0.164270751     3
## critical         0.006588723  0.008993591     6
## advance          0.028182213  0.038468655     5
## 
## Conditional Dominance Statistics:
##                IVs: 1      IVs: 2      IVs: 3      IVs: 4       IVs: 5
## complaints 0.68131416 0.494246920 0.373554784 0.286700889 0.2204031958
## privileges 0.18157559 0.075443067 0.029225025 0.010689428 0.0051217859
## learning   0.38897445 0.227010614 0.136869545 0.084488083 0.0552457544
## raises     0.34826403 0.193532066 0.105971630 0.052691787 0.0200276856
## critical   0.02447321 0.006996336 0.004280956 0.002299817 0.0006893976
## advance    0.02405175 0.021636513 0.034375193 0.039288706 0.0324940601
##                  IVs: 6
## complaints 0.1686772130
## privileges 0.0033678656
## learning   0.0420074921
## raises     0.0015832740
## critical   0.0007926205
## advance    0.0172470605
## 
## Complete Dominance Designations:
##                    Dmnated?complaints Dmnated?privileges Dmnated?learning
## Dmnates?complaints                 NA               TRUE             TRUE
## Dmnates?privileges              FALSE                 NA            FALSE
## Dmnates?learning                FALSE               TRUE               NA
## Dmnates?raises                  FALSE                 NA            FALSE
## Dmnates?critical                FALSE              FALSE            FALSE
## Dmnates?advance                 FALSE                 NA            FALSE
##                    Dmnated?raises Dmnated?critical Dmnated?advance
## Dmnates?complaints           TRUE             TRUE            TRUE
## Dmnates?privileges             NA             TRUE              NA
## Dmnates?learning             TRUE             TRUE            TRUE
## Dmnates?raises                 NA             TRUE              NA
## Dmnates?critical            FALSE               NA              NA
## Dmnates?advance                NA               NA              NA

Question 4

Provide a reflection on whether the different variable importance metrics provided similar or dissimilar information about the variables in the model. Make sure to compare standardized coefficients, variance explained measures, and dominance analysis.

Psych 250C - Problem Set 5

305946624

May 2024

Introduction

Question 1

1.1) Provide the code and output a summary which includes coefficient estimates, ANOVA table, and R² results.

1.2) Select one regression coefficient from the model (not the intercept) and write out an interpretation for the unstandardized coefficient, interpretation for the standardized coefficient, inference, and conclusion (see writing guide for distinction) relevant to that regression coefficient.

1.3) Conduct your regression diagnostics for this analysis. Look at dfbetas, linearity, normality, heterogeneity, and independence. Summarize your results as if you were including them in the results section of a paper (single paragraph form).

1.4) Consider the dfbetas from this model and compare them to a model where all of the predictors and the outcomes have been standardized. Do the same observations have extreme dfbetas across the two models?

Question 2

Question 3

Question 4

Psych 250C - Problem Set 5

305946624

May 2024

Introduction

Question 1

1.1) Provide the code and output a summary which includes coefficient estimates, ANOVA table, and R2 results.

1.2) Select one regression coefficient from the model (not the intercept) and write out an interpretation for the unstandardized coefficient, interpretation for the standardized coefficient, inference, and conclusion (see writing guide for distinction) relevant to that regression coefficient.

1.3) Conduct your regression diagnostics for this analysis. Look at dfbetas, linearity, normality, heterogeneity, and independence. Summarize your results as if you were including them in the results section of a paper (single paragraph form).

1.4) Consider the dfbetas from this model and compare them to a model where all of the predictors and the outcomes have been standardized. Do the same observations have extreme dfbetas across the two models?

Question 2

Question 3

Question 4

1.1) Provide the code and output a summary which includes coefficient estimates, ANOVA table, and R² results.