Problem set 2

Loading the necessary packages

library(car) #outlier detection by case

## Warning: package 'car' was built under R version 4.2.3

## Loading required package: carData

## Warning: package 'carData' was built under R version 4.2.3

library(psych) #for descriptives and histogram

## Warning: package 'psych' was built under R version 4.2.3

## 
## Attaching package: 'psych'

## The following object is masked from 'package:car':
## 
##     logit

library(haven)

PSY772ProblemSet2 <- read_sav("C:/Users/John Majoubi/Downloads/PSY772ProblemSet2.sav")

1a) creating a new dataset that removes the missing data

#1a
listwise.PS2 = na.exclude(PSY772ProblemSet2[c(3:6)])

describe(listwise.PS2)

##          vars   n mean   sd median trimmed  mad  min   max range  skew kurtosis
## Enjoy       1 149 7.40 2.05   8.00    7.50 1.48 1.00 11.00 10.00 -0.50     0.34
## Happy       2 149 9.40 2.14  10.00    9.79 1.48 1.00 11.00 10.00 -1.47     1.79
## Focus       3 149 7.94 2.87   8.00    8.30 2.97 1.00 11.00 10.00 -0.85    -0.10
## SREISavg    4 149 3.62 0.39   3.68    3.63 0.39 2.63  4.42  1.79 -0.30    -0.43
##            se
## Enjoy    0.17
## Happy    0.18
## Focus    0.24
## SREISavg 0.03

#requesting the listwise deletion
describe(listwise.PS2, na.rm = F)

##          vars   n mean   sd median trimmed  mad  min   max range  skew kurtosis
## Enjoy       1 149 7.40 2.05   8.00    7.50 1.48 1.00 11.00 10.00 -0.50     0.34
## Happy       2 149 9.40 2.14  10.00    9.79 1.48 1.00 11.00 10.00 -1.47     1.79
## Focus       3 149 7.94 2.87   8.00    8.30 2.97 1.00 11.00 10.00 -0.85    -0.10
## SREISavg    4 149 3.62 0.39   3.68    3.63 0.39 2.63  4.42  1.79 -0.30    -0.43
##            se
## Enjoy    0.17
## Happy    0.18
## Focus    0.24
## SREISavg 0.03

Loading four big packages

library(psych) #for descriptives
library(car) #for Diag
library(ppcor) #semipartial for Effect Sizes

## Loading required package: MASS

## Warning: package 'MASS' was built under R version 4.2.3

library(lm.beta) #for standadardized Beta weights

## Warning: package 'lm.beta' was built under R version 4.2.3

Getting the column locations

names(listwise.PS2)

## [1] "Enjoy"    "Happy"    "Focus"    "SREISavg"

1C) Screening for outliers using the Cook’s D stats

# creating the equation aka analysis object: Rate for rate of happiness should be fine for the purpose of this assignment.
Rate = lm(Happy ~ Enjoy +  Focus + SREISavg, data = listwise.PS2)
# screening for outliers using Cook's Distance
summary(cooks.distance(Rate))

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000005 0.0001960 0.0012960 0.0094096 0.0056513 0.2128113

1D) There are zero outliers in this equation. Max distance is less than 1.00

1E) Running the zero-order correlations (to test linearity outcome prediction AND informally test collinearity between predictors AND singularity)

#for listwise deletion of missing data
cor(listwise.PS2)

##              Enjoy      Happy     Focus   SREISavg
## Enjoy    1.0000000 0.33471188 0.1798894 0.13840350
## Happy    0.3347119 1.00000000 0.5063710 0.06416197
## Focus    0.1798894 0.50637101 1.0000000 0.21230915
## SREISavg 0.1384035 0.06416197 0.2123091 1.00000000

cor(listwise.PS2, use = "complete")

##              Enjoy      Happy     Focus   SREISavg
## Enjoy    1.0000000 0.33471188 0.1798894 0.13840350
## Happy    0.3347119 1.00000000 0.5063710 0.06416197
## Focus    0.1798894 0.50637101 1.0000000 0.21230915
## SREISavg 0.1384035 0.06416197 0.2123091 1.00000000

There seems to be linearity of outcome prediction (based on the existence of the correlation of r > .20). Separately there appears to be no concern about collinearity (based on all inter-predictor correlations of r < .60)

1F) Creating a scatterplot array for all four variables

scatterplotMatrix(listwise.PS2)

There are NO curvilinear relationships for these four variables

Data Visualization

multi.hist(listwise.PS2)

Pretty close to a Gaussian distribution

1G) Assessing Collinearity Using Variance Infaltion Factors (VIF)

vif(Rate)

##    Enjoy    Focus SREISavg 
## 1.044797 1.073156 1.058709

The VIF is slightly above 1.00 and thus based on the standard of VIFs ≥ 3.00 in this literature there is no collinearity.

1H) Assessing Homogenity of Residuals

describe(Rate$residuals)

##    vars   n mean   sd median trimmed  mad   min  max range  skew kurtosis   se
## X1    1 149    0 1.76   0.31    0.19 1.17 -6.39 3.33  9.73 -1.28     2.14 0.14

There seems to be homogeneity of residuals because the distance between the mean and median are contained within half of the standard deviation units.

We have met all crucial assumptions. #### 1J) OLS Multiple regression NHSTs

summary(Rate)

## 
## Call:
## lm(formula = Happy ~ Enjoy + Focus + SREISavg, data = listwise.PS2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3931 -0.4797  0.3112  1.0783  3.3324 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.03408    1.41953   4.251 3.80e-05 ***
## Enjoy        0.27159    0.07317   3.712 0.000293 ***
## Focus        0.35433    0.05280   6.711 4.05e-10 ***
## SREISavg    -0.40144    0.38883  -1.032 0.303593    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.782 on 145 degrees of freedom
## Multiple R-squared:  0.3227, Adjusted R-squared:  0.3087 
## F-statistic: 23.03 on 3 and 145 DF,  p-value: 2.991e-12

1K) The model is a good fit model for the predicting the outcome, due to the omnibus showing a main effect. With the multiple R-squared of 0.323

Expected_Valur_R = (3/(149-1))

print(Expected_Valur_R)

## [1] 0.02027027

1L) Expected value of R is 0.020

Requesting the standardized Beta

lm.beta(Rate)

## 
## Call:
## lm(formula = Happy ~ Enjoy + Focus + SREISavg, data = listwise.PS2)
## 
## Standardized Coefficients::
## (Intercept)       Enjoy       Focus    SREISavg 
##          NA  0.25928713  0.47514190 -0.07260125

1M) Focalism moves the outcome the most in hyperspace. As focalism increases so does the students’ prediction of their happiness. Also, as current mood increases so does the students’ prediction of their happiness.

1N) Requesting the semi-partial correlations as effect size

#request semipartial cor matrix
RateSPcor = spcor(listwise.PS2)
#display the semipartials for the predictors only, using the outcome column in the previous matrix 
RateSPcor$estimate[2,]

##       Enjoy       Happy       Focus    SREISavg 
##  0.25366757  1.00000000  0.45866101 -0.07055955

Focalism predicts affective forecasting above and beyond other predictors, as it has the largest coefficient when controlling for the other two predictors, r\(_{a(b.cd)}\) = .459

APA Style Conclusion

1O) A multiple regression analysis on affective forecasting revealed a significant overall effect of the model, F(3, 145) = 23.03, p < .001, R\(^{2}\) = .323. Specifically, focalism was positively correlated with affective forecasting scores, b\(*\) = .475, t(145) = 6.71, p < .001, r\(_{a(b.cd)}\) = .459. That means as focalism increased so did the students’ affective forecasting (prediction of their happiness). Also, current mood was positively predictive of affective forecasting, b\(*\) = .259, t(145) = 3.71, p < .001, r\(_{a(b.cd)}\) = .253. This indicates that as students’ current mood increased so did the their affective forecating. Emotional intelligence was NOT statistically significant as a predictor, p >= .304.

1P) The study finds that there are some meaningful relationships between our variables of interest and the students’ prediction of how happy they will be on summer vacation. specifically, focalism (which is the tendency to place too much focus or emphasis on a single factor or piece of information when making judgments or predictions.) is the best predictor of the studets’ emotional prediction. Students’ mood (although not as strongly as focalism) was also a predictor of students’ affective forecasting. Lastly there was no link between emotional intelligence and studets’ predictions on their happiness on summer vacation.