library(ggplot2)
library(haven)
library(tidyverse)
wvs_usa <- read_sav("data/wvs-usa.sav")
pallant_cor <- read_sav("data/pallant-cor.sav")

Correlation

First, working more with continuous variables From the (11) Pallant correlation survey:

Does optimism seem to be correlated with more self-esteem? 
Is age correlated with self-control? 

For both of these, report the Pearson r coefficient, and the whole relationship, the way I write it out in the class examples.

Does optimism seem to be correlated with more self-esteem?

R provides the “cor()” function which computes the Pearson co-efficient for the correlation.

optimism_cor <- cor(pallant_cor$optimism, pallant_cor$self_esteem, use="complete.obs")

if(optimism_cor >= .5) {
    paste("Large correlation (DING!  DING!  DING!)")
} else if(optimism_cor <= .49 & optimism_cor >= .3) {
    paste("Medium correlation")
} else if(optimism_cor < .3 & optimism_cor >= .1) {
    paste("Small correlation (don't get too excited)")
}
## [1] "Large correlation (DING!  DING!  DING!)"
optimism_cor
## [1] 0.5649459

R also provides the “cor.test()” function which provides more insight into a correlation calculation, including t-test & p-value.

optimism_test <- cor.test(pallant_cor$optimism, pallant_cor$self_esteem)

optimism_test
## 
##  Pearson's product-moment correlation
## 
## data:  pallant_cor$optimism and pallant_cor$self_esteem
## t = 14.214, df = 431, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4971777 0.6258629
## sample estimates:
##       cor 
## 0.5649459

A scatterplot of the distribution of values can also provide insight into the shape of a possible correlation.

ggplot(pallant_cor) +
    aes(x = optimism, y = self_esteem) +
    geom_point() +
    geom_smooth(method = lm)
## Don't know how to automatically pick scale for object of type haven_labelled/vctrs_vctr/double. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type haven_labelled/vctrs_vctr/double. Defaulting to continuous.
## `geom_smooth()` using formula 'y ~ x'

Our reading discussed the Paired Samples T-test. This function is available in the R programming language.

t.test(pallant_cor$optimism, pallant_cor$self_esteem, paired = TRUE, alternative = "two.sided")
## 
##  Paired t-test
## 
## data:  pallant_cor$optimism and pallant_cor$self_esteem
## t = -51.036, df = 432, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.88173 -11.00049
## sample estimates:
## mean of the differences 
##               -11.44111

Is age correlated with self-control?

Other Tests

Next, think about OTHER tests (not necessarily just with continuous variables- think about the chi-square, the t-test, ANOVAS) that we’ve talked about to answer these questions:

Who has more stress, men or women?  
Does marital status seem to impact levels of optimism? 
Does marital status seem to impact  levels of self-esteem?  
Does having kids seem to affect your "total perceived stress?"  

What are those differences, and are they significant? What kind of test did you use for this and why?

Write out the answers like I write out in the class examples…like you are writing a formal report.

You don’t need to do prepare this as a PowerPoint this week- just send in your answers as part of a Word doc in the Assignments tab! Do also include your SPSS output.

===

If you’ve got two categorical/ dichotomous variables (or you can reasonably and logically make them that dichotomous), you do a “Chi-Square Test.”

Men/ women and the question, “YES or NO- would you walk 5 miles through the snow for a bag of hot, salty, buttery popcorn?”

That’s a chi-square situation. And a heart risk.

If you’ve got two variables, and one is nominal/ dichotomous (2 values), and the other is continuous (like a scale), that’s an Independent samples t-test situation:

On a scale of “respect for the environment” where 3 = “less respect” and 10 = “more respect,” who respects the environment more- people in Poland or people in Latvia?

That’s a t-test situation.

If you’ve got two variables, and one is nominal with 3-5 values, and the other is continuous, that’s an ANOVA situation.

On a scale of “respect for the environment” where 3 = “less respect” and 10 = “more respect,” who respects the environment more- people in Poland, people in Latvia, or people in Argentina?

This calls for an ANOVA.

You can probably figure out the LAST situation we still have to cover— what if you have 2 continuous variables that you want to compare?

Yes, dear readers- this IS the time for a “correlation.”