CMPU4091 Visualising Data

Correlation Analysis

Author

Fajar Albalushi


1. Introduction

This report explores relationships between psychological well-being variables, focusing on correlations. The dataset was adapted from the SPSS survey.sav file.


Concept Scale Used Type
Optimism 6 item Life Orientation Test instrument developed by Scheier and Carver (1985) Continuous - Ratio
Perceived Control of Internal States 18 item PCOISS test instrument developed by Pallant (2000) Continuous - Ratio
Perceived Stress 10 item Perceived Stress test developed by Cohen, Kamarck and Mermelstein (1983) Continuous - Ratio
Life Satisfaction 5 item Satisfaction with Life instrument developed by Diener, Emmons, Larson and Griffin (1985) Continuous

Table 1: Psychological Factors of Interest

2. Exploration

2.1 Histograms with Density Plots

This section explores the distribution of key psychological well-being measures: Optimism, Positive Affect, Negative Affect, and Mastery. Each histogram provides insight into the spread of scores across participants, complemented by a density curve to visualize trends.

Code
# Remove all NAs from the dataset to make sure we are working with consistent data
survey <- na.omit(survey)
# Histogram for Total Optimism
plot1 <- ggplot(survey, aes(x = toptim)) +
  geom_histogram(aes(y = after_stat(density)), binwidth = 1, fill = "blue", alpha = 0.5) +
  geom_density(color = "blue", alpha = 0.7) +
  labs(title = "Total Optimism", x = "Optimism Score", y = "Density") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 10, face = "bold"),
    axis.title = element_text(size = 9),
    axis.text = element_text(size = 8),
    legend.text = element_text(size = 8),
    plot.margin = margin(5, 10, 5, 5)
  )
# Histogram for Total Positive Affect
plot2 <- ggplot(survey, aes(x = tposaff)) +
  geom_histogram(aes(y = after_stat(density)), binwidth = 1, fill = "green", alpha = 0.5) +
  geom_density(color = "green", alpha = 0.7) +
  labs(title = "Total Positive Affect", x = "Positive Affect Score", y = "Density") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 10, face = "bold"),
    axis.title = element_text(size = 9),
    axis.text = element_text(size = 8),
    legend.text = element_text(size = 8),
    plot.margin = margin(5, 10, 5, 5)
  )
# Histogram for Total Negative Affect
plot3 <- ggplot(survey, aes(x = tnegaff)) +
  geom_histogram(aes(y = after_stat(density)), binwidth = 1, fill = "red", alpha = 0.5) +
  geom_density(color = "red", alpha = 0.7) +
  labs(title = "Total Negative Affect", x = "Negative Affect Score", y = "Density") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 10, face = "bold"),
    axis.title = element_text(size = 9),
    axis.text = element_text(size = 8),
    legend.text = element_text(size = 8),
    plot.margin = margin(5, 10, 5, 5)
  )
# Histogram for Total Mastery
plot4 <- ggplot(survey, aes(x = tmast)) +
  geom_histogram(aes(y = after_stat(density)), binwidth = 1, fill = "purple", alpha = 0.5) +
  geom_density(color = "purple", alpha = 0.7) +
  labs(title = "Total Mastery", x = "Mastery Score", y = "Density") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 10, face = "bold"),
    axis.title = element_text(size = 9),
    axis.text = element_text(size = 8),
    legend.text = element_text(size = 8),
    plot.margin = margin(5, 10, 5, 5)
  )
# Combine histograms in a grid
plot_grid(plot1, plot2, plot3, plot4, ncol = 2)

Observations:

Key Findings

  1. Total Optimism (Blue - Top Left)

    • Follows an approximately normal distribution with a slight right skew.

    • Most scores range between 15 and 25, indicating moderate to high optimism.

  2. Total Positive Affect (Green - Top Right)

    • Nearly normal distribution, with scores peaking around 30-40.

    • Suggests that most respondents experience frequent positive emotions.

  3. Total Negative Affect (Red - Bottom Left)

    • Highly right-skewed, meaning most respondents report low negative affect.

    • Only a few individuals experience frequent distress.

  4. Total Mastery (Purple - Bottom Right)

    • Shows a bimodal distribution, with peaks at 15 and 25.

    • Suggests two groups: one with lower perceived control and another with higher mastery.

2.2 Bar Chart for Smoking Status

This boxplot compares total optimism scores between smokers (YES) and non-smokers (NO).

Code
# Convert 'smoke' to a factor for categorical plotting
survey$smoke <- as.factor(survey$smoke)

ggplot(survey, aes(x = smoke, fill = smoke)) +
  geom_bar(alpha = 0.7) +
  labs(title = "Distribution of Smoking Status", x = "Smoking Status", y = "Count") +
  scale_fill_manual(values = c("lightblue", "darkblue")) + # Custom colors
  theme_minimal() +
  theme(
    plot.title = element_text(size = 10, face = "bold"),
    axis.title = element_text(size = 9),
    axis.text = element_text(size = 8),
    legend.text = element_text(size = 8),
    plot.margin = margin(5, 10, 5, 5)
  )

Observations:

Key Findings

  1. Median Optimism

    • Non-smokers have a slightly lower median optimism score compared to smokers.

    • Smokers’ median optimism appears slightly higher, though the difference is minimal.

  2. Spread & Variability

    • Both groups show a similar interquartile range (IQR), suggesting similar variability in optimism scores.

    • The whiskers extend roughly the same, indicating a comparable overall distribution.

  3. Outliers

    • Non-smokers have two low-score outliers (below 10), indicating a few individuals with very low optimism.

    • Smokers do not show any visible outliers, suggesting a more consistent optimism range.

2.3 Boxplot: Optimism vs Smoking Status

The scatterplots illustrate the relationships between Total Optimism and three psychological factors: Positive Affect, Negative Affect, and Mastery. Each plot includes a regression line, indicating the trend of association between the variables.

Code
# Load ggplot2
library(ggplot2)

ggplot(survey, aes(x = smoke, y = toptim, fill = smoke)) +
  geom_boxplot(alpha = 0.8, width = 0.6, outlier.color = "red", outlier.shape = 16, outlier.size = 3) + # Better styling
  labs(title = "Comparison of Optimism Between Smokers and Non-Smokers", 
       x = "Smoking Status", 
       y = "Total Optimism") +
  scale_fill_manual(values = c("lightblue", "darkblue")) + # Custom colors
  theme_minimal() +
  theme(
    plot.title = element_text(size = 11, face = "bold"),
    axis.title = element_text(size = 9),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 8),
    legend.position = "none"
  )

Observations:

1. Optimism vs Positive Affect

  • A positive correlation is observed, meaning individuals with higher optimism tend to experience higher positive affect.

  • The red regression line slopes upward, reinforcing this relationship.

  • The spread of points suggests some variability, but the trend holds consistently.

2. Optimism vs Negative Affect

  • A negative correlation exists, where higher optimism is linked to lower negative affect.

  • The blue regression line slopes downward, confirming this inverse relationship.

  • While the trend is present, data points show considerable variability, indicating that other factors may influence negative affect.

3. Optimism vs Mastery

  • A moderate positive correlation is evident, suggesting that higher optimism is associated with a greater sense of mastery.

  • The purple regression line slopes slightly upward, indicating a positive, but weaker association compared to positive affect.

  • The data points show some spread, meaning optimism alone may not fully explain mastery levels.

3. Correlation

3.1 Correlation Co-efficients with p-values

The table presents the correlation coefficients and p-values for four psychological variables: Total Optimism (toptim), Positive Affect (tposaff), Negative Affect (tnegaff), and Mastery (tmast).

Code
# Load required library
library(Hmisc)

# Select relevant columns for correlation analysis
cor_data <- survey %>% select(toptim, tposaff, tnegaff, tmast)

# Compute correlation matrix and p-values
cor_results <- rcorr(as.matrix(cor_data))

# Extract correlation coefficients and p-values
cor_matrix <- round(cor_results$r, 3)  # Correlation coefficients
p_matrix <- round(cor_results$P, 3)  # P-values

# Print results
cat("Correlation Coefficients:\n")
Correlation Coefficients:
Code
print(cor_matrix)
        toptim tposaff tnegaff  tmast
toptim   1.000   0.418  -0.341  0.569
tposaff  0.418   1.000  -0.363  0.463
tnegaff -0.341  -0.363   1.000 -0.427
tmast    0.569   0.463  -0.427  1.000
Code
cat("\nP-Values:\n")

P-Values:
Code
print(p_matrix)
        toptim tposaff tnegaff tmast
toptim      NA       0       0     0
tposaff      0      NA       0     0
tnegaff      0       0      NA     0
tmast        0       0       0    NA

Observations:

Optimism & Mastery (r = 0.569, p < 0.001)

  • Strongest positive correlation in the dataset.

  • Individuals with higher optimism tend to experience greater mastery (sense of control).

Optimism & Positive Affect (r = 0.418, p < 0.001)

  • Moderate positive correlation, suggesting that optimistic individuals tend to experience more positive emotions.

Optimism & Negative Affect (r = -0.341, p < 0.001)

  • Negative correlation, meaning higher optimism is associated with lower negative affect, supporting the idea that optimism acts as a protective factor against negative emotions.

Negative Affect & Mastery (r = -0.427, p < 0.001)

  • Moderate negative correlation, indicating that individuals with higher negative affect tend to feel less mastery over their circumstances.

All p-values = 0

  • This means that all correlations are statistically significant at p < 0.001.

  • We can confidently reject the null hypothesis, confirming that these relationships are unlikely due to random chance.

3.2 Scatterplots with Regression Lines

This analysis explores the relationship between education levels and psychological factors such as stress, life satisfaction, self-esteem, and perceived control, highlighting how these correlations vary across different educational backgrounds.

Code
# Load necessary libraries
library(ggplot2)
library(patchwork)  # For arranging multiple plots

# Define a base theme with better spacing
base_theme <- theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    axis.title.x = element_text(size = 12, margin = margin(t = 10)),
    axis.title.y = element_text(size = 12, margin = margin(r = 10)),
    axis.text = element_text(size = 10),
    legend.position = "none",
    plot.margin = margin(15, 15, 15, 15), # Increase margin spacing
    axis.text.y = element_text(angle = 0, hjust = 1) # Rotate y-axis labels
  )

# Scatterplot: Optimism vs Positive Affect
plot1 <- ggplot(survey, aes(x = toptim, y = tposaff)) +
  geom_point(alpha = 0.6, color = "blue", size = 1.5) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(title = "Optimism vs Positive Affect", x = "Total Optimism", y = "Total Positive Affect") +
  base_theme +
  coord_cartesian(clip = "off") # Prevents excessive scaling

# Scatterplot: Optimism vs Negative Affect
plot2 <- ggplot(survey, aes(x = toptim, y = tnegaff)) +
  geom_point(alpha = 0.6, color = "red", size = 1.5) +
  geom_smooth(method = "lm", color = "blue", se = FALSE) +
  labs(title = "Optimism vs Negative Affect", x = "Total Optimism", y = "Total Negative Affect") +
  base_theme +
  coord_cartesian(clip = "off")

# Scatterplot: Optimism vs Mastery
plot3 <- ggplot(survey, aes(x = toptim, y = tmast)) +
  geom_point(alpha = 0.6, color = "green", size = 1.5) +
  geom_smooth(method = "lm", color = "purple", se = FALSE) +
  labs(title = "Optimism vs Mastery", x = "Total Optimism", y = "Total Mastery") +
  base_theme +
  coord_cartesian(clip = "off")

# Arrange plots in a vertical layout with increased spacing
(plot1 / plot2 / plot3) +
  plot_annotation(
    title = "Scatterplots with Regression Lines",
    theme = theme(plot.title = element_text(size = 16, face = "bold", hjust = 0.5))
  ) &
  theme(plot.margin = margin(20, 20, 20, 20)) # Extra margin to avoid overlap
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

Observations:

  • Optimism vs Positive Affect: A weak positive correlation is observed, suggesting that individuals with higher optimism tend to experience slightly higher positive affect, as indicated by the upward trend in the regression line.

  • Optimism vs Negative Affect: A weak negative correlation is present, meaning that as optimism increases, negative affect slightly decreases, although the relationship is not strong.

  • Optimism vs Mastery: A moderate positive correlation is seen, indicating that higher optimism is associated with a stronger sense of mastery and control over one’s life.

  • Overall Trend: While the relationships are not very strong, they align with expectations that optimism is linked to higher positive emotions, lower negative emotions, and a greater sense of control.

    3.3 Correlation Heatmap

This heatmap visualizes the correlations between optimism and key psychological factors, showing positive associations with mastery and positive affect, while indicating negative relationships with negative affect.

Code
# Load necessary library
library(ggcorrplot)

# Create the correlation heatmap
ggcorrplot(cor_matrix, 
           type = "lower", 
           lab = TRUE, 
           lab_size = 4, 
           colors = c("blue", "white", "red"), 
           title = "Correlation Heatmap: Optimism & Psychological Factors", 
           ggtheme = theme_minimal()) +
    theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"))

Observations:

  • Optimism and Mastery (0.57): There is a strong positive correlation, indicating that individuals with higher optimism tend to have a greater sense of mastery and control over their lives.

  • Optimism and Positive Affect (0.42): A moderate positive correlation suggests that optimistic individuals generally experience higher positive emotions.

  • Optimism and Negative Affect (-0.34): A weak negative correlation implies that higher optimism is associated with slightly lower negative emotions.

  • Positive Affect and Mastery (0.46): A moderate positive correlation indicates that individuals with high positive affect also tend to feel a stronger sense of mastery.

  • Negative Affect and Mastery (-0.43): A moderate negative correlation suggests that individuals with a strong sense of control experience fewer negative emotions.

  • Positive Affect and Negative Affect (-0.36): A weak negative correlation means that while positive and negative emotions are related, they do not completely oppose each other.

Conclusion

This report highlights key findings on psychological well-being and optimism:

  1. Optimism is positively associated with Positive Affect and weakly with Mastery.

  2. Smokers tend to have slightly lower optimism than non-smokers.

  3. Individuals with high optimism tend to have lower negative affect.

  4. The correlation heatmap and scatterplots validate these relationships.

Further research could explore causality between these psychological factors and their real-world implications.