Psych 205 - Lab 2 -

I USED CHAT GPT TO SUPPORT THIS ASSIGNMENT

Prof. Key

The goal of this lab is to get practice describing data and working in Quarto. Download this file as a .qmd from the course page.

In Lecture.

  1. One advantage of Quarto (and R Markdown) is that you can run code in a document. You do this using a “code block”. In the space below, insert an R code block, type out a math equation that used to give you difficulty as a kid into the code block below, and run the code to see the result. Below the code block, add text for humans to determine whether R got the math problem correct or not. Then render the document as a .pdf and .html file. Did this work?

  2. Load the “grad onboarding” survey into the code block below, and answer the following questions. Note : to successfully render code in the document, you must a) explicitly load the dataset into your Quarto document and b) make sure you have no errors in your code. :)

    • Graph the variables self.skills and class.skills side by side using the par() function. Change the formatting of the graph to make it look ready for presentation. Add vertical lines to each graph to illustrate the mean (solid line) and standard deviation (dashed lines).
    • Below each graph, report the mean and standard deviation of both variables, and interpret what these statistics tell you about the individuals in our class. (Who cares? What do these statistics tell us?)
data <- read.csv("/Users/ccnlab/Documents/Classes/Psych205/grad_onboard_SP25.csv", stringsAsFactors = TRUE)
## thinking about epistemology

hist(data$class.skills, main = 'Perceptions of Class Skills', xlab = '', ylim = c(0,20))
mean_class = mean(data$class.skills, na.rm=T)
std_class = sd(data$class.skills)
abline(v = mean_class, col = "blue", lwd = 2)  # Mean as a solid line
abline(v = mean_class + std_class, col = "red", lty = 2)
abline(v = mean_class - std_class, col = "red", lty = 2) 
mtext(paste("Mean:", round(mean_class, 2), "SD:", round(std_class, 2), 'people see others have higher skllset'), side = 1, line = 3)

# Second plot
hist(data$self.skills, main='Self Perceptions of Skills', xlab = '', ylim = c(0,20))
mean_self = mean(data$self.skills, na.rm=T)
std_self = sd(data$self.skills)
abline(v = mean_self, col = "blue", lwd = 2)  # Mean as a solid line
abline(v = mean_self + std_self, col = "red", lty = 2)
abline(v = mean_self- std_self, col = "red", lty = 2) 
mtext(paste("Mean:", round(mean_self, 2), "SD:", round(std_self, 2), '\n people identified themselves having higher than average skillset'), side = 1, line = 3)

par(mfrow = c(1, 1))
  1. Split your graphics window into a 2x5 grid, and graph each of the 10 “can.*” variables in the dataset (e.g., “can.import”, “can.clean”, etc.). Make sure each graph contains the name of the variable and that the graph looks good / is intelligible. Note : you can and should use a for-loop to do this! Then, report the frequencies of these variables - what are some things you observe about the data? Does this make sense given what you know about the participants?

    # Set up a 2x5 plotting area
    par(mfrow = c(2, 5), mar = c(4, 4, 2, 1))
    
    # List of "can.*" variables
    can_vars <- grep("^can\\.", names(data), value = TRUE)
    
    # Plot each variable
    for (var in can_vars) {
      barplot(table(data[[var]]), main = var, col = c("lightblue", "lightcoral"), ylim = c(0, 60))
    }

    # Reset plotting parameters
    par(mfrow = c(1, 1))

    # Report frequencies
    for (var in can_vars) {
      cat("\nFrequencies for", var, ":\n")
      print(table(data[[var]]))
    }     
    
    Frequencies for can.import :
    
    Maybe    No   Yes 
        6     2    25 
    
    Frequencies for can.clean :
    
    Maybe    No   Yes 
       10     5    18 
    
    Frequencies for can.graph :
    
    Maybe    No   Yes 
       10     5    18 
    
    Frequencies for can.render :
    
    Maybe    No   Yes 
       11    11    11 
    
    Frequencies for can.lm :
    
          Maybe    No   Yes 
        1    12     8    12 
    
    Frequencies for can.interp :
    
    Maybe    No   Yes 
        8     5    20 
    
    Frequencies for can.pvalue :
    
    Maybe    No   Yes 
       10     2    21 
    
    Frequencies for can.sevsd :
    
    Maybe    No   Yes 
       14     7    12 
    
    Frequencies for can.95ci :
    
    Maybe    No   Yes 
       15     3    15 
    
    Frequencies for can.forloop :
    
    Maybe    No   Yes 
       15    10     8 
    
    Frequencies for can.science :
    
     3  4  5 
     3 14 16 

The results make sense to me as the class self reported higher than average skill in statistics.

In Section

  1. Choose one of the datasets from the class folder (avoid the one(s) labeled [repeated measures] as we will get to these later. I’ll be focusing on the “Perceptions of the Wealthy” dataset if you want to follow along with my key when it’s posted). Look over the accompanying article for a guide to the variables, and identify two numeric variables from the dataset that seem interesting to you. Graph these variables, report the relevant descriptive statistics for each variable. Then explain what these statistics and graphs tell you about the individuals in the dataset, and what other questions you might ask about these individuals (e.g., what do the data NOT tell you?)

On Your Own

Use these notes to answer the following questions. Let me know if anything is unclear!

Psychologists often want to combine information from multiple questions (that measure the same construct) into one variable. There are lots of different ways to do this, but the first that we will talk about is just the humble average. For example, the Rosenberg (1965) self-esteem scale has 10 questions all related to self-esteem; rather than work with 10 different variables, it might be nice to just average these 10 and report one average number. These are called “likert” scales. It’s technically pronounced ‘lick-ert’, but no one says that because it sounds kind of gross.

Here’s an overview (from my 101 class notes) of how to work with such likert scales conceptually and computationally (in R). You’ll need to load the psych library into R; to do this, run the following code in your console.

#install.packages("psych") # installs the "psych" package. you only need to do this once.`

#library(psych) # loads the library. you need to do this every R session.`
  1. Self-Esteem Problems. Use the self-esteem dataset (from the class datasets folder). You should find a codebook that describes these data in the same folder.

Check to make sure the data loaded correctly. (Note that 0s in this dataset mean the person was missing data.) Report the sample size of the dataset.

#install.packages("psych")
library(psych)

Create a self-esteem scale from the 10-items. Make sure to reverse-score the negatively-keyed items so that for every question, higher numbers measure higher self-esteem. Graph this variable as a histogram, and make the graph look nice (ready for publication). Report the alpha reliability, mean, and standard deviation of this variable. Below your graph, describe what these statistics tell you about the self-esteem of the participants. What other questions do you have about this variable, or the output?

data <- read.csv('/Users/ccnlab/Documents/Classes/Psych205/Lab2/Self-Esteem-dataset/data.csv', stringsAsFactors = TRUE, sep = "\t")

data[data == 0] <- NA
print(paste('sample size', nrow(data)))
[1] "sample size 47974"
nn = nrow(data)
# Set up a 2x5 plotting area
par(mfrow = c(2, 5), mar = c(4, 4, 2, 1))

# List of "Q*" variables
q_vars <- grep("^Q", names(data), value = TRUE)
selected_data <- data[,c("Q1","Q2","Q3","Q4", "Q5" , "Q6" , "Q7" , "Q8" , "Q9" , "Q10")]
alpha_result <- alpha(selected_data, na.rm=T)
Warning in alpha(selected_data, na.rm = T): Some items were negatively correlated with the first principal component and probably 
should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' option
Some items ( Q3 Q5 Q8 Q9 Q10 ) were negatively correlated with the first principal component and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' option
a <- alpha_result$total$raw_alpha
print(alpha_result$total$raw_alpha)
[1] -0.1743972
# Plot each variable
for (i in seq_along(q_vars)) {
  var <- q_vars[[i]]
  mean_var <- mean(data[[var]], na.rm=T)
  std_var <- sd(data[[var]], na.rm=T) 
  barplot(table(data[[var]]), main = var, col = c("lightblue", "lightcoral"), ylim = c(0, nn))
  a <- alpha_result$alpha.drop$raw_alpha[[i]]
  mtext(paste("alpha:",round(a,2), "\nMean:", round(mean_var, 2), "SD:", round(std_var, 2)), side = 1, line = 3, cex = 0.5)  
}

# Reset plotting parameters
par(mfrow = c(1, 1))

Then, graph the variable gender as a categorical factor, and report the number of people who identified as “female”, “male”, and “other”. You will need to do some data cleaning here.

# Install and load the dplyr package
#install.packages("dplyr")
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
# Convert numerical gender to categorical and filter out 0 values
data_cleaned <- data %>%
  mutate(gender = case_when(
    gender == 1 ~ "male",
    gender == 3 ~ "female",
    gender == 2 ~ "other",
    TRUE ~ NA_character_  # This handles any unexpected values
  )) %>%
  filter(!is.na(gender))

# Display the cleaned data
#print(data_cleaned)
#install.packages("ggplot2")
library(ggplot2)

Attaching package: 'ggplot2'
The following objects are masked from 'package:psych':

    %+%, alpha
# Create a bar plot of the gender variable
ggplot(data_cleaned, aes(x = gender)) +
  geom_bar(fill = "skyblue", color = "black") +
  labs(title = "Distribution of Gender", x = "Gender", y = "Count") +
  theme_minimal()

  1. Self-Esteem is Conditional. Report the mean and standard deviation of self-esteem for people who are identified as female, male, and “other” in the dataset. What differences do you observe? How might we use this knowledge / what other questions do you have? Note : there are MANY ways to do this in R; you can use the subset function or indexing to divide the dataset into three groups - “females”, “males”, and people who reported “other”. Or other fancier methods we will talk about later. See how many different ways you can do it.
# Initialize a list to store summary results
summary_results <- list()
#data_cleaned$gender <- factor(data_cleaned$gender, levels = c("male", "female", "other"))
# Loop over each variable and calculate mean and standard deviation
for (var in q_vars) {
  summary_stats <- data_cleaned %>%
    group_by(gender) %>%
    summarize(
      mean = mean(.data[[var]], na.rm = TRUE),
      sd = sd(.data[[var]], na.rm = TRUE),
      .groups = "drop"
    )
  
  # Store the result in the list
  summary_results[[var]] <- summary_stats
}

# Visualize the summary results for each variable
for (var in q_vars) {
  print(summary_results[[var]])
}
# A tibble: 3 × 3
  gender  mean    sd
  <chr>  <dbl> <dbl>
1 female  2.59 0.968
2 male    3.12 0.855
3 other   2.95 0.862
# A tibble: 3 × 3
  gender  mean    sd
  <chr>  <dbl> <dbl>
1 female  2.71 0.925
2 male    3.24 0.748
3 other   3.04 0.796
# A tibble: 3 × 3
  gender  mean    sd
  <chr>  <dbl> <dbl>
1 female  2.78 0.993
2 male    2.22 0.953
3 other   2.35 0.948
# A tibble: 3 × 3
  gender  mean    sd
  <chr>  <dbl> <dbl>
1 female  2.57 0.932
2 male    3.05 0.798
3 other   2.85 0.797
# A tibble: 3 × 3
  gender  mean    sd
  <chr>  <dbl> <dbl>
1 female  2.81 0.942
2 male    2.35 0.988
3 other   2.40 0.969
# A tibble: 3 × 3
  gender  mean    sd
  <chr>  <dbl> <dbl>
1 female  2.27 0.944
2 male    2.68 0.930
3 other   2.50 0.912
# A tibble: 3 × 3
  gender  mean    sd
  <chr>  <dbl> <dbl>
1 female  2.11 0.941
2 male    2.50 0.943
3 other   2.42 0.927
# A tibble: 3 × 3
  gender  mean    sd
  <chr>  <dbl> <dbl>
1 female  2.78 0.941
2 male    2.61 0.977
3 other   2.75 0.941
# A tibble: 3 × 3
  gender  mean    sd
  <chr>  <dbl> <dbl>
1 female  3.20 0.948
2 male    2.67 1.01 
3 other   2.85 0.970
# A tibble: 3 × 3
  gender  mean    sd
  <chr>  <dbl> <dbl>
1 female  3.11  1.02
2 male    2.43  1.08
3 other   2.66  1.06

Appendix of gpt usage