Week 9 coding goals

My goals for this week were quite ambitious, and unfortunately I was not able to complete all of them as I had a 60% essay due for another subject. But, I promise I tried my best to at least make some progress. Here were my goals for this week:

Challenges and successes

1. Keep cleaning up code

Using Jenny’s advice from last learning log, I was able to clean up my code for descriptive statistics slightly more. First, as usual I loaded the packages and excluded participants.

#load packages
library(readspss) #package to read the original datafile from OFS
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.4     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.6
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dplyr)

#read data
data <- read.sav("Humiston & Wamsley 2019 data.sav")

#remove excluded 
cleandata <- data %>%     #remove excluded participants 
  filter(exclude=="no")

As a reminder, here is what a large chunk of our descriptive code looked like:

#original unclean data
BIB <- cleandata %>% #Calculating baseline implicit bias from "cleandata" data
  select( #select two variables baseline IAT scores for race and gender 
    base_IAT_race,
    base_IAT_gen) %>% 
  summarise( #using summarise() to calculate means and sds
    BIBaverage = mean(rbind(base_IAT_race, base_IAT_gen)), #use rbind() function to bind together the race and gender baseline IAT values, and then calculate mean and sds for this binded value 
    BIBsd = sd(rbind(base_IAT_race, base_IAT_gen))
            )

print(BIB)
##   BIBaverage     BIBsd
## 1  0.5565373 0.4058619
PrenapIB <- cleandata %>% #Calculating prenap implicit bias from "cleandata" data
  select( #select two variables prenap IAT scores for race and gender 
    pre_IAT_race,
    pre_IAT_gen) %>% 
  summarise( #using summarise() to calculate means and sds
    PrenapIBaverage = mean(
      rbind( #use rbind() function to bind together the race and gender prenap IAT values
        pre_IAT_race, 
        pre_IAT_gen)
      ),
    PrenapIBsd = sd( #now calculate mean and sds for this binded value 
      rbind(
        pre_IAT_race, 
        pre_IAT_gen))
            )

print(PrenapIB)
##   PrenapIBaverage PrenapIBsd
## 1       0.2566674  0.4776418
PostnapIB <- cleandata %>% #Calculating postnap implicit bias from "cleandata" data
  select( #select two variables postnap IAT scores for race and gender 
    post_IAT_race,
    post_IAT_gen) %>% 
  summarise( #using summarise() to calculate means and sds
    PostnapIBaverage = mean(
      rbind( #use rbind() function to bind together the race and gender postnap IAT values
        post_IAT_race,
        post_IAT_gen
      )),
    PostnapIBsd = sd( #now calculate mean and sds for this binded value 
      rbind(
        post_IAT_race,
        post_IAT_gen
      ))
  )

print(PostnapIB)
##   PostnapIBaverage PostnapIBsd
## 1        0.2776836   0.4585372
OWDIB <- cleandata %>% #Calculating one-week delay implicit bias from "cleandata" data
  select( #select two variables one-week delay IAT scores for race and gender 
    week_IAT_race,
    week_IAT_gen) %>% 
  summarise( #using summarise() to calculate means and sds
    OWDIBaverage = mean(
      rbind( #use rbind() function to bind together the race and gender one-week delay IAT values
        week_IAT_race,
        week_IAT_gen
      )
    ),
    OWDIBsd = sd( #now calculate mean and sds for this binded value 
      rbind(
        week_IAT_race,
        week_IAT_gen
      )
    )
  )

print(OWDIB)
##   OWDIBaverage   OWDIBsd
## 1    0.3994186 0.4254629

While there wasn’t any other way to use rbind() for every variable like I initially wanted, Jenny suggested I try put everything into one pipe or try write my own function since those three descriptives were almost completely identical. So, I attempted to write my own function using Jenny’s advice from the Week 7 Q and A.

Before I started any of that, I decided to follow Jenny’s advice and select only variables of interest to make things easier to work with.

To do this, I copied and pasted one of the code chunks from above and replaced the specific variable with a general variable name which I’ve called time_race for the variables called (time)_IAT_race, and time_gen for the variables called (time)_IAT_gen.

#selecting variables of interest
implicit_bias_time <- cleandata %>%
  select(base_IAT_race, base_IAT_gen, 
         pre_IAT_race, pre_IAT_gen, 
         post_IAT_race, post_IAT_gen, 
         week_IAT_race, week_IAT_gen)

#the function
function_implicit_bias_av_sd <- function(time_race, time_gen) {
  implicit_bias_time %>% 
  select(
    time_race,
    time_gen) %>% 
  summarise(
    BIBaverage = mean(rbind(time_race, time_gen)),
    BIBsd = sd(rbind(time_race, time_gen))
            )
}

#running the function
function_implicit_bias_av_sd(time_race = base_IAT_race, time_gen = base_IAT_gen)

I thought I had done everything correctly, but unfortunately when I tried to run my function, an error came up saying that Error: object'base_IAT_race' not found. So, I tried to call the variable using the $.

#running function with called variable
function_implicit_bias_av_sd(time_race = implicit_bias_time$base_IAT_race, time_gen = implicit_bias_time$base_IAT_gen)

But unfortunately, this came up with a different error: Error: Must subset columns with a valid subscript vector. x Can't convert from <double> to <integer> due to loss of precision. When I tried to Google an answer, the only Stack Overflow discussion told me to rename my variables with underscores (which they already had), or to call the variable, which I already tried.

So, for now, this has been a failure. So, I pose this question to Jenny: how do I make my function work when this error comes up?

2. Come up with another exploratory analysis question

I haven’t completely come up with a new question, but from our workshop on Thursday I did have a great chat with Laura who gave me some ideas.

Firstly, Jenny reassured us that we didn’t need completely unique questions, which was very reassuring, so I think I may create another scatterplot to look at sleep quality in one of my variables.

Secondly, in my chat with Laura, she suggested that I do a histogram and sent me the code for hers. This is a great idea as our group did not work with a histogram in our original verification exercise.

Unfortunately, this goal was a failure in the sense that I was unable to completely come up with a question due to lack of time, but I have some great ideas now.

3. Start statistical analysis for question 1

Again, unfortunately, I was incredibly pressed for time this week due to other subjects so I was unable to attempt this.

4. Start plotting questions 2 and 3 for exploratory analysis

Again, unfortunately I ran out of time to start this week.

Next steps in my coding journey

Thankfully, the last assessment I have for this term is the final verification report. This means I have an entire week to concentrate solely on this, so I will be planning to get a lot done.

Questions for Jenny

Firstly, thank you so much for answering my questions last week! I really appreciate that you’ve taken the time to do this outside of work hours and for the entire class too, thank you so much. I really hope you, and the rest of the team on PSYC3361 get a raise, recognition and some well-deserved rest these holidays.

My question this week relates to my function in the “1. Cleaning up code” section of this learning log. For some reason, I keep getting the following error when trying to call my function: Error: Must subset columns with a valid subscript vector. x Can't convert from <double> to <integer> due to loss of precision Solutions on Google haven’t worked, and I’m not too sure why this function isn’t working. Do you have any tips on how I might be able to solve this issue?