This document repeats a bunch of the code from the first two R Assignments for PSYC 1490, much of which will be useful for you as you do the analyses that you’ll write up for your third assignment. You may also want to refer back to those other two assignments for additional techniques—you can open a fresh Markdown file from either those previous labs in the Files pane of this project, or you can go back to your own version of those assignments to find the code you wrote previously.

And as usual, ask your TA or a classmate for help! R takes a while to learn, but the more help you get, the faster that process will go.

Loading raw data into R

The first thing you do in any data analysis document is to load your fresh raw data into R so you can later work with and calculate statistics on your data.

Read in your data file using read_csv("classdata_2020.csv"), and assign it to the dataframe name IntroSurvey using the <- left-arrow operator.

IntroSurvey <- read_csv("classdata_2020.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   time_planning = col_character(),
##   dec_mode = col_character(),
##   class = col_character(),
##   school = col_character(),
##   gender = col_character(),
##   handed = col_character(),
##   major = col_character(),
##   concentration = col_character()
## )
## See spec(...) for full column specifications.

Cleaning and manipulating your data

Before we calculate any statistics on our data, we may need to clean the raw data, or calculate additional values of interest in our data. It is important to do this before calculating any statistics, so that any values you may need for your statistics are already present in your data by the time you run your “analysis” code.

Continuous variables: calculating overall scores for Regret and Maximization scales

The code below will calculate new continuous columns based on existing raw data in IntroSurvey:

  • regret_total, for overall Regret Score on the Regret Scale, by averaging all 5 individual regret scores, regret1 to regret5
  • maxi_total, for overall Maximizing score on the Maximizing Scale, by averaging all 6 individual regret scores, maxi1 to maxi6
  • maxi_as, for Maximizing Scale: Alternative Search subscale (maxi1 & maxi3)
  • maxi_dd, for Maximizing Scale: Decision Difficulty subscale (maxi2 & maxi5)
  • maxi_hs, for Maximizing Scale: High Standards subscale (maxi4 & maxi6)

If you want to create any new columns of data, you can put them in the chunk below.

IntroSurvey <- mutate(IntroSurvey,
                      regret_total = (regret1 + regret2 + regret3 + regret4 + regret5) / 5,
                      maxi_total = (maxi1 + maxi2 + maxi3 + maxi4 + maxi5 + maxi6) / 6,
                      maxi_as = (maxi1 + maxi3) / 2,
                      maxi_dd = (maxi2 + maxi5) / 2,
                      maxi_hs = (maxi4 + maxi6) / 2)

Categorical variables: recoding decision mode, school affiliation, and handedness as BINARY

Above, we used mutate() to create new continuous variables by averaging the values of other continuous variables together. We can also create new categorical variables by recoding the values of existing categorical variables into levels that are more convenient for the statistics we want to calculate.

The code below will calculate new categorical columns based on existing raw data in IntroSurvey:

  • dec_mode_binary: a binary variable indicating whether participants reported using a “calculation-based” decision mode or not
  • school_binary: a binary variable indicating whether participants are CC/Barnard-affiliated (grouped together) or GS/SPS-affiliated (grouped together)
  • handed_binary: a binary variable indicating whether participants are right-handed vs. anything else
IntroSurvey <- mutate(IntroSurvey,
                      dec_mode_binary = ifelse(startsWith(dec_mode, "calculation-based"),
                                          "calc",
                                          "not-calc"),
                      school_binary = ifelse(school %in% c("Barnard", "CC"),
                                             "B_CC",
                                             "GS_SPS"),
                      handed_binary = ifelse(handed == "R",
                                             "R",
                                             "notR"))

Calculating descriptive statistics about the participant population

The following code will generate summary statistics about the entire participant population who filled out the Intro Survey. You will need to report these in your Methods section of Assignment 3.

Mean, SD, and # no-responses of reported age

# First, create a summary dataframe to hold the relevant info
age_summary <- summarize(IntroSurvey,
                         mean_age = mean(age, na.rm = TRUE),
                         sd_age = sd(age, na.rm = TRUE),
                         no_responses_age = sum(is.na(age)))

# Then print the dataframe to show the summary values inside
age_summary
## # A tibble: 1 x 3
##   mean_age sd_age no_responses_age
##      <dbl>  <dbl>            <int>
## 1     23.0   3.85                2

Counts of reported gender

gender_count <- count(IntroSurvey,
                     gender)

gender_count
## # A tibble: 3 x 2
##   gender                    n
##   <chr>                 <int>
## 1 F                        45
## 2 M                        23
## 3 Prefer not to specify     1

Calculating & reporting inferential statistics

Correlation

Here is example code for you to use to generate all the statistics you need to report a Pearson correlation for Assignment 3. The example code is written to correlate age and birth year across all participants. This is not a truly meaningful correlation, so you need to change the variables in the code below to variables you want to calculate a correlation for.

In APA format, when reporting the results of a statistical test, you must first report some descriptive statistics about the variables going into the test. When reporting a Pearson correlation, you must first report the mean and standard deviation of both of the variables going into the correlation.

Use the code below, featuring the function summarize(), to calculate these descriptive statistics. Change the variables age and birthyear to whatever two variables you want to run in your correlation.

cor_summaries <- summarize(IntroSurvey,
                               mean_var1 = mean(age, na.rm = TRUE),
                               sd_var1 = sd(age, na.rm = TRUE),
                               mean_var2 = mean(psych_courses, na.rm = TRUE),
                               sd_var2 = sd(psych_courses, na.rm = TRUE))

cor_summaries
## # A tibble: 1 x 4
##   mean_var1 sd_var1 mean_var2 sd_var2
##       <dbl>   <dbl>     <dbl>   <dbl>
## 1      23.0    3.85      4.24    2.27

Use the code below, featuring the function cor.test(), to calculate your Pearson correlation. Change the variables age and birthyear to whatever two variables you want to run in your correlation.

cor_report <- cor.test(~ age + psych_courses, method = "pearson", data = IntroSurvey)

cor_report
## 
##  Pearson's product-moment correlation
## 
## data:  age and psych_courses
## t = 0.9975, df = 54, p-value = 0.323
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1330988  0.3838356
## sample estimates:
##       cor 
## 0.1345084

Use the code below, without any changes, to see the estimated Pearson’s r for the test above.

cor_report$estimate
##       cor 
## 0.1345084

Use the code below, without any changes, to see the estimated p-value for the test above.

cor_report$p.value
## [1] 0.322973

T-test

Here is example code for you to use to generate all the statistics you need to report a t-test for Assignment 3. The example code is written to test for a difference in age by binary handedness (right vs not right) across all participants. This is obviously not a meaningful t-test, so you need to change the variables in the code below to variables you want to calculate a t-test for.

In APA format, when reporting the results of a statistical test, you must first report some descriptive statistics about the variables going into the test. When reporting a t-test, you must first report the mean and standard deviation of the dependent variable, separately for participants in both of the independent variable groups.

Use the code below, featuring the functions group_by() and summarize(), to calculate these descriptive statistics. Change the variables: change handed_binary to your binary IV of choice for your t-test, and change age to your DV of choice.

t_test_summaries <- IntroSurvey %>%
  group_by(school_binary) %>%
  summarize(count_dv = n(),
            mean_dv = mean(age, na.rm = TRUE),
            sd_dv = sd(age, na.rm = TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
t_test_summaries
## # A tibble: 2 x 4
##   school_binary count_dv mean_dv sd_dv
##   <chr>            <int>   <dbl> <dbl>
## 1 B_CC                40    20.4 0.788
## 2 GS_SPS              29    26.5 3.60

Use the code below, featuring the function t.test(), to calculate your t-test. Change the variables handed_binary and age to your IV and DV of choice for your t-test. Remember that the formula is dv ~ iv, so the DV goes BEFORE the tilde (~) and the IV goes AFTER the tilde.

t_test_report <- t.test(age ~ school_binary, data = IntroSurvey)

t_test_report
## 
##  Welch Two Sample t-test
## 
## data:  age by school_binary
## t = -8.8272, df = 28.87, p-value = 1.074e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.513391 -4.686243
## sample estimates:
##   mean in group B_CC mean in group GS_SPS 
##             20.43590             26.53571

Use the code below, without any changes, to see the estimated difference in DV value between the two IV groups for the test above.

t_test_report$estimate
##   mean in group B_CC mean in group GS_SPS 
##             20.43590             26.53571

Use the code below, without any changes, to see the estimated t-value for the test above.

t_test_report$statistic
##         t 
## -8.827248

Use the code below, without any changes, to see the estimated p-value for the test above.

t_test_report$p.value
## [1] 1.074077e-09

Graphing data

Run this code first to make your plot theme before running any other plot code.

bwtheme <- theme_bw() + 
  theme(strip.background = element_rect(fill = 'white')) +
  theme(plot.title = element_text(size = rel(1))) + 
  theme(axis.title = element_text(size = rel(1)))

In your Assignment 3 write-up, you can choose to include either a scatterplot to go with your Pearson correlation or a boxplot to go with your t-test. We have included example code below for both options. Choose the one you want, and edit the code for that one to plot the variables you want and put the correct x-axis label, y-axis label, and title on your plot. You can ignore the code for the other plot.

To save the image file for your plot, select the Plots tab (usually just after the Files tab in the bottom righthand pane in you RStudio window), and click “Export.” You can specify the filename and type, and also adjust the aspect ratio before saving. By default, the plot will save in your current working

Scatter plot

This example scatterplot has birthyear on the x-axis and age on the y-axis.

scatterplot1 <- ggplot(IntroSurvey, aes(x = psych_courses, y = age)) +
              # shape = 1 is a hollow circle. play around with other numbers to find other shapes!
              geom_point(shape = 1) +
              # geom_smooth() adds a regression line
              geom_smooth(method = lm) +
              # labs() lets you modify various labels in a ggplot
              labs(title = "Age as a function of psych courses",
                x = "Psych courses",
                y = "Age (years)") +
             bwtheme

scatterplot1
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 13 rows containing non-finite values (stat_smooth).
## Warning: Removed 13 rows containing missing values (geom_point).

Box plot

This example boxplot has handed_binary on the x-axis and age on the y-axis.

boxplot1 <- ggplot(IntroSurvey, aes(x = school_binary, y = age)) +
            geom_boxplot() +
            geom_jitter(position=position_jitter(width = .1, height = 0)) +
            labs(title = "Age as a function of binary of schools",
              x = "Binary of Schools",
              y = "Age (years)") +
            bwtheme

boxplot1
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
## Warning: Removed 2 rows containing missing values (geom_point).