Weekly Lab Homework Assignment: Computation

Objective:

This lab assignment aims to reinforce your understanding of data cleaning and descriptive analysis using the dplyr and psych packages in R. You will apply these concepts through practical exercises, focusing on using and stacking dplyr functions with the %>% operator.

Instructions:

Complete each exercise by writing the necessary R code.
Ensure you use the %>% operator to chain multiple dplyr functions together.
Interpret the results for each exercise.
Knit your R Markdown file to a PDF and submit it as per the submission instructions.

Homework Exercises:

Exercise 1: Cleaning Data with `dplyr`

Clean a dataset using various dplyr functions.

Use the following dataset for the exercise:

data <- data.frame(
  participant_id = 1:10,
  reaction_time = c(250, 340, 295, NA, 310, 275, 325, 290, 360, NA),
  gender = c("M", "F", "F", "M", "M", "F", "M", "F", "M", "F"),
  accuracy = c(95, 87, 92, 88, 94, 91, 85, 89, 93, NA)
)

print(data)

##    participant_id reaction_time gender accuracy
## 1               1           250      M       95
## 2               2           340      F       87
## 3               3           295      F       92
## 4               4            NA      M       88
## 5               5           310      M       94
## 6               6           275      F       91
## 7               7           325      M       85
## 8               8           290      F       89
## 9               9           360      M       93
## 10             10            NA      F       NA

Clean the dataset by performing the following steps:
- Remove rows with missing values.
- Rename the reaction_time column to response_time.
- Create a new column performance_group based on accuracy (High if accuracy >= 90, otherwise Low).
- Remove outliers from the response_time column.
- Relevel the performance_group column to set “Low” as the reference level.

# Install the dplyr package (if not already installed)
if(!require(dplyr)){install.packages("dplyr", dependencies=TRUE)}

#Load Library
library(dplyr)

 remove_outliers <- function(data, column) {  
   # Calculate quartiles and IQR using tidy evaluation  
   Q1 <- quantile(pull(data, {{ column }}), 0.25, na.rm = TRUE)  
   Q3 <- quantile(pull(data, {{ column }}), 0.75, na.rm = TRUE)  
   IQR_val <- Q3 - Q1  
   lower_bound <- Q1 - 1.5 * IQR_val  
   upper_bound <- Q3 + 1.5 * IQR_val  
     
   # Filter rows based on the calculated bounds  
   data %>%  
     filter({{ column }} >= lower_bound,  
            {{ column }} <= upper_bound)  
 }

#create cleaned_data
#create cleaned_data Ctrl + Shift + M for pipe
cleaned_data <- data %>%
  na.omit() %>% 
  rename(response_time = reaction_time) %>% 
  mutate(performance_group = ifelse(accuracy >= 90, "High", "Low")) %>% 
  remove_outliers(response_time) %>% 
  mutate(performance_group = relevel(factor(performance_group), ref = "Low"))
print(cleaned_data)

##   participant_id response_time gender accuracy performance_group
## 1              1           250      M       95              High
## 2              2           340      F       87               Low
## 3              3           295      F       92              High
## 4              5           310      M       94              High
## 5              6           275      F       91              High
## 6              7           325      M       85               Low
## 7              8           290      F       89               Low
## 8              9           360      M       93              High

Interpretation: In the cleaned dataset, we removed 2 rows with missing data. There were no outliers so no outliers were removed. We renamed a column from reaction reaction time to response time. We also added in a new column called performance group that categorized participant accuracy as low or high, with low as the reference group.

Exercise 2: Generating Descriptive Statistics with `psych`

Use the following dataset for the exercise:

study_hours <- data.frame(
  participant_id = 1:10,
  hours = c(5, 6, 4, 7, 5, 3, 8, 6, 5, 7)
)

Generate descriptive statistics using the describe() function from the psych package.

# Install the psych package (if not already installed)
if(!require(psych)){install.packages("psych", dependencies=TRUE)}

#load the psych package
library(psych)

# Generate descriptive statistics
describe(study_hours)

##                vars  n mean   sd median trimmed  mad min max range  skew
## participant_id    1 10  5.5 3.03    5.5    5.50 3.71   1  10     9  0.00
## hours             2 10  5.6 1.51    5.5    5.62 1.48   3   8     5 -0.08
##                kurtosis   se
## participant_id    -1.56 0.96
## hours             -1.18 0.48

Interpretation: The mean number of hours studied is 5.6 with a standard deviation of 1.51. The median is 5.5, which is very similar to the mean. This means that we do not have any outliers. The study hours ranged from 3 to 8 hours. There is a slightly negative skew, but it is very close to zero implying that we have a normal distribution.

Create graphical summaries of a dataset using the psych package.

Use the following dataset for the exercise:

experiment_data <- data.frame(
  response_time = c(250, 340, 295, 310, 275, 325, 290, 360, 285, 310),
  accuracy = c(95, 87, 92, 88, 94, 91, 85, 89, 93, 90),
  age = c(23, 35, 29, 22, 30, 31, 27, 40, 24, 32)
)

Create a correlation plot using the corPlot() function.

# Create the correlation plot
corPlot(cor(experiment_data))

Interpretation: There is a strong negative correlation between accuracy and response time. The higher the response time, the lower the accuracy is. There is a strong positive correlation between age and response time, as age increases so does response time. There is a small negative correlation between accuracy and age such that as age increases, accuracy decreases.
Create pair panels using the pairs.panels() function.

# Create the pair panels
experiment_data %>%
  pairs.panels()

Interpretation: The histograms show that the data is fairly evenly spread out. The scatter plots clearly show us the strong relationships between response time, accuracy, and age.

Weekly Lab Homework Assignment: Computation

James Cronin

09 March, 2025

Objective:

Instructions:

Homework Exercises:

Exercise 1: Cleaning Data with `dplyr`

Exercise 2: Generating Descriptive Statistics with `psych`

Weekly Lab Homework Assignment: Computation

James Cronin

09 March, 2025

Objective:

Instructions:

Homework Exercises:

Exercise 1: Cleaning Data with dplyr

Exercise 2: Generating Descriptive Statistics with psych

Exercise 1: Cleaning Data with `dplyr`

Exercise 2: Generating Descriptive Statistics with `psych`