Overview

This file contains a set of tasks that you need to complete in R for the lab assignment. The tasks may require you to add a code chuck, type code into a chunk, and/or execute code. Don’t forget that you need to acknowledge if you used any resources beyond class materials or got help to complete the assignment.

Additional information, include examples and code, about this assignment can be found in the file “VisualizingRelationshipsBetween2Variables.html”.

The data set you will use is different than the one used in the instructions. Pay attention to the differences in the Excel files’ names, any variable names, or object names. You will need to adjust your code accordingly.

Once you have completed the assignment, you will need to knit this R Markdown file to produce an .html file. You will then need to upload the .html file and this .Rmd file to AsULearn.

1. Add your Name and the Date

The first thing you need to do in this file is to add your name and date in the lines underneath this document’s title.

2. Identify and Set Your Working Directory

You need to identify and set your working directory in this section. If you are working in the cloud version of RStudio, enter a note here to tell us that you did not need to change the working directory because you are working in the cloud.

getwd()
## [1] "/Users/summersimpson/Downloads/VisualizingRelationshipsFall2025"
setwd("/Users/summersimpson/Downloads/VisualizingRelationshipsFall2025")

3. Installing and Loading Packages and Data Set

You need to install and load the packages and data set you’ll use for the lab assignment in this section. In this lab, we will use the following packages: dplyr, tidyverse, forcats, ggplot2, janitor and openxlsx. We have not used the package janitor in previous labs, so you will need to install it before you can load it.

library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.2     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.3.0
## ✔ purrr     1.1.0     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("forcats")
library("ggplot2")
library("openxlsx")
VisualizingRelationshipsData <- read.xlsx("VisualizingRelationshipsData.xlsx")

4. Scatterplot of Price for Rib Plate and Driving Distance

Create a scatterplot showing the relationship between variables that identify how much respondents are willing to pay for a plate of ribs and how far they are willing to drive for good BBQ. When making this scatterplot, let’s assume you are interested in whether how much respondents are willing to pay for a plate of ribs influences how far they are willing to drive for good BBQ.

ggplot(VisualizingRelationshipsData, aes(x = Ribs.Price, y = Minutes.Driving)) +
  geom_point(color = "darkseagreen2") +
  geom_smooth(method = "lm", se = FALSE, color = "darkorchid4") +
  labs(
    x = "Amount Willing to Pay",
    y = "Distance Willing to Drive",
    title = "Relationship Between Cost and Driving Distance for BBQ"
  ) +
  theme_minimal() +
  theme(
    panel.grid = element_blank(),
    panel.background = element_blank()
  )
## `geom_smooth()` using formula = 'y ~ x'

5. Describe Scatterplot of Price for Rib Plate and Driving Distance

Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null. *The amount someone is willing to pay for ribs is the independent variable, and the distance they are willing to drive is the dependent variable. The scatterplot shows a positive relationship, which means people who are willing to spend more tend to drive farther for BBQ.

6. Scatterplot of Driving Distance and Age

Create a scatterplot showing the relationship between how far a respondent is willing to drive for good BBQ and their age. When making this scatterplot, let’s assume you are interested in whether how far someone is willing to drive is a function of their age. - Change the labels on the x- and y-axes to words instead of using the variable names. Make sure that the labels on both the x- and y-axes are capitalized.

ggplot(VisualizingRelationshipsData, aes(x = Age, y = Minutes.Driving)) +
  geom_point(color = "deepskyblue") +
  geom_smooth(method = "lm", se = FALSE, color = "firebrick2") +
  labs(
    x = "Age of Respondent",
    y = "Distance Willing to Drive",
    title = "Relationship Between Age and Driving Distance for BBQ"
  ) +
  theme_minimal() +
  theme(
    panel.grid = element_blank(),      
    panel.background = element_blank() 
  )
## `geom_smooth()` using formula = 'y ~ x'

7. Describe Scatterplot of Driving Distance and Age

Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null. *Age is the independent variable, and driving distance is the dependent variable. The scatterplot shows a null relationship, meaning age does not appear to strongly influence how far someone is willing to drive for BBQ.

8. Creating Dichotomous Variables

You need to create three dichotomous variables based on existing variables in the data set in this section. The first should be named “Prefers.Brisket” and should take on a value of “1” if a respondent identified beef brisket as their preferred type of BBQ meat and a value of “0” if they did not. The second should be named “Prefers.Fries” and should take on a value of “1” if a respondent identified french fries as their preferred side dish and a value of “0” if they did not. The third should be named “Longer.Distances” and should take on a value of “1” if a respondent is willing to drive longer than average for good BBQ and and value of “0” if they are not.

VisualizingRelationshipsData <- VisualizingRelationshipsData %>%
  mutate(
    Prefers.Brisket = ifelse(Favorite.Meat == "Beef Brisket", 1, 0),
    Prefers.Fries = ifelse(Favorite.Side == "French Fries", 1, 0),
    Longer.Distances = ifelse(Minutes.Driving > mean(Minutes.Driving, na.rm = TRUE), 1, 0)
  )

9. Scatterplot of Price of Dinner Plate and Prefers Brisket

We want to know if those who prefer brisket over other types of meat are willing to pay more for a dinner plate than those who do not prefer brisket. Create a scatter plot between the variable for the price someone is willing to pay for a dinner plate and the dichotomous variable you created indicating if someone prefers brisket.

ggplot(VisualizingRelationshipsData, aes(x = Prefers.Brisket, y = Dinner.Plate.Price)) +
  geom_jitter(color = "lightsteelblue4", width = 0.15, height = 0) +
  scale_x_continuous(breaks = c(0, 1),
                     labels = c("Does Not Prefer Brisket", "Prefers Brisket")) +
  labs(
    x = "Brisket Preference",
    y = "Dinner Plate Price",
    title = "Dinner Plate Price by Brisket Preference"
  ) +
  theme_classic() +
  theme(
    panel.background = element_blank(),
    panel.grid = element_blank()
  )

10. Describe Scatterplot of Price of Dinner Plate and Perfers Brisket

Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null. *The independent variable in this scatterplot is whether someone prefers brisket, and the dependent variable is the price they are willing to pay for a dinner plate. The relationship is null, because there is no clear difference in dinner plate price between those who prefer brisket and those who do not.

11. Scatterplot of Age and Prefers Fries

We want to know if those who prefer fries are older than those who do not prefer fries. Create a scatter plot between the variable for the respondent’s age and the dichotomous variable you created indicating if someone prefers fries as their favorite side.

ggplot(VisualizingRelationshipsData, aes(x = Prefers.Fries, y = Age)) +
  geom_jitter(color = "goldenrod3", width = 0.2, height = 0) +
  labs(
    x = "Prefers Fries",
    y = "Age",
    title = "Relationship Between Age and Preference for Fries"
  ) +
  theme_minimal() +
  theme(
    panel.grid = element_blank(),     
    panel.background = element_blank() 
  ) +
  scale_x_continuous(
    breaks = c(0, 1),
    labels = c("Does Not Prefer Fries", "Prefers Fries")
  )

12. Describe Scatterplot of Age and Prefers Fries

Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null. *The independent variable is whether the respondent prefers fries, and the dependent variable is age. The scatterplot shows a null relationship,where preferring fries does not appear to be associated with your age

14. Contingency Table of Prefers Brisket and Prefers Fries

We are interested iin whether someone who prefers brisket is more or less likely to also prefer fries than someone who does not prefer brisket. Construct a contingency table between the two dichotomous variables that you created indicating if someone prefers brisket and if someone prefers fries.

table_brisket_fries <- table(PrefersBrisket = VisualizingRelationshipsData$Prefers.Brisket,
                             PrefersFries = VisualizingRelationshipsData$Prefers.Fries)

percent_table <- prop.table(table_brisket_fries, margin = 2) * 100
percent_table <- round(percent_table, 1)

final_table <- matrix(
  paste0(table_brisket_fries, " (", percent_table, "%)"),
  nrow = nrow(table_brisket_fries),
  dimnames = dimnames(table_brisket_fries)
)

15. Describe Contingency Table of Prefers Brisket and Prefers Fries

Write a brief description of the relationship between these two variables identified in the contingency table you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null. *The independent variable is whether someone prefers brisket, and the dependent variable is whether they prefer fries. The table shows a null relationship, meaning preferring brisket does not influence whether someone also prefers fries.

Publish Document

Click the “Knit” button to publish your work as an html document. This document or file will appear in the folder specified by your working directory. You will need to upload both this RMarkdown file and the html file it produces to AsU Learn to get all of the points associated with this lab.