Overview

This file contains a set of tasks that you need to complete in R for the lab assignment. The tasks may require you to add a code chuck, type code into a chunk, and/or execute code. Don’t forget that you need to acknowledge if you used any resources beyond class materials or got help to complete the assignment.

Additional information, include examples and code, about this assignment can be found in the file “VisualizingRelationshipsBetween2Variables.html”.

The data set you will use is different than the one used in the instructions. Pay attention to the differences in the Excel files’ names, any variable names, or object names. You will need to adjust your code accordingly.

Once you have completed the assignment, you will need to knit this R Markdown file to produce an .html file. You will then need to upload the .html file and this .Rmd file to AsULearn.

1. Add your Name and the Date

The first thing you need to do in this file is to add your name and date in the lines underneath this document’s title.

2. Identify and Set Your Working Directory

You need to identify and set your working directory in this section. If you are working in the cloud version of RStudio, enter a note here to tell us that you did not need to change the working directory because you are working in the cloud.

getwd()

## [1] "C:/Users/brcla/Downloads/VisualizingRelationshipsFall2025/VisualizingRelationshipsFall2025"

setwd("C:/Users/brcla/Downloads/VisualizingRelationshipsFall2025/VisualizingRelationshipsFall2025")

3. Installing and Loading Packages and Data Set

You need to install and load the packages and data set you’ll use for the lab assignment in this section. In this lab, we will use the following packages: dplyr, tidyverse, forcats, ggplot2, janitor and openxlsx. We have not used the package janitor in previous labs, so you will need to install it before you can load it.

library("dplyr")

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library("tidyverse")

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.2     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.3.0
## ✔ purrr     1.1.0     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

install.packages("janitor")

## Installing package into 'C:/Users/brcla/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)

## package 'janitor' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\brcla\AppData\Local\Temp\RtmpGy7JRY\downloaded_packages

library("openxlsx")
library("forcats")
library("ggplot2")
library("janitor")

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

VisualizingBivariateData <- read.xlsx("VisualizingRelationshipsData.xlsx")

names(VisualizingBivariateData)

##  [1] "Observation"        "Sex"                "Age"               
##  [4] "Hometown"           "Favorite.Meat"      "Favorite.Sauce"    
##  [7] "Sweetness"          "Favorite.Side"      "Restaurant.City"   
## [10] "Restaurant.Name"    "Minutes.Driving"    "Sandwich.Price"    
## [13] "Dinner.Plate.Price" "Ribs.Price"

4. Scatterplot of Price for Rib Plate and Driving Distance

Create a scatterplot showing the relationship between variables that identify how much respondents are willing to pay for a plate of ribs and how far they are willing to drive for good BBQ. When making this scatterplot, let’s assume you are interested in whether how much respondents are willing to pay for a plate of ribs influences how far they are willing to drive for good BBQ.

ggplot(VisualizingBivariateData,                           
       aes(x=Ribs.Price,  y=Minutes.Driving)) + 
    geom_point()

Change the labels on the x- and y-axes to words instead of using the variable names. Make sure that the labels on both the x- and y-axes are capitalized.

ggplot(VisualizingBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +   
  geom_point()   +     
    labs(x="Ribs Price",  y="Minutes")

Change the color of the dots to darkseagreen2.

ggplot(VisualizingBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +   
  geom_point(color="darkseagreen2")   +     
    labs(x="Ribs Price",  y="Minutes")

There should be no color or grid lines in the background.

ggplot(VisualizingBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +   
  geom_point(color="darkseagreen2")   +     
    labs(x="Ribs Price",  y="Minutes") + theme_minimal()

panel.grid = element_blank()
axis.line = element_line(color = "white")

The graph should have a line of best fit in darkorchid4.

ggplot(VisualizingBivariateData,                           
       aes(x=Ribs.Price,  y=Minutes.Driving)) + 
    geom_point(color="darkseagreen2")+
    stat_smooth(method = "lm", formula = y ~ x,  geom = "smooth", se = FALSE, color="darkorchid4")+
  labs(x="Ribs Price",y="Minutes")  +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line())

The graph should have a title.

ggplot(VisualizingBivariateData,                           
       aes(x=Ribs.Price,  y=Minutes.Driving)) + 
    geom_point(color="darkseagreen2")+
    stat_smooth(method = "lm", formula = y ~ x,  geom = "smooth", se = FALSE, color="darkorchid4")+
  labs(x="Ribs Price",y="Minutes")  +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line()) +
  ggtitle("Bivariate Relationship Between Ribs Price and Minutes Driving")

5. Describe Scatterplot of Price for Rib Plate and Driving Distance

Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.

The scatter plot shows a slight upward trend which would indicate that as the price of ribs increases, the minutes someone is willing to drive increases, but not drastically. The independent variable is the price of ribs while the dependent variable is the minutes someone is willing to drive to get good BBQ. The relationship in this case would be positive.

6. Scatterplot of Driving Distance and Age

Create a scatterplot showing the relationship between how far a respondent is willing to drive for good BBQ and their age. When making this scatterplot, let’s assume you are interested in whether how far someone is willing to drive is a function of their age.

ggplot(VisualizingBivariateData,                           
       aes(x=Minutes.Driving,  y=Age)) + 
    geom_point()

Change the labels on the x- and y-axes to words instead of using the variable names. Make sure that the labels on both the x- and y-axes are capitalized.

ggplot(VisualizingBivariateData, aes(x=Minutes.Driving, y=Age)) +   
  geom_point()   +     
    labs(x="Minutes Driving",  y="Age")

Change the color of the dots to deepskyblue.

ggplot(VisualizingBivariateData, aes(x=Minutes.Driving, y=Age)) +   
  geom_point(color = "deepskyblue")   +     
    labs(x="Minutes Driving",  y="Age")

There should be no color or grid lines in the background.

ggplot(VisualizingBivariateData,                           
       aes(x=Minutes.Driving,  y=Age)) + 
    geom_point(color="deepskyblue")+
  labs(x="Minutes Driving",y="Age")  +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line())

The graph should have a line of best fit in firebrick2.

ggplot(VisualizingBivariateData,                           
       aes(x=Minutes.Driving,  y=Age)) + 
    geom_point(color="deepskyblue")+
  stat_smooth(method = "lm", formula = y ~ x, geom = "smooth", se = FALSE, color = "firebrick2") +
  labs(x="Minutes Driving",y="Age")  +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line())

The graph should have a title.

ggplot(VisualizingBivariateData,                           
       aes(x=Minutes.Driving,  y=Age)) + 
    geom_point(color="deepskyblue")+
  stat_smooth(method = "lm", formula = y ~ x, geom = "smooth", se = FALSE, color = "firebrick2") +
  labs(x="Minutes Driving",y="Age")  +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line()) +
    ggtitle("Bivariate Relationship Between Minutes Driving and Age")

7. Describe Scatterplot of Driving Distance and Age

The scatter plot shows a slight upward trend which would indicate that as the amount of minutes people are willing to drive increases, the person’s age rises, but not drastically. The independent variable is the amount of minutes someone is willing to drive while the dependent variable is the age of that person. The relationship in this case would be positive.

8. Creating Dichotomous Variables

You need to create three dichotomous variables based on existing variables in the data set in this section. The first should be named “Prefers.Brisket” and should take on a value of “1” if a respondent identified beef brisket as their preferred type of BBQ meat and a value of “0” if they did not. The second should be named “Prefers.Fries” and should take on a value of “1” if a respondent identified french fries as their preferred side dish and a value of “0” if they did not. The third should be named “Longer.Distances” and should take on a value of “1” if a respondent is willing to drive longer than average for good BBQ and and value of “0” if they are not.

library(dplyr)

colnames(VisualizingBivariateData)

##  [1] "Observation"        "Sex"                "Age"               
##  [4] "Hometown"           "Favorite.Meat"      "Favorite.Sauce"    
##  [7] "Sweetness"          "Favorite.Side"      "Restaurant.City"   
## [10] "Restaurant.Name"    "Minutes.Driving"    "Sandwich.Price"    
## [13] "Dinner.Plate.Price" "Ribs.Price"

VisualizingBivariateData %>% 
mutate(Prefers.Brisket=NA) %>%
mutate(Prefers.Brisket=replace(Prefers.Brisket, Favorite.Meat==3, 1)) %>%
mutate(Prefers.Brisket=replace(Prefers.Brisket, Favorite.Meat < 3, 0)) %>% mutate(Prefers.Brisket=replace(Prefers.Brisket, Favorite.Meat > 3, 0)) -> VisualizingBivariateData

VisualizingBivariateData %>% 
mutate(Prefers.Fries=NA) %>%
mutate(Prefers.Fries=replace(Prefers.Fries, Favorite.Side==6, 1)) %>%
mutate(Prefers.Fries=replace(Prefers.Fries, Favorite.Side < 6, 0)) %>% mutate(Prefers.Fries=replace(Prefers.Fries, Favorite.Side > 6, 0)) -> VisualizingBivariateData

mean(VisualizingBivariateData$Minutes.Driving)

## [1] 39.11346

names(VisualizingBivariateData)

##  [1] "Observation"        "Sex"                "Age"               
##  [4] "Hometown"           "Favorite.Meat"      "Favorite.Sauce"    
##  [7] "Sweetness"          "Favorite.Side"      "Restaurant.City"   
## [10] "Restaurant.Name"    "Minutes.Driving"    "Sandwich.Price"    
## [13] "Dinner.Plate.Price" "Ribs.Price"         "Prefers.Brisket"   
## [16] "Prefers.Fries"

library(dplyr)

names(VisualizingBivariateData)

##  [1] "Observation"        "Sex"                "Age"               
##  [4] "Hometown"           "Favorite.Meat"      "Favorite.Sauce"    
##  [7] "Sweetness"          "Favorite.Side"      "Restaurant.City"   
## [10] "Restaurant.Name"    "Minutes.Driving"    "Sandwich.Price"    
## [13] "Dinner.Plate.Price" "Ribs.Price"         "Prefers.Brisket"   
## [16] "Prefers.Fries"

VisualizingBivariateData <- VisualizingBivariateData %>%
  mutate(Longer.Distances = ifelse(Minutes.Driving > 39, 1, 0))

table(VisualizingBivariateData$Longer.Distances)

## 
##   0   1 
## 241 138

9. Scatterplot of Price of Dinner Plate and Prefers Brisket

We want to know if those who prefer brisket over other types of meat are willing to pay more for a dinner plate than those who do not prefer brisket. Create a scatter plot between the variable for the price someone is willing to pay for a dinner plate and the dichotomous variable you created indicating if someone prefers brisket.

library(ggplot2)

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point()

Your continuous variable should be along the y-axis and the dichotomous variable should be on the x-axis.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point()

Use the “jitter” option to spread out the data points.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point(position = "jitter")

Change the labels on the x- and y-axes to words instead of using the variable names. Make sure that the labels on both the x- and y-axes are capitalized.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point(position = "jitter") +
  labs(x="Prefers Brisket",y="Dinner Plate Price")

Change the color of the dots to lightsteelblue4.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point(position = "jitter", color="lightsteelblue4") +
  labs(x="Prefers Brisket",y="Dinner Plate Price")

There should be no color or grid lines in the background.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point(position = "jitter", color="lightsteelblue4") +
  labs(x="Prefers Brisket",y="Dinner Plate Price") +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line())

Change the values “0” and “1” on the x-axis to words describing each category.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point(position = "jitter", color="lightsteelblue4") +
  labs(x="Prefers Brisket",y="Dinner Plate Price") +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line())+
  scale_x_discrete(labels = c("0" = "No Brisket", "1" = "Prefers Brisket"))

The graph should have a title.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point(position = "jitter", color="lightsteelblue4") +
  labs(x="Prefers Brisket",y="Dinner Plate Price") +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line()) +
  ggtitle("Bivariate Relationship Between Brisket Preference and Dinner Plate Price") +
  scale_x_discrete(labels = c("0" = "No Brisket", "1" = "Prefers Brisket"))

10. Describe Scatterplot of Price of Dinner Plate and Perfers Brisket

In this scatter plot, the independent variable is the “Prefers Brisket” which is on the x-axis and the dependent variable is the “Dinner Plate Price” on the y-axis. In this case, the relationship between the two is null, because there is no consistent relationship between those who prefer brisket and those who would pay either more or less for a dinner plate.

11. Scatterplot of Age and Prefers Fries

We want to know if those who prefer fries are older than those who do not prefer fries. Create a scatter plot between the variable for the respondent’s age and the dichotomous variable you created indicating if someone prefers fries as their favorite side.

Your continuous variable should be along the y-axis and the dichotomous variable should be on the x-axis.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point()

Use the “jitter” option to spread out the data points.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point(position = "jitter")

Change the labels on the x- and y-axes to words instead of using the variable names. Make sure that the labels on both the x- and y-axes are capitalized.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point(position = "jitter") +
  labs(x="Prefers Fries", y="Age")

Change the color of the dots to goldenrod3.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point(position = "jitter", color="goldenrod3") +
  labs(x="Prefers Fries", y="Age")

There should be no color or grid lines in the background.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point(position = "jitter", color="goldenrod3") +
  labs(x="Prefers Fries", y="Age") +
    theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line())

Change the value “0” and “1” on the x-axis to words describing each categeory.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point(position = "jitter", color="goldenrod3") +
  labs(x="Prefers Fries", y="Age") +
    theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line()) +
scale_x_discrete(labels = c("0" = "No Fries", "1" = "Prefers Fries"))

The graph should have a title.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point(position = "jitter", color="goldenrod3") +
  labs(x="Prefers Fries", y="Age") +
    theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line()) +
  ggtitle("Bivariate Relationship Between Frie Preference and Age") +
scale_x_discrete(labels = c("0" = "No Fries", "1" = "Prefers Fries"))

12. Describe Scatterplot of Age and Prefers Fries

In this scatter plot the independent variable is the “Prefer Fries” on the x-axis and the dependent variable is the “Age” on the y-axis. In this case, the relationship between the two variables is negative because those less in age tend to prefer fries while those older do not prefer fries.

14. Contingency Table of Prefers Brisket and Prefers Fries

We are interested iin whether someone who prefers brisket is more or less likely to also prefer fries than someone who does not prefer brisket. Construct a contingency table between the two dichotomous variables that you created indicating if someone prefers brisket and if someone prefers fries.

Have prefers fries as the x variable (on the side) and prefers brisket as the y variable (across the top).

library(janitor)

VisualizingBivariateData %>%      
  tabyl(Prefers.Brisket, Prefers.Fries)%>%  
  adorn_title()

##                  Prefers.Fries   
##  Prefers.Brisket             0  1
##                0           253 55
##                1            60 11

Include the variable names as titles for the top and side of the table.

VisualizingBivariateData %>%      
  tabyl(Prefers.Brisket, Prefers.Fries)%>%  
  adorn_title()

##                  Prefers.Fries   
##  Prefers.Brisket             0  1
##                0           253 55
##                1            60 11

Construct a variable and use the variable to label the columns and rows.

VisualizingBivariateData %>%
  mutate(Prefers.Brisket.Label=NA) %>%
  mutate(Prefers.Brisket.Label=replace(Prefers.Brisket.Label, Prefers.Brisket==1,"Prefers Brisket")) %>%
  mutate(Prefers.Brisket.Label=replace(Prefers.Brisket.Label, Prefers.Brisket==0, "No Brisket")) %>%
  mutate(Prefers.Fries.Label=NA) %>%
  mutate(Prefers.Fries.Label=replace(Prefers.Fries.Label, Prefers.Fries==1, "Prefers Fries")) %>%
  mutate(Prefers.Fries.Label=replace(Prefers.Fries.Label, Prefers.Fries==0, "No Fries")) -> VisualizingBivariateData

VisualizingBivariateData %>%                   
  tabyl(Prefers.Brisket.Label,Prefers.Fries.Label) %>%  
  adorn_title(row_name = "Prefers Brisket", col_name = "Prefers Fries")

##                  Prefers Fries              
##  Prefers Brisket      No Fries Prefers Fries
##       No Brisket           253            55
##  Prefers Brisket            60            11

Calculate the percentages for the columns.

VisualizingBivariateData %>%      
  tabyl(Prefers.Brisket.Label, Prefers.Fries.Label) %>%  
  adorn_percentages("col") %>%
  adorn_title(row_name = "Prefers Brisket", col_name = "Prefers Fries")

##                      Prefers Fries                  
##  Prefers Brisket          No Fries     Prefers Fries
##       No Brisket 0.808306709265176 0.833333333333333
##  Prefers Brisket 0.191693290734824 0.166666666666667

Report the percentages to the 1st decimal place.

VisualizingBivariateData %>%      
  tabyl(Prefers.Brisket.Label, Prefers.Fries.Label) %>%  
  adorn_percentages("col") %>%
    adorn_pct_formatting(digits = 1) %>%
  adorn_ns() %>%
  adorn_title(row_name = "Prefers Brisket", col_name = "Prefers Fries")

##                  Prefers Fries              
##  Prefers Brisket      No Fries Prefers Fries
##       No Brisket   80.8% (253)    83.3% (55)
##  Prefers Brisket   19.2%  (60)    16.7% (11)

Include the number of observations in each cell.

VisualizingBivariateData %>%      
  tabyl(Prefers.Brisket.Label, Prefers.Fries.Label) %>%  
  adorn_totals(c("row", "col")) %>%
  adorn_percentages("col") %>%
    adorn_pct_formatting(digits = 1) %>%
  adorn_ns("front") %>%
  adorn_title(row_name = "Prefers Brisket", col_name = "Prefers Fries")

##                  Prefers Fries                           
##  Prefers Brisket      No Fries Prefers Fries        Total
##       No Brisket  253  (80.8%)   55  (83.3%) 308  (81.3%)
##  Prefers Brisket   60  (19.2%)   11  (16.7%)  71  (18.7%)
##            Total  313 (100.0%)   66 (100.0%) 379 (100.0%)

15. Describe Contingency Table of Prefers Brisket and Prefers Fries

Write a brief description of the relationship between these two variables identified in the contingency table you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.

The independent variable is “Prefers Fries” and the dependent variable is “Prefers Brisket”. The relationship of of the variables is null.

Publish Document

Click the “Knit” button to publish your work as an html document. This document or file will appear in the folder specified by your working directory. You will need to upload both this RMarkdown file and the html file it produces to AsU Learn to get all of the points associated with this lab.

PS3115 Fall 2025 Visualizing Bivariate Relationships Lab Assignment

(Brett Clayton)

(10/26/2025)