Overview

This file contains a set of tasks that you need to complete in R for the lab assignment. The tasks may require you to add a code chuck, type code into a chunk, and/or execute code. Don’t forget that you need to acknowledge if you used any resources beyond class materials or got help to complete the assignment.

Additional information, include examples and code, about this assignment can be found in the file “VisualizingRelationshipsBetween2Variables.html”.

The data set you will use is different than the one used in the instructions. Pay attention to the differences in the Excel files’ names, any variable names, or object names. You will need to adjust your code accordingly.

Once you have completed the assignment, you will need to knit this R Markdown file to produce an .html file. You will then need to upload the .html file and this .Rmd file to AsULearn.

1. Add your Name and the Date

The first thing you need to do in this file is to add your name and date in the lines underneath this document’s title.

2. Identify and Set Your Working Directory

You need to identify and set your working directory in this section. If you are working in the cloud version of RStudio, enter a note here to tell us that you did not need to change the working directory because you are working in the cloud.

getwd()
## [1] "C:/Users/brcla/Downloads/VisualizingRelationshipsFall2025/VisualizingRelationshipsFall2025"
setwd("C:/Users/brcla/Downloads/VisualizingRelationshipsFall2025/VisualizingRelationshipsFall2025")

3. Installing and Loading Packages and Data Set

You need to install and load the packages and data set you’ll use for the lab assignment in this section. In this lab, we will use the following packages: dplyr, tidyverse, forcats, ggplot2, janitor and openxlsx. We have not used the package janitor in previous labs, so you will need to install it before you can load it.

library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.2     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.3.0
## ✔ purrr     1.1.0     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
install.packages("janitor")
## Installing package into 'C:/Users/brcla/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'janitor' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\brcla\AppData\Local\Temp\RtmpGy7JRY\downloaded_packages
library("openxlsx")
library("forcats")
library("ggplot2")
library("janitor")
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
VisualizingBivariateData <- read.xlsx("VisualizingRelationshipsData.xlsx")
names(VisualizingBivariateData)
##  [1] "Observation"        "Sex"                "Age"               
##  [4] "Hometown"           "Favorite.Meat"      "Favorite.Sauce"    
##  [7] "Sweetness"          "Favorite.Side"      "Restaurant.City"   
## [10] "Restaurant.Name"    "Minutes.Driving"    "Sandwich.Price"    
## [13] "Dinner.Plate.Price" "Ribs.Price"

4. Scatterplot of Price for Rib Plate and Driving Distance

Create a scatterplot showing the relationship between variables that identify how much respondents are willing to pay for a plate of ribs and how far they are willing to drive for good BBQ. When making this scatterplot, let’s assume you are interested in whether how much respondents are willing to pay for a plate of ribs influences how far they are willing to drive for good BBQ.

ggplot(VisualizingBivariateData,                           
       aes(x=Ribs.Price,  y=Minutes.Driving)) + 
    geom_point()   

ggplot(VisualizingBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +   
  geom_point()   +     
    labs(x="Ribs Price",  y="Minutes")

ggplot(VisualizingBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +   
  geom_point(color="darkseagreen2")   +     
    labs(x="Ribs Price",  y="Minutes")

ggplot(VisualizingBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +   
  geom_point(color="darkseagreen2")   +     
    labs(x="Ribs Price",  y="Minutes") + theme_minimal() 

panel.grid = element_blank()
axis.line = element_line(color = "white")
ggplot(VisualizingBivariateData,                           
       aes(x=Ribs.Price,  y=Minutes.Driving)) + 
    geom_point(color="darkseagreen2")+
    stat_smooth(method = "lm", formula = y ~ x,  geom = "smooth", se = FALSE, color="darkorchid4")+
  labs(x="Ribs Price",y="Minutes")  +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line())

ggplot(VisualizingBivariateData,                           
       aes(x=Ribs.Price,  y=Minutes.Driving)) + 
    geom_point(color="darkseagreen2")+
    stat_smooth(method = "lm", formula = y ~ x,  geom = "smooth", se = FALSE, color="darkorchid4")+
  labs(x="Ribs Price",y="Minutes")  +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line()) +
  ggtitle("Bivariate Relationship Between Ribs Price and Minutes Driving")

5. Describe Scatterplot of Price for Rib Plate and Driving Distance

Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.

The scatter plot shows a slight upward trend which would indicate that as the price of ribs increases, the minutes someone is willing to drive increases, but not drastically. The independent variable is the price of ribs while the dependent variable is the minutes someone is willing to drive to get good BBQ. The relationship in this case would be positive.

6. Scatterplot of Driving Distance and Age

Create a scatterplot showing the relationship between how far a respondent is willing to drive for good BBQ and their age. When making this scatterplot, let’s assume you are interested in whether how far someone is willing to drive is a function of their age.

ggplot(VisualizingBivariateData,                           
       aes(x=Minutes.Driving,  y=Age)) + 
    geom_point() 

ggplot(VisualizingBivariateData, aes(x=Minutes.Driving, y=Age)) +   
  geom_point()   +     
    labs(x="Minutes Driving",  y="Age")

ggplot(VisualizingBivariateData, aes(x=Minutes.Driving, y=Age)) +   
  geom_point(color = "deepskyblue")   +     
    labs(x="Minutes Driving",  y="Age")

ggplot(VisualizingBivariateData,                           
       aes(x=Minutes.Driving,  y=Age)) + 
    geom_point(color="deepskyblue")+
  labs(x="Minutes Driving",y="Age")  +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line())

ggplot(VisualizingBivariateData,                           
       aes(x=Minutes.Driving,  y=Age)) + 
    geom_point(color="deepskyblue")+
  stat_smooth(method = "lm", formula = y ~ x, geom = "smooth", se = FALSE, color = "firebrick2") +
  labs(x="Minutes Driving",y="Age")  +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line())

ggplot(VisualizingBivariateData,                           
       aes(x=Minutes.Driving,  y=Age)) + 
    geom_point(color="deepskyblue")+
  stat_smooth(method = "lm", formula = y ~ x, geom = "smooth", se = FALSE, color = "firebrick2") +
  labs(x="Minutes Driving",y="Age")  +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line()) +
    ggtitle("Bivariate Relationship Between Minutes Driving and Age")

7. Describe Scatterplot of Driving Distance and Age

Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.

The scatter plot shows a slight upward trend which would indicate that as the amount of minutes people are willing to drive increases, the person’s age rises, but not drastically. The independent variable is the amount of minutes someone is willing to drive while the dependent variable is the age of that person. The relationship in this case would be positive.

8. Creating Dichotomous Variables

You need to create three dichotomous variables based on existing variables in the data set in this section. The first should be named “Prefers.Brisket” and should take on a value of “1” if a respondent identified beef brisket as their preferred type of BBQ meat and a value of “0” if they did not. The second should be named “Prefers.Fries” and should take on a value of “1” if a respondent identified french fries as their preferred side dish and a value of “0” if they did not. The third should be named “Longer.Distances” and should take on a value of “1” if a respondent is willing to drive longer than average for good BBQ and and value of “0” if they are not.

library(dplyr)
colnames(VisualizingBivariateData)
##  [1] "Observation"        "Sex"                "Age"               
##  [4] "Hometown"           "Favorite.Meat"      "Favorite.Sauce"    
##  [7] "Sweetness"          "Favorite.Side"      "Restaurant.City"   
## [10] "Restaurant.Name"    "Minutes.Driving"    "Sandwich.Price"    
## [13] "Dinner.Plate.Price" "Ribs.Price"
VisualizingBivariateData %>% 
mutate(Prefers.Brisket=NA) %>%
mutate(Prefers.Brisket=replace(Prefers.Brisket, Favorite.Meat==3, 1)) %>%
mutate(Prefers.Brisket=replace(Prefers.Brisket, Favorite.Meat < 3, 0)) %>% mutate(Prefers.Brisket=replace(Prefers.Brisket, Favorite.Meat > 3, 0)) -> VisualizingBivariateData
VisualizingBivariateData %>% 
mutate(Prefers.Fries=NA) %>%
mutate(Prefers.Fries=replace(Prefers.Fries, Favorite.Side==6, 1)) %>%
mutate(Prefers.Fries=replace(Prefers.Fries, Favorite.Side < 6, 0)) %>% mutate(Prefers.Fries=replace(Prefers.Fries, Favorite.Side > 6, 0)) -> VisualizingBivariateData
mean(VisualizingBivariateData$Minutes.Driving)
## [1] 39.11346
names(VisualizingBivariateData)
##  [1] "Observation"        "Sex"                "Age"               
##  [4] "Hometown"           "Favorite.Meat"      "Favorite.Sauce"    
##  [7] "Sweetness"          "Favorite.Side"      "Restaurant.City"   
## [10] "Restaurant.Name"    "Minutes.Driving"    "Sandwich.Price"    
## [13] "Dinner.Plate.Price" "Ribs.Price"         "Prefers.Brisket"   
## [16] "Prefers.Fries"
library(dplyr)
names(VisualizingBivariateData)
##  [1] "Observation"        "Sex"                "Age"               
##  [4] "Hometown"           "Favorite.Meat"      "Favorite.Sauce"    
##  [7] "Sweetness"          "Favorite.Side"      "Restaurant.City"   
## [10] "Restaurant.Name"    "Minutes.Driving"    "Sandwich.Price"    
## [13] "Dinner.Plate.Price" "Ribs.Price"         "Prefers.Brisket"   
## [16] "Prefers.Fries"
VisualizingBivariateData <- VisualizingBivariateData %>%
  mutate(Longer.Distances = ifelse(Minutes.Driving > 39, 1, 0))
table(VisualizingBivariateData$Longer.Distances)
## 
##   0   1 
## 241 138

9. Scatterplot of Price of Dinner Plate and Prefers Brisket

We want to know if those who prefer brisket over other types of meat are willing to pay more for a dinner plate than those who do not prefer brisket. Create a scatter plot between the variable for the price someone is willing to pay for a dinner plate and the dichotomous variable you created indicating if someone prefers brisket.

library(ggplot2)
ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point()               

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point()

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point(position = "jitter")

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point(position = "jitter") +
  labs(x="Prefers Brisket",y="Dinner Plate Price")

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point(position = "jitter", color="lightsteelblue4") +
  labs(x="Prefers Brisket",y="Dinner Plate Price")

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point(position = "jitter", color="lightsteelblue4") +
  labs(x="Prefers Brisket",y="Dinner Plate Price") +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line())

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point(position = "jitter", color="lightsteelblue4") +
  labs(x="Prefers Brisket",y="Dinner Plate Price") +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line())+
  scale_x_discrete(labels = c("0" = "No Brisket", "1" = "Prefers Brisket"))

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) + 
    geom_point(position = "jitter", color="lightsteelblue4") +
  labs(x="Prefers Brisket",y="Dinner Plate Price") +
  theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line()) +
  ggtitle("Bivariate Relationship Between Brisket Preference and Dinner Plate Price") +
  scale_x_discrete(labels = c("0" = "No Brisket", "1" = "Prefers Brisket"))

10. Describe Scatterplot of Price of Dinner Plate and Perfers Brisket

Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.

In this scatter plot, the independent variable is the “Prefers Brisket” which is on the x-axis and the dependent variable is the “Dinner Plate Price” on the y-axis. In this case, the relationship between the two is null, because there is no consistent relationship between those who prefer brisket and those who would pay either more or less for a dinner plate.

11. Scatterplot of Age and Prefers Fries

We want to know if those who prefer fries are older than those who do not prefer fries. Create a scatter plot between the variable for the respondent’s age and the dichotomous variable you created indicating if someone prefers fries as their favorite side.

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point()              

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point(position = "jitter") 

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point(position = "jitter") +
  labs(x="Prefers Fries", y="Age")

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point(position = "jitter", color="goldenrod3") +
  labs(x="Prefers Fries", y="Age")

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point(position = "jitter", color="goldenrod3") +
  labs(x="Prefers Fries", y="Age") +
    theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line())

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point(position = "jitter", color="goldenrod3") +
  labs(x="Prefers Fries", y="Age") +
    theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line()) +
scale_x_discrete(labels = c("0" = "No Fries", "1" = "Prefers Fries"))

ggplot(VisualizingBivariateData,                      
       aes(x=as.factor(Prefers.Fries), y=Age)) + 
    geom_point(position = "jitter", color="goldenrod3") +
  labs(x="Prefers Fries", y="Age") +
    theme(panel.grid.major = element_blank(),     
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line()) +
  ggtitle("Bivariate Relationship Between Frie Preference and Age") +
scale_x_discrete(labels = c("0" = "No Fries", "1" = "Prefers Fries"))

12. Describe Scatterplot of Age and Prefers Fries

Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.

In this scatter plot the independent variable is the “Prefer Fries” on the x-axis and the dependent variable is the “Age” on the y-axis. In this case, the relationship between the two variables is negative because those less in age tend to prefer fries while those older do not prefer fries.

14. Contingency Table of Prefers Brisket and Prefers Fries

We are interested iin whether someone who prefers brisket is more or less likely to also prefer fries than someone who does not prefer brisket. Construct a contingency table between the two dichotomous variables that you created indicating if someone prefers brisket and if someone prefers fries.

library(janitor)
VisualizingBivariateData %>%      
  tabyl(Prefers.Brisket, Prefers.Fries)%>%  
  adorn_title()                  
##                  Prefers.Fries   
##  Prefers.Brisket             0  1
##                0           253 55
##                1            60 11
VisualizingBivariateData %>%      
  tabyl(Prefers.Brisket, Prefers.Fries)%>%  
  adorn_title()
##                  Prefers.Fries   
##  Prefers.Brisket             0  1
##                0           253 55
##                1            60 11
VisualizingBivariateData %>%
  mutate(Prefers.Brisket.Label=NA) %>%
  mutate(Prefers.Brisket.Label=replace(Prefers.Brisket.Label, Prefers.Brisket==1,"Prefers Brisket")) %>%
  mutate(Prefers.Brisket.Label=replace(Prefers.Brisket.Label, Prefers.Brisket==0, "No Brisket")) %>%
  mutate(Prefers.Fries.Label=NA) %>%
  mutate(Prefers.Fries.Label=replace(Prefers.Fries.Label, Prefers.Fries==1, "Prefers Fries")) %>%
  mutate(Prefers.Fries.Label=replace(Prefers.Fries.Label, Prefers.Fries==0, "No Fries")) -> VisualizingBivariateData
VisualizingBivariateData %>%                   
  tabyl(Prefers.Brisket.Label,Prefers.Fries.Label) %>%  
  adorn_title(row_name = "Prefers Brisket", col_name = "Prefers Fries")  
##                  Prefers Fries              
##  Prefers Brisket      No Fries Prefers Fries
##       No Brisket           253            55
##  Prefers Brisket            60            11
VisualizingBivariateData %>%      
  tabyl(Prefers.Brisket.Label, Prefers.Fries.Label) %>%  
  adorn_percentages("col") %>%
  adorn_title(row_name = "Prefers Brisket", col_name = "Prefers Fries")    
##                      Prefers Fries                  
##  Prefers Brisket          No Fries     Prefers Fries
##       No Brisket 0.808306709265176 0.833333333333333
##  Prefers Brisket 0.191693290734824 0.166666666666667
VisualizingBivariateData %>%      
  tabyl(Prefers.Brisket.Label, Prefers.Fries.Label) %>%  
  adorn_percentages("col") %>%
    adorn_pct_formatting(digits = 1) %>%
  adorn_ns() %>%
  adorn_title(row_name = "Prefers Brisket", col_name = "Prefers Fries") 
##                  Prefers Fries              
##  Prefers Brisket      No Fries Prefers Fries
##       No Brisket   80.8% (253)    83.3% (55)
##  Prefers Brisket   19.2%  (60)    16.7% (11)
VisualizingBivariateData %>%      
  tabyl(Prefers.Brisket.Label, Prefers.Fries.Label) %>%  
  adorn_totals(c("row", "col")) %>%
  adorn_percentages("col") %>%
    adorn_pct_formatting(digits = 1) %>%
  adorn_ns("front") %>%
  adorn_title(row_name = "Prefers Brisket", col_name = "Prefers Fries") 
##                  Prefers Fries                           
##  Prefers Brisket      No Fries Prefers Fries        Total
##       No Brisket  253  (80.8%)   55  (83.3%) 308  (81.3%)
##  Prefers Brisket   60  (19.2%)   11  (16.7%)  71  (18.7%)
##            Total  313 (100.0%)   66 (100.0%) 379 (100.0%)

15. Describe Contingency Table of Prefers Brisket and Prefers Fries

Write a brief description of the relationship between these two variables identified in the contingency table you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.

The independent variable is “Prefers Fries” and the dependent variable is “Prefers Brisket”. The relationship of of the variables is null.

Publish Document

Click the “Knit” button to publish your work as an html document. This document or file will appear in the folder specified by your working directory. You will need to upload both this RMarkdown file and the html file it produces to AsU Learn to get all of the points associated with this lab.