This file contains a set of tasks that you need to complete in R for the lab assignment. The tasks may require you to add a code chuck, type code into a chunk, and/or execute code. Don’t forget that you need to acknowledge if you used any resources beyond class materials or got help to complete the assignment.
Additional information, include examples and code, about this assignment can be found in the file “VisualizingRelationshipsBetween2Variables.html”.
The data set you will use is different than the one used in the instructions. Pay attention to the differences in the Excel files’ names, any variable names, or object names. You will need to adjust your code accordingly.
Once you have completed the assignment, you will need to knit this R Markdown file to produce an .html file. You will then need to upload the .html file and this .Rmd file to AsULearn.
The first thing you need to do in this file is to add your name and date in the lines underneath this document’s title.
You need to identify and set your working directory in this section. If you are working in the cloud version of RStudio, enter a note here to tell us that you did not need to change the working directory because you are working in the cloud.
getwd()
## [1] "C:/Users/brcla/Downloads/VisualizingRelationshipsFall2025/VisualizingRelationshipsFall2025"
setwd("C:/Users/brcla/Downloads/VisualizingRelationshipsFall2025/VisualizingRelationshipsFall2025")
You need to install and load the packages and data set you’ll use for
the lab assignment in this section. In this lab, we will use the
following packages: dplyr, tidyverse,
forcats, ggplot2, janitor and
openxlsx. We have not used the package janitor
in previous labs, so you will need to install it before you can load
it.
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ ggplot2 3.5.2 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.3.0
## ✔ purrr 1.1.0 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
install.packages("janitor")
## Installing package into 'C:/Users/brcla/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'janitor' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\brcla\AppData\Local\Temp\RtmpGy7JRY\downloaded_packages
library("openxlsx")
library("forcats")
library("ggplot2")
library("janitor")
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
VisualizingBivariateData <- read.xlsx("VisualizingRelationshipsData.xlsx")
names(VisualizingBivariateData)
## [1] "Observation" "Sex" "Age"
## [4] "Hometown" "Favorite.Meat" "Favorite.Sauce"
## [7] "Sweetness" "Favorite.Side" "Restaurant.City"
## [10] "Restaurant.Name" "Minutes.Driving" "Sandwich.Price"
## [13] "Dinner.Plate.Price" "Ribs.Price"
Create a scatterplot showing the relationship between variables that identify how much respondents are willing to pay for a plate of ribs and how far they are willing to drive for good BBQ. When making this scatterplot, let’s assume you are interested in whether how much respondents are willing to pay for a plate of ribs influences how far they are willing to drive for good BBQ.
ggplot(VisualizingBivariateData,
aes(x=Ribs.Price, y=Minutes.Driving)) +
geom_point()
ggplot(VisualizingBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +
geom_point() +
labs(x="Ribs Price", y="Minutes")
darkseagreen2.ggplot(VisualizingBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +
geom_point(color="darkseagreen2") +
labs(x="Ribs Price", y="Minutes")
ggplot(VisualizingBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +
geom_point(color="darkseagreen2") +
labs(x="Ribs Price", y="Minutes") + theme_minimal()
panel.grid = element_blank()
axis.line = element_line(color = "white")
darkorchid4.ggplot(VisualizingBivariateData,
aes(x=Ribs.Price, y=Minutes.Driving)) +
geom_point(color="darkseagreen2")+
stat_smooth(method = "lm", formula = y ~ x, geom = "smooth", se = FALSE, color="darkorchid4")+
labs(x="Ribs Price",y="Minutes") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line())
ggplot(VisualizingBivariateData,
aes(x=Ribs.Price, y=Minutes.Driving)) +
geom_point(color="darkseagreen2")+
stat_smooth(method = "lm", formula = y ~ x, geom = "smooth", se = FALSE, color="darkorchid4")+
labs(x="Ribs Price",y="Minutes") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line()) +
ggtitle("Bivariate Relationship Between Ribs Price and Minutes Driving")
Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.
The scatter plot shows a slight upward trend which would indicate that as the price of ribs increases, the minutes someone is willing to drive increases, but not drastically. The independent variable is the price of ribs while the dependent variable is the minutes someone is willing to drive to get good BBQ. The relationship in this case would be positive.
Create a scatterplot showing the relationship between how far a respondent is willing to drive for good BBQ and their age. When making this scatterplot, let’s assume you are interested in whether how far someone is willing to drive is a function of their age.
ggplot(VisualizingBivariateData,
aes(x=Minutes.Driving, y=Age)) +
geom_point()
ggplot(VisualizingBivariateData, aes(x=Minutes.Driving, y=Age)) +
geom_point() +
labs(x="Minutes Driving", y="Age")
deepskyblue.ggplot(VisualizingBivariateData, aes(x=Minutes.Driving, y=Age)) +
geom_point(color = "deepskyblue") +
labs(x="Minutes Driving", y="Age")
ggplot(VisualizingBivariateData,
aes(x=Minutes.Driving, y=Age)) +
geom_point(color="deepskyblue")+
labs(x="Minutes Driving",y="Age") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line())
firebrick2.ggplot(VisualizingBivariateData,
aes(x=Minutes.Driving, y=Age)) +
geom_point(color="deepskyblue")+
stat_smooth(method = "lm", formula = y ~ x, geom = "smooth", se = FALSE, color = "firebrick2") +
labs(x="Minutes Driving",y="Age") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line())
ggplot(VisualizingBivariateData,
aes(x=Minutes.Driving, y=Age)) +
geom_point(color="deepskyblue")+
stat_smooth(method = "lm", formula = y ~ x, geom = "smooth", se = FALSE, color = "firebrick2") +
labs(x="Minutes Driving",y="Age") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line()) +
ggtitle("Bivariate Relationship Between Minutes Driving and Age")
Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.
The scatter plot shows a slight upward trend which would indicate that as the amount of minutes people are willing to drive increases, the person’s age rises, but not drastically. The independent variable is the amount of minutes someone is willing to drive while the dependent variable is the age of that person. The relationship in this case would be positive.
You need to create three dichotomous variables based on existing variables in the data set in this section. The first should be named “Prefers.Brisket” and should take on a value of “1” if a respondent identified beef brisket as their preferred type of BBQ meat and a value of “0” if they did not. The second should be named “Prefers.Fries” and should take on a value of “1” if a respondent identified french fries as their preferred side dish and a value of “0” if they did not. The third should be named “Longer.Distances” and should take on a value of “1” if a respondent is willing to drive longer than average for good BBQ and and value of “0” if they are not.
library(dplyr)
colnames(VisualizingBivariateData)
## [1] "Observation" "Sex" "Age"
## [4] "Hometown" "Favorite.Meat" "Favorite.Sauce"
## [7] "Sweetness" "Favorite.Side" "Restaurant.City"
## [10] "Restaurant.Name" "Minutes.Driving" "Sandwich.Price"
## [13] "Dinner.Plate.Price" "Ribs.Price"
VisualizingBivariateData %>%
mutate(Prefers.Brisket=NA) %>%
mutate(Prefers.Brisket=replace(Prefers.Brisket, Favorite.Meat==3, 1)) %>%
mutate(Prefers.Brisket=replace(Prefers.Brisket, Favorite.Meat < 3, 0)) %>% mutate(Prefers.Brisket=replace(Prefers.Brisket, Favorite.Meat > 3, 0)) -> VisualizingBivariateData
VisualizingBivariateData %>%
mutate(Prefers.Fries=NA) %>%
mutate(Prefers.Fries=replace(Prefers.Fries, Favorite.Side==6, 1)) %>%
mutate(Prefers.Fries=replace(Prefers.Fries, Favorite.Side < 6, 0)) %>% mutate(Prefers.Fries=replace(Prefers.Fries, Favorite.Side > 6, 0)) -> VisualizingBivariateData
mean(VisualizingBivariateData$Minutes.Driving)
## [1] 39.11346
names(VisualizingBivariateData)
## [1] "Observation" "Sex" "Age"
## [4] "Hometown" "Favorite.Meat" "Favorite.Sauce"
## [7] "Sweetness" "Favorite.Side" "Restaurant.City"
## [10] "Restaurant.Name" "Minutes.Driving" "Sandwich.Price"
## [13] "Dinner.Plate.Price" "Ribs.Price" "Prefers.Brisket"
## [16] "Prefers.Fries"
library(dplyr)
names(VisualizingBivariateData)
## [1] "Observation" "Sex" "Age"
## [4] "Hometown" "Favorite.Meat" "Favorite.Sauce"
## [7] "Sweetness" "Favorite.Side" "Restaurant.City"
## [10] "Restaurant.Name" "Minutes.Driving" "Sandwich.Price"
## [13] "Dinner.Plate.Price" "Ribs.Price" "Prefers.Brisket"
## [16] "Prefers.Fries"
VisualizingBivariateData <- VisualizingBivariateData %>%
mutate(Longer.Distances = ifelse(Minutes.Driving > 39, 1, 0))
table(VisualizingBivariateData$Longer.Distances)
##
## 0 1
## 241 138
We want to know if those who prefer brisket over other types of meat are willing to pay more for a dinner plate than those who do not prefer brisket. Create a scatter plot between the variable for the price someone is willing to pay for a dinner plate and the dichotomous variable you created indicating if someone prefers brisket.
library(ggplot2)
ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point()
ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point()
ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point(position = "jitter")
ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point(position = "jitter") +
labs(x="Prefers Brisket",y="Dinner Plate Price")
lightsteelblue4.ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point(position = "jitter", color="lightsteelblue4") +
labs(x="Prefers Brisket",y="Dinner Plate Price")
ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point(position = "jitter", color="lightsteelblue4") +
labs(x="Prefers Brisket",y="Dinner Plate Price") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line())
ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point(position = "jitter", color="lightsteelblue4") +
labs(x="Prefers Brisket",y="Dinner Plate Price") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line())+
scale_x_discrete(labels = c("0" = "No Brisket", "1" = "Prefers Brisket"))
ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point(position = "jitter", color="lightsteelblue4") +
labs(x="Prefers Brisket",y="Dinner Plate Price") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line()) +
ggtitle("Bivariate Relationship Between Brisket Preference and Dinner Plate Price") +
scale_x_discrete(labels = c("0" = "No Brisket", "1" = "Prefers Brisket"))
Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.
In this scatter plot, the independent variable is the “Prefers Brisket” which is on the x-axis and the dependent variable is the “Dinner Plate Price” on the y-axis. In this case, the relationship between the two is null, because there is no consistent relationship between those who prefer brisket and those who would pay either more or less for a dinner plate.
We want to know if those who prefer fries are older than those who do not prefer fries. Create a scatter plot between the variable for the respondent’s age and the dichotomous variable you created indicating if someone prefers fries as their favorite side.
ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point()
ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point(position = "jitter")
ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point(position = "jitter") +
labs(x="Prefers Fries", y="Age")
goldenrod3.ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point(position = "jitter", color="goldenrod3") +
labs(x="Prefers Fries", y="Age")
ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point(position = "jitter", color="goldenrod3") +
labs(x="Prefers Fries", y="Age") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line())
ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point(position = "jitter", color="goldenrod3") +
labs(x="Prefers Fries", y="Age") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line()) +
scale_x_discrete(labels = c("0" = "No Fries", "1" = "Prefers Fries"))
ggplot(VisualizingBivariateData,
aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point(position = "jitter", color="goldenrod3") +
labs(x="Prefers Fries", y="Age") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line()) +
ggtitle("Bivariate Relationship Between Frie Preference and Age") +
scale_x_discrete(labels = c("0" = "No Fries", "1" = "Prefers Fries"))
Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.
In this scatter plot the independent variable is the “Prefer Fries” on the x-axis and the dependent variable is the “Age” on the y-axis. In this case, the relationship between the two variables is negative because those less in age tend to prefer fries while those older do not prefer fries.
We are interested iin whether someone who prefers brisket is more or less likely to also prefer fries than someone who does not prefer brisket. Construct a contingency table between the two dichotomous variables that you created indicating if someone prefers brisket and if someone prefers fries.
library(janitor)
VisualizingBivariateData %>%
tabyl(Prefers.Brisket, Prefers.Fries)%>%
adorn_title()
## Prefers.Fries
## Prefers.Brisket 0 1
## 0 253 55
## 1 60 11
VisualizingBivariateData %>%
tabyl(Prefers.Brisket, Prefers.Fries)%>%
adorn_title()
## Prefers.Fries
## Prefers.Brisket 0 1
## 0 253 55
## 1 60 11
VisualizingBivariateData %>%
mutate(Prefers.Brisket.Label=NA) %>%
mutate(Prefers.Brisket.Label=replace(Prefers.Brisket.Label, Prefers.Brisket==1,"Prefers Brisket")) %>%
mutate(Prefers.Brisket.Label=replace(Prefers.Brisket.Label, Prefers.Brisket==0, "No Brisket")) %>%
mutate(Prefers.Fries.Label=NA) %>%
mutate(Prefers.Fries.Label=replace(Prefers.Fries.Label, Prefers.Fries==1, "Prefers Fries")) %>%
mutate(Prefers.Fries.Label=replace(Prefers.Fries.Label, Prefers.Fries==0, "No Fries")) -> VisualizingBivariateData
VisualizingBivariateData %>%
tabyl(Prefers.Brisket.Label,Prefers.Fries.Label) %>%
adorn_title(row_name = "Prefers Brisket", col_name = "Prefers Fries")
## Prefers Fries
## Prefers Brisket No Fries Prefers Fries
## No Brisket 253 55
## Prefers Brisket 60 11
VisualizingBivariateData %>%
tabyl(Prefers.Brisket.Label, Prefers.Fries.Label) %>%
adorn_percentages("col") %>%
adorn_title(row_name = "Prefers Brisket", col_name = "Prefers Fries")
## Prefers Fries
## Prefers Brisket No Fries Prefers Fries
## No Brisket 0.808306709265176 0.833333333333333
## Prefers Brisket 0.191693290734824 0.166666666666667
VisualizingBivariateData %>%
tabyl(Prefers.Brisket.Label, Prefers.Fries.Label) %>%
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 1) %>%
adorn_ns() %>%
adorn_title(row_name = "Prefers Brisket", col_name = "Prefers Fries")
## Prefers Fries
## Prefers Brisket No Fries Prefers Fries
## No Brisket 80.8% (253) 83.3% (55)
## Prefers Brisket 19.2% (60) 16.7% (11)
VisualizingBivariateData %>%
tabyl(Prefers.Brisket.Label, Prefers.Fries.Label) %>%
adorn_totals(c("row", "col")) %>%
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 1) %>%
adorn_ns("front") %>%
adorn_title(row_name = "Prefers Brisket", col_name = "Prefers Fries")
## Prefers Fries
## Prefers Brisket No Fries Prefers Fries Total
## No Brisket 253 (80.8%) 55 (83.3%) 308 (81.3%)
## Prefers Brisket 60 (19.2%) 11 (16.7%) 71 (18.7%)
## Total 313 (100.0%) 66 (100.0%) 379 (100.0%)
Write a brief description of the relationship between these two variables identified in the contingency table you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.
The independent variable is “Prefers Fries” and the dependent variable is “Prefers Brisket”. The relationship of of the variables is null.
Click the “Knit” button to publish your work as an html document. This document or file will appear in the folder specified by your working directory. You will need to upload both this RMarkdown file and the html file it produces to AsU Learn to get all of the points associated with this lab.