This file contains a set of tasks that you need to complete in R for the lab assignment. The tasks may require you to add a code chuck, type code into a chunk, and/or execute code. Don’t forget that you need to acknowledge if you used any resources beyond class materials or got help to complete the assignment.
Additional information, include examples and code, about this assignment can be found in the file “VisualizingRelationshipsBetween2Variables.html”.
The data set you will use is different than the one used in the instructions. Pay attention to the differences in the Excel files’ names, any variable names, or object names. You will need to adjust your code accordingly.
Once you have completed the assignment, you will need to knit this R Markdown file to produce an .html file. You will then need to upload the .html file and this .Rmd file to AsULearn.
The first thing you need to do in this file is to add your name and date in the lines underneath this document’s title.
You need to identify and set your working directory in this section. If you are working in the cloud version of RStudio, enter a note here to tell us that you did not need to change the working directory because you are working in the cloud.
getwd()
## [1] "/Users/rlmcollins/Desktop"
setwd("/Users/rlmcollins/Desktop")
You need to install and load the packages and data set you’ll use for
the lab assignment in this section. In this lab, we will use the
following packages: dplyr, tidyverse,
forcats, ggplot2, janitor and
openxlsx. We have not used the package janitor
in previous labs, so you will need to install it before you can load
it.
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ ggplot2 3.5.2 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.3.0
## ✔ purrr 1.1.0 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("openxlsx")
library("forcats")
library("ggplot2")
install.packages("janitor")
##
## The downloaded binary packages are in
## /var/folders/wg/xl6pvy053zx8fzvllrfstzmc0000gn/T//RtmpxAD56Y/downloaded_packages
library("janitor")
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
VisualizeBivariateData <- read.xlsx("VisualizingRelationshipsData.xlsx")
names(VisualizeBivariateData)
## [1] "Observation" "Sex" "Age"
## [4] "Hometown" "Favorite.Meat" "Favorite.Sauce"
## [7] "Sweetness" "Favorite.Side" "Restaurant.City"
## [10] "Restaurant.Name" "Minutes.Driving" "Sandwich.Price"
## [13] "Dinner.Plate.Price" "Ribs.Price"
Create a scatterplot showing the relationship between variables that identify how much respondents are willing to pay for a plate of ribs and how far they are willing to drive for good BBQ. When making this scatterplot, let’s assume you are interested in whether how much respondents are willing to pay for a plate of ribs influences how far they are willing to drive for good BBQ.
ggplot(VisualizeBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +
geom_point()
ggplot(VisualizeBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +
geom_point() + labs (x="Ribs Price", y="Minutes Driving")
darkseagreen2.ggplot(VisualizeBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +
geom_point(color="darkseagreen2") + labs (x="Ribs Price", y="Minutes Driving")
ggplot(VisualizeBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +
geom_point(color="darkseagreen2") + labs (x="Ribs Price", y="Minutes Driving") + theme (panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line())
darkorchid4.ggplot(VisualizeBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +
geom_point(color="darkseagreen2") + labs (x="Ribs Price", y="Minutes Driving") + stat_smooth(method = "lm", formula = y ~ x, geom = "smooth", se = FALSE, color="darkorchid4") + theme (panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line())
ggplot(VisualizeBivariateData, aes(x=Ribs.Price, y=Minutes.Driving)) +
geom_point(color="darkseagreen2") + labs (x="Ribs Price", y="Minutes Driving") + stat_smooth(method = "lm", formula = y ~ x, geom = "smooth", se = FALSE, color="darkorchid4") + theme (panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line()) + ggtitle("Bivariate Relationship Between Ribs Price and Minutes Driving")
Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.
The independent variable is the price of ribs and the dependent variable is how many minutes someone is willing to drive. The relationship between the two variables is positive.
Create a scatterplot showing the relationship between how far a respondent is willing to drive for good BBQ and their age. When making this scatterplot, let’s assume you are interested in whether how far someone is willing to drive is a function of their age.
ggplot(VisualizeBivariateData, aes(x=Age, y=Minutes.Driving)) +
geom_point()
ggplot(VisualizeBivariateData, aes(x=Age, y=Minutes.Driving)) +
geom_point() + labs (x="Age", y="Minutes Driving")
deepskyblue.ggplot(VisualizeBivariateData, aes(x=Age, y=Minutes.Driving)) +
geom_point(color="deepskyblue") + labs (x="Age", y="Minutes Driving")
ggplot(VisualizeBivariateData, aes(x=Age, y=Minutes.Driving)) +
geom_point(color="deepskyblue") + labs (x="Age", y="Minutes Driving") + theme (panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line())
firebrick2.ggplot(VisualizeBivariateData, aes(x=Age, y=Minutes.Driving)) +
geom_point(color="deepskyblue") + labs (x="Age", y="Minutes Driving") + stat_smooth(method = "lm", formula = y ~ x, geom = "smooth", se = FALSE, color="firebrick2") + theme (panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line())
ggplot(VisualizeBivariateData, aes(x=Age, y=Minutes.Driving)) +
geom_point(color="deepskyblue") + labs (x="Age", y="Minutes Driving") + stat_smooth(method = "lm", formula = y ~ x, geom = "smooth", se = FALSE, color="firebrick2") + theme (panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line()) + ggtitle("Bivariate Relationship Between Age and Minutes Driving")
Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.
The independent variable is age and the dependent variable is how many minutes someone is willing to drive. The relationship between the two variables is null.
You need to create three dichotomous variables based on existing variables in the data set in this section. The first should be named “Prefers.Brisket” and should take on a value of “1” if a respondent identified beef brisket as their preferred type of BBQ meat and a value of “0” if they did not. The second should be named “Prefers.Fries” and should take on a value of “1” if a respondent identified french fries as their preferred side dish and a value of “0” if they did not. The third should be named “Longer.Distances” and should take on a value of “1” if a respondent is willing to drive longer than average for good BBQ and and value of “0” if they are not.
VisualizeBivariateData %>%
mutate(Prefers.Brisket=NA) %>%
mutate(Prefers.Brisket=replace(Prefers.Brisket, Favorite.Meat==5, 1)) %>%
mutate(Prefers.Brisket=replace(Prefers.Brisket, Favorite.Meat < 5, 0)) ->VisualizeBivariateData
VisualizeBivariateData %>%
mutate(Prefers.Fries=NA) %>%
mutate(Prefers.Fries=replace(Prefers.Fries, Favorite.Side==5, 1)) %>%
mutate(Prefers.Fries=replace(Prefers.Fries, Favorite.Side < 5, 0)) ->VisualizeBivariateData
VisualizeBivariateData %>%
mutate(Longer.Distances=NA) %>%
mutate(Longer.Distances=replace(Longer.Distances, Minutes.Driving==5, 1)) %>%
mutate(Longer.Distances=replace(Longer.Distances, Minutes.Driving < 5, 0)) ->VisualizeBivariateData
We want to know if those who prefer brisket over other types of meat are willing to pay more for a dinner plate than those who do not prefer brisket. Create a scatter plot between the variable for the price someone is willing to pay for a dinner plate and the dichotomous variable you created indicating if someone prefers brisket.
ggplot(VisualizeBivariateData,
aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point()
ggplot(VisualizeBivariateData,
aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point(position = "jitter")
ggplot(VisualizeBivariateData,
aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point(position = "jitter")+
labs(x="Prefers Brisket",y="Dinner Plate Price")
lightsteelblue4.ggplot(VisualizeBivariateData,
aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point(position = "jitter", color="lightsteelblue4")+
labs(x="Prefers Brisket",y="Dinner Plate Price")
ggplot(VisualizeBivariateData, aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point(position = "jitter", color="lightsteelblue4")+
labs(x="Prefers Brisket",y="Dinner Plate Price")+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line())
ggplot(VisualizeBivariateData, aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point(position = "jitter", color="lightsteelblue4")+
labs(x="Prefers Brisket",y="Dinner Plate Price")+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line())+ scale_x_discrete(labels = c("0" = "Not Preferred", "1" = "Preferred"))
ggplot(VisualizeBivariateData, aes(x=as.factor(Prefers.Brisket), y=Dinner.Plate.Price)) +
geom_point(position = "jitter", color="lightsteelblue4")+
labs(x="Prefers Brisket",y="Dinner Plate Price")+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line())+ scale_x_discrete(labels = c("0" = "Not Preferred", "1" = "Preferred"))+ ggtitle("Bivariate Relationship Betwen Preferring Brisket and Dinner Plate Price")
Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.
The independent variable is whether someone prefers brisket and the dependent variable is how much someone is willing to pay for a dinner plate. The relationship between the two variables is null.
We want to know if those who prefer fries are older than those who do not prefer fries. Create a scatter plot between the variable for the respondent’s age and the dichotomous variable you created indicating if someone prefers fries as their favorite side.
ggplot(VisualizeBivariateData,
aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point()
ggplot(VisualizeBivariateData,
aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point(position = "jitter")
ggplot(VisualizeBivariateData,
aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point(position = "jitter")+
labs(x="Prefers Fries",y="Age")
goldenrod3.ggplot(VisualizeBivariateData,
aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point(position = "jitter", color="goldenrod3")+
labs(x="Prefers Fries",y="Age")
ggplot(VisualizeBivariateData, aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point(position = "jitter", color="goldenrod3")+
labs(x="Prefers Fries",y="Age")+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line())
ggplot(VisualizeBivariateData, aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point(position = "jitter", color="goldenrod3")+
labs(x="Prefers Fries",y="Age")+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line())+ scale_x_discrete(labels = c("0" = "Not Preferred", "1" = "Preferred"))
ggplot(VisualizeBivariateData, aes(x=as.factor(Prefers.Fries), y=Age)) +
geom_point(position = "jitter", color="goldenrod3")+
labs(x="Prefers Fries",y="Age")+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line())+ scale_x_discrete(labels = c("0" = "Not Preferred", "1" = "Preferred"))+ ggtitle("Bivariate Relationship Betwen Preferring Fries and Age")
Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.
The independent variable is whether someone prefers fires and the dependant variable is how old someone is. The relationship between the two variables is null.
We are interested in whether someone who prefers brisket is more or less likely to also prefer fries than someone who does not prefer brisket. Construct a contingency table between the two dichotomous variables that you created indicating if someone prefers brisket and if someone prefers fries.
VisualizeBivariateData %>%
mutate(Prefers.Brisket=NA) %>% mutate(Prefers.Brisket=replace(Prefers.Brisket, Favorite.Meat > mean(Prefers.Brisket), 1)) %>%
mutate(Prefers.Brisket=replace(Prefers.Brisket, Favorite.Meat <= mean(Prefers.Brisket), 0)) -> VisualizeBivariateData
VisualizeBivariateData %>%
mutate(Prefers.Fries=NA) %>% mutate(Prefers.Fries=replace(Prefers.Fries, Favorite.Side > mean(Prefers.Fries), 1)) %>%
mutate(Prefers.Fries=replace(Prefers.Fries, Favorite.Side <= mean(Prefers.Fries), 0)) -> VisualizeBivariateData
VisualizeBivariateData %>%
tabyl(Prefers.Fries, Prefers.Brisket)%>%
adorn_title()
## Prefers.Brisket
## Prefers.Fries NA_
## <NA> 379
VisualizeBivariateData %>%
mutate(Prefers.Fries.Label=NA) %>%
mutate(Prefers.Fries.Label=replace(Prefers.Fries.Label,Prefers.Fries==1,"Preferred")) %>%
mutate(Prefers.Fries.Label=replace(Prefers.Fries.Label,Prefers.Fries==0,"Not Preferred")) %>%
mutate(Prefers.Brisket.Label=NA) %>%
mutate(Prefers.Brisket.Label=replace(Prefers.Brisket.Label,Prefers.Brisket==1,"Preferred")) %>%
mutate(Prefers.Brisket.Label=replace(Prefers.Brisket.Label,Prefers.Brisket==0,"Not Preferred")) ->VisualizeBivariateData
VisualizeBivariateData %>%
tabyl(Prefers.Fries.Label,Prefers.Brisket.Label) %>%
adorn_title(row_name = "Prefers Fries", col_name = "Prefers Brisket")
## Prefers Brisket
## Prefers Fries NA_
## <NA> 379
VisualizeBivariateData %>%
tabyl(Prefers.Fries.Label, Prefers.Brisket.Label) %>%
adorn_percentages("col") %>%
adorn_title()
## Prefers.Brisket.Label
## Prefers.Fries.Label NA_
## <NA> 1
VisualizeBivariateData %>%
tabyl(Prefers.Fries.Label, Prefers.Brisket.Label) %>%
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 1) %>%
adorn_title()
## Prefers.Brisket.Label
## Prefers.Fries.Label NA_
## <NA> 100.0%
VisualizeBivariateData %>%
tabyl(Prefers.Fries.Label, Prefers.Brisket.Label) %>%
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 1) %>%
adorn_ns() %>%
adorn_title()
## Prefers.Brisket.Label
## Prefers.Fries.Label NA_
## <NA> 100.0% (379)
Write a brief description of the relationship between these two variables identified in the contingency table you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null.
The independent variable is whether someone prefers fries and the dependent variable is whether someone prefers brisket. The relationship between the two variables is null.
Click the “Knit” button to publish your work as an html document. This document or file will appear in the folder specified by your working directory. You will need to upload both this RMarkdown file and the html file it produces to AsU Learn to get all of the points associated with this lab.