This file contains a set of tasks that you need to complete in R for the lab assignment. The tasks may require you to add a code chuck, type code into a chunk, and/or execute code. Don’t forget that you need to acknowledge if you used any resources beyond class materials or got help to complete the assignment.
Additional information, include examples and code, about this assignment can be found in the file “VisualizingRelationshipsBetween2Variables.html”.
The data set you will use is different than the one used in the instructions. Pay attention to the differences in the Excel files’ names, any variable names, or object names. You will need to adjust your code accordingly.
Once you have completed the assignment, you will need to knit this R Markdown file to produce an .html file. You will then need to upload the .html file and this .Rmd file to AsULearn.
The first thing you need to do in this file is to add your name and date in the lines underneath this document’s title.
You need to identify and set your working directory in this section. If you are working in the cloud version of RStudio, enter a note here to tell us that you did not need to change the working directory because you are working in the cloud. > getwd() >setwd(“/Users/corddoss/Desktop/Research methods class/Week 9”)
You need to install and load the packages and data set you’ll use for
the lab assignment in this section. In this lab, we will use the
following packages: dplyr, tidyverse,
forcats, ggplot2, janitor and
openxlsx. We have not used the package janitor
in previous labs, so you will need to install it before you can load it.
> library(“dplyr”) > library(“tidyverse”) >library(“openxlsx”)
>library(“forcats”) >library(“ggplot2”) >
install.packages(“janitor”) > library(janitor) >
VisualizingRelationshipsData <-
read.xlsx(“VisualizingRelationshipsData.xlsx”)
>names(VisualizingRelationshipsData) # 4. Scatterplot of Price for
Rib Plate and Driving Distance Create a scatterplot showing the
relationship between variables that identify how much respondents are
willing to pay for a plate of ribs and how far they are willing to drive
for good BBQ. When making this scatterplot, let’s assume you are
interested in whether how much respondents are willing to pay for a
plate of ribs influences how far they are willing to drive for good BBQ.
>ggplot(VisualizingRelationshipsData, aes(x = Ribs.Price, y =
Minutes.Driving)) + geom_point(color = “darkseagreen2”) +
stat_smooth(method = “lm”, formula = y ~ x, geom = “smooth”, color =
“darkorchid4”, se = FALSE) + labs( x = “Price for Rib Plate”, y =
“Minutes Willing to Drive”, title = “Relationship between Rib Plate
Price and Driving Distance” ) + theme( panel.grid.major =
element_blank(), panel.grid.minor = element_blank(), panel.background =
element_blank(), plot.background = element_blank(), axis.line =
element_line() ) - Change the labels on the x- and y-axes to words
instead of using the variable names. Make sure that the labels on both
the x- and y-axes are capitalized.
Change the color of the dots to
darkseagreen2.
There should be no color or grid lines in the background.
The graph should have a line of best fit in
darkorchid4.
The graph should have a title.
Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null. >The scatterplot shows that people are more willing to pay a higher price for BBQ and drive a little further. It isn’t a strong relationship but it has positive trends. Featured in the graph is the price for a rib plate on the x-axis(independent) and the mintues driving on the y-axis(dependent) # 6. Scatterplot of Driving Distance and Age Create a scatterplot showing the relationship between how far a respondent is willing to drive for good BBQ and their age. When making this scatterplot, let’s assume you are interested in whether how far someone is willing to drive is a function of their age. >ggplot(VisualizingRelationshipsData, aes(x = Age, y = Minutes.Driving)) + geom_point(color = “deepskyblue”) + stat_smooth(method = “lm”, formula = y ~ x, geom = “smooth”, color = “firebrick2”, se = FALSE) + labs( x = “Age”, y = “Minutes Willing to Drive”, title = “Relationship between Age and Driving Distance for BBQ” ) + theme( panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), plot.background = element_blank(), axis.line = element_line() )
Change the labels on the x- and y-axes to words instead of using the variable names. Make sure that the labels on both the x- and y-axes are capitalized.
Change the color of the dots to
deepskyblue.
There should be no color or grid lines in the background.
The graph should have a line of best fit in
firebrick2.
The graph should have a title.
Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null. >The scatterplot graph seems to show that age doesnt affect how far someone is willing to drive for BBQ. The relatiosnhip on the graph isnt very big so it alligns more with null. The independent variable is Age (x-axis) and the dependent variable is the mintues theyre willing to drive (y-axis). # 8. Creating Dichotomous Variables You need to create three dichotomous variables based on existing variables in the data set in this section. The first should be named “Prefers.Brisket” and should take on a value of “1” if a respondent identified beef brisket as their preferred type of BBQ meat and a value of “0” if they did not. The second should be named “Prefers.Fries” and should take on a value of “1” if a respondent identified french fries as their preferred side dish and a value of “0” if they did not. The third should be named “Longer.Distances” and should take on a value of “1” if a respondent is willing to drive longer than average for good BBQ and and value of “0” if they are not. >VisualizingRelationshipsData %>% mutate(Prefers.Brisket = ifelse(Favorite.Meat == “Beef Brisket”, 1, 0), Prefers.Fries = ifelse(Favorite.Side == “French Fries”, 1, 0), Longer.Distances = ifelse(Minutes.Driving > mean(Minutes.Driving, na.rm = TRUE), 1, 0)) -> VisualizingRelationshipsData
We want to know if those who prefer brisket over other types of meat
are willing to pay more for a dinner plate than those who do not prefer
brisket. Create a scatter plot between the variable for the price
someone is willing to pay for a dinner plate and the dichotomous
variable you created indicating if someone prefers brisket.
>ggplot(VisualizingRelationshipsData, aes(x =
as.factor(Prefers.Brisket), y = Dinner.Plate.Price)) +
geom_point(position = “jitter”, color = “lightsteelblue4”) + labs( x =
“Preference for Brisket”, y = “Price for Dinner Plate”, title =
“Relationship Between Preferring Brisket and Dinner Plate Price” ) +
theme( panel.grid.major = element_blank(), panel.grid.minor =
element_blank(), panel.background = element_blank(), plot.background =
element_blank(), axis.line = element_line() ) + scale_x_discrete(labels
= c(“0” = “Does Not Prefer Brisket”, “1” = “Prefers Brisket”)) - Your
continuous variable should be along the y-axis and the dichotomous
variable should be on the x-axis.
Use the “jitter” option to spread out the data points.
Change the labels on the x- and y-axes to words instead of using the variable names. Make sure that the labels on both the x- and y-axes are capitalized.
Change the color of the dots to
lightsteelblue4.
There should be no color or grid lines in the background.
Change the values “0” and “1” on the x-axis to words describing each category.
The graph should have a title.
Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null. >The scatterplot shows that the people who prefer brisket dont pay at a different rate higher or lower than those who dont. The points on the plot are spread pretty evenly in both groups so there isnt a clear relationship. The independent variable is the preference for brisket(x-axis), while the dependent variable is the price for a dinner plate (y-axis)
We want to know if those who prefer fries are older than those who do
not prefer fries. Create a scatter plot between the variable for the
respondent’s age and the dichotomous variable you created indicating if
someone prefers fries as their favorite side.
>ggplot(VisualizingRelationshipsData, aes(x =
as.factor(Prefers.Fries), y = Age)) + geom_point(position = “jitter”,
color = “goldenrod3”) + labs( x = “Preference for Fries”, y = “Age”,
title = “Relationship Between Preferring Fries and Age” ) + theme(
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), plot.background = element_blank(),
axis.line = element_line() ) + scale_x_discrete(labels = c(“0” = “Does
Not Prefer Fries”, “1” = “Prefers Fries”))
Your continuous variable should be along the y-axis and the dichotomous variable should be on the x-axis.
Use the “jitter” option to spread out the data points.
Change the labels on the x- and y-axes to words instead of using the variable names. Make sure that the labels on both the x- and y-axes are capitalized.
Change the color of the dots to goldenrod3.
There should be no color or grid lines in the background.
Change the value “0” and “1” on the x-axis to words describing each categeory.
The graph should have a title.
Write a brief description of the relationship between these two variables identified in the scatterplot you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null. >The scatterplot shows that the people who prefer fries arent particularlly older or younger. Therefore there isnt a real patter so the relatiosnhip is null. The independent variable is the preference for fries (x-axis) and the dependent variable is age (y-axis)
We are interested iin whether someone who prefers brisket is more or less likely to also prefer fries than someone who does not prefer brisket. Construct a contingency table between the two dichotomous variables that you created indicating if someone prefers brisket and if someone prefers fries. >VisualizingRelationshipsData <- VisualizingRelationshipsData %>% mutate( Prefers.Brisket = ifelse(Favorite.Meat == “Beef Brisket”, 1, 0), Prefers.Fries = ifelse(Favorite.Side == “French Fries”, 1, 0) )
Have prefers fries as the x variable (on the side) and prefers brisket as the y variable (across the top). >VisualizingRelationshipsData <- VisualizingRelationshipsData %>% mutate( Prefers.Brisket.Label = ifelse(Prefers.Brisket == 1, “Prefers Brisket”, “Does Not Prefer Brisket”), Prefers.Fries.Label = ifelse(Prefers.Fries == 1, “Prefers Fries”, “Does Not Prefer Fries”) )
Include the variable names as titles for the top and side of the table. VisualizingRelationshipsData %>% tabyl(Prefers.Fries.Label, Prefers.Brisket.Label) %>% adorn_percentages(“col”) %>% adorn_pct_formatting(digits = 1) %>% adorn_ns() %>% adorn_title(row_name = “Prefers Fries”, col_name = “Prefers Brisket”)
Construct a variable and use the variable to label the columns and rows.
Calculate the percentages for the columns.
Report the percentages to the 1st decimal place.
Include the number of observations in each cell.
Write a brief description of the relationship between these two variables identified in the contingency table you created. Your description should identify which variable is your dependent variable and which variable is your independent variable. The description should also describe the relationship as positive, negative, or null. >The independent variable is if someone prefers brisket and our dependent is if someone prefers fries. when Looking at the table most people who do not prefer brisket also do not prefer fries (86.3%), while the smaller percentage of (13.7%) do prefer fries. With this it shows that there is not a strong differnce between the two groups, suggesting that there is a weak relationship of people preferring brisket and preferring fries. someone
Click the “Knit” button to publish your work as an html document. This document or file will appear in the folder specified by your working directory. You will need to upload both this RMarkdown file and the html file it produces to AsU Learn to get all of the points associated with this lab.