title: “PS/CJ 3115 Fall 2025 Analyzing Bivariate Relationships Lab
Assignment”
author: “(Cord Doss)”
date: “(11/7/25)” output: html_document —
This file contains a set of tasks that you need to complete in R for the lab assignment. The tasks may require you to add a code chuck, type code into a chunk, and/or execute code. Don’t forget that you need to acknowledge if you used any resources beyond class materials or got help to complete the assignment.
Instructions associated with this assignment can be found in the file “AnalyzingBivariateRelationshipsTutorial.html”.
The data set you will use is different than the one used in the tutorial. Pay attention to the differences in the Excel files’ names, any variable names, or object names. You will need to adjust your code accordingly.
When asked to describe a relationship, your answer needs to directly engage with the statistical analysis you conducted. Your discussion should include the following:
Once you have completed the assignment, you will need to knit this R Markdown file to produce an .html file. You will then need to upload the .html file and this .Rmd file to AsULearn.
The first thing you need to do in this file is to add your name and date in the lines underneath this document’s title.
You need to identify and set your working directory in this section. If you are working in the cloud version of RStudio, enter a note here to tell us that you did not need to change the working directory because you are working in the cloud.
setwd("/Users/corddoss/Desktop/Research methods class/Week 10/")
You need to install and load the packages and data set you’ll use for
the lab assignment in this section. In this lab, we will use the
following packages: dplyr, tidyverse,
forcats, ggplot2, janitor and
openxlsx.
# Load all required packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(forcats)
library(ggplot2)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(openxlsx)
read.xlsx("BivariateRelationshipsData.xlsx") -> AnalyzeBivariateData
names(AnalyzeBivariateData)
## [1] "Observation" "Sex" "Age"
## [4] "Hometown" "Favorite.Meat" "Favorite.Sauce"
## [7] "Sweetness" "Favorite.Side" "Restaurant.City"
## [10] "Restaurant.Name" "Minutes.Driving" "Sandwich.Price"
## [13] "Dinner.Plate.Price" "Ribs.Price"
You want to know if there is a relationship between the price someone is willing to pay for a plate of ribs and how far they are willing to drive. Calculate the Pearson’s correlation coefficient between the variables that identify how much respondents are willing to pay for a plate of ribs and how far they are willing to drive for good BBQ. Round the coefficient to the fourth decimal place.
round(cor(AnalyzeBivariateData$Ribs.Price,AnalyzeBivariateData$Minutes.Driving, use="pairwise.complete.obs", method="pearson"), digits = 4)
## [1] 0.1795
Calculate the test statistic and \(p\)-value for the correlation between Rib Plate and Driving Distance. Do not try to round the coefficient when calculating the significance.
cor.test(AnalyzeBivariateData$Ribs.Price,AnalyzeBivariateData$Minutes.Driving,use="pairwise.complete.obs", method="pearson")
##
## Pearson's product-moment correlation
##
## data: AnalyzeBivariateData$Ribs.Price and AnalyzeBivariateData$Minutes.Driving
## t = 3.5432, df = 377, p-value = 0.0004448
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08023825 0.27527828
## sample estimates:
## cor
## 0.1795218
Write a brief description of the relationship between Rib Plate and Driving Distance based on your calculations from tasks 4 and 5. The correlation coefficient is 0.1795, and the p value is 0.0004448, 0 does not fall between the confidence intervals which means we could say the relatiosnhip between Rib Plate Price and Driving Distance is not strongly correlated but is signifcant. # 7. Correlation of Driving Distance and Age You want to know if there is a relationship between how far someone is willing to drive for good BBQ and their age. Calculate the Pearson’s correlation coefficient between the variables that identify how far someone is willing to drive for good BBQ and their age. Round the coefficient to the thrid decimal place.
round(cor(AnalyzeBivariateData$Minutes.Driving,AnalyzeBivariateData$Age, use="pairwise.complete.obs", method="pearson"),digits = 3)
## [1] 0.075
Calculate the test statistic and \(p\)-value for the correlation between Driving Distance and Age. Do not try to round the coefficient when calculating the significance.
cor.test(AnalyzeBivariateData$Minutes.Driving,AnalyzeBivariateData$Age,use="pairwise.complete.obs", method="pearson")
##
## Pearson's product-moment correlation
##
## data: AnalyzeBivariateData$Minutes.Driving and AnalyzeBivariateData$Age
## t = 1.4536, df = 377, p-value = 0.1469
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.02627864 0.17407908
## sample estimates:
## cor
## 0.07465359
Write a brief description of the relationship between Rib Plate and Driving Distance based on your calculations from tasks 7 and 8. The correlation coefficent is 0.075 and the p value is 0.1469, and 0 does fall between the two confidence intervals. We ccould say that the relationship between Driving Distance and Age is not correlated and or significant statistically. # 10. Creating Dichotomous Variables You need to create three dichotomous variables based on existing variables in the data set in this section. The first should be named “Prefers.Poultry” and should take on a value of “1” if a respondent identified poultry as their preferred type of BBQ meat and a value of “0” if they did not. The second should be named “Prefers.Beans” and should take on a value of “1” if a respondent identified baked beans as their preferred side dish and a value of “0” if they did not. The third should be named “Longer.Distances” and should take on a value of “1” if a respondent is willing to drive longer than average for good BBQ and and value of “0” if they are not.
AnalyzeBivariateData %>%
mutate(Prefers.Poultry=NA) %>%
mutate(Prefers.Poultry=replace(Prefers.Poultry, Favorite.Meat==5, 1)) %>%
mutate(Prefers.Poultry=replace(Prefers.Poultry, Favorite.Meat < 5, 0)) ->AnalyzeBivariateData
AnalyzeBivariateData %>%
mutate(Prefers.Beans=NA) %>%
mutate(Prefers.Beans=replace(Prefers.Beans, Favorite.Side==1, 1)) %>%
mutate(Prefers.Beans=replace(Prefers.Beans, Favorite.Side > 1, 0))-> AnalyzeBivariateData
AnalyzeBivariateData %>%
mutate(Longer.Distances=NA) %>%
mutate(Longer.Distances=replace(Longer.Distances, Minutes.Driving==39, 1)) %>%
mutate(Longer.Distances=replace(Longer.Distances, Minutes.Driving < 39, 1)) %>%
mutate(Longer.Distances=replace(Longer.Distances, Minutes.Driving > 39, 0)) ->AnalyzeBivariateData
We want to know if those who prefer poultry over other types of meat are willing to pay more for a dinner plate than those who do not prefer poultry. Perform a difference-of-means test between the price someone is willing to pay for a dinner plate and their preference for poultry.
t.test(Dinner.Plate.Price ~ Prefers.Poultry , data = AnalyzeBivariateData)
##
## Welch Two Sample t-test
##
## data: Dinner.Plate.Price by Prefers.Poultry
## t = 1.122, df = 47.113, p-value = 0.2676
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -1.026275 3.614791
## sample estimates:
## mean in group 0 mean in group 1
## 18.63636 17.34211
Write a brief description of the relationship between the price of a dinner plate and someone’s preference for poultry based your calculations from task 11. >The P value is 0.2676 and 0 would fall between the confidence intervals. This means the relationship between the Price of a Dinner plate and the Preference in Pultry is not significant. # 13. Relationship between Age and Prefers Baked Beans We want to know if those who prefer baked beans are older than those who do not prefer baked beans. Perform a difference-of-means test between the variables that identify a respondent’s age and their preference for baked beans.
t.test(Age ~ Prefers.Beans , data = AnalyzeBivariateData)
##
## Welch Two Sample t-test
##
## data: Age by Prefers.Beans
## t = -1.0651, df = 62.541, p-value = 0.2909
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -6.479797 1.974504
## sample estimates:
## mean in group 0 mean in group 1
## 26.05505 28.30769
Write a brief description of the relationship between the the respondent’s age and the preference for baked beans based on your calculations from task 13. >The p value is 0.2909 and 0 falls between the confidence intervals. We could say that the relationship between age and preference of baked beans is not significant
We are interested in whether someone who prefers poultry is more or less likely to also prefer baked beans than someone who does not prefer poultry. Construct a contingency table between the two dichotomous variables that you created indicating if someone prefers poultry and if someone prefers baked beans.
Have prefers baked beans as the x (on the side) and prefers poultry as the y (across the top).
Include the variable names as titles for the top and side of the table.
Construct a variable and use the variable to label the columns and rows.
Calculate the percentages for the columns.
Report the percentages to the 1st decimal place.
Include the number of observations in each cell.
AnalyzeBivariateData %>%
mutate(Prefers.Poultry.Label=NA) %>%
mutate(Prefers.Poultry.Label=replace(Prefers.Poultry.Label, Prefers.Poultry==1, "Prefers Poultry")) %>%
mutate(Prefers.Poultry.Label=replace(Prefers.Poultry.Label, Prefers.Poultry==0,"Does Not Prefer Poultry")) ->AnalyzeBivariateData
AnalyzeBivariateData %>%
mutate(Prefers.Beans.Label=NA) %>%
mutate(Prefers.Beans.Label=replace(Prefers.Beans.Label, Prefers.Beans==1,"Prefers Beans")) %>%
mutate(Prefers.Beans.Label=replace(Prefers.Beans.Label, Prefers.Beans==0, "Does Not Prefer Beans")) ->AnalyzeBivariateData
AnalyzeBivariateData %>%
tabyl(Prefers.Beans.Label, Prefers.Poultry.Label) %>%
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 1) %>%
adorn_ns() %>%
adorn_title(row_name = "Beans Preference" , col_name = "Poultry Preference")
## Poultry Preference
## Beans Preference Does Not Prefer Poultry Prefers Poultry
## Does Not Prefer Beans 85.6% (292) 92.1% (35)
## Prefers Beans 14.4% (49) 7.9% (3)
Test to see if there is a relationship between preferring poultry and preferring baked beans. chisq.test(AnalyzeBivariateData\(Prefers.Poultry, AnalyzeBivariateData\)Prefers.Beans) # 17. Describe the relationship between prefers poultry and prefers fries. Write a brief description describing the relationship between preferring poultry and preferring baked beans using your results from tasks 15 and 16. The p value 0.3943, 92.1% of people who prefer poultry and do not prefer beans. Then 85.6% of those who do not prefer poultry do not prefer beans, we could say that the relationship between preferring pultry and preferring beans is not correlated and significant
Click the “Knit” button to publish your work as an html document. This document or file will appear in the folder specified by your working directory. You will need to upload both this RMarkdown file and the html file it produces to AsU Learn to get all of the points associated with this lab.