Overview

This file contains a set of tasks that you need to complete in R for the lab assignment. The tasks may require you to add a code chuck, type code into a chunk, and/or execute code. Don’t forget that you need to acknowledge if you used any resources beyond class materials or got help to complete the assignment.

Instructions associated with this assignment can be found in the file “AnalyzingBivariateRelationshipsTutorial.html”.

The data set you will use is different than the one used in the tutorial. Pay attention to the differences in the Excel files’ names, any variable names, or object names. You will need to adjust your code accordingly.

When asked to describe a relationship, your answer needs to directly engage with the statistical analysis you conducted. Your discussion should include the following:

Once you have completed the assignment, you will need to knit this R Markdown file to produce an .html file. You will then need to upload the .html file and this .Rmd file to AsULearn.

1. Add your Name and the Date

The first thing you need to do in this file is to add your name and date in the lines underneath this document’s title.

2. Identify and Set Your Working Directory

You need to identify and set your working directory in this section. If you are working in the cloud version of RStudio, enter a note here to tell us that you did not need to change the working directory because you are working in the cloud.

getwd()
## [1] "/Users/summersimpson/Downloads/AnalyzingBivariateRelationshipsFall2025 2"
setwd("/Users/summersimpson/Downloads/AnalyzingBivariateRelationshipsFall2025 2")

3. Installing and Loading Packages and Data Set

You need to install and load the packages and data set you’ll use for the lab assignment in this section. In this lab, we will use the following packages: dplyr, tidyverse, forcats, ggplot2, janitor and openxlsx.

install.packages("janitor")
## 
## The downloaded binary packages are in
##  /var/folders/76/w01_ncvd5pn4r8v5nxz870fh0000gn/T//RtmpsoXFhp/downloaded_packages
library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.2     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.3.0
## ✔ purrr     1.1.0     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("openxlsx")
library("forcats")
library("ggplot2")
library("janitor")
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
AnalyzeBivariateData <- read.xlsx("BivariateRelationshipsData.xlsx")
names(AnalyzeBivariateData)
##  [1] "Observation"        "Sex"                "Age"               
##  [4] "Hometown"           "Favorite.Meat"      "Favorite.Sauce"    
##  [7] "Sweetness"          "Favorite.Side"      "Restaurant.City"   
## [10] "Restaurant.Name"    "Minutes.Driving"    "Sandwich.Price"    
## [13] "Dinner.Plate.Price" "Ribs.Price"

4. Correlation between Rib Plate and Driving Distance

You want to know if there is a relationship between the price someone is willing to pay for a plate of ribs and how far they are willing to drive. Calculate the Pearson’s correlation coefficient between the variables that identify how much respondents are willing to pay for a plate of ribs and how far they are willing to drive for good BBQ. Round the coefficient to the fourth decimal place.

round(cor(AnalyzeBivariateData$Ribs.Price,
          AnalyzeBivariateData$Minutes.Driving,
          use = "pairwise.complete.obs",
          method = "pearson"), 4)
## [1] 0.1795

5. Calcuate the Signifiance of the Correlation between Rib Plate and Driving Distance

Calculate the test statistic and \(p\)-value for the correlation between Rib Plate and Driving Distance. Do not try to round the coefficient when calculating the significance.

cor.test(AnalyzeBivariateData$Ribs.Price,AnalyzeBivariateData$Sandwich.Price,use="pairwise.complete.obs", method="pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  AnalyzeBivariateData$Ribs.Price and AnalyzeBivariateData$Sandwich.Price
## t = 12.976, df = 377, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4818736 0.6215804
## sample estimates:
##       cor 
## 0.5556368

6. Describe the Relationship between Rib Plate and Driving Distance

Write a brief description of the relationship between Rib Plate and Driving Distance based on your calculations from tasks 4 and 5. *There is no clear relationship between how much someone is willing to pay for a rib plate and how far they are willing to drive, meaning one does not predict the other. # 7. Correlation of Driving Distance and Age You want to know if there is a relationship between how far someone is willing to drive for good BBQ and their age. Calculate the Pearson’s correlation coefficient between the variables that identify how far someone is willing to drive for good BBQ and their age. Round the coefficient to the thrid decimal place.

round(cor(AnalyzeBivariateData$Minutes.Driving,
          AnalyzeBivariateData$Age,
          use = "pairwise.complete.obs",
          method = "pearson"), 3)
## [1] 0.075

8. Calcuate the Signifiance of the Correlation between Driving Distance and Age

Calculate the test statistic and \(p\)-value for the correlation between Driving Distance and Age. Do not try to round the coefficient when calculating the significance.

cor.test(AnalyzeBivariateData$Minutes.Driving,
         AnalyzeBivariateData$Age,
         use = "pairwise.complete.obs",
         method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  AnalyzeBivariateData$Minutes.Driving and AnalyzeBivariateData$Age
## t = 1.4536, df = 377, p-value = 0.1469
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.02627864  0.17407908
## sample estimates:
##        cor 
## 0.07465359

9. Describe Relationship between Driving Distance and Age

Write a brief description of the relationship between Rib Plate and Driving Distance based on your calculations from tasks 7 and 8. *The independent variable in this relationship is Age, and the dependent variable is Driving Distance. The correlation is very small, which indicates a null relationship. This means age does not really influence how far someone is willing to drive for good BBQ. # 10. Creating Dichotomous Variables You need to create three dichotomous variables based on existing variables in the data set in this section. The first should be named “Prefers.Poultry” and should take on a value of “1” if a respondent identified poultry as their preferred type of BBQ meat and a value of “0” if they did not. The second should be named “Prefers.Beans” and should take on a value of “1” if a respondent identified baked beans as their preferred side dish and a value of “0” if they did not. The third should be named “Longer.Distances” and should take on a value of “1” if a respondent is willing to drive longer than average for good BBQ and and value of “0” if they are not.

AnalyzeBivariateData <- AnalyzeBivariateData %>%
  mutate(
    # 1. Prefers Poultry (1 = Poultry, 0 = Not Poultry)
    Prefers.Poultry = ifelse(Favorite.Meat == 3, 1, 0),

    # 2. Prefers Beans (1 = Baked Beans, 0 = Not Baked Beans)
    Prefers.Beans = ifelse(Favorite.Side == "Baked Beans", 1, 0),

    # 3. Longer Distances (1 = Above Average Driving Distance, 0 = At/Below Average)
    Longer.Distances = ifelse(Minutes.Driving > mean(Minutes.Driving, na.rm = TRUE), 1, 0)
  )

11. Relationship between Price of Dinner Plate and Prefers Poultry

We want to know if those who prefer poultry over other types of meat are willing to pay more for a dinner plate than those who do not prefer poultry. Perform a difference-of-means test between the price someone is willing to pay for a dinner plate and their preference for poultry.

t.test(AnalyzeBivariateData$Dinner.Plate.Price ~ AnalyzeBivariateData$Prefers.Poultry)
## 
##  Welch Two Sample t-test
## 
## data:  AnalyzeBivariateData$Dinner.Plate.Price by AnalyzeBivariateData$Prefers.Poultry
## t = -1.3334, df = 89.33, p-value = 0.1858
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -3.7128025  0.7307283
## sample estimates:
## mean in group 0 mean in group 1 
##        18.22727        19.71831

12. Describe the Relationship between Price of Dinner Plate and Prefers Poultry

Write a brief description of the relationship between the price of a dinner plate and someone’s preference for poultry based your calculations from task 11. *There was no meaningful difference in how much people were willing to pay for a dinner plate based on whether they preferred poultry or not. This means that liking poultry doesn’t really affect the price someone is willing to pay. # 13. Relationship between Age and Prefers Baked Beans We want to know if those who prefer baked beans are older than those who do not prefer baked beans. Perform a difference-of-means test between the variables that identify a respondent’s age and their preference for baked beans.

AnalyzeBivariateData$Prefers.Beans <- ifelse(AnalyzeBivariateData$Favorite.Side == 4, 1, 0)
table(AnalyzeBivariateData$Prefers.Beans)
## 
##   0   1 
## 276 103
t.test(Age ~ Prefers.Beans, data = AnalyzeBivariateData)
## 
##  Welch Two Sample t-test
## 
## data:  Age by Prefers.Beans
## t = 3.0143, df = 293.26, p-value = 0.002801
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  1.205436 5.740603
## sample estimates:
## mean in group 0 mean in group 1 
##        27.30797        23.83495

14. Describe the Relationship between Age and Prefers Baked Beans

Write a brief description of the relationship between the the respondent’s age and the preference for baked beans based on your calculations from task 13. *there was no significant difference in age between respondents who prefer baked beans and those who do not. This suggests that age does not really influence whether someone prefers baked beans as a side.

15. Contingency Table of Prefers Poultry and Prefers Baked Beans

We are interested in whether someone who prefers poultry is more or less likely to also prefer baked beans than someone who does not prefer poultry. Construct a contingency table between the two dichotomous variables that you created indicating if someone prefers poultry and if someone prefers baked beans.

cont_table <- AnalyzeBivariateData %>%
  tabyl(Prefers.Poultry, Prefers.Beans) %>%   
  adorn_percentages("col") %>%                
  adorn_pct_formatting(digits = 1) %>%       
  adorn_ns()                                  

rownames(cont_table) <- c("Does Not Prefer Poultry", "Prefers Poultry")
colnames(cont_table) <- c("Does Not Prefer Baked Beans", "Prefers Baked Beans")

cont_table
##  Does Not Prefer Baked Beans Prefers Baked Beans       <NA>
##                            0         81.5% (225) 80.6% (83)
##                            1         18.5%  (51) 19.4% (20)

16. Perform a \(\chi^2\) test on the contingency table.

Test to see if there is a relationship between preferring poultry and preferring baked beans.

poultry_beans_table <- table(AnalyzeBivariateData$Prefers.Poultry, AnalyzeBivariateData$Prefers.Beans)

chisq.test(poultry_beans_table)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  poultry_beans_table
## X-squared = 0.0036617, df = 1, p-value = 0.9517

17. Describe the relationship between prefers poultry and prefers fries.

Write a brief description describing the relationship between preferring poultry and preferring baked beans using your results from tasks 15 and 16. *There is no significant relationship between preferring poultry and preferring baked beans. This suggests that a person’s preference for poultry does not predict whether they also prefer baked beans.

Publish Document

Click the “Knit” button to publish your work as an html document. This document or file will appear in the folder specified by your working directory. You will need to upload both this RMarkdown file and the html file it produces to AsU Learn to get all of the points associated with this lab.