Overview

This file contains a set of tasks that you need to complete in R for the lab assignment. The tasks may require you to add a code chuck, type code into a chunk, and/or execute code. Don’t forget that you need to acknowledge if you used any resources beyond class materials or got help to complete the assignment.

Instructions associated with this assignment can be found in the file “AnalyzingBivariateRelationshipsTutorial.html”.

The data set you will use is different than the one used in the tutorial. Pay attention to the differences in the Excel files’ names, any variable names, or object names. You will need to adjust your code accordingly.

When asked to describe a relationship, your answer needs to directly engage with the statistical analysis you conducted. Your discussion should include the following:

Identify your dependent and independent variables.
Identify what type of analysis you did to examine this relationship (for example, correlation, difference-in-means).
The results form that analysis. What is the correlation coefficient? What are the means of the two groups? How are the observations distributed across the cells in the contingency table?
Direction of the relationship (positive, negative, or null). For correlation coefficients, you should also discuss the strength of the relationship.
Is the relationship statistically significant and how do you know this?

Once you have completed the assignment, you will need to knit this R Markdown file to produce an .html file. You will then need to upload the .html file and this .Rmd file to AsULearn.

1. Add your Name and the Date

The first thing you need to do in this file is to add your name and date in the lines underneath this document’s title.

2. Identify and Set Your Working Directory

You need to identify and set your working directory in this section. If you are working in the cloud version of RStudio, enter a note here to tell us that you did not need to change the working directory because you are working in the cloud.

getwd()

## [1] "/Users/rlmcollins/Desktop"

setwd("/Users/rlmcollins/Desktop")

3. Installing and Loading Packages and Data Set

You need to install and load the packages and data set you’ll use for the lab assignment in this section. In this lab, we will use the following packages: dplyr, tidyverse, forcats, ggplot2, janitor and openxlsx.

library("dplyr")

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library("tidyverse")

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.2     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.3.0
## ✔ purrr     1.1.0     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library("openxlsx")
library("forcats")
library("ggplot2")
library("janitor")

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

AnalyzeBivariateData <- read.xlsx("BivariateRelationshipsData.xlsx")

names(AnalyzeBivariateData)

##  [1] "Observation"        "Sex"                "Age"               
##  [4] "Hometown"           "Favorite.Meat"      "Favorite.Sauce"    
##  [7] "Sweetness"          "Favorite.Side"      "Restaurant.City"   
## [10] "Restaurant.Name"    "Minutes.Driving"    "Sandwich.Price"    
## [13] "Dinner.Plate.Price" "Ribs.Price"

4. Correlation between Rib Plate and Driving Distance

You want to know if there is a relationship between the price someone is willing to pay for a plate of ribs and how far they are willing to drive. Calculate the Pearson’s correlation coefficient between the variables that identify how much respondents are willing to pay for a plate of ribs and how far they are willing to drive for good BBQ. Round the coefficient to the fourth decimal place.

cor(AnalyzeBivariateData$Ribs.Price,AnalyzeBivariateData$Minutes.Driving, use="pairwise.complete.obs", method="pearson")

## [1] 0.1795218

round(cor(AnalyzeBivariateData$Ribs.Price,AnalyzeBivariateData$Minutes.Driving, use="pairwise.complete.obs", method="pearson"),digits=4)

## [1] 0.1795

5. Calcuate the Signifiance of the Correlation between Rib Plate and Driving Distance

Calculate the test statistic and \(p\)-value for the correlation between Rib Plate and Driving Distance. Do not try to round the coefficient when calculating the significance.

cor.test(AnalyzeBivariateData$Ribs.Price,AnalyzeBivariateData$Minutes.Driving,use="pairwise.complete.obs", method="pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  AnalyzeBivariateData$Ribs.Price and AnalyzeBivariateData$Minutes.Driving
## t = 3.5432, df = 377, p-value = 0.0004448
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.08023825 0.27527828
## sample estimates:
##       cor 
## 0.1795218

6. Describe the Relationship between Rib Plate and Driving Distance

Write a brief description of the relationship between Rib Plate and Driving Distance based on your calculations from tasks 4 and 5.

There is a positive relationship between Ribs Price and Minutes Driving. However, the correlation is weak, meaning the relationship exists but is not very strong. The 95% confidence interval suggests that the true correlation in the population is likely small but positive.

7. Correlation of Driving Distance and Age

You want to know if there is a relationship between how far someone is willing to drive for good BBQ and their age. Calculate the Pearson’s correlation coefficient between the variables that identify how far someone is willing to drive for good BBQ and their age. Round the coefficient to the thrid decimal place.

cor(AnalyzeBivariateData$Minutes.Driving,AnalyzeBivariateData$Age, use="pairwise.complete.obs", method="pearson")

## [1] 0.07465359

round(cor(AnalyzeBivariateData$Minutes.Driving,AnalyzeBivariateData$Age, use="pairwise.complete.obs", method="pearson"),digits=3)

## [1] 0.075

8. Calcuate the Signifiance of the Correlation between Driving Distance and Age

Calculate the test statistic and \(p\)-value for the correlation between Driving Distance and Age. Do not try to round the coefficient when calculating the significance.

cor.test(AnalyzeBivariateData$Minutes.Driving,AnalyzeBivariateData$Age,use="pairwise.complete.obs", method="pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  AnalyzeBivariateData$Minutes.Driving and AnalyzeBivariateData$Age
## t = 1.4536, df = 377, p-value = 0.1469
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.02627864  0.17407908
## sample estimates:
##        cor 
## 0.07465359

9. Describe Relationship between Driving Distance and Age

Write a brief description of the relationship between Rib Plate and Driving Distance based on your calculations from tasks 7 and 8.

The correlation between Minutes Driving and Age is weak and not statistically significant. This indicates that there is no meaningful linear relationship between how long participants drive and their age. The 95% confidence interval includes zero, further suggesting that age does not reliably predict minutes spent driving in this sample.

10. Creating Dichotomous Variables

You need to create three dichotomous variables based on existing variables in the data set in this section. The first should be named “Prefers.Poultry” and should take on a value of “1” if a respondent identified poultry as their preferred type of BBQ meat and a value of “0” if they did not. The second should be named “Prefers.Beans” and should take on a value of “1” if a respondent identified baked beans as their preferred side dish and a value of “0” if they did not. The third should be named “Longer.Distances” and should take on a value of “1” if a respondent is willing to drive longer than average for good BBQ and and value of “0” if they are not.

AnalyzeBivariateData %>%                            
mutate(Prefers.Poultry=NA) %>%
mutate(Prefers.Poultry=replace(Prefers.Poultry, Favorite.Meat==5, 1)) %>%
mutate(Prefers.Poultry=replace(Prefers.Poultry, Favorite.Meat < 5, 0)) ->AnalyzeBivariateData

AnalyzeBivariateData %>%                            
mutate(Prefers.Beans=NA) %>%
mutate(Prefers.Beans=replace(Prefers.Beans, Favorite.Side==5, 1)) %>%
mutate(Prefers.Beans=replace(Prefers.Beans, Favorite.Side < 5, 0)) ->AnalyzeBivariateData

AnalyzeBivariateData %>%                            
mutate(Longer.Distances=NA) %>%
mutate(Longer.Distances=replace(Longer.Distances, Minutes.Driving==5, 1)) %>%
mutate(Longer.Distances=replace(Longer.Distances, Minutes.Driving < 5, 0)) ->AnalyzeBivariateData

11. Relationship between Price of Dinner Plate and Prefers Poultry

We want to know if those who prefer poultry over other types of meat are willing to pay more for a dinner plate than those who do not prefer poultry. Perform a difference-of-means test between the price someone is willing to pay for a dinner plate and their preference for poultry.

t.test(Dinner.Plate.Price ~ Prefers.Poultry, data = AnalyzeBivariateData)

## 
##  Welch Two Sample t-test
## 
## data:  Dinner.Plate.Price by Prefers.Poultry
## t = 1.122, df = 47.113, p-value = 0.2676
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.026275  3.614791
## sample estimates:
## mean in group 0 mean in group 1 
##        18.63636        17.34211

12. Describe the Relationship between Price of Dinner Plate and Prefers Poultry

Write a brief description of the relationship between the price of a dinner plate and someone’s preference for poultry based your calculations from task 11.

The results indicate that there is no statistically significant difference in Dinner Plate Price between those who prefer poultry and those who do not. The average Dinner Plate Price was slightly higher for individuals who do not prefer poultry compared to those who do prefer poultry, but this difference is not statistically meaningful. The 95% confidence interval includes zero, further suggesting that any difference observed may be due to chance rather than a real effect.

13. Relationship between Age and Prefers Baked Beans

We want to know if those who prefer baked beans are older than those who do not prefer baked beans. Perform a difference-of-means test between the variables that identify a respondent’s age and their preference for baked beans.

t.test(Age ~ Prefers.Beans, data = AnalyzeBivariateData)

## 
##  Welch Two Sample t-test
## 
## data:  Age by Prefers.Beans
## t = 6.7353, df = 100.63, p-value = 1.03e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  4.993434 9.163112
## sample estimates:
## mean in group 0 mean in group 1 
##        27.38596        20.30769

14. Describe the Relationship between Age and Prefers Baked Beans

Write a brief description of the relationship between the the respondent’s age and the preference for baked beans based on your calculations from task 13.

The results indicate a statistically significant difference in age between those who prefer beans and those who do not. On average, individuals who do not prefer beans are older than those who do prefer beans. The 95% confidence interval for the difference in means suggests that this age difference is both meaningful and unlikely due to chance.

15. Contingency Table of Prefers Poultry and Prefers Baked Beans

We are interested in whether someone who prefers poultry is more or less likely to also prefer baked beans than someone who does not prefer poultry. Construct a contingency table between the two dichotomous variables that you created indicating if someone prefers poultry and if someone prefers baked beans.

AnalyzeBivariateData %>%                            
  mutate(Prefers.Poultry=NA) %>%                               
  mutate(Prefers.Poultry=replace(Prefers.Poultry, Favorite.Meat > mean(Favorite.Meat), 1)) %>% 
  mutate(Prefers.Poultry=replace(Prefers.Poultry, Favorite.Meat <= mean(Favorite.Meat), 0)) -> AnalyzeBivariateData

AnalyzeBivariateData %>%                            
  mutate(Prefers.Beans=NA) %>%                               
  mutate(Prefers.Beans=replace(Prefers.Beans, Favorite.Side > mean(Favorite.Side), 1)) %>% 
  mutate(Prefers.Beans=replace(Prefers.Beans, Favorite.Side <= mean(Favorite.Side), 0)) -> AnalyzeBivariateData

Have prefers baked beans as the x (on the side) and prefers poultry as the y (across the top).

AnalyzeBivariateData %>%      
  tabyl(Prefers.Beans, Prefers.Poultry)%>%  
  adorn_title()

##                Prefers.Poultry   
##  Prefers.Beans               0  1
##              0             163 65
##              1              85 66

Include the variable names as titles for the top and side of the table.

AnalyzeBivariateData %>%  
  mutate(Prefers.Beans.Label=NA)  %>% 
  mutate(Prefers.Beans.Label=replace(Prefers.Beans.Label,Prefers.Beans==1,"Preferred")) %>%
  mutate(Prefers.Beans.Label=replace(Prefers.Beans.Label,Prefers.Beans==0,"Not Preferred")) %>%
  mutate(Prefers.Poultry.Label=NA)  %>% 
  mutate(Prefers.Poultry.Label=replace(Prefers.Poultry.Label,Prefers.Poultry==1,"Preferred")) %>%
  mutate(Prefers.Poultry.Label=replace(Prefers.Poultry.Label,Prefers.Poultry==0,"Not Preferred")) ->AnalyzeBivariateData

Construct a variable and use the variable to label the columns and rows.

AnalyzeBivariateData %>%
  tabyl(Prefers.Beans.Label,Prefers.Poultry.Label) %>%  
  adorn_title(row_name = "Prefers Beans", col_name = "Prefers Poultry")

##                Prefers Poultry          
##  Prefers Beans   Not Preferred Preferred
##  Not Preferred             163        65
##      Preferred              85        66

Calculate the percentages for the columns.

AnalyzeBivariateData %>% 
  tabyl(Prefers.Beans.Label, Prefers.Poultry.Label) %>%  
  adorn_percentages("col") %>%
  adorn_title()

##                      Prefers.Poultry.Label                 
##  Prefers.Beans.Label         Not Preferred        Preferred
##        Not Preferred     0.657258064516129 0.49618320610687
##            Preferred     0.342741935483871 0.50381679389313

Report the percentages to the 1st decimal place.

AnalyzeBivariateData %>% 
  tabyl(Prefers.Beans.Label, Prefers.Poultry.Label) %>%  
  adorn_percentages("col") %>%
    adorn_pct_formatting(digits = 1) %>%
  adorn_title()

##                      Prefers.Poultry.Label          
##  Prefers.Beans.Label         Not Preferred Preferred
##        Not Preferred                 65.7%     49.6%
##            Preferred                 34.3%     50.4%

Include the number of observations in each cell.

AnalyzeBivariateData %>%      
  tabyl(Prefers.Beans.Label, Prefers.Poultry.Label) %>%  
  adorn_percentages("col") %>%
    adorn_pct_formatting(digits = 1) %>%
  adorn_ns() %>%
  adorn_title()

##                      Prefers.Poultry.Label           
##  Prefers.Beans.Label         Not Preferred  Preferred
##        Not Preferred           65.7% (163) 49.6% (65)
##            Preferred           34.3%  (85) 50.4% (66)

16. Perform a \(\chi^2\) test on the contingency table.

Test to see if there is a relationship between preferring poultry and preferring baked beans.

chisq.test(AnalyzeBivariateData$Prefers.Beans,
               AnalyzeBivariateData$Prefers.Poultry)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  AnalyzeBivariateData$Prefers.Beans and AnalyzeBivariateData$Prefers.Poultry
## X-squared = 8.6192, df = 1, p-value = 0.003326

17. Describe the relationship between prefers poultry and prefers fries.

Write a brief description describing the relationship between preferring poultry and preferring baked beans using your results from tasks 15 and 16.

The results indicate a significant relationship between preferring poultry and preferring beans. This means that the two preferences are not independent. People’s preference for poultry is statistically associated with whether or not they prefer beans.

Publish Document

Click the “Knit” button to publish your work as an html document. This document or file will appear in the folder specified by your working directory. You will need to upload both this RMarkdown file and the html file it produces to AsU Learn to get all of the points associated with this lab.

PS/CJ 3115 Fall 2025 Analyzing Bivariate Relationships Lab Assignment

(Raequel Collins)

(11/7/2025)