This file contains a set of tasks that you need to complete in R for the lab assignment. The tasks may require you to add a code chuck, type code into a chunk, and/or execute code. Don’t forget that you need to acknowledge if you used any resources beyond class materials or got help to complete the assignment.
Instructions associated with this assignment can be found in the file “AnalyzingBivariateRelationshipsTutorial.html”.
The data set you will use is different than the one used in the tutorial. Pay attention to the differences in the Excel files’ names, any variable names, or object names. You will need to adjust your code accordingly.
When asked to describe a relationship, your answer needs to directly engage with the statistical analysis you conducted. Your discussion should include the following:
Once you have completed the assignment, you will need to knit this R Markdown file to produce an .html file. You will then need to upload the .html file and this .Rmd file to AsULearn.
The first thing you need to do in this file is to add your name and date in the lines underneath this document’s title.
You need to identify and set your working directory in this section. If you are working in the cloud version of RStudio, enter a note here to tell us that you did not need to change the working directory because you are working in the cloud.
getwd()
## [1] "/Users/rlmcollins/Desktop"
setwd("/Users/rlmcollins/Desktop")
You need to install and load the packages and data set you’ll use for
the lab assignment in this section. In this lab, we will use the
following packages: dplyr, tidyverse,
forcats, ggplot2, janitor and
openxlsx.
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ ggplot2 3.5.2 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.3.0
## ✔ purrr 1.1.0 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("openxlsx")
library("forcats")
library("ggplot2")
library("janitor")
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
AnalyzeBivariateData <- read.xlsx("BivariateRelationshipsData.xlsx")
names(AnalyzeBivariateData)
## [1] "Observation" "Sex" "Age"
## [4] "Hometown" "Favorite.Meat" "Favorite.Sauce"
## [7] "Sweetness" "Favorite.Side" "Restaurant.City"
## [10] "Restaurant.Name" "Minutes.Driving" "Sandwich.Price"
## [13] "Dinner.Plate.Price" "Ribs.Price"
You want to know if there is a relationship between the price someone is willing to pay for a plate of ribs and how far they are willing to drive. Calculate the Pearson’s correlation coefficient between the variables that identify how much respondents are willing to pay for a plate of ribs and how far they are willing to drive for good BBQ. Round the coefficient to the fourth decimal place.
cor(AnalyzeBivariateData$Ribs.Price,AnalyzeBivariateData$Minutes.Driving, use="pairwise.complete.obs", method="pearson")
## [1] 0.1795218
round(cor(AnalyzeBivariateData$Ribs.Price,AnalyzeBivariateData$Minutes.Driving, use="pairwise.complete.obs", method="pearson"),digits=4)
## [1] 0.1795
Calculate the test statistic and \(p\)-value for the correlation between Rib Plate and Driving Distance. Do not try to round the coefficient when calculating the significance.
cor.test(AnalyzeBivariateData$Ribs.Price,AnalyzeBivariateData$Minutes.Driving,use="pairwise.complete.obs", method="pearson")
##
## Pearson's product-moment correlation
##
## data: AnalyzeBivariateData$Ribs.Price and AnalyzeBivariateData$Minutes.Driving
## t = 3.5432, df = 377, p-value = 0.0004448
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08023825 0.27527828
## sample estimates:
## cor
## 0.1795218
Write a brief description of the relationship between Rib Plate and Driving Distance based on your calculations from tasks 4 and 5.
There is a positive relationship between Ribs Price and Minutes Driving. However, the correlation is weak, meaning the relationship exists but is not very strong. The 95% confidence interval suggests that the true correlation in the population is likely small but positive.
You want to know if there is a relationship between how far someone is willing to drive for good BBQ and their age. Calculate the Pearson’s correlation coefficient between the variables that identify how far someone is willing to drive for good BBQ and their age. Round the coefficient to the thrid decimal place.
cor(AnalyzeBivariateData$Minutes.Driving,AnalyzeBivariateData$Age, use="pairwise.complete.obs", method="pearson")
## [1] 0.07465359
round(cor(AnalyzeBivariateData$Minutes.Driving,AnalyzeBivariateData$Age, use="pairwise.complete.obs", method="pearson"),digits=3)
## [1] 0.075
Calculate the test statistic and \(p\)-value for the correlation between Driving Distance and Age. Do not try to round the coefficient when calculating the significance.
cor.test(AnalyzeBivariateData$Minutes.Driving,AnalyzeBivariateData$Age,use="pairwise.complete.obs", method="pearson")
##
## Pearson's product-moment correlation
##
## data: AnalyzeBivariateData$Minutes.Driving and AnalyzeBivariateData$Age
## t = 1.4536, df = 377, p-value = 0.1469
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.02627864 0.17407908
## sample estimates:
## cor
## 0.07465359
Write a brief description of the relationship between Rib Plate and Driving Distance based on your calculations from tasks 7 and 8.
The correlation between Minutes Driving and Age is weak and not statistically significant. This indicates that there is no meaningful linear relationship between how long participants drive and their age. The 95% confidence interval includes zero, further suggesting that age does not reliably predict minutes spent driving in this sample.
You need to create three dichotomous variables based on existing variables in the data set in this section. The first should be named “Prefers.Poultry” and should take on a value of “1” if a respondent identified poultry as their preferred type of BBQ meat and a value of “0” if they did not. The second should be named “Prefers.Beans” and should take on a value of “1” if a respondent identified baked beans as their preferred side dish and a value of “0” if they did not. The third should be named “Longer.Distances” and should take on a value of “1” if a respondent is willing to drive longer than average for good BBQ and and value of “0” if they are not.
AnalyzeBivariateData %>%
mutate(Prefers.Poultry=NA) %>%
mutate(Prefers.Poultry=replace(Prefers.Poultry, Favorite.Meat==5, 1)) %>%
mutate(Prefers.Poultry=replace(Prefers.Poultry, Favorite.Meat < 5, 0)) ->AnalyzeBivariateData
AnalyzeBivariateData %>%
mutate(Prefers.Beans=NA) %>%
mutate(Prefers.Beans=replace(Prefers.Beans, Favorite.Side==5, 1)) %>%
mutate(Prefers.Beans=replace(Prefers.Beans, Favorite.Side < 5, 0)) ->AnalyzeBivariateData
AnalyzeBivariateData %>%
mutate(Longer.Distances=NA) %>%
mutate(Longer.Distances=replace(Longer.Distances, Minutes.Driving==5, 1)) %>%
mutate(Longer.Distances=replace(Longer.Distances, Minutes.Driving < 5, 0)) ->AnalyzeBivariateData
We want to know if those who prefer poultry over other types of meat are willing to pay more for a dinner plate than those who do not prefer poultry. Perform a difference-of-means test between the price someone is willing to pay for a dinner plate and their preference for poultry.
t.test(Dinner.Plate.Price ~ Prefers.Poultry, data = AnalyzeBivariateData)
##
## Welch Two Sample t-test
##
## data: Dinner.Plate.Price by Prefers.Poultry
## t = 1.122, df = 47.113, p-value = 0.2676
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -1.026275 3.614791
## sample estimates:
## mean in group 0 mean in group 1
## 18.63636 17.34211
Write a brief description of the relationship between the price of a dinner plate and someone’s preference for poultry based your calculations from task 11.
The results indicate that there is no statistically significant difference in Dinner Plate Price between those who prefer poultry and those who do not. The average Dinner Plate Price was slightly higher for individuals who do not prefer poultry compared to those who do prefer poultry, but this difference is not statistically meaningful. The 95% confidence interval includes zero, further suggesting that any difference observed may be due to chance rather than a real effect.
We want to know if those who prefer baked beans are older than those who do not prefer baked beans. Perform a difference-of-means test between the variables that identify a respondent’s age and their preference for baked beans.
t.test(Age ~ Prefers.Beans, data = AnalyzeBivariateData)
##
## Welch Two Sample t-test
##
## data: Age by Prefers.Beans
## t = 6.7353, df = 100.63, p-value = 1.03e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 4.993434 9.163112
## sample estimates:
## mean in group 0 mean in group 1
## 27.38596 20.30769
Write a brief description of the relationship between the the respondent’s age and the preference for baked beans based on your calculations from task 13.
The results indicate a statistically significant difference in age between those who prefer beans and those who do not. On average, individuals who do not prefer beans are older than those who do prefer beans. The 95% confidence interval for the difference in means suggests that this age difference is both meaningful and unlikely due to chance.
We are interested in whether someone who prefers poultry is more or less likely to also prefer baked beans than someone who does not prefer poultry. Construct a contingency table between the two dichotomous variables that you created indicating if someone prefers poultry and if someone prefers baked beans.
AnalyzeBivariateData %>%
mutate(Prefers.Poultry=NA) %>%
mutate(Prefers.Poultry=replace(Prefers.Poultry, Favorite.Meat > mean(Favorite.Meat), 1)) %>%
mutate(Prefers.Poultry=replace(Prefers.Poultry, Favorite.Meat <= mean(Favorite.Meat), 0)) -> AnalyzeBivariateData
AnalyzeBivariateData %>%
mutate(Prefers.Beans=NA) %>%
mutate(Prefers.Beans=replace(Prefers.Beans, Favorite.Side > mean(Favorite.Side), 1)) %>%
mutate(Prefers.Beans=replace(Prefers.Beans, Favorite.Side <= mean(Favorite.Side), 0)) -> AnalyzeBivariateData
AnalyzeBivariateData %>%
tabyl(Prefers.Beans, Prefers.Poultry)%>%
adorn_title()
## Prefers.Poultry
## Prefers.Beans 0 1
## 0 163 65
## 1 85 66
AnalyzeBivariateData %>%
mutate(Prefers.Beans.Label=NA) %>%
mutate(Prefers.Beans.Label=replace(Prefers.Beans.Label,Prefers.Beans==1,"Preferred")) %>%
mutate(Prefers.Beans.Label=replace(Prefers.Beans.Label,Prefers.Beans==0,"Not Preferred")) %>%
mutate(Prefers.Poultry.Label=NA) %>%
mutate(Prefers.Poultry.Label=replace(Prefers.Poultry.Label,Prefers.Poultry==1,"Preferred")) %>%
mutate(Prefers.Poultry.Label=replace(Prefers.Poultry.Label,Prefers.Poultry==0,"Not Preferred")) ->AnalyzeBivariateData
AnalyzeBivariateData %>%
tabyl(Prefers.Beans.Label,Prefers.Poultry.Label) %>%
adorn_title(row_name = "Prefers Beans", col_name = "Prefers Poultry")
## Prefers Poultry
## Prefers Beans Not Preferred Preferred
## Not Preferred 163 65
## Preferred 85 66
AnalyzeBivariateData %>%
tabyl(Prefers.Beans.Label, Prefers.Poultry.Label) %>%
adorn_percentages("col") %>%
adorn_title()
## Prefers.Poultry.Label
## Prefers.Beans.Label Not Preferred Preferred
## Not Preferred 0.657258064516129 0.49618320610687
## Preferred 0.342741935483871 0.50381679389313
AnalyzeBivariateData %>%
tabyl(Prefers.Beans.Label, Prefers.Poultry.Label) %>%
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 1) %>%
adorn_title()
## Prefers.Poultry.Label
## Prefers.Beans.Label Not Preferred Preferred
## Not Preferred 65.7% 49.6%
## Preferred 34.3% 50.4%
AnalyzeBivariateData %>%
tabyl(Prefers.Beans.Label, Prefers.Poultry.Label) %>%
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 1) %>%
adorn_ns() %>%
adorn_title()
## Prefers.Poultry.Label
## Prefers.Beans.Label Not Preferred Preferred
## Not Preferred 65.7% (163) 49.6% (65)
## Preferred 34.3% (85) 50.4% (66)
Test to see if there is a relationship between preferring poultry and preferring baked beans.
chisq.test(AnalyzeBivariateData$Prefers.Beans,
AnalyzeBivariateData$Prefers.Poultry)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: AnalyzeBivariateData$Prefers.Beans and AnalyzeBivariateData$Prefers.Poultry
## X-squared = 8.6192, df = 1, p-value = 0.003326
Write a brief description describing the relationship between preferring poultry and preferring baked beans using your results from tasks 15 and 16.
The results indicate a significant relationship between preferring poultry and preferring beans. This means that the two preferences are not independent. People’s preference for poultry is statistically associated with whether or not they prefer beans.
Click the “Knit” button to publish your work as an html document. This document or file will appear in the folder specified by your working directory. You will need to upload both this RMarkdown file and the html file it produces to AsU Learn to get all of the points associated with this lab.