Overview

This file contains a set of tasks that you need to complete in R for the lab assignment. The tasks may require you to add a code chuck, type code into a chunk, and/or execute code. Don’t forget that you need to acknowledge if you used any resources beyond class materials or got help to complete the assignment.

Instructions associated with this assignment can be found in the file “RegressionTutorial.html”.

The data set you will use is different than the one used in the instructions. Pay attention to the differences in the Excel files’ names, any variable names, or object names. You will need to adjust your code accordingly.

When asked to describe a relationship, your answer needs to directly engage with the statistical analysis you conducted. This instructions file provides detailed explanation of how to describe your results.

Once you have completed the assignment, you will need to knit this R Markdown file to produce an .html file. You will then need to upload the .html file and this .Rmd file to AsULearn.

1. Add your Name and the Date

The first thing you need to do in this file is to add your name and date in the lines underneath this document’s title.

2. Identify and Set Your Working Directory

You need to identify and set your working directory in this section. If you are working in the cloud version of RStudio, enter a note here to tell us that you did not need to change the working directory because you are working in the cloud.

setwd("/Users/corddoss/Desktop/Research methods class/Week 11")

3. Installing and Loading Packages and Data Set

You need to install and load the packages and data set you’ll use for the lab assignment in this section. In this lab, we will use the following packages: dplyr, tidyverse, forcats, ggplot2, janitor, texreg and openxlsx. You have not used the package texreg in previous labs, so make sure you install and load the package.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(forcats)
library(ggplot2)
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(openxlsx)
library(texreg)
## Version:  1.39.4
## Date:     2024-07-23
## Author:   Philip Leifeld (University of Manchester)
## 
## Consider submitting praise using the praise or praise_interactive functions.
## Please cite the JSS article in your publications -- see citation("texreg").
## 
## Attaching package: 'texreg'
## The following object is masked from 'package:tidyr':
## 
##     extract
BBQData <- read.xlsx("RegressionBBQ.xlsx")
names(BBQData)
##  [1] "Observation"        "Sex"                "Age"               
##  [4] "Hometown"           "Favorite.Meat"      "Favorite.Sauce"    
##  [7] "Sweetness"          "Favorite.Side"      "Restaurant.City"   
## [10] "Restaurant.Name"    "Minutes.Driving"    "Sandwich.Price"    
## [13] "Dinner.Plate.Price" "Ribs.Price"

4. Bivariate Regression Model of Rib Plate and Driving Distance

You want to know whether how far someone is willing to drive for BBQ influences the price they are willing to pay for a plate of ribs. Estimate a bivariate regression model, rounding your results to the 2nd decimal. Use your middle name as the name of the object where you store your regression results.

mae<- lm(Ribs.Price ~ Minutes.Driving, data = BBQData)
screenreg(mae, digits = 2)
## 
## ===========================
##                  Model 1   
## ---------------------------
## (Intercept)       21.05 ***
##                   (0.76)   
## Minutes.Driving    0.05 ***
##                   (0.01)   
## ---------------------------
## R^2                0.03    
## Adj. R^2           0.03    
## Num. obs.        379       
## ===========================
## *** p < 0.001; ** p < 0.01; * p < 0.05

5. Substantively Interpret the Relationship between Rib Plate and Driving Distance

Write a brief description of the relationship between Rib Plate and Driving. The slope coefficient is 0.05 which makes its relationship significant. We could say that for every additional minute soemones drives for BBQ, the total they are willing to spend on a rib plate increases by $0.05 in a positive trend. # 6. Substantively Interpret the x-intercept in the regression model of Rib Plate and Driving Distance. The x-intercept is 21.05, that tells us that when a respondent is willing to drive 0 minutes for good bbq, they are willing to pay $21.05 for a plate of ribs. The 3 stars indicate that the p-value for the intercept is smaller than 0.001 and is very significant. # 7. Estimate a Bivariate Regression Model of Driving Distance and Age You want to know whether someone’s age influences how far they are willing to drive for good BBQ. Estimate a bivariate regression model. Round your results so there is 1 digit after the decimal place. Use the name of your hometown as the name of the object where you store your regression results.

beaufort<- lm(Minutes.Driving ~ Age, data = BBQData)
screenreg(beaufort, digits = 1)
## 
## ======================
##              Model 1  
## ----------------------
## (Intercept)   33.6 ***
##               (4.2)   
## Age            0.2    
##               (0.1)   
## ----------------------
## R^2            0.0    
## Adj. R^2       0.0    
## Num. obs.    379      
## ======================
## *** p < 0.001; ** p < 0.01; * p < 0.05

8. Substantively Interpret the Relationship between Rib Plate and Driving Distance

The age coefficient is 0.2 which is not statistically significant. We cannot rule out a relationship of age and driving distance and leave it up to chance.

9.Substantively Interpret the x-intercept in the Regression model of Rib Plate and Driving Distance.

The x-intercept is 33.6, this tells us that when a respondent identifies their age as 0 they are willing to drive 33.6 minutes for good bbq. While it could be statistically significant it does not have a real world application # 10. Creating Dichotomous Variables You need to create three dichotomous variables based on existing variables in the data set in this section. The first should be named “Prefers.Eastern” and should take on a value of “1” if a respondent identified “Eastern Style (no tomato)” as their preferred type of BBQ sauce and a value of “0” if they did not. The second should be named “Prefers.HP” and should take on a value of “1” if a respondent identified hush puppies as their preferred side dish and a value of “0” if they did not. The third should be named “Pay.More” and should take on a value of “1” if a respondent is willing to pay above the average for dinner plate and and value of “0” if they are not.

BBQData %>%
  mutate(Prefers.Eastern=NA) %>%
  mutate(Prefers.Eastern=replace(Prefers.Eastern, Favorite.Sauce==1, 1)) %>%
  mutate(Prefers.Eastern=replace(Prefers.Eastern, Favorite.Sauce > 1, 0)) ->BBQData
BBQData %>%
  mutate(Prefers.HP=NA) %>%
  mutate(Prefers.HP=replace(Prefers.HP, Favorite.Side==4, 1)) %>%
  mutate(Prefers.HP=replace(Prefers.HP, Favorite.Side > 4, 0)) %>%
  mutate(Prefers.HP=replace(Prefers.HP, Favorite.Side < 4, 0)) ->BBQData
BBQData %>%
  mutate(Pay.More=NA) %>%
  mutate(Pay.More=replace(Pay.More, Dinner.Plate.Price > 18.50659631, 1)) %>%
  mutate(Pay.More=replace(Pay.More, Dinner.Plate.Price < 18.50659631, 0)) ->BBQData

11. Estimate a Multivariate Regression Model with Three Variables.

Estimate a multivariate regression with the variables driving distance, age, and a preference for hush puppies. You want to know if how far respondents are willing to drive is a function of their age and their love of hush puppies. Round your results so there are 3 digits after the decimal place. Use the name of your favorite BBQ side as the name of the object where you store your regression results.

beans<- lm(Minutes.Driving ~ Age + Prefers.HP, data = BBQData)
screenreg(beans, digits = 3)
## 
## ========================
##              Model 1    
## ------------------------
## (Intercept)   33.138 ***
##               (4.500)   
## Age            0.215    
##               (0.146)   
## Prefers.HP     1.112    
##               (4.043)   
## ------------------------
## R^2            0.006    
## Adj. R^2       0.000    
## Num. obs.    379        
## ========================
## *** p < 0.001; ** p < 0.01; * p < 0.05

12. Substantively Interpret the Regression Model in the Previous Question.

You do not need to interpret the x-intercept. The age coefficient tells us that for every extra year of age respondents are willing to drive 21.05 more minutes. However this is not statistically significantly and we cannot conclude a relationship that isn’t up to some chance. Those who responded saying they prefer hushpuppies are expected to drive 1.1 more minutes than those who don’t. This relationship is also not statistically significant. There is not an influence on driving willingness based on age and hush puppy prefernce. # 13. Estimate a Multivariate Regression Estimate a multivariate regression model with the variables for driving distance, age, preference for eastern style sauce, and being willing to pay above average price for a dinner plate. You want to know if the time someone is willing to spend driving is a function of their age, eastern sauce being their preferred sauce style, and if they are willing to pay a higher price for a BBQ plate. Come up with your own unique name for the object to store your regression results. Round to 4 digits.

daep<- lm(Minutes.Driving ~ Age + Prefers.Eastern + Pay.More, data = BBQData)
screenreg(daep, digits = 4)
## 
## =============================
##                  Model 1     
## -----------------------------
## (Intercept)       27.5658 ***
##                   (4.8680)   
## Age                0.2451    
##                   (0.1438)   
## Prefers.Eastern    1.1662    
##                   (3.5744)   
## Pay.More          11.0720 ** 
##                   (3.5917)   
## -----------------------------
## R^2                0.0303    
## Adj. R^2           0.0226    
## Num. obs.        379         
## =============================
## *** p < 0.001; ** p < 0.01; * p < 0.05

14. Substantively Interpret the Regression Model in the Previous Question.

You do not need to interpret the x-intercept. When it comes to age, the coefficient indicates us that for every extra year of age, respondents are willing to drive 0.245 minutes longer but the relationship isn’t statistically significant. Those who prefer Eastern style sauce adds 1.17 minutes but isnt statistically significant. But, the willingness to pay is statistically significant and adds about 11 minutes more in driving time. Age and preferring eastern sauce do not influence driving time but instead increase willingness to pay more for a bbq plate for a positive influence on driving time. # Publish Document Click the “Knit” button to publish your work as an html document. This document or file will appear in the folder specified by your working directory. You will need to upload both this RMarkdown file and the html file it produces to AsU Learn to get all of the points associated with this lab.