This file contains a set of tasks that you need to complete in R for the lab assignment. The tasks may require you to add a code chuck, type code into a chunk, and/or execute code. In this lab you will also need to describe your results. Don’t forget that you need to acknowledge if you used any resources beyond class materials or got help to complete the assignment.
Instructions associated with this assignment can be found in the file “DescribingDataTutorial.html”. You can find the code book associated with the BBQ data on the AsULearn.
The data set you will use is different than the one used in the instructions. Pay attention to the differences in the Excel files name, any variable names, or object names. You will need to adjust your code accordingly.
Once you have completed the assignment, you will need to knit this R Markdown file to produce an html file. You will then need to upload the .html file and this .Rmd file to AsULearn.
The first thing you need to do in this file is to add your name and date in the lines underneath this document’s title (see the code in lines 9 and 10).
You need to identify and set your working directory in this section. If you are working in the cloud version of RStudio, enter a note here to tell us that you did not need to change the working directory because you are working in the cloud.
getwd()
## [1] "/Users/summersimpson/Downloads/DescribingDataFall2025"
setwd("/Users/summersimpson/Downloads/DescribingDataFall2025")
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ ggplot2 3.5.2 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.3.0
## ✔ purrr 1.1.0 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("modeest")
library("openxlsx")
You need to install and load the packages and data set you’ll use for
the lab assignment in this section. In this lab, we will use the three
packages we have used in previous labs (dplyr
,
tidyverse
, and openxls
) and one new package
(modeest
). Remember, the first time you use a package you
need to install the package.
install.packages("modeest")
##
## The downloaded binary packages are in
## /var/folders/76/w01_ncvd5pn4r8v5nxz870fh0000gn/T//Rtmpx6CXcH/downloaded_packages
install.packages("readxl")
##
## The downloaded binary packages are in
## /var/folders/76/w01_ncvd5pn4r8v5nxz870fh0000gn/T//Rtmpx6CXcH/downloaded_packages
library(readxl)
DescribingData<- read_excel("DescribingDataAssignmentData.xlsx")
Display the names of the variables in your data set.
names(DescribingData)
## [1] "Observation" "Sex" "Age"
## [4] "Hometown" "Favorite Meat" "Favorite Sauce"
## [7] "Sweetness" "Favorite Side" "Restaurant City"
## [10] "Restaurant Name" "Minutes Driving" "Sandwich Price"
## [13] "Dinner Plate Price" "Ribs Price"
names("DescribingDataAssignmentData")
## NULL
Display the last 5 observations in the data set.
tail(DescribingData, 5)
## # A tibble: 5 × 14
## Observation Sex Age Hometown `Favorite Meat` `Favorite Sauce` Sweetness
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 644 1 19 1 1 4 1
## 2 645 1 38 7 6 5 2
## 3 646 2 20 2 2 7 2
## 4 647 1 44 4 2 7 2
## 5 648 2 22 1 5 5 2
## # ℹ 7 more variables: `Favorite Side` <dbl>, `Restaurant City` <chr>,
## # `Restaurant Name` <chr>, `Minutes Driving` <dbl>, `Sandwich Price` <dbl>,
## # `Dinner Plate Price` <dbl>, `Ribs Price` <dbl>
Choose one variable other than Dinner.Plate.Price and display all the observations for that variable.
DescribingData$Age
## [1] 21 18 20 20 22 21 22 20 19 21 21 25 43 22 23 46 23 21 18 18 52 21 54 38 21
## [26] 27 56 27 18 32 58 19 54 30 NA NA NA NA NA NA NA NA NA NA 20 18 28 21 23 22
## [51] 21 20 19 18 17 28 18 20 21 21 21 47 22 21 20 38 21 21 21 20 43 21 25 28 35
## [76] 23 19 27 28 26 23 20 21 21 20 20 22 21 19 20 20 NA NA NA NA NA NA NA NA NA
## [101] NA NA NA NA NA NA NA 22 54 33 20 54 22 21 18 23 22 23 19 54 25 19 20 48 19
## [126] 25 19 NA NA NA NA NA NA NA NA NA NA 20 54 54 23 34 26 21 23 22 19 49 25 19
## [151] 21 20 79 74 57 25 55 50 25 23 23 52 22 22 55 29 22 24 22 22 21 21 21 33 21
## [176] 21 22 36 21 21 22 21 30 27 20 37 14 22 20 20 20 22 38 23 21 21 19 21 84 21
## [201] 19 21 21 20 48 19 23 20 20 17 20 24 47 18 25 21 20 21 21 20 27 74 22 20 20
## [226] 43 37 44 24 23 22 41 23 73 62 21 16 19 20 25 48 48 19 60 22 27 21 26 21 27
## [251] 71 19 21 21 21 70 21 21 20 21 20 69 21 21 20 20 68 25 20 24 29 18 23 34 67
## [276] 21 25 22 20 20 19 66 21 20 21 18 21 21 19 65 20 21 18 23 20 64 20 20 20 20
## [301] 20 20 21 63 21 44 47 70 39 54 53 44 20 62 21 21 21 20 21 21 99 19 61 55 21
## [326] 69 21 21 20 60 20 20 21 20 26 53 53 21 59 51 20 32 44 18 58 22 18 21 20 20
## [351] 21 20 20 41 60 20 57 20 25 21 21 22 21 56 21 20 20 19 50 47 19 19 55 20 19
## [376] 16 14 17 20 11 54 18 19 20 21 53 21 22 19 21 20 19 21 21 19 52 21 21 60 62
## [401] 21 21 21 23 51 21 22 20 23 31 50 22 21 22 21 22 21 21 49 56 65 55 15 22 21
## [426] 21 22 48 21 48 49 22 17 47 20 21 20 21 21 21 10 21 46 25 20 20 21 22 19 20
## [451] 45 20 19 19 20 52 52 21 44 70 18 20 65 71 36 42 14 20 43 20 63 17 20 22 23
## [476] 21 19 20 23 20 20 42 21 50 21 22 20 41 20 19 21 19 20 20 40 23 21 20 20 21
## [501] 21 20 39 20 21 22 23 21 19 38 23 19 20 20 30 22 20 19 33 21 52 37 21 57 86
## [526] 21 35 19 20 21 20 30 36 20 52 21 19 21 18 34 20 20 19 19 18 18 33 20 18 19
## [551] 18 19 19 22 17 20 18 19 24 NA 21 32 21 52 20 19 53 21 21 19 21 19 31 19 57
## [576] 57 20 20 19 21 20 20 50 19 30 52 21 19 48 19 20 21 29 19 56 19 56 31 33 28
## [601] 21 22 20 19 22 21 28 19 21 59 19 53 24 24 55 26 25 24 22 21 26 23 50 25 24
## [626] 28 19 22 25 25 19 40 19 23 25 39 52 18 22 24 26 66 46 19 38 20 44 22
mean(DescribingData$`Dinner Plate Price`)
## [1] NA
mean(DescribingData$`Dinner Plate Price`, na.rm = TRUE)
## [1] 19.74563
You need to calculate the means for variables measuring 1) the price of a dinner plate, 2) preferred sweetness of sauce, 3) how long the respondent is willing to drive, and 4) the price of a rib plate. Calculate the means of each variable separate chunks of code (that is, you’ll need four distinct chunks of code). After each chunk of code, write a one sentence description of the mean. Don’t forget about missing data.
mean(DescribingData$`Dinner Plate Price`, na.rm = TRUE)
## [1] 19.74563
The mean for the price of a Dinner plate is around $19.75
mean(DescribingData$Sweetness, na.rm = TRUE)
## [1] 2.889922
The mean for the level of sweetness prefered is around 2.89
mean(DescribingData$`Minutes Driving`, na.rm = TRUE)
## [1] 41.71498
the mean for the amount of minutes someone would drive for BBQ is around 41.71 minutes.
mean(DescribingData$`Ribs Price`, na.rm = TRUE)
## [1] 23.54849
the mean price for ribs someone would pay is around $23.55. # 8. Rounding Recalculate the means, but round the calculated values. Again, use a separate chunk for each rounded mean. After each chunk of code, write a one sentence description of the mean. Don’t forget about missing data. Importantly, you need to round the means of the different variables to different decimal places.
round(mean(DescribingData$`Dinner Plate Price`, na.rm = TRUE),digits = 2)
## [1] 19.75
The average price of a dinner plate after rounding and excluding the NA answers is $19.75. - Sweetness of sauce should be rounded to the 1st decimal place.
round(mean(DescribingData$Sweetness, na.rm = TRUE),digits = 1)
## [1] 2.9
The average level of sweetness preferred, not counting those who
skipped this question is about 2.9.
- How long the respondent is willing to drive should be rounded to the
3rd decimal place.
round(mean(DescribingData$`Minutes Driving`, na.rm = TRUE),digits = 3)
## [1] 41.715
The average distance someone would drive for BBQ is 41.715 minutes, based only on those who answered. - The price of a rib plate should be rounded to the 2nd decimal place.
round(mean(DescribingData$`Ribs Price`, na.rm = TRUE),digits = 2)
## [1] 23.55
The average price someone would pay for a rib plate, after excluding those who skipped this question is $23.55.
You need to calculate and describe the medians of the variables measuring 1) age of the respondent, 2) how long the respondent is willing to drive for good BBQ, and 3) the price of a sandwich. Use a separate chunk of code for each variable. After each chunk of code write one sentence description of the median. Don’t forget about missing data.
median(DescribingData$Age, na.rm = TRUE)
## [1] 21
The median age of respondents is 21 years, which means half of the respondents are younger than 21 and half are older.
median(DescribingData$`Minutes Driving`, na.rm = TRUE)
## [1] 30
The median number of minutes respondents are willing to drive for good BBQ is 30 minutes, indicating that most people are willing to travel about this long.
median(DescribingData$`Sandwich Price`, na.rm = TRUE)
## [1] 15
The median price of a BBQ sandwich is $15, which means that half of the sandwiches cost less than this amount and half cost more.
You need to calculate and describe the modes of the variables for 1) favorite meat, 2) favorite sauce, and 3) favorite side. These are all categorical variables. Use a separate chunk of code for each variable. After each chunk of code write one sentence description of the mode.
When describing these results, you need to convert the numerical modes of the different variables into words according to the survey code book, which is available on AsU Learn.
mfv(DescribingData$`Favorite Meat`)
## [1] 1
This tells us the favorite meat was pulled pork.
mfv(DescribingData$`Favorite Sauce`)
## [1] 1
This tells us the favorite sauce was Eastern Style.
mfv(DescribingData$`Favorite Side`)
## [1] 4
This tells us the favorite side was Hush Puppies.
You need to calculate and describe the ranges, maximums, and minimums of the variables that identify respondents’ 1) ages, 2) rib price, and 3) how many minutes they would drive for BBQ. Use a separate chunk of code for each variable. After each chunk of code write a one sentence description of the minimum, maximum, and range.
min(DescribingData$Age, na.rm = TRUE)
## [1] 10
This shows that the minimum Age in the survey was 10 years old.
max(DescribingData$Age, na.rm = TRUE)
## [1] 99
This shows the max age of someone in the survey is 99 years old.
max(DescribingData$Age, na.rm = TRUE) - min(DescribingData$Age, na.rm = TRUE)
## [1] 89
This shows that the range of age is 89.
min(DescribingData$`Ribs Price`, na.rm = TRUE)
## [1] 0
this shows that someone is not paying any money to get Ribs.
max(DescribingData$`Ribs Price`, na.rm = TRUE)
## [1] 75
this shows the max price someone is paying for Ribs is $75.
max(DescribingData$`Ribs Price`, na.rm = TRUE)- min(DescribingData$`Ribs Price`, na.rm = TRUE)
## [1] 75
This shows that the range of price someone will pay for ribs is 75.
min(DescribingData$`Minutes Driving`, na.rm = TRUE)
## [1] 0
This shows that someone is not willing to make a drive to get BBQ.
max(DescribingData$`Minutes Driving`, na.rm = TRUE)
## [1] 500
This shows the maximum amount of minutes someone would drive for BBQ is 500 minutes.
max(DescribingData$`Minutes Driving`, na.rm = TRUE)- min(DescribingData$`Minutes Driving`, na.rm = TRUE)
## [1] 500
This shows that the range of minutes being driven for BBQ is 500 minutes.
You need to calculate and describe the standard deviation of the variables that identify 1) the number of minutes a respondent would drive for BBQ and 2) the price they would pay for a sandwich in this section.
sd(DescribingData$`Minutes Driving`, na.rm = TRUE)
## [1] 50.95685
The standard deviation for the number of minutes respondents are willing to drive for BBQ is 50.96 minutes, meaning that responses vary widely around the mean. Some people are willing to drive much longer, while others would drive much less.
sd(DescribingData$`Sandwich Price`, na.rm = TRUE)
## [1] 6.608642
The standard deviation for sandwich price is $6.61, showing how much respondents’ price expectations differ from the average price. # 13. Did you receive help? jacob Stockton
no one
Click the “Knit” button to publish your work as an html document. This document or file will appear in the folder specified by your working directory. You will need to upload both this RMarkdown file and the html file it produces to AsU Learn to get all of the lab points for this week.