---
title: "BUA 345 - Lecture 6"
subtitle: "Data Queries"
author: "Penelope Pooler Eisenbies"
date: last-modified
lightbox: true
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
## Housekeeping
```{r setup, echo=FALSE, warning=F, message=F, include=F}
#| include: false
# this line specifies options for default options for all R Chunks
knitr::opts_chunk$set(echo=F)
# suppress scientific notation
options(scipen=100)
# install helper package that loads and installs other packages, if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
# install and load required packages
pacman::p_load(pacman,tidyverse, magrittr, olsrr, shadowtext, mapproj, knitr, kableExtra,
countrycode, usdata, maps, RColorBrewer, gridExtra, ggthemes, gt,
mosaicData, epiDisplay, vistributions, psych, tidyquant, dygraphs)
# verify packages
# p_loaded()
```
```{r weather, include=F, eval=F}
# check number of rows to skip
# leave one row above first row of data
# slice to number of days in the month
# for January slice to number of days with data
# select min max rain and snow columns
# fix data types in january and check all
#use bid rows to stack all four data sets
# october
oct25 <- read_csv("data/weather/oct_2025.csv",
show_col_types=F, skip=36) |>
slice(1:31) |>
dplyr::select(1:3,8:9) |> glimpse()
names(oct25) <- c("date", "max", "min", "precip", "snow")
# november
nov25 <- read_csv("data/weather/nov_2025.csv",
show_col_types=F, skip=36) |>
slice(1:30) |>
dplyr::select(1:3,8:9)
names(nov25) <- c("date", "max", "min", "precip", "snow")
# december
dec25 <- read_csv("data/weather/dec_2025.csv",
show_col_types=F, skip=36) |>
slice(1:31) |>
dplyr::select(1:3,8:9)
names(dec25) <- c("date", "max", "min", "precip", "snow")
weather <- bind_rows(oct25, nov25, dec25, jan26)
# january
# some data may be missing depending on date of lecture
jan26 <- read_csv("data/weather/jan_2026.csv",
show_col_types=F, skip=37) |>
slice(1:27) |>
dplyr::select(1:3,8:9)
names(jan26) <- c("date", "max", "min", "precip", "snow")
jan26 <- jan26 |>
mutate(max=as.numeric(max), min=as.numeric(min))
# stack all 4 datasets
# create date variable and convert precip and snow to numeric
# export to excel folder
weather <- bind_rows(oct25,nov25,dec25,jan26) |>
mutate(date = mdy(date),
precip = gsub("T", 0, precip, fixed=T) |> as.numeric() |> round(1),
snow = gsub("T", 0, snow, fixed=T) |> as.numeric() |> round(1)) |>
write_csv("Excel Files/Weather_25_26.csv")
```
**HW 3 is due 2/2/2026**
**Sign up for a [FREE Posit Cloud Account](https://posit.cloud/plans/free){target="_blank"} before Tuesday, 2/3/26**
### Today's plan
- Data Set Queries
- `VLOOKUP`
- `VLOOKUP` with Data Validation
- `VLOOKUP` with MATCH command
- `VLOOKUP` with Range option
##
### Lecture 6 In-class Exercise - Q1-Q2
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
The Pivot Table below shows the COUNTS of Titanic passengers in each class that did or did not survive the disaster.
{fig-align="center"}
**Question 1.** If we want to know the percent of passengers that survived (Yes) that were in second class, what option should we select for `Show Values As`?
Should we select `% of Column Total`, `% of Row Total` or `% of Grand Total`?
<br>
**Question 2.** If we want to know the percent of ALL passengers that were in third class and did not survive, what option should we select for `Show Values As`?
##
:::::::: columns
::: {.column width="48%"}
### Data Set Queries
- A useful skill is to be able to find specific observations quickly in a large database.
- Excel facilitates this process with these commands:
- VLOOKUP
- MATCH
- You are working for a Non-profit that offers celebrity autographs in exchange for donations, so you need to contact celebrities.
- This first worksheet only has 5 observations, but what if it had 200?
- You want to find a phone number quickly, just based on STAGE NAME.
:::
::: {.column width="4%"}
:::
::::: {.column width="48%"}
::: fragment
{fig-align="center" height="2.5in"}
:::
::: fragment
{fig-align="center" height="2.5in"}
:::
- We will create a `VLOOKUP` command to do this.
:::::
::::::::
## `VLOOKUP` Command
:::::::: columns
::: {.column width="48%"}
We will use `VLOOKUP` to automate a datebase search.
- We enter Bad Bunny in the cell next to ‘STAGE NAME’
- We create a `VLOOKUP` function to output correct phone number in cell next to ‘FAN PHONE NO.’
:::
::: {.column width="4%"}
:::
::::: {.column width="48%"}
::: fragment
{fig-align="center" height="2.5in"}
:::
::: fragment
{fig-align="center" height="2in"}
:::
:::::
::::::::
##
:::::: columns
::: {.column width="48%"}
### `VLOOKUP` inputs
- **`G2`:** Where reference value is entered (see Worksheet 1)
- **`B2:D6`:** Data range to be searched
- Note: VLOOKUP requires that Column with reference value is FIRST COLUMN (Index = 1)
- STAGE NAME (Column B) is first column
- **`3`:** Column index where query answer is.
- FAN PHONE NO is 3rd column counting from STAGE NAME
- **`FALSE`:** We enter FALSE to indicate we want an EXACT MATCH
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}
{fig-align="center" height="2.5in"}
{fig-align="center" height="2in"}
:::
::::::
## `VLOOKUP` Example 2
:::::: columns
::: {.column width="48%"}
- Now you have a different database (Worksheet 2) where STAGE NAME is listed in COLUMN A.
- Using this new database, we want to find Rihanna’s phone number
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}
{fig-align="center" height="2.5in"}
{fig-align="center" height="2in"}
:::
::::::
##
:::::: columns
::: {.column width="48%"}
### Updating the inputs
- **`G2`:** Where reference value is entered (see Worksheet 1)
- **`??:???`:** Data range to be searched
- Note: VLOOKUP requires that Column with reference value is FIRST COLUMN (Index = 1)
- STAGE NAME (**Column A**) is first column
- **`?`:** Column index where query answer is.
- FAN PHONE NO is 4th column counting from STAGE NAME
- **`FALSE`:** We enter FALSE to indicate we want an EXACT MATCH
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}
{fig-align="center" height="2.5in"}
{fig-align="center" height="2in"}
:::
::::::
##
### Lecture 6 In-class Exercise - Q3
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
:::::: columns
::: {.column width="48%"}
We want to use
`=VLOOKUP(G2, ?, ?, FALSE)` to find Rihanna’s phone number in the database in Worksheet 2.
**What should the missing inputs be?**
**HINT:** There are two correct choices and either choice will receive full credit.
- `…, B2:D6, 3, …`
- `…, A1:D6, 4, …`
- `…, A2:D6, 3, …`
- `…, A2:D6, 4, …`
- `…, B1:D6, 3, …`
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}
{fig-align="center" height="2.5in"}
{fig-align="center" height="2in"}
:::
::::::
## VLOOKUP
::::::: columns
::: {.column width="48%"}
### Error without Data Validation
Using the Worksheet 2 Database, we want to find Beyonce’s phone number phone number, but there’s a common problem.
- The accent in Beyonce’s name causes an error.
- How do we deal with this?
- We use ‘Data Validation’ to restrict options for `G2` to ONLY VALID entries.
:::
::: {.column width="4%"}
:::
:::: {.column width="48%"}
{fig-align="center" height="2.5in"}
- **Without Data Validation:**
::: fragment
{fig-align="center" height="1.5in"}
:::
::::
:::::::
##
::::::: columns
:::: {.column width="48%"}
### Adding Data Validation
1. Click cell `G2`
2. Select the `Data Validation` icon on the `Data` tab.
::: fragment
{fig-align="center"}
:::
3. Complete the pop-box as shown and click `OK`.
- Now cell G2 has a drop-down menu of correct values.
::::
::: {.column width="4%"}
:::
::: {.column width="48%"}
::: fragment
{fig-align="center"}
:::
::: fragment
{fig-align="center"}
:::
:::
:::::::
##
### Another Data Validation Example
:::::: columns
::: {.column width="48%"}
- Worksheet 3 contains Syracuse weather data from October 2025 to January 27, 2026.
- NOTE: Data were aggregated from the [National Weather Service](https://www.weather.gov/wrh/Climate?wfo=bgm){target="_blank"} and values of 'T' (trace amounts) were replaced with 0 for this example.
- We want to create a query that will output a value when the user specifies a date AND a weather variable.
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}
{fig-align="center"}
{fig-align="center"}
- Our first `VLOOKUP` command will have to be manually changed if we want to change the weather variable, e.g from snowfall to maximum temperature.
- We can automate the table functionality with data validation and an embedded `MATCH` command.
:::
::::::
##
:::::: columns
::: {.column width="48%"}
### Data Validation
- Next steps are to add data validation to cells H2 and H3
- We want a pull-down menu in H2 so that only one of the dates can be selected.
- We want a pull-down menu in H3 so that only one of the weather variables can be chosen.
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}
**Cell H2 Data Validation:**
{fig-align="center" height="3in"}
**Cell H3 Data Validation:**
{fig-align="center"}
:::
::::::
##
### Lecture 6 In-class Exercise - Q4
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
:::::: columns
::: {.column width="48%"}
The example `VLOOKUP` command on slide 12 is set up to show the snowfall for different dates.
If we want to know the minimum temperature for different dates, what would change in our `VLOOKUP` command?
**HINT:** The third input before `FALSE` is the only input that needs to change.
- 3rd input would change from 5 to 1.
- 3rd input would change from 5 to 2.
- 3rd input would change from 5 to 3.
- 3rd input would change from 5 to 4.
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}
{fig-align="center"}
**How do we correct the incorrect VLOOKUP below?**
{fig-align="center"}
:::
::::::
##
### Match Command
- We can automate the column choice by adding a `MATCH` command
::: fragment
**How do we AUTOMATE the lookup column in the VLOOKUP below?** We want the command to update automatically when we choose a new weather variable?
{fig-align="center"}
:::
::: fragment
** We REPLACE the third input with a `MATCH` command:**
{fig-align="center"}
:::
- **H3**: Input value we want to match
- **A1:E1**: Array of cells to search for input value
- **0**: Type of match. 0 indicates **EXACT MATCH**
##
### VLOOKUP with Range Option
- We begin this example by creating a range lookup table.
:::::::: columns
:::: {.column width="48%"}
- Here are the ranges for Hurricane categories.
::: fragment
{fig-align="center"}
:::
- We need to convert these to ranges that Excel can understand.
- Range lookup will only work with lower limits in ASCENDING ORDER.
::::
::: {.column width="4%"}
:::
:::: {.column width="48%"}
- Here are the CORRECT inputs for a Range Lookup table for Hurricane categories.
::: fragment
{fig-align="center"}
:::
- Update the table in columns D and E on Worksheet 4.
::::
::::::::
##
### Putting all the Skills Together
{fig-align="center"}
1. In Cell `B1`, we will add data validation so that we have a pull-down list of Hurricane names.
2. In Cell `B2`, we will create a `VLOOKUP` command that outputs the maximum wind speed.
3. In cell `B3` we will create a `VLOOKUP` with a range option that will output the category of the hurricaine.
##
### Lecture 6 In-class Exercise - Q5
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
<br>
Based on the completed query table, what category was Hurricane Rina?
<br>
Select Category 1, 2, 3, 4, or 5.
##
### Key Points from Today
- **Queries using VLOOKUP command**
- Useful for creating automated searches
- VLOOKUP very useful for LARGE data sets
- **VLOOKUP Query with ‘Data Validation’**
- Prevents user error due to typos or complex text (accents, etc.)
- **VLOOKUP with MATCH command**
- Enables creation of more complex automated queries
- **VLOOKUP with RANGE lookup**
- Outputs answers by determining interval that includes reference value entry
::: fragment
**HW 3 is due 2/2/2026**
**To submit an Engagement Question or Comment about material from Lecture 6:** Submit it by midnight today (day of lecture).
:::