---
title: "BUA 345 - Lecture 7"
subtitle: "Introduction to R/Rstudio in Posit Cloud and Review of Correlation"
author: "Penelope Pooler Eisenbies"
date: last-modified
lightbox: true
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
## Housekeeping
```{r setup, echo=FALSE, warning=F, message=F, include=F}
#| include: false
# this line specifies options for default options for all R Chunks
knitr::opts_chunk$set(echo=F)
# suppress scientific notation
options(scipen=100)
# install helper package that loads and installs other packages, if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
# install and load required packages
pacman::p_load(pacman,tidyverse, magrittr, olsrr, shadowtext, mapproj, knitr, kableExtra,
countrycode, usdata, maps, RColorBrewer, gridExtra, ggthemes, gt,
mosaicData, epiDisplay, vistributions, psych, tidyquant, dygraphs)
# verify packages
# p_loaded()
```
**HW 3 was due 2/2/2026**
**Sign up for a [FREE Posit Cloud Account](https://posit.cloud/plans/free){target="_blank"}**
### Today's plan
- Introduction to using R and RStudio
- Review of correlation, $R_{XY}$
#### Lecture 8 plan (Preview)
- Review of Simple Linear Regression
- Function vs. Model
- Examining Real Data
- Creating a Model
- Interpreting a Regression Model
##
### Lecture 7 In-class Exercise - Q1
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
Recall the Lecture 6 ‘Weather’ worksheet which is the ‘Lecture 7 Review Worksheet’.
<br>
The **first and second inputs** for the `VLOOKUP` command in cell H4, are **where the reference value is located** and **where the data to be searched are**.
<br>
**Which choice below contains the correct first and second inputs?**
**HINT: You may use `=FORMULATEXT(H4)` to check your answer.**
::: nonincremental
=VLOOKUP(H2, A2:E120,…
=VLOOKUP(H3, A1:E120,…
=VLOOKUP(H4, A2:E120,…
=VLOOKUP(H2, B1:E120,…
=VLOOKUP(H3, B2:E120,…
:::
## R and RStudio
- In this course we will use R and RStudio for the predictive analytics lectures.
- You will access R and RStudio through **Posit Cloud**.
- Sign up for a [Free Posit Cloud Account](https://posit.cloud/plans/free){target="_blank"}
- I will post R/RStudio files on Posit Cloud that you can access in provided links.
- I will also provide demo videos that show how to access files and complete exercises.
- NOTE: The free Posit Cloud account is limited to 25 hours per month.
- I demo how to download completed work so that you can use this allotment efficiently.
- We will also use Posit cloud for quiz questions of predictive analytics skills.
- For those who want to download R and RStudio (not required):
- There is an information page on my course website, [Installing R and RStudio](https://peneloopy.github.io/bua_345_sem/#installing-r-and-rstudio){target="_blank"}
##
### Opening a Posit Cloud Link
**Always click 'Save a Permanent Copy'** so you don't lose your work.
{fig-align="center"}
##
### Helpful Global Options - General
Click `Tools` \> `Global Options`. The next few slides are helpful reference but are not required.
:::::: columns
::: {.column width="38%"}
- On the `Save workspace...` line choose `Never`.
- Your work can still be saved by clicking
- `Ctrl + S` or `Cmd + S`.
:::
::: {.column width="4%"}
:::
::: {.column width="58%"}
{fig-align="center"}
:::
::::::
##
:::::: columns
::: {.column width="38%"}
### Helpful Global Options - Code
- On the `Editing` tab, select the `Use native pipe operator` option.
- On the `Display` tab, select all 3 options under `Syntax`.
:::
::: {.column width="4%"}
:::
::: {.column width="58%"}
{fig-align="center" height="2in"}
{fig-align="center" height="4.5in"}
:::
::::::
##
:::::: columns
::: {.column width="38%"}
### Helpful Global Options - Appearance
- The default white appearance can cause eye strain more quickly.
- You can choose a different `Editor Theme`.
- I prefer `Tomorrow Night Blue`.
:::
::: {.column width="4%"}
:::
::: {.column width="58%"}
{fig-align="center"}
:::
::::::
##
:::::: columns
::: {.column width="48%"}
### Helpful Global Options - R Markdown
- On the Basic tab, next to `Show in document outline` select `Sections and All Chunks`.
- On the `Visual` tab:
- check box next to `Show line numbers in code blocks`.
- next to `Editor content width (px)`, change the value to `1200`.
- When you're done selecting all options, click `OK` at the bottom.
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}
{fig-align="center"}
{fig-align="center"}
{fig-align="center"}
:::
::::::
##
### A brief Tour of the RStudio Screen and Panels
:::::: columns
::: {.column width="28%"}
When you open a provided project link you see
- the `Console` in the left panel
- the `Global Environment` in the upper right panel
- `Files` (and other options) in the lower right panel
:::
::: {.column width="4%"}
:::
::: {.column width="68%"}
{fig-align="center"}
- **OPTIONAL:** Clear Console by typing `Ctrl/Cmd` + `L`.
:::
::::::
##
### Appearance with Quarto (`.qmd`) File Open
Provided `.qmd files` appear in the upper left panel above the `Console`.
{fig-align="center"}
##
### Running the `Setup` Code Chunk
Whenever you begin working with a provided code file, click the `green triangle` in the `Setup` chunk to setup options and load and install packages.
{fig-align="center"}
{fig-align="center"}
## Review of Linear Correlations
- In your prerequisite course for BUA 345, you covered linear relationships between two or more quantitative variables.
<br>
- We will review this material this week while introducing R and RStudio.
<br>
- Often if we have two quantitative variables we want to understand the extent to which they are associated.
- The first step is often to plot the data using a scatterplot.
- We can also use quantitative measures of association to understand these relationships.
##
#### Grocery Sales per Sq. Ft. and Planned Store Openings
```{r}
grocery <- read_csv("data/grocery.csv", show_col_types = F)
(grocery_plot <- grocery |>
ggplot(aes(x=sales_sq_ft, y=openings, color=openings)) +
geom_point(size=4, show.legend = F) +
#geom_smooth(method = lm, color="red", se=F, linetype="dashed") +
labs(x="Sales per sq. foot", y="Planned Store Openings", title="Relationship between Grocery Sales and Expansion") +
theme(legend.position = "none",
plot.title = element_text(size = 20),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
legend.text = element_text(size = 12),
legend.title = element_text(size = 15)) +
theme_classic())
#ggsave("img/grocery_scatterplot.png", width=6, height=4)
```
## Understanding Linear Relationships
::::: columns
::: {.column width="50%"}
```{r}
kable(grocery)
```
:::
::: {.column width="50%"}
{fig-align="center"}
:::
:::::
## Direction of the Relationship
::::: columns
::: {.column width="50%"}
<br>
As X (sales per square feet) increases, Y (planned store openings) also increases.
<br>
When Y increases with X in an approximately linear fashion, that is a
- POSITIVE LINEAR RELATIONSHIP
- **The trend has a positive slope.**
:::
::: {.column width="50%"}
```{r}
(grocery_plot <- grocery |>
ggplot(aes(x=sales_sq_ft, y=openings, color=openings)) +
geom_point(size=4, show.legend = F) +
geom_smooth(method = lm, color="red", se=F, linetype="dashed") +
labs(x="Sales per sq. foot", y="Planned Store Openings", title="Relationship between Grocery Sales and Expansion") +
theme(legend.position = "none",
plot.title = element_text(size = 20),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
legend.text = element_text(size = 12),
legend.title = element_text(size = 15)) +
theme_classic())
#ggsave("img/grocery_scatterplot_w_line.png", width=6, height=4)
```
:::
:::::
## Strength of the Linear Relationship
::::::::: columns
:::: {.column width="50%"}
In addition to determining if there is a positive or negative relationship,
- We also want to quantify, how strong the relationship is.
<br>
::: fragment
To quantify the strength a linear relationship, we calculate:
:::
- Pearson's correlation coefficient, $R_{xy}$.
- $R_{xy} = 0.85$
- How do we interpret this value?
- ...Spoiler: This a strong positive correlation!
::::
:::::: {.column width="50%"}
{fig-align="center"}
<br>
::::: {.fragment .fade-in}
:::: {.fragment .grow}
::: {.fragment .shrink}
```{r echo=T}
cor(grocery$sales_sq_ft, grocery$openings)
```
:::
::::
:::::
::::::
:::::::::
##
### Interpreting $R_{xy}$, the correlation coefficient
$R_{xy}$ ranges from -1 to 1.
- The most extreme $R_{xy}$ values represent 'perfectly correlated data':
{fig-align="center" height="4.5in"}
## Very Strongly Correlated Data
$R_{xy} = 1$ or $R_{xy} = -1$ is unrealistic. These correlations are both strong and realistic:
{fig-align="center"}
##
### Range of $R_{xy}$ Guidelines for Interpretation
{fig-align="center"}
## Example of Negative Correlation
```{r}
urban_rural <- read_csv("data/Urban_Rural.csv", show_col_types = F) |>
filter(Year >= 1830)
(rural_plot <- urban_rural |>
ggplot(aes(x=Year, y=Rural_Pct, color=Rural_Pct)) +
geom_point(size=4, show.legend = F) +
labs(x="Year", y="Percent of People in Rural Areas",
title = "Transition Away from Rural Living in USA") +
theme(legend.position = "none",
plot.title = element_text(size = 20),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
legend.text = element_text(size = 12),
legend.title = element_text(size = 15)) +
scale_x_continuous(breaks=seq(1830, 2010,20)) +
theme_classic())
# ggsave("img/rural_pct_usa.png")
```
##
### Lecture 7 In-class Exercises - Q2
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
::::: columns
::: {.column width="50%"}
**What is the correlation between `Year` and `Rural_Pct` in the `urban_rural` dataset?**
Hint: This correlation is almost perfect.
Round answer to three decimal places.
:::
::: {.column width="50%"}
{fig-align="center"}
:::
:::::
##
### Correlation between Height and Mass in Starwars
What is the correlation between height and mass in the starwars data?
```{r, message=F}
my_starwars <- starwars |>
filter(mass <= 1000) # removes missing values and Jabba
(sw_plot <- my_starwars |>
ggplot(aes(x=height, y=mass)) +
geom_point(color="blue", size=3) +
geom_smooth(method = lm, color="red", se=F, linetype="dashed") +
labs(x="Height (cm)", y="Mass (kg)", title="Relationship between Height and Mass - Star Wars",
caption="Extreme Outlier, Jabba the Hut, excluded.") +
theme(legend.position = "none",
plot.title = element_text(size = 20),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
plot.caption = element_text(size=12)) +
theme_classic())
#ggsave("img/sw_height_mass.png")
```
##
### Lecture 7 In-class Exercise - Q3-Q4
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
:::::: columns
::: {.column width="40%"}
**Question 3.** What is the correlation between height and mass in the Star Wars dataset, `my_starwars`?
<br>
**Question 4.** How strong is this correlation based on the provided guidelines:

:::
::: {.column width="2%"}
:::
::: {.column width="58%"}
{fig-align="center"}
:::
::::::
## When NOT to use $R_{xy}$
$R_{xy}$ is only valid when examining linear relationships.
If the data have a curvilinear relationship, there are other tools that will be covered in other courses.
{fig-align="center"}
##
### Key Points from Today
- An introduction to R and RStudio in Posit Cloud.
- A review of linear associations between variables.
- We will continue this discussion in Lecture 8 on Thursday
- For now, you are expected to understand
- How to open provided files in Posit Cloud
- How to interpret a scatterplot
- Calculating $R_{xy}$ in R using the `cor` command in R
- Interpreting $R_{xy}$
- When NOT to use $R_{xy}$ to examine data associations
::: fragment
**HW 3 was due 2/2/2026 and HW 4 is due Wed. 2/11/2026**
**To submit an Engagement Question or Comment about material from Lecture 7:** Submit it by midnight today (day of lecture).
:::