BUA 345 - Lecture 7

Introduction to R/Rstudio in Posit Cloud and Review of Correlation

Author

Penelope Pooler Eisenbies

Published

February 9, 2026

Housekeeping

HW 3 was due 2/2/2026

Sign up for a FREE Posit Cloud Account

Today’s plan

Introduction to using R and RStudio
Review of correlation, $R_{XY}$

Lecture 8 plan (Preview)

Review of Simple Linear Regression
- Function vs. Model
- Examining Real Data
- Creating a Model
- Interpreting a Regression Model

Lecture 7 In-class Exercise - Q1

Poll Everywhere - My User Name: penelopepoolereisenbies685

Recall the Lecture 6 ‘Weather’ worksheet which is the ‘Lecture 7 Review Worksheet’.

The first and second inputs for the VLOOKUP command in cell H4, are where the reference value is located and where the data to be searched are.

Which choice below contains the correct first and second inputs?

HINT: You may use =FORMULATEXT(H4) to check your answer.

=VLOOKUP(H2, A2:E120,…

=VLOOKUP(H3, A1:E120,…

=VLOOKUP(H4, A2:E120,…

=VLOOKUP(H2, B1:E120,…

=VLOOKUP(H3, B2:E120,…

R and RStudio

In this course we will use R and RStudio for the predictive analytics lectures.
You will access R and RStudio through Posit Cloud.
- Sign up for a Free Posit Cloud Account
I will post R/RStudio files on Posit Cloud that you can access in provided links.
I will also provide demo videos that show how to access files and complete exercises.
NOTE: The free Posit Cloud account is limited to 25 hours per month.
- I demo how to download completed work so that you can use this allotment efficiently.
We will also use Posit cloud for quiz questions of predictive analytics skills.
For those who want to download R and RStudio (not required):
- There is an information page on my course website, Installing R and RStudio

Opening a Posit Cloud Link

Always click ‘Save a Permanent Copy’ so you don’t lose your work.

Helpful Global Options - General

Click Tools > Global Options. The next few slides are helpful reference but are not required.

On the Save workspace... line choose Never.
Your work can still be saved by clicking
- Ctrl + S or Cmd + S.

Helpful Global Options - Code

On the Editing tab, select the Use native pipe operator option.
On the Display tab, select all 3 options under Syntax.

Helpful Global Options - Appearance

The default white appearance can cause eye strain more quickly.
You can choose a different Editor Theme.
- I prefer Tomorrow Night Blue.

Helpful Global Options - R Markdown

On the Basic tab, next to Show in document outline select Sections and All Chunks.
On the Visual tab:
- check box next to Show line numbers in code blocks.
- next to Editor content width (px), change the value to 1200.
When you’re done selecting all options, click OK at the bottom.

A brief Tour of the RStudio Screen and Panels

When you open a provided project link you see

the Console in the left panel
the Global Environment in the upper right panel
Files (and other options) in the lower right panel

OPTIONAL: Clear Console by typing Ctrl/Cmd + L.

Appearance with Quarto (`.qmd`) File Open

Provided .qmd files appear in the upper left panel above the Console.

Running the `Setup` Code Chunk

Whenever you begin working with a provided code file, click the green triangle in the Setup chunk to setup options and load and install packages.

Output from running the Setup chunk — Output from running the `Setup` chunk

Review of Linear Correlations

In your prerequisite course for BUA 345, you covered linear relationships between two or more quantitative variables.

We will review this material this week while introducing R and RStudio.

Often if we have two quantitative variables we want to understand the extent to which they are associated.
- The first step is often to plot the data using a scatterplot.
- We can also use quantitative measures of association to understand these relationships.

Grocery Sales per Sq. Ft. and Planned Store Openings

Understanding Linear Relationships

chain	sales_sq_ft	openings
Roundy’s	393	2
Weis Markets	325	3
Natural Grocers	419	5
Ingles	325	10
Kroger	496	15
Harris Teeter’s	442	20
Fresh Market	490	20
Sprouts Farmer’s Market	490	20
Publix	552	30
Whole Foods	937	38

Direction of the Relationship

As X (sales per square feet) increases, Y (planned store openings) also increases.

When Y increases with X in an approximately linear fashion, that is a

POSITIVE LINEAR RELATIONSHIP
- The trend has a positive slope.

`geom_smooth()` using formula = 'y ~ x'

Strength of the Linear Relationship

In addition to determining if there is a positive or negative relationship,

We also want to quantify, how strong the relationship is.

To quantify the strength a linear relationship, we calculate:

Pearson’s correlation coefficient, $R_{xy}$.
$R_{xy} = 0.85$
How do we interpret this value?
- …Spoiler: This a strong positive correlation!

Code

```{r echo=T}
cor(grocery$sales_sq_ft, grocery$openings)
```

[1] 0.8517842

Interpreting $R_{xy}$, the correlation coefficient

$R_{xy}$ ranges from -1 to 1.

The most extreme $R_{xy}$ values represent ‘perfectly correlated data’:

Very Strongly Correlated Data

$R_{xy} = 1$ or $R_{xy} = -1$ is unrealistic. These correlations are both strong and realistic:

Range of $R_{xy}$ Guidelines for Interpretation

Example of Negative Correlation

Lecture 7 In-class Exercises - Q2

Poll Everywhere - My User Name: penelopepoolereisenbies685

What is the correlation between Year and Rural_Pct in the urban_rural dataset?

Hint: This correlation is almost perfect.

Round answer to three decimal places.

Correlation between Height and Mass in Starwars

What is the correlation between height and mass in the starwars data?

Lecture 7 In-class Exercise - Q3-Q4

Poll Everywhere - My User Name: penelopepoolereisenbies685

Question 3. What is the correlation between height and mass in the Star Wars dataset, my_starwars?

Question 4. How strong is this correlation based on the provided guidelines:

When NOT to use $R_{xy}$

$R_{xy}$ is only valid when examining linear relationships.

If the data have a curvilinear relationship, there are other tools that will be covered in other courses.

Key Points from Today

An introduction to R and RStudio in Posit Cloud.
A review of linear associations between variables.
We will continue this discussion in Lecture 8 on Thursday
For now, you are expected to understand
- How to open provided files in Posit Cloud
- How to interpret a scatterplot
- Calculating $R_{xy}$ in R using the cor command in R
- Interpreting $R_{xy}$
- When NOT to use $R_{xy}$ to examine data associations

HW 3 was due 2/2/2026 and HW 4 is due Wed. 2/11/2026

To submit an Engagement Question or Comment about material from Lecture 7: Submit it by midnight today (day of lecture).

--- title: "BUA 345 - Lecture 7" subtitle: "Introduction to R/Rstudio in Posit Cloud and Review of Correlation" author: "Penelope Pooler Eisenbies" date: last-modified lightbox: true toc: true toc-depth: 3 toc-location: left toc-title: "Table of Contents" toc-expand: 1 format: html: code-line-numbers: true code-fold: true code-tools: true execute: echo: fenced --- ## Housekeeping ```{r setup, echo=FALSE, warning=F, message=F, include=F} #| include: false # this line specifies options for default options for all R Chunks knitr::opts_chunk$set(echo=F) # suppress scientific notation options(scipen=100) # install helper package that loads and installs other packages, if needed if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/") # install and load required packages pacman::p_load(pacman,tidyverse, magrittr, olsrr, shadowtext, mapproj, knitr, kableExtra, countrycode, usdata, maps, RColorBrewer, gridExtra, ggthemes, gt, mosaicData, epiDisplay, vistributions, psych, tidyquant, dygraphs) # verify packages # p_loaded() ``` **HW 3 was due 2/2/2026** **Sign up for a [FREE Posit Cloud Account](https://posit.cloud/plans/free){target="_blank"}** ### Today's plan - Introduction to using R and RStudio - Review of correlation, $R_{XY}$ #### Lecture 8 plan (Preview) - Review of Simple Linear Regression - Function vs. Model - Examining Real Data - Creating a Model - Interpreting a Regression Model ## ### Lecture 7 In-class Exercise - Q1 [***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685** Recall the Lecture 6 ‘Weather’ worksheet which is the ‘Lecture 7 Review Worksheet’. The **first and second inputs** for the `VLOOKUP` command in cell H4, are **where the reference value is located** and **where the data to be searched are**. **Which choice below contains the correct first and second inputs?** **HINT: You may use `=FORMULATEXT(H4)` to check your answer.** ::: nonincremental =VLOOKUP(H2, A2:E120,… =VLOOKUP(H3, A1:E120,… =VLOOKUP(H4, A2:E120,… =VLOOKUP(H2, B1:E120,… =VLOOKUP(H3, B2:E120,… ::: ## R and RStudio - In this course we will use R and RStudio for the predictive analytics lectures. - You will access R and RStudio through **Posit Cloud**. - Sign up for a [Free Posit Cloud Account](https://posit.cloud/plans/free){target="_blank"} - I will post R/RStudio files on Posit Cloud that you can access in provided links. - I will also provide demo videos that show how to access files and complete exercises. - NOTE: The free Posit Cloud account is limited to 25 hours per month. - I demo how to download completed work so that you can use this allotment efficiently. - We will also use Posit cloud for quiz questions of predictive analytics skills. - For those who want to download R and RStudio (not required): - There is an information page on my course website, [Installing R and RStudio](https://peneloopy.github.io/bua_345_sem/#installing-r-and-rstudio){target="_blank"} ## ### Opening a Posit Cloud Link **Always click 'Save a Permanent Copy'** so you don't lose your work. ![](img/l7_pocl1.png){fig-align="center"} ## ### Helpful Global Options - General Click `Tools` \> `Global Options`. The next few slides are helpful reference but are not required. :::::: columns ::: {.column width="38%"} - On the `Save workspace...` line choose `Never`. - Your work can still be saved by clicking - `Ctrl + S` or `Cmd + S`. ::: ::: {.column width="4%"} ::: ::: {.column width="58%"} ![](img/l7_pocl2.png){fig-align="center"} ::: :::::: ## :::::: columns ::: {.column width="38%"} ### Helpful Global Options - Code - On the `Editing` tab, select the `Use native pipe operator` option. - On the `Display` tab, select all 3 options under `Syntax`. ::: ::: {.column width="4%"} ::: ::: {.column width="58%"} ![](img/l7_pocl3.png){fig-align="center" height="2in"} ![](img/l7_pocl4.png){fig-align="center" height="4.5in"} ::: :::::: ## :::::: columns ::: {.column width="38%"} ### Helpful Global Options - Appearance - The default white appearance can cause eye strain more quickly. - You can choose a different `Editor Theme`. - I prefer `Tomorrow Night Blue`. ::: ::: {.column width="4%"} ::: ::: {.column width="58%"} ![](img/l7_pocl5.png){fig-align="center"} ::: :::::: ## :::::: columns ::: {.column width="48%"} ### Helpful Global Options - R Markdown - On the Basic tab, next to `Show in document outline` select `Sections and All Chunks`. - On the `Visual` tab: - check box next to `Show line numbers in code blocks`. - next to `Editor content width (px)`, change the value to `1200`. - When you're done selecting all options, click `OK` at the bottom. ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ![](img/l7_pocl6.png){fig-align="center"} ![](img/l7_pocl7.png){fig-align="center"} ![](img/l7_pocl8.png){fig-align="center"} ::: :::::: ## ### A brief Tour of the RStudio Screen and Panels :::::: columns ::: {.column width="28%"} When you open a provided project link you see - the `Console` in the left panel - the `Global Environment` in the upper right panel - `Files` (and other options) in the lower right panel ::: ::: {.column width="4%"} ::: ::: {.column width="68%"} ![](img/l7_pocl9.png){fig-align="center"} - **OPTIONAL:** Clear Console by typing `Ctrl/Cmd` + `L`. ::: :::::: ## ### Appearance with Quarto (`.qmd`) File Open Provided `.qmd files` appear in the upper left panel above the `Console`. ![](img/l7_pocl10.png){fig-align="center"} ## ### Running the `Setup` Code Chunk Whenever you begin working with a provided code file, click the `green triangle` in the `Setup` chunk to setup options and load and install packages. ![](img/l7_pocl11.png){fig-align="center"} ![Output from running the `Setup` chunk](img/l7_pocl12.png){fig-align="center"} ## Review of Linear Correlations - In your prerequisite course for BUA 345, you covered linear relationships between two or more quantitative variables. - We will review this material this week while introducing R and RStudio. - Often if we have two quantitative variables we want to understand the extent to which they are associated. - The first step is often to plot the data using a scatterplot. - We can also use quantitative measures of association to understand these relationships. ## #### Grocery Sales per Sq. Ft. and Planned Store Openings ```{r} grocery <- read_csv("data/grocery.csv", show_col_types = F) (grocery_plot <- grocery |> ggplot(aes(x=sales_sq_ft, y=openings, color=openings)) + geom_point(size=4, show.legend = F) + #geom_smooth(method = lm, color="red", se=F, linetype="dashed") + labs(x="Sales per sq. foot", y="Planned Store Openings", title="Relationship between Grocery Sales and Expansion") + theme(legend.position = "none", plot.title = element_text(size = 20), axis.title = element_text(size=18), axis.text = element_text(size=15), legend.text = element_text(size = 12), legend.title = element_text(size = 15)) + theme_classic()) #ggsave("img/grocery_scatterplot.png", width=6, height=4) ``` ## Understanding Linear Relationships ::::: columns ::: {.column width="50%"} ```{r} kable(grocery) ``` ::: ::: {.column width="50%"} ![](img/grocery_scatterplot.png){fig-align="center"} ::: ::::: ## Direction of the Relationship ::::: columns ::: {.column width="50%"} As X (sales per square feet) increases, Y (planned store openings) also increases. When Y increases with X in an approximately linear fashion, that is a - POSITIVE LINEAR RELATIONSHIP - **The trend has a positive slope.** ::: ::: {.column width="50%"} ```{r} (grocery_plot <- grocery |> ggplot(aes(x=sales_sq_ft, y=openings, color=openings)) + geom_point(size=4, show.legend = F) + geom_smooth(method = lm, color="red", se=F, linetype="dashed") + labs(x="Sales per sq. foot", y="Planned Store Openings", title="Relationship between Grocery Sales and Expansion") + theme(legend.position = "none", plot.title = element_text(size = 20), axis.title = element_text(size=18), axis.text = element_text(size=15), legend.text = element_text(size = 12), legend.title = element_text(size = 15)) + theme_classic()) #ggsave("img/grocery_scatterplot_w_line.png", width=6, height=4) ``` ::: ::::: ## Strength of the Linear Relationship ::::::::: columns :::: {.column width="50%"} In addition to determining if there is a positive or negative relationship, - We also want to quantify, how strong the relationship is. ::: fragment To quantify the strength a linear relationship, we calculate: ::: - Pearson's correlation coefficient, $R_{xy}$. - $R_{xy} = 0.85$ - How do we interpret this value? - ...Spoiler: This a strong positive correlation! :::: :::::: {.column width="50%"} ![](img/grocery_scatterplot_w_line.png){fig-align="center"} ::::: {.fragment .fade-in} :::: {.fragment .grow} ::: {.fragment .shrink} ```{r echo=T} cor(grocery$sales_sq_ft, grocery$openings) ``` ::: :::: ::::: :::::: ::::::::: ## ### Interpreting $R_{xy}$, the correlation coefficient $R_{xy}$ ranges from -1 to 1. - The most extreme $R_{xy}$ values represent 'perfectly correlated data': ![](img/perfect_cor.png){fig-align="center" height="4.5in"} ## Very Strongly Correlated Data $R_{xy} = 1$ or $R_{xy} = -1$ is unrealistic. These correlations are both strong and realistic: ![](img/very_strong_cor.png){fig-align="center"} ## ### Range of $R_{xy}$ Guidelines for Interpretation ![](img/range_of_cor.png){fig-align="center"} ## Example of Negative Correlation ```{r} urban_rural <- read_csv("data/Urban_Rural.csv", show_col_types = F) |> filter(Year >= 1830) (rural_plot <- urban_rural |> ggplot(aes(x=Year, y=Rural_Pct, color=Rural_Pct)) + geom_point(size=4, show.legend = F) + labs(x="Year", y="Percent of People in Rural Areas", title = "Transition Away from Rural Living in USA") + theme(legend.position = "none", plot.title = element_text(size = 20), axis.title = element_text(size=18), axis.text = element_text(size=15), legend.text = element_text(size = 12), legend.title = element_text(size = 15)) + scale_x_continuous(breaks=seq(1830, 2010,20)) + theme_classic()) # ggsave("img/rural_pct_usa.png") ``` ## ### Lecture 7 In-class Exercises - Q2 [***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685** ::::: columns ::: {.column width="50%"} **What is the correlation between `Year` and `Rural_Pct` in the `urban_rural` dataset?** Hint: This correlation is almost perfect. Round answer to three decimal places. ::: ::: {.column width="50%"} ![](img/rural_pct_usa.png){fig-align="center"} ::: ::::: ## ### Correlation between Height and Mass in Starwars What is the correlation between height and mass in the starwars data? ```{r, message=F} my_starwars <- starwars |> filter(mass <= 1000) # removes missing values and Jabba (sw_plot <- my_starwars |> ggplot(aes(x=height, y=mass)) + geom_point(color="blue", size=3) + geom_smooth(method = lm, color="red", se=F, linetype="dashed") + labs(x="Height (cm)", y="Mass (kg)", title="Relationship between Height and Mass - Star Wars", caption="Extreme Outlier, Jabba the Hut, excluded.") + theme(legend.position = "none", plot.title = element_text(size = 20), axis.title = element_text(size=18), axis.text = element_text(size=15), plot.caption = element_text(size=12)) + theme_classic()) #ggsave("img/sw_height_mass.png") ``` ## ### Lecture 7 In-class Exercise - Q3-Q4 [***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685** :::::: columns ::: {.column width="40%"} **Question 3.** What is the correlation between height and mass in the Star Wars dataset, `my_starwars`? **Question 4.** How strong is this correlation based on the provided guidelines: ![](img/l7_corguide.png) ::: ::: {.column width="2%"} ::: ::: {.column width="58%"} ![](img/sw_height_mass.png){fig-align="center"} ::: :::::: ## When NOT to use $R_{xy}$ $R_{xy}$ is only valid when examining linear relationships. If the data have a curvilinear relationship, there are other tools that will be covered in other courses. ![](img/dont_use_cor.png){fig-align="center"} ## ### Key Points from Today - An introduction to R and RStudio in Posit Cloud. - A review of linear associations between variables. - We will continue this discussion in Lecture 8 on Thursday - For now, you are expected to understand - How to open provided files in Posit Cloud - How to interpret a scatterplot - Calculating $R_{xy}$ in R using the `cor` command in R - Interpreting $R_{xy}$ - When NOT to use $R_{xy}$ to examine data associations ::: fragment **HW 3 was due 2/2/2026 and HW 4 is due Wed. 2/11/2026** **To submit an Engagement Question or Comment about material from Lecture 7:** Submit it by midnight today (day of lecture). :::

BUA 345 - Lecture 7

Housekeeping

Today’s plan

Lecture 8 plan (Preview)

Lecture 7 In-class Exercise - Q1

R and RStudio

Opening a Posit Cloud Link

Helpful Global Options - General

Helpful Global Options - Code

Helpful Global Options - Appearance

Helpful Global Options - R Markdown

A brief Tour of the RStudio Screen and Panels

Appearance with Quarto (`.qmd`) File Open

Running the `Setup` Code Chunk

Review of Linear Correlations

Grocery Sales per Sq. Ft. and Planned Store Openings

Understanding Linear Relationships

Direction of the Relationship

Strength of the Linear Relationship

Interpreting \(R_{xy}\), the correlation coefficient

Very Strongly Correlated Data

Range of \(R_{xy}\) Guidelines for Interpretation

Example of Negative Correlation

Lecture 7 In-class Exercises - Q2

Correlation between Height and Mass in Starwars

Lecture 7 In-class Exercise - Q3-Q4

When NOT to use \(R_{xy}\)

Key Points from Today

Housekeeping

Today’s plan

Lecture 8 plan (Preview)

Lecture 7 In-class Exercise - Q1

R and RStudio

Opening a Posit Cloud Link

Helpful Global Options - General

Helpful Global Options - Code

Helpful Global Options - Appearance

Helpful Global Options - R Markdown

A brief Tour of the RStudio Screen and Panels

Appearance with Quarto (.qmd) File Open

Running the Setup Code Chunk

Review of Linear Correlations

Grocery Sales per Sq. Ft. and Planned Store Openings

Understanding Linear Relationships

Direction of the Relationship

Strength of the Linear Relationship

Interpreting \(R_{xy}\), the correlation coefficient

Very Strongly Correlated Data

Range of \(R_{xy}\) Guidelines for Interpretation

Example of Negative Correlation

Lecture 7 In-class Exercises - Q2

Correlation between Height and Mass in Starwars

Lecture 7 In-class Exercise - Q3-Q4

When NOT to use \(R_{xy}\)

Key Points from Today

Appearance with Quarto (`.qmd`) File Open

Running the `Setup` Code Chunk