Housekeeping

HW 3 is due 2/3/2025

Sign up for a FREE Posit Cloud Account

Today’s plan 📋

  • Introduction to using R and RStudio

  • Review of correlation, \(R_{XY}\)

Lecture 8 plan (Preview) 📋

  • Review of Simple Linear Regression

    • Function vs. Model

    • Examining Real Data

    • Creating a Model

    • Interpreting a Regression Model

In-class Polling (Session ID: bua345s25)

💥 Lecture 7 In-class Exercise - Q1 💥

Recall the Lecture 6 ‘Weather’ worksheet which is the ‘Lecture 7 Review Worksheet’.

The first and second inputs for the VLOOKUP command in cell H4, are:

  • Where the input reference value is located
  • Where the data to be searched are

Which choice below contains the correct first and second inputs?

  • HINT: You may use =FORMULATEXT(H4) to check your answer.

=VLOOKUP(H2, A2:E91,…

=VLOOKUP(H3, A1:E91,…

=VLOOKUP(H4, A2:E91,…

=VLOOKUP(H2, B1:E91,…

=VLOOKUP(H3, B2:E91,…

=VLOOKUP(,H4, B2:E9,…

R and RStudio

  • In this course we will use R and RStudio for the predictive analytics lectures.

  • You will access R and RStudio through Posit Cloud.

  • I will post R/RStudio files on Posit Cloud that you can access in provided links.

  • I will also provide demo videos that show how to access files and complete exercises.

  • NOTE: The free Posit Cloud account is limited to 25 hours per month.

    • I demo how to download completed work so that you can use this allotment efficiently.
  • We will also use Posit cloud for quiz questions of predictive analytics skills.

  • For those who want to download R and RStudio (not required):

Always click ‘Save a Permanent Copy’ so you don’t lose your work.

Helpful Global Options - General

Click Tools > Global Options. The next few slides are helpful reference but are not required.

  • On the Save workspace... line choose Never.

  • Your work can still be saved by clicking

    • Ctrl + S or Cmd + S.

Helpful Global Options - Code

  • On the Editing tab, select the Use native pipe operator option.

  • On the Display tab, select all 3 options under Syntax.

Helpful Global Options - Appearance

  • The default white appearance can cause eye strain more quickly.

  • You can choose a different Editor Theme.

    • I prefer Tomorrow Night Blue.

Helpful Global Options - R Markdown

  • On the Basic tab, next to Show in document outline select Sections and All Chunks.

  • On the Visual tab:

    • check box next to Show line numbers in code blocks.

    • next to Editor content width (px), change the value to 900.

  • When you’re done selecting all options, click OK at the bottom.

A brief Tour of the Screen and Panels

When you open a provided project link you see

  • the Console in the left panel

  • the Global Environment in the upper right panel

  • Files (and other options) in the lower right panel

Appearance with Quarto (.qmd) File Open

Provided .qmd files appear in the upper left panel above the Console.

Running the Setup Code Chunk

Whenever you begin working with a provided code file, click the green triangle in the Setup chunk to setup options and load and install packages.

Output from running the Setup chunk

Output from running the Setup chunk

Review of Linear Correlations

  • In your prerequisite course for BUA 345, you covered linear relationships between two or more quantitative variables.


  • We will introduce the review this material this week while introducing R and RStudio.


  • Often if we have two quantitative variables we want to understand the extent to which they are associated.

    • The first step is often to plot the data using a scatterplot.

    • We can also use quantitative measures of association to understand these relationships.

Grocery Sales per Sq. Ft. and Planned Store Openings

Understanding Linear Relationships

chain sales_sq_ft openings
Roundy’s 393 2
Weis Markets 325 3
Natural Grocers 419 5
Ingles 325 10
Kroger 496 15
Harris Teeter’s 442 20
Fresh Market 490 20
Sprouts Farmer’s Market 490 20
Publix 552 30
Whole Foods 937 38

Direction of the Relationship


As X (sales per square feet) increases, Y (planned store openings) also increases.


When Y increases with X in an approximately linear fashion, that is a

  • POSITIVE LINEAR RELATIONSHIP

    • The trend has a positive slope.

Strength of the Linear Relationship

In addition to determining if there is a positive or negative relationship,

  • We also want to quantify, how strong the relationship is.


To quantify the strength a linear relationship, we calculate:

  • Pearson’s correlation coefficient, \(R_{xy}\).

  • \(R_{xy} = 0.85\)

  • How do we interpret this value?

    • …Spoiler: This a strong positive correlation!


cor(grocery$sales_sq_ft, grocery$openings)
[1] 0.8517842

Interpreting \(R_{xy}\), the correlation coefficient

\(R_{xy}\) ranges from -1 to 1.

  • The most extreme \(R_{xy}\) values represent ‘perfectly correlated data’:

Very Strongly Correlated Data

\(R_{xy} = 1\) or \(R_{xy} = -1\) is unrealistic. These correlations are both strong and realistic:

Range of \(R_{xy}\) Guidelines for Interpretation

Example of Negative Correlation

💥 Lecture 7 In-class Exercises - Q2 💥

What is the correlation between Year and Rural_Pct in the urban_rural dataset?

Hint: This correlation is almost perfect.

Round answer to three decimal places.

Correlation between Height and Mass in Starwars

What is the correlation between height and mass in the starwars data?

💥 Lecture 7 In-class Exercise - Q3-Q4 💥

Question 3. What is the correlation between height and mass in the Star Wars dataset, my_starwars?


Question 4. How strong is this correlation based on the provided guidelines:

When NOT to use \(R_{xy}\)

\(R_{xy}\) is only valid when examining linear relationships.

If the data have a curvilinear relationship, there are other tools that will be covered in other courses.