Influence

Harold Nelson

2026-06-30

Intro

This is just a small set of exercises to develop your intuition on the topic of influence.

Setup

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.1     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.3     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

The Basic Data

Here’s some basic data to get started.

x = 1:20
y = 1:20
set.seed(123)
y = y + rnorm(20,sd = 2)
df = data.frame(x,y)

The Base Model

Create the model base using lm(). Store its coefficients in the vector base_coeff.

Solution

base = lm(y~x,data = df)
base_coeff = base$coefficients
base_coeff
## (Intercept)           x 
##   0.6201850   0.9679107

A Scatterplot.

Create a scatterplot of x and y with an lm smoother. Don’t start with the aes(). Put the aes() in the geom_point() and also in the geom_smooth().

Solution

df %>% 
  ggplot() + 
  geom_point(aes(x,y)) +
  geom_smooth(aes(x,y), method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

Modify y

Create a new variable y_ne (Northeast) in df. Modify the last value in y by adding 30 to the original value. Use ifelse(x == 20,…,…).

Examine the tail of df to verify your work.

Solution

df = df %>% 
  mutate(y_ne = ifelse(x == 20,y + 30,y))

tail(df)   
##     x        y     y_ne
## 15 15 13.88832 13.88832
## 16 16 19.57383 19.57383
## 17 17 17.99570 17.99570
## 18 18 14.06677 14.06677
## 19 19 20.40271 20.40271
## 20 20 19.05442 49.05442

The Northeast Model

Create the model ne using x and y_ne. Save its coefficients in ne_coeff.

Solution

ne = lm(y_ne~x,data = df)
ne_coeff = ne$coefficients
ne_coeff
## (Intercept)           x 
##   -2.379815    1.396482

Impact?

Create the vector ne_diff by subtracting base_coeff from ne_coeff.

ne_diff = ne_coeff - base_coeff
ne_diff
## (Intercept)           x 
##  -3.0000000   0.4285714

Conclusion

Complete the following sentence.

A positive outlier on the right …

Solution

A positive outlier on the right tilts the regression line up to the right, increasing the slope and decreasing the y-intercept.

Graphically

Modify the existing scatterplot as follows.

Add color = “Black” to the existing geoms outside the aes().

Add a geom_point for the x,y_ne points. Use an appropriate aes(). Add color = “Red”.

Add a geom_smooth() for the x,y_ne points. Make this red also.

Solution

df %>% 
     ggplot() + 
       geom_point(aes(x,y),color = "black") +
       geom_smooth(aes(x,y),method = "lm",color = "black") +
       geom_point(aes(x,y_ne)) +
       geom_smooth(aes(x,y_ne),method = "lm",color = "red")
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'