2025-11-04

What is Simple Linear Regression?

Simple Linear Regression…

  • Is a Statistical technique for modeling data between a dependent and independent variable

  • Finds the relationship between an observed value and a single predictor

  • Tries to minimize the deviation of the fit line from the data

With simple linear regression it’s possible to predict values for highly correlated data!

What follows are some examples of the use of Simple Linear Regression.

Falling Rubber Ducks

Here is a plot modeling the maximum negative vertical velocity (m/s) as a function of the height (m) of a rubber duck just before it hits the ground, on Earth and with random error.

kinematic formula with error: \(\text{V}_y = -\sqrt{2gh} + \varepsilon\)
fitted: \(\text{slope} = -0.01124734x - 369.705\)

Interest in “Rubber Duck” over 10 years

Here is a plot of Google Trends data showing interest in searches for “Rubber Duck” over the years 2015 to 2025 accessed from http://trends.google.com/trends/explore?date=2015-10-04%202025-11-04&geo=US&q=rubber%20duck&hl=en

The calculation for interest is described by Google as:

Numbers represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means there was not enough data for this term.

Included in the plot is a regression model that fits the data.

Plot of Interest in “Rubber Duck” over time on Google Search

Average Price of Rubber Duck

Here I took the opportunity to search for data on the price of rubber ducks over time, and I found that ebay.com provided some limited data on listing’s sale price every week for a custom date range. I also chose a filter for every kind of rubber duck represented in all collections found on the site.

Data obtained from: https://www.ebay.com/sh/research?marketplace=EBAY-US&keywords=Rubber+Duck&dayRange=1095&endDate=1762271987093&startDate=1667660387093&categoryId=0&offset=0&limit=50&tabName=SOLD&tz=America%2FNew_York

(You need an account to access it.)

Avg Price Over 3 Years

Regression Line: \(\text{slope} = 0.03151762x + 18.22896960\)

Code

Next, I will be providing the code for each of these plots, so that anyone who views this presentation can also try it out themselves. The language is R, and you’ll need a couple packages such as plotly and ggplot2.

Falling Ducks Code

x_dropHit <- seq(100,500, by=1.5)

kinform <- function(g,h, error_sd){ g = 9.8 h=h error_sd = 2 epsilon <- rnorm(n = length(h), mean = 0, sd = error_sd) mean_velocity <- -1sqrt(2g*h) new_func <- mean_velocity + epsilon new_func[h<=0] <- 0 return(new_func) }

maxVy <- function(x) { Vy <- c()

for (i in x) { result <- kinform(g=g, h=i, error_sd=error_sd) Vy <- append(Vy, result)

} return(Vy) }

Cy <- maxVy(x_dropHit)

cannonV <- data.frame(x=x_dropHit,y=Cy)

mod = lm(Cy ~ x_dropHit, data=cannonV) x = cannonV\(x; y = cannonV\)y

xax <- list( title = “Height Dropped (meters)”, titlefont = list(family=“Modern Computer Roman”) )

yax <- list( title = “Max (-)Vertical Velocity (m/s)”, titlefont = list(family=“Modern Computer Roman”), range = c(-150,0) )

fig <- plot_ly(x=x, y=y, type=“scatter”, mode=“markers”, name=“data”, width=700, height=350) %>% add_lines(x = x, y = fitted(mod), name=“fitted”) %>% layout(xaxis = xax, yaxis = yax) %>% layout(margin=list( l=150, r=50, b=20, t=40 ) ) config(fig, displaylogo=FALSE)

Interest in Ducks Code

idf <- read.csv(“multiTimeline.csv”, header=FALSE, comment.char=“#”) new_idf <- data.frame(seq(1,122,by=1),idf[2])

searchLine <- lm(V2 ~ seq.1..122..by…1., data=new_idf)

rDuckg <- ggplot(data= new_idf, aes(x=seq.1..122..by…1.,y=V2)) + geom_point() + geom_smooth(method=“lm”, formula = y ~ x) + scale_y_continuous(limits = c(0,100)) +

labs( title = “Search Interest for ‘Rubber Duck’”, x = “Month Index from 2015-10 to 2025-11”, y = “Relative Search Interest”

) rDuckg

Duck Price History Code

price_data <- read.csv(“price data.csv”, header=FALSE, sep=“;”) weeks <- seq(1,158,by=1) prices <- c(price_data[2]) duckPrice <- data.frame(weeks, prices)

priceLine <- lm(V2 ~ weeks, data=duckPrice)

duckPlot <- ggplot(data= duckPrice, aes(x=weeks,y=V2)) + geom_point() + geom_smooth(method=“lm”, formula = y ~ x) + scale_y_continuous(limits = c(0,35)) +

labs( title = “EBay Price History of Rubber Duck-items’”, x = “Weeks from Oct 31, 22’ to Nov 3, 25’”, y = “Average Price of Rubber Duck items ($)”

) duckPlot

Thanks for viewing!