2026-03-07

What is Correlation?

Correlation is when there is a statistical associaton or pattern where two variables change together. It can be modeled by the following equation.

\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} \] Here,
- \(r\) is the correlation coefficient.
- \(x_i\), \(y_i\) are individual observations.
- \(\bar{x}\), \(\bar{y}\) are the means of the variables.

Range

The range of r is as follows: \[ -1 \le r \le 1 \] - Where \(r = 1\) → perfect positive correlation
- \(r = -1\) → perfect negative correlation
- \(r = 0\) → no linear relationship

What is Causation?

Causation indicates that one event directly produces another event. This is different than correlation because one event, the cause, directly makes the other, the effect, happen. In correlation, the two events just appear together, one does not cause the other.

Correlation Example

This graph seems to show that shark attacks increase with ice cream sales, but that makes no logical sense.

Causation Example

In reality, shark attacks and ice cream sales both increase when the temperature increases in the summer time. There is a causation between hotter temperatures and shark attacks, as more people go swimming when it is hot out. This example is shown below

Both

This graph shows that increasing temperature is a driving factor for the increase in shark attacks and the increase in ice cream sales.

Plotly Code

ice_model <- lm(icecream_sales ~ Temp, data = data)
shark_model <- lm(shark_attacks ~ Temp, data = data)
data$ice_fit <- predict(ice_model)
data$shark_fit <- predict(shark_model)

plot_ly(data, x = ~Temp) %>%
  add_markers(y = ~icecream_sales, marker = list(color = '#B10DA1'),name = "Ice Cream Sales") %>%
  add_lines(y = ~ice_fit, 
  line = list(color = 'blue'),
  name = "Ice Cream Sales Trend") %>%
  add_markers(
    y = ~shark_attacks, 
    marker = list(color = '#00CC96'),
    name = "Shark Attacks") %>%
  add_lines(y = ~shark_fit, line = list(color = '#AF0038'),name = "Shark Attack Trend") %>%
  layout(
    title = "Temperature vs Ice Cream Sales and Shark Attacks",
    xaxis = list(title = "Temperature(F)"),
    yaxis = list(title = "Value")
  )