2025-03-16

Simple Linear Model

  • Models Correlation between two variables
  • Correlation =/= Causation
  • Creates a line that best fits the relationship between two variables
  • Fitted line has the least average distance between each residual
  • Fitted to following equation \(y = \beta_0 + \beta_1\cdot x\)

Linear Regression Model

  • The below example shows a strong relationship between petal length and width

Reliance and Strength of Model

  • The strength and reliance of the model is dependent on that of the r and r^2 values respectively
  • r, the correlation coefficient represents the strength and direction of a model, given by \(\beta_1 = r\cdot{Sy \over Sx}\)
    • Another interpretation, based off the equation, is that this is the ratio between the slope of the line and the spread of the slope
  • The r squared value, found by squaring the correlation coefficient, explains reliance giving the percent of residuals explained by the model
    • This shows how well the relationship is explained in linear regression, which can be compared to that of other models, like a parabolic model

Little to No Correlation

  • This graph shows a weak negative correlation
## `geom_smooth()` using formula = 'y ~ x'

R Values for this example

  • Using the command summary(), the r-square value comes to be below 1%, essentially stating there is no correlation
linreg2 = lm(Sepal.Length ~ Sepal.Width, data=iris)
summary(linreg2)
## 
## Call:
## lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5561 -0.6333 -0.1120  0.5579  2.2226 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   6.5262     0.4789   13.63   <2e-16 ***
## Sepal.Width  -0.2234     0.1551   -1.44    0.152    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8251 on 148 degrees of freedom
## Multiple R-squared:  0.01382,    Adjusted R-squared:  0.007159 
## F-statistic: 2.074 on 1 and 148 DF,  p-value: 0.1519

Negative Correlation

This next example also shows a weak negative correlation, but it is strong enough where this could be a real correlation

## `geom_smooth()` using formula = 'y ~ x'

Statistics for Example Three

This example shows a 20% r squared value, so although this is fairly weak, it is high enough to confirm some correlation

lingreg3 = lm(Wind ~ Temp, data=airquality)
summary(lingreg3)
## 
## Call:
## lm(formula = Wind ~ Temp, data = airquality)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.5784 -2.4489 -0.2261  1.9853  9.7398 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 23.23369    2.11239  10.999  < 2e-16 ***
## Temp        -0.17046    0.02693  -6.331 2.64e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.142 on 151 degrees of freedom
## Multiple R-squared:  0.2098, Adjusted R-squared:  0.2045 
## F-statistic: 40.08 on 1 and 151 DF,  p-value: 2.642e-09