2024-02-02

Introduction

  • Simple linear regression attempts to find a linear function for x and y data points
  • The goal is to predict the values of the dependent (y) variable from the linear function
  • The basic formula is: \(y =\beta x + \alpha\) where \(\beta\) is the slope and \(\alpha\) is the y intercept
  • Once \(\alpha\) and \(\beta\) values are found, the formula can predict y values given new x values
  • The next slides will look at data sets and apply/show linear models to them

Women Height and Weight Data Set

head(women,15)
##    height weight
## 1      58    115
## 2      59    117
## 3      60    120
## 4      61    123
## 5      62    126
## 6      63    129
## 7      64    132
## 8      65    135
## 9      66    139
## 10     67    142
## 11     68    146
## 12     69    150
## 13     70    154
## 14     71    159
## 15     72    164

Women Height and Weight Plotted Data

womenPlot = ggplot(women,aes(height,weight)) + geom_point() +theme_bw()
womenPlot

Creating Linear Model

  • Just by looking at the plot in the previous slide it is clear that this is a good candidate for a linear model
  • In this case the linear model will be \(\text{weight} =\beta \times \text{height} + \alpha\). \(\alpha\) and \(\beta\) will be found by the linear model
  • lm() function in R can be used to create a linear model. The most simple format is: lm(y ~ x, datasetName)
model = lm(weight ~ height, women)
summary(model)$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) -87.51667  5.9369440 -14.74103 1.711082e-09
## height        3.45000  0.0911365  37.85531 1.090973e-14
  • From the above data \(\beta=3.45\) and \(\alpha=-87.51667\)

Implementing Linear Model

x = seq(50,80,0.1)
y = 3.45*x - 87.51667
head(y,7)
## [1] 84.98333 85.32833 85.67333 86.01833 86.36333 86.70833 87.05333
plot(x,y)

Plotting Linear Model Using Formula

womenPlot + geom_smooth(method = 'lm',formula = y~x, se=FALSE)

Plotting Linear Model Using geom_abline() Function

womenPlot + geom_abline(slope = 3.45, intercept = -87.51667,colour='blue')

Example of Poor Linear Model: Raw Data

Example of Poor Linear Model: Linear Model

model1 = lm(pressure ~ temperature, pressure)
summary(model1)$coefficients
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) -147.89887  66.552888 -2.222276 0.0401241285
## temperature    1.51242   0.315846  4.788472 0.0001709745

\(\text{pressure}= 1.51242\times \text{temperature} - 147.89887\)

Example of Poor Linear Model: Plot

pressurePlot %>% add_lines(x = pressure$temperature,y = fitted(model1),
                                                        name="Fit")
  • It is clear that the linear model does not represent the data well and would provide poor predictions