A leverage point is a point whose x-value is distant from the other x-values. A point is a bad leverage point if its Y-value does not follow the pattern set by the other data points. In other words, a bad leverage point is a leverage point which is also an outlier. A good leverage point is a leverage point which is NOT also an outlier. If an observation has a response value that is very different from the predicted value based on a model, then that observation is called an outlier. For a good tutorial, (https://online.stat.psu.edu/stat501/lesson/11)
\(\hat{y_{i}}= \hat{\beta_{0}}+\hat{\beta_{1}}x_{i}\) where \(\hat{\beta_{0}}= \bar{y}-\hat{\beta_{1}}\bar{x}\) and \(\hat{\beta_{1}}=\sum_{j=1}^{n}c_{j}y_{j}\) where \(c_{j}= \frac{x_{j}-\bar{x}}{SXX}\) so that \[ \begin{eqnarray} \hat{y_{i}} & = & \bar{y}-\hat{\beta_{1}}\bar{x}+\hat{\beta_{1}}x_{i} \\ & = & \bar{y}+\hat{\beta_{1}}(x_{i}-\bar{x}) \\ & = & \frac{1}{n}\sum_{j=1}^{n}y_j+\sum_{j=1}^{n}\frac{x_{j}-\bar{x}}{SXX}y_{i}(x_{i}-\bar{x})\\ & = & \sum_{j=1}^{n}\lgroup\frac{1}{n}+\frac{(x_{i}-\bar{x})(x_{j}-\bar{x})}{SXX}\rgroup y_j \\ & = & \sum_{j=1}^{n}h_{ij}y_{j} \space where \\ h_{ij} & = & \lgroup\frac{1}{n}+\frac{(x_{i}-\bar{x})(x_{j}-\bar{x})}{SXX}\rgroup \end{eqnarray}\]
Notice that \[ \sum_{j=1}^{n}h_{ij} = \sum_{j=1}^{n}\lgroup\frac{1}{n}+\frac{(x_i-\bar{x})(x_j-\bar{x})}{SXX}\rgroup = \frac{n}{n}+\frac{(x_i-\bar{x})}{SXX}\sum_{j=i}^n\lgroup x_j-\bar{x}\rgroup = 1 \] since \(\sum_{j=i}^n\lgroup x_j-\bar{x}\rgroup = 0\)
We can express the predicted value, \(\hat{y_i}\) as \[\hat{y_i} = h_{ii}+\sum_{j\ne{i}}h_{ij}y_i \] where \[ \mathbf{h_{ii}=\frac{1}{n}+\frac{(x_i-\bar{x})^2}{\sum_{j=1}^{n}(x_j-\bar{x})^2}} \] The term \(h_{ii}\) is commonly called the leverage of the ith data point.
A popular rule is to classify \(x_i\) as a point of high leverage in a simple linear regression model if \[h_{ii} \gt 2\times{average(h_{ii})}=2\times2/n=4/n. \]
U of Universe, ssb@universe↩︎