La regresión es un modelo estadístico que establece la posible relación entre una o más variables explicativas de un conjunto de datos observados con una varaible explicada. Lo que se pretende es estimar la relación entre una variable de interés (explicada) respecto del valor de una o más variables explicativas. Todas las variables involucradas (explicativa o explicativas y la o las explicadas utilizadas) deben ser numéricas para este tipo de análisis.
\[ \begin{align} \vec{y} &= f\left({X}_{1},{X}_{1},\ldots,{X}_{p}\right)\\ &= \boldsymbol{X}\vec{\beta}+\vec{\varepsilon} \end{align} \]
En donde:
\(\vec{y}\) es un vertor de dimensión \(n{\times}1\) de respuestas
\(\boldsymbol{X}\) es una matriz \(n{\times}p\) de variables explicativas conocida como matriz de diseñno.
\(\vec{\beta}\) es un vector \(p{\times}1\) de parámetross.
\(\vec{\varepsilon}\) es un vector \(n{\times}1\) de errores aleatorios.
Se asume que \(\vec{\varepsilon}\) es un vector de variables aleatorias tal que:
\[E(\vec{\varepsilon}) = \vec{0}\]
\[V(\vec{\varepsilon}) = {\sigma}^{2}\boldsymbol{I}\]
De donde se tiene lo siguiente:
\[ \begin{align} E(\vec{y}) &= E(\boldsymbol{X}\vec{\beta}+\vec{\varepsilon})\\ &= \boldsymbol{X}\vec{\beta}+E(\vec{\varepsilon})\\ &= \boldsymbol{X}\vec{\beta}\\ \end{align} \]
\[ \begin{align} V(\vec{y}) &= V(\boldsymbol{X}\vec{\beta}+\vec{\varepsilon})\\ &= V(\vec{\varepsilon})\\ &= {\sigma}^{2}\boldsymbol{I} \end{align} \]
Interesa:
Estimar los parámetros asociados al modelo, si no es posible, estimar algunas funciones lineales de los parámetros.
Hacer predicciones de la variable respuesta.
Nota:
“Es necesario tener más observaciones que parámetros, si no entonces hay problemas de singularidad con las matrices \(n{>}p\)”.
La regresión lineal presenta la relación entre variables numéricas, más expresamente entre una variable explicada (\(y\)) y otra variable explicativa (\(x\)). Intenta por tanto, predecir el valor de una variable cuantitativa en relación a otra.
\[ \begin{align} \begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} &= \begin{bmatrix} {\beta}_{0}\\ {\beta}_{0}\\ \vdots\\ {\beta}_{0} \end{bmatrix} + \begin{bmatrix} {\beta}_{1}{x}_{1}\\ {\beta}_{1}{x}_{2}\\ \vdots\\ {\beta}_{1}{x}_{n} \end{bmatrix} + \begin{bmatrix} {\varepsilon}_{1}\\ {\varepsilon}_{2}\\ \vdots\\ {\varepsilon}_{n} \end{bmatrix} \end{align} \]
con \({\varepsilon}_{i}{\sim}N(0,{\sigma}^{2})\) o
\[ \begin{align} \begin{bmatrix} {\varepsilon}_{1}\\ {\varepsilon}_{2}\\ \vdots\\ {\varepsilon}_{n} \end{bmatrix} &\sim N\left( \begin{bmatrix} 0\\ 0\\ \vdots\\ 0\\ \end{bmatrix},{\sigma}^{2}\begin{bmatrix} 1 & 0 & \cdots & 0\\ 0 & 1 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & 1 \end{bmatrix} \right) \end{align} \]
El análisis de regresión es una herramienta estadística para investigar relaciones entre variables. Por lo general, el investigador trata de determinar el efecto causal de una variable sobre otra: el efecto de un aumento del precio sobre la demanda, por ejemplo, o el efecto de los cambios en la oferta monetaria sobre la tasa de inflación. Para explorar estas cuestiones, el investigador reúne datos sobre las variables subyacentes de interés y emplea la regresión para estimar el efecto cuantitativo de las variables causales sobre la variable que afectan. El investigador también suele evaluar la “significación estadística” de las relaciones estimadas, es decir, el grado de confianza de que la relación verdadera esté próxima a la relación estimada.
library(statsr)
## Loading required package: BayesFactor
## Loading required package: coda
## Loading required package: Matrix
## ************
## Welcome to BayesFactor 0.9.12-4.4. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
##
## Type BFManual() to open the manual.
## ************
datos <- data.frame(x = rnorm(10), y = rnorm(10))
En estadística, la regresión lineal o ajuste lineal es un modelo matemático usado para aproximar la relación entre una variable explicada \(\vec{y}\) y, \(p\) variables independientes \(\vec{x}_{i}\) con \(p{\in}{Z}^{p}\) y un término aleatorio \(\vec{\varepsilon}\). Este método es aplicable en muchas situaciones en las que se estudia la relación entre dos o más variables o predecir un comportamiento, algunas incluso sin relación con la tecnología. En caso de que no se pueda aplicar un modelo de regresión a un estudio, se dice que no hay correlación entre las variables estudiadas.
library(statsr)
plot_ss(x = x, y = y, data = datos)
## Click two points to make a line.
## Call:
## lm(formula = y ~ x, data = pts)
##
## Coefficients:
## (Intercept) x
## 0.08674 0.74037
##
## Sum of Squares: 5.317
Dado el modelo de regresión planteado de forma general
\[{y}_{i} = {\beta}_{0} + {\beta}_{1}{x}_{i} + {\varepsilon}_{i}\]
Se tiene, al despejar, que los errores son iguales a la diferencia
\[{\varepsilon}_{i} = {y}_{i} - {\beta}_{0} - {\beta}_{1}{x}_{i}\]
Y entonces la idea, para encontrar la recta de mejor ajuste es minimizar la suma de los errores o distancias verticales de las observaciones a la recta ajustada, por lo que la expresión a minimizar es:
\[ \begin{align} S\left({\beta}_{0}, {\beta}_{1}\right) &= \vec{\varepsilon}^{t}\vec{\varepsilon}\\ &= \begin{bmatrix} {y}_{1} - {\beta}_{0} + {\beta}_{1}{x}_{1}, & {y}_{2} - {\beta}_{0} + {\beta}_{1}{x}_{2}, & \cdots, & {y}_{n} - {\beta}_{0} + {\beta}_{1}{x}_{n} \end{bmatrix} \begin{bmatrix} {y}_{1} - {\beta}_{0} - {\beta}_{1}{x}_{1}\\ {y}_{2} - {\beta}_{0} - {\beta}_{1}{x}_{2}\\ \vdots\\ {y}_{n} - {\beta}_{0} - {\beta}_{1}{x}_{n} \end{bmatrix}\\ &= {\sum}_{i=1}^{n}\left({{y}_{i} - {\beta}_{0} - {\beta}_{1}{x}_{i}}\right)^{2} \end{align} \]
\[ \begin{align} \frac{{\partial}}{{\partial}{\beta}_{0}}S\left({\beta}_{0}, {\beta}_{1}\right) &= 2{\sum}_{i=1}^{n}\left({{y}_{i} - {\beta}_{0} - {\beta}_{1}{x}_{i}}\right)\left(-1\right)\\ &= -2{\sum}_{i=1}^{n}\left({{y}_{i} - {\beta}_{0} - {\beta}_{1}{x}_{i}}\right)\\ &= -2{{\sum}_{i=1}^{n}{y}_{i} + 2{\sum}_{i=1}^{n}{\beta}_{0} + 2{\beta}_{1}{\sum}_{i=1}^{n}{x}_{i}} \end{align} \]
\[ \begin{align} \frac{{\partial}}{{\partial}{\beta}_{1}}S\left({\beta}_{0}, {\beta}_{1}\right) &= 2{\sum}_{i=1}^{n}\left({{y}_{i} - {\beta}_{0} - {\beta}_{1}{x}_{i}}\right)\left(-{x}_{i}\right)\\ &= -2{\sum}_{i=1}^{n}{x}_{i}\left({{y}_{i} - {\beta}_{0} - {\beta}_{1}{x}_{i}}\right)\\ &= -2\left({\sum}_{i=1}^{n}{{x}_{i}{y}_{i} - {\beta}_{0}{x}_{i} - {\beta}_{1}{x}_{i}^{2}}\right)\\ &= -2{{\sum}_{i=1}^{n}{x}_{i}{y}_{i} + 2{\beta}_{0}{\sum}_{i=1}^{n}{x}_{i} + 2{\beta}_{1}{\sum}_{i=1}^{n}{x}_{i}^{2}}\\ \end{align} \]
Igualando a cero se obtiene lo siguiente:
\[ \begin{align} \frac{{\partial}}{{\partial}{\beta}_{0}}S({\beta}_{0}, {\beta}_{1}) = 0 &\rightarrow -2{{\sum}_{i=1}^{n}{y}_{i} + 2\widehat{\beta}_{0}{\sum}_{i=1}^{n}1 + 2\widehat{\beta}_{1}{\sum}_{i=1}^{n}{x}_{i}} = 0\\ &\rightarrow {-{\sum}_{i=1}^{n}{y}_{i} + \widehat{\beta}_{0}{\sum}_{i=1}^{n}1 + \widehat{\beta}_{1}{\sum}_{i=1}^{n}{x}_{i}} = 0\\ &\rightarrow {\widehat{\beta}_{0}{\sum}_{i=1}^{n}1 = {\sum}_{i=1}^{n}{y}_{i} - \widehat{\beta}_{1}{\sum}_{i=1}^{n}{x}_{i}}\\ &\rightarrow {\widehat{\beta}_{0} = \frac{{\sum}_{i=1}^{n}{y}_{i}}{{\sum}_{i=1}^{n}1} - \widehat{\beta}_{1}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}1}}\\ &\rightarrow {\widehat{\beta}_{0} = \frac{{\sum}_{i=1}^{n}{y}_{i}}{n} - \widehat{\beta}_{1}\frac{{\sum}_{i=1}^{n}{x}_{i}}{n}}\\ &\rightarrow {\widehat{\beta}_{0} = \overline{y} - \widehat{\beta}_{1}\overline{x}} \end{align} \]
\[ \begin{align} \frac{{\partial}}{{\partial}{\beta}_{1}}S({\beta}_{0}, {\beta}_{1}) = 0 &\rightarrow -2{{\sum}_{i=1}^{n}{x}_{i}{y}_{i} + 2\widehat{\beta}_{0}{\sum}_{i=1}^{n}{x}_{i} + 2\widehat{\beta}_{1}{\sum}_{i=1}^{n}{x}_{i}^{2}} = 0\\ &\rightarrow {-{\sum}_{i=1}^{n}{x}_{i}{y}_{i} + \widehat{\beta}_{0}{\sum}_{i=1}^{n}{x}_{i} + \widehat{\beta}_{1}{\sum}_{i=1}^{n}{x}_{i}^{2}} = 0\\ &\rightarrow {\widehat{\beta}_{1}{\sum}_{i=1}^{n}{x}_{i}^{2} = {\sum}_{i=1}^{n}{x}_{i}{y}_{i} - \widehat{\beta}_{0}{\sum}_{i=1}^{n}{x}_{i}}\\ &\rightarrow {\widehat{\beta}_{1} = \frac{{\sum}_{i=1}^{n}{x}_{i}{y}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \widehat{\beta}_{0}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}}\\ &\rightarrow {\widehat{\beta}_{1} = \frac{{\sum}_{i=1}^{n}{x}_{i}{y}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \left(\overline{y} - \widehat{\beta}_{1}\overline{x}\right)\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}}\\ &\rightarrow {\widehat{\beta}_{1} = \frac{{\sum}_{i=1}^{n}{x}_{i}{y}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \overline{y}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} + \widehat{\beta}_{1}\overline{x}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}}\\ &\rightarrow {\widehat{\beta}_{1} - \widehat{\beta}_{1}\overline{x}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} = \frac{{\sum}_{i=1}^{n}{x}_{i}{y}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \overline{y}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}}\\ &\rightarrow {\widehat{\beta}_{1}\left(1 - \overline{x}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}\right) = \frac{{\sum}_{i=1}^{n}{x}_{i}{y}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \overline{y}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}}\\ &\rightarrow \widehat{\beta}_{1} = \frac{\frac{{\sum}_{i=1}^{n}{x}_{i}{y}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \overline{y}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}}{1 - \overline{x}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}}\\ &\rightarrow \widehat{\beta}_{1} = \frac{\frac{{\sum}_{i=1}^{n}{x}_{i}{y}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \overline{y}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}}{\frac{{\sum}_{i=1}^{n}{x}_{i}^{2}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \overline{x}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}}\\ &\rightarrow \widehat{\beta}_{1} = \frac{{\sum}_{i=1}^{n}{x}_{i}{y}_{i} - \overline{y}{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2} - \overline{x}{\sum}_{i=1}^{n}{x}_{i}}\\ &\rightarrow \widehat{\beta}_{1} = \frac{{\sum}_{i=1}^{n}{x}_{i}{y}_{i} - n\frac{{\sum}_{i=1}^{n}{y}_{i}}{n}\overline{x}}{{\sum}_{i=1}^{n}{x}_{i}^{2} - \frac{{\sum}_{i=1}^{n}{x}_{i}}{n}{{\sum}_{i=1}^{n}{x}_{i}}}\\ &\rightarrow \widehat{\beta}_{1} = \frac{{\sum}_{i=1}^{n}{x}_{i}{y}_{i} - n\overline{y}\overline{x}}{{\sum}_{i=1}^{n}{x}_{i}^{2} - \frac{1}{n}\left({{\sum}_{i=1}^{n}{x}_{i}}\right)^{2}}\\ &\rightarrow \widehat{\beta}_{1} = \frac{\frac{1}{n}{\sum}_{i=1}^{n}{x}_{i}{y}_{i} - \overline{y}\overline{x}}{\frac{1}{n}{\sum}_{i=1}^{n}{x}_{i}^{2} - \left(\frac{{{\sum}_{i=1}^{n}{x}_{i}}}{n}\right)^{2}}\\ &\rightarrow \widehat{\beta}_{1} = \frac{{S}_{x,y}}{{S}_{x,x}} \end{align} \]
con \({y}_{i}{\sim}N({\beta}_{0} + {\beta}_{1}{x}_{i},{\sigma}^{2})\) o
\[ \begin{align} \begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} &\sim N\left( \begin{bmatrix} {\beta}_{0} + {\beta}_{1}{x}_{1}\\ {\beta}_{0} + {\beta}_{1}{x}_{2}\\ \vdots\\ {\beta}_{0} + {\beta}_{1}{x}_{n}\\ \end{bmatrix},{\sigma}^{2}\begin{bmatrix} 1 & 0 & \cdots & 0\\ 0 & 1 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & 1 \end{bmatrix} \right) \end{align} \]
\[ \begin{align} \mathbb{L}\left({\beta}_{0}, {\beta}_{1}, {\sigma}^{2}|{x}_{1}, {x}_{2}, \cdots, {x}_{n}, {y}_{1}, {y}_{2}, \cdots, {y}_{n}\right) &= {\prod}_{i=1}^{n}{\frac{1}{\sqrt{2{\pi}{\sigma}^{2}}}}\exp{\left\{-\frac{1}{2{\sigma}^{2}}\left({y}_{i} - {\beta}_{0} - {\beta}_{1}{x}_{i}\right)^{2}\right\}}\\ &= {\left(\frac{1}{\sqrt{2{\pi}{\sigma}^{2}}}\right)^{n}}\exp{\left\{-\frac{1}{2{\sigma}^{2}}{\sum}_{i=1}^{n}\left({y}_{i} - {\beta}_{0} - {\beta}_{1}{x}_{i}\right)^{2}\right\}}\\ &= {\frac{1}{\left({2{\pi}{\sigma}^{2}}\right)^\frac{n}{2}}}\exp{\left\{-\frac{1}{2{\sigma}^{2}}{\sum}_{i=1}^{n}\left({y}_{i} - {\beta}_{0} - {\beta}_{1}{x}_{i}\right)^{2}\right\}} \end{align} \]
\[ \begin{align} \ln\left[{\mathbb{L}\left({\beta}_{0}, {\beta}_{1}, {\sigma}^{2}|{x}_{1}, {x}_{2}, \cdots, {x}_{n}, {y}_{1}, {y}_{2}, \cdots, {y}_{n}\right)}\right] &= \ln\left[{\frac{1}{\left({2{\pi}{\sigma}^{2}}\right)^\frac{n}{2}}}\exp{\left\{-\frac{1}{2{\sigma}^{2}}{\sum}_{i=1}^{n}\left({y}_{i} - {\beta}_{0} - {\beta}_{1}{x}_{i}\right)^{2}\right\}}\right]\\ &= \ln\left[{{\left({2{\pi}{\sigma}^{2}}\right)^{-\frac{n}{2}}}}\right]{-\frac{1}{2{\sigma}^{2}}{\sum}_{i=1}^{n}\left({y}_{i} - {\beta}_{0} - {\beta}_{1}{x}_{i}\right)^{2}}\\ &= {-\frac{n}{2}}\ln{{\left({2{\pi}{\sigma}^{2}}\right)}}{-\frac{1}{2{\sigma}^{2}}{\sum}_{i=1}^{n}\left({y}_{i} - {\beta}_{0} - {\beta}_{1}{x}_{i}\right)^{2}}\\ \end{align} \]
\[ \begin{align} \frac{{\partial}}{{\partial}{\beta}_{0}}\ln\left[{\mathbb{L}\left({\beta}_{0}, {\beta}_{1}, {\sigma}^{2}|{x}_{1}, {x}_{2}, \cdots, {x}_{n}, {y}_{1}, {y}_{2}, \cdots, {y}_{n}\right)}\right] = 0 &{\rightarrow} {-2\frac{1}{2\widehat{\sigma}^{2}}{\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)}\left(-1\right) = 0\\ &{\rightarrow} {\frac{1}{2\widehat{\sigma}^{2}}{\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)} = 0\\ &{\rightarrow} {{\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)} = 0\\ &{\rightarrow} {{\sum}_{i=1}^{n}{y}_{i} - {\sum}_{i=1}^{n}\widehat{\beta}_{0} - {\sum}_{i=1}^{n}\widehat{\beta}_{1}{x}_{i}} = 0\\ &{\rightarrow} {{\sum}_{i=1}^{n}{y}_{i} - {\sum}_{i=1}^{n}\widehat{\beta}_{1}{x}_{i}} = {\sum}_{i=1}^{n}\widehat{\beta}_{0}\\ &{\rightarrow} {{\sum}_{i=1}^{n}{y}_{i} - \widehat{\beta}_{1}{\sum}_{i=1}^{n}{x}_{i}} = {n}\widehat{\beta}_{0}\\ &{\rightarrow} {\frac{{\sum}_{i=1}^{n}{y}_{i}}{n} - \widehat{\beta}_{1}\frac{{\sum}_{i=1}^{n}{x}_{i}}{n}} = \widehat{\beta}_{0}\\ &{\rightarrow} {\overline{y} - \widehat{\beta}_{1}\overline{x}} = \widehat{\beta}_{0} \end{align} \]
\[ \begin{align} \frac{{\partial}}{{\partial}{\beta}_{1}}\ln\left[{\mathbb{L}\left({\beta}_{0}, {\beta}_{1}, {\sigma}^{2}|{x}_{1}, {x}_{2}, \cdots, {x}_{n}, {y}_{1}, {y}_{2}, \cdots, {y}_{n}\right)}\right] = 0 &{\rightarrow} {-2\frac{1}{2\widehat{\sigma}^{2}}{\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)}\left(-{x}_{i}\right) = 0\\ &{\rightarrow} {\frac{1}{2\widehat{\sigma}^{2}}{\sum}_{i=1}^{n}\left({y}_{i}{x}_{i} - \widehat{\beta}_{0}{x}_{i} - \widehat{\beta}_{1}{x}_{i}^{2}\right)} = 0\\ &{\rightarrow} {{\sum}_{i=1}^{n}\left({y}_{i}{x}_{i} - \widehat{\beta}_{0}{x}_{i} - \widehat{\beta}_{1}{x}_{i}^{2}\right)} = 0\\ &{\rightarrow} {{\sum}_{i=1}^{n}{y}_{i}{x}_{i} - {\sum}_{i=1}^{n}\widehat{\beta}_{0}{x}_{i} - {\sum}_{i=1}^{n}\widehat{\beta}_{1}{x}_{i}^{2}} = 0\\ &{\rightarrow} {{\sum}_{i=1}^{n}{y}_{i}{x}_{i} - {\sum}_{i=1}^{n}\widehat{\beta}_{0}{x}_{i} = {\sum}_{i=1}^{n}\widehat{\beta}_{1}{x}_{i}^{2}}\\ &{\rightarrow} {{\sum}_{i=1}^{n}{y}_{i}{x}_{i} - \widehat{\beta}_{0}{\sum}_{i=1}^{n}{x}_{i} = \widehat{\beta}_{1}{\sum}_{i=1}^{n}{x}_{i}^{2}}\\ &{\rightarrow} {\frac{{\sum}_{i=1}^{n}{y}_{i}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \widehat{\beta}_{0}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} = \widehat{\beta}_{1}}\\ &{\rightarrow} {\frac{{\sum}_{i=1}^{n}{y}_{i}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \left(\overline{y} - \widehat{\beta}_{1}\overline{x}\right)\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} = \widehat{\beta}_{1}}\\ &{\rightarrow} {\frac{{\sum}_{i=1}^{n}{y}_{i}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \overline{y}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} + \widehat{\beta}_{1}\overline{x}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} = \widehat{\beta}_{1}}\\ &{\rightarrow} {\frac{{\sum}_{i=1}^{n}{y}_{i}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \overline{y}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} = \widehat{\beta}_{1} - \widehat{\beta}_{1}\overline{x}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}}\\ &{\rightarrow} {\frac{{\sum}_{i=1}^{n}{y}_{i}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \overline{y}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} = \widehat{\beta}_{1}\left(1 - \overline{x}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}\right)}\\ &{\rightarrow} {\frac{\frac{{\sum}_{i=1}^{n}{y}_{i}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}} - \overline{y}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}}{\left(1 - \overline{x}\frac{{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2}}\right)} = \widehat{\beta}_{1}}\\ &{\rightarrow} {\frac{{\sum}_{i=1}^{n}{y}_{i}{x}_{i} - \overline{y}{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2} - \overline{x}{\sum}_{i=1}^{n}{x}_{i}} = \widehat{\beta}_{1}}\\ &{\rightarrow} {\frac{{\sum}_{i=1}^{n}{y}_{i}{x}_{i} - \overline{y}\frac{n}{n}{\sum}_{i=1}^{n}{x}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2} - \overline{x}\frac{n}{n}{\sum}_{i=1}^{n}{x}_{i}} = \widehat{\beta}_{1}}\\ &{\rightarrow} {\frac{{\sum}_{i=1}^{n}{y}_{i}{x}_{i} - {n}\overline{y}\overline{x}}{{\sum}_{i=1}^{n}{x}_{i}^{2} - {n}\overline{x}\overline{x}} = \widehat{\beta}_{1}}\\ &{\rightarrow} {\frac{\frac{1}{n}{\sum}_{i=1}^{n}{y}_{i}{x}_{i} - \overline{y}\overline{x}}{\frac{1}{n}{\sum}_{i=1}^{n}{x}_{i}^{2} - \overline{x}^{2}} = \widehat{\beta}_{1}}\\ &{\rightarrow} {\frac{{S}_{x,y}}{{S}_{x,x}} = \widehat{\beta}_{1}} \end{align} \]
\[ \begin{align} \frac{{\partial}}{{\partial}{\sigma}^{2}}\ln\left[{\mathbb{L}\left({\beta}_{0}, {\beta}_{1}, {\sigma}^{2}|{x}_{1}, {x}_{2}, \cdots, {x}_{n}, {y}_{1}, {y}_{2}, \cdots, {y}_{n}\right)}\right] = 0 &{\rightarrow} - {\frac{n}{2}}\frac{1}{{{2{\pi}\widehat{\sigma}^{2}}}}{2{\pi}} - {\frac{1}{2\left(\widehat{\sigma}^{2}\right)^{2}}(-1){\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)^{2}} = 0\\ &{\rightarrow} - \frac{n}{{2{\widehat{\sigma}^{2}}}} + {\frac{1}{2\left(\widehat{\sigma}^{2}\right)^{2}}{\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)^{2}} = 0\\ &{\rightarrow} {\frac{1}{2\left(\widehat{\sigma}^{2}\right)^{2}}{\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)^{2}} = \frac{n}{{2{\widehat{\sigma}^{2}}}}\\ &{\rightarrow} {\frac{1}{\left(\widehat{\sigma}^{2}\right)^{2}}{\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)^{2}} = \frac{n}{{{\widehat{\sigma}^{2}}}}\\ &{\rightarrow} {\frac{1}{n}{\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)^{2}} = \widehat{\sigma}^{2}\\ \end{align} \]
\[ \begin{align} {SC}_{T} &= {\sum}_{i=1}^{n}\left({y}_{i} - \overline{y}\right)^{2}\\ &= {\sum}_{i=1}^{n}\left({y}_{i} - \widehat{y}_{i} + \widehat{y}_{i} - \overline{y}\right)^{2}\\ &= {\sum}_{i=1}^{n}\left[\left({y}_{i} - \widehat{y}_{i}\right) + \left(\widehat{y}_{i} - \overline{y}\right)\right]^{2}\\ &= {\sum}_{i=1}^{n}\left[\left({y}_{i} - \widehat{y}_{i}\right)^{2} + 2\left({y}_{i} - \widehat{y}_{i}\right)\left(\widehat{y}_{i} - \overline{y}\right) + \left(\widehat{y}_{i} - \overline{y}\right)^{2}\right]\\ &= {\sum}_{i=1}^{n}\left({y}_{i} - \widehat{y}_{i}\right)^{2} + 2{\sum}_{i=1}^{n}\left({y}_{i} - \widehat{y}_{i}\right)\left(\widehat{y}_{i} - \overline{y}\right) + {\sum}_{i=1}^{n}\left(\widehat{y}_{i} - \overline{y}\right)^{2}\\ &= {\sum}_{i=1}^{n}\left({y}_{i} - \widehat{y}_{i}\right)^{2} + 2{\sum}_{i=1}^{n}{\varepsilon}_{i}\left(\widehat{y}_{i} - \overline{y}\right) + {\sum}_{i=1}^{n}\left(\widehat{y}_{i} - \overline{y}\right)^{2}\\ &= {\sum}_{i=1}^{n}\left({y}_{i} - \widehat{y}_{i}\right)^{2} + 2{\sum}_{i=1}^{n}{\varepsilon}_{i}\left(\widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i} - \overline{y}\right) + {\sum}_{i=1}^{n}\left(\widehat{y}_{i} - \overline{y}\right)^{2}\\ &= {\sum}_{i=1}^{n}\left({y}_{i} - \widehat{y}_{i}\right)^{2} + 2\left(\widehat{\beta}_{0} - \overline{y}\right){\sum}_{i=1}^{n}{\varepsilon}_{i} + 2\widehat{\beta}_{1}{\sum}_{i=1}^{n}{\varepsilon}_{i}{x}_{i} + {\sum}_{i=1}^{n}\left(\widehat{y}_{i} - \overline{y}\right)^{2}\\ &= {\sum}_{i=1}^{n}\left({y}_{i} - \widehat{y}_{i}\right)^{2} + 2\left(\widehat{\beta}_{0} - \overline{y}\right)0 + 2\widehat{\beta}_{1}{\sum}_{i=1}^{n}\left({{y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}}\right){x}_{i} + {\sum}_{i=1}^{n}\left(\widehat{y}_{i} - \overline{y}\right)^{2}\\ &= {\sum}_{i=1}^{n}\left({y}_{i} - \widehat{y}_{i}\right)^{2} + 2\widehat{\beta}_{1}\frac{{\partial}}{{\partial}{\beta}_{1}}S\left(\widehat{\beta}_{0}, \widehat{\beta}_{1}\right) + {\sum}_{i=1}^{n}\left(\widehat{y}_{i} - \overline{y}\right)^{2}\\ &= {\sum}_{i=1}^{n}\left({y}_{i} - \widehat{y}_{i}\right)^{2} + 2\widehat{\beta}_{1}0 + {\sum}_{i=1}^{n}\left(\widehat{y}_{i} - \overline{y}\right)^{2}\\ &= {\sum}_{i=1}^{n}\left({y}_{i} - \widehat{y}_{i}\right)^{2} + {\sum}_{i=1}^{n}\left(\widehat{y}_{i} - \overline{y}\right)^{2} \end{align} \]
\[ \begin{align} {SC}_{T} &= {SC}_{E} + {SC}_{R} \end{align} \]
\[ \begin{align} {SC}_{T} &= {\sum}_{i=1}^{n}\left({y}_{i} - \overline{y}\right)^{2} \end{align} \]
\[ \begin{align} {SC}_{E} &= {\sum}_{i=1}^{n}\left[{y}_{i} - \left(\widehat{\beta}_{0} + \widehat{\beta}_{1}{x}_{i}\right)\right]^{2} \end{align} \]
\[ \begin{align} {SC}_{R} &= {\sum}_{i=1}^{n}\left[\left(\widehat{\beta}_{0} + \widehat{\beta}_{1}{x}_{i}\right) - \overline{y}\right]^{2} \end{align} \]
\[ \begin{align} \frac{{SC}_{T}}{{gl}_{T}} = \frac{{\sum}_{i=1}^{n}\left({y}_{i} - \overline{y}\right)^{2}}{n-1} &{\sim} \chi_{(n-1)}^{2} \end{align} \]
\[ \begin{align} \frac{{SC}_{E}}{{gl}_{E}} = \frac{{\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)^{2}}{n-2} &{\sim} \chi_{(n-2)}^{2} \end{align} \]
\[ \begin{align} \frac{{SC}_{R}}{{gl}_{R}} = \frac{{SC}_{T}}{n-1} - \frac{{SC}_{E}}{n-2} &{\sim} \chi_{[(n - 1) - (n - 2)]}^{2} \end{align} \]
\[ \begin{align} \frac{{SC}_{R}}{{gl}_{R}} = \frac{{\sum}_{i=1}^{n}\left(\widehat{\beta}_{0} + \widehat{\beta}_{1}{x}_{i} - \overline{y}\right)^{2}}{1} &{\sim} \chi_{(1)}^{2} \end{align} \]
\[ \begin{align} \frac{\frac{{SC}_{R}}{{gl}_{R}}}{\frac{{SC}_{E}}{{gl}_{E}}} = \frac{\frac{{\sum}_{i=1}^{n}\left(\widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i} - \overline{y}\right)^{2}}{1}}{\frac{{\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)^{2}}{n-2}} &{\sim} F_{(1,n-2)}^{2} \end{align} \]
\[ \begin{align} {H}_{0}: {y}_{i} = {\beta}_{0} + {\varepsilon}_{i} &{\text{ versus }} {H}_{1}: {y}_{i} = {\beta}_{0} + {\beta}_{1}{x}_{i} + {\varepsilon}_{i}\\ {H}_{0}: {\beta}_{1} = {0} &{\text{ versus }} {H}_{1}: {\beta}_{1} \not= {0} \end{align} \]
Fuente | gl | Suma de cuadrados | Cuadrado medio | \(F_{(1,n-2)}\) |
---|---|---|---|---|
Regresión | 1 | \(SC_R={\sum}_{i=1}^{n}\left(\widehat{\beta}_{0} + \widehat{\beta}_{1}{x}_{i} - \overline{y}\right)^{2}\) | \(CM_R=\frac{{\sum}_{i=1}^{n}\left(\widehat{\beta}_{0} + \widehat{\beta}_{1}{x}_{i} - \overline{y}\right)^{2}}{1}\) | \(\frac{CM_R}{CM_E}=\frac{\frac{{\sum}_{i=1}^{n}\left(\widehat{\beta}_{0} + \widehat{\beta}_{1}{x}_{i} - \overline{y}\right)^{2}}{1}}{\frac{{\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)^{2}}{n-2}}\) |
Error | n-2 | \(SC_E={\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)^{2}\) | \(CM_E=\frac{{\sum}_{i=1}^{n}\left({y}_{i} - \widehat{\beta}_{0} - \widehat{\beta}_{1}{x}_{i}\right)^{2}}{n-2}\) | |
Total | n-1 | \(SC_T={\sum}_{i=1}^{n}\left({y}_{i} - \overline{y}\right)^{2}\) | \(CM_T=\frac{{\sum}_{i=1}^{n}\left({y}_{i} - \overline{y}\right)^{2}}{n-1}\) |
\[ \begin{align} R^{2} &= \frac{SC_R}{SC_T}\\ &= \frac{{\sum}_{i=1}^{n}\left(\widehat{\beta}_{0} + \widehat{\beta}_{1}{x}_{i} - \overline{y}\right)^{2}}{{\sum}_{i=1}^{n}\left({y}_{i} - \overline{y}\right)^{2}} \end{align} \]
\[ \begin{align} R_{a}^{2} &= \frac{\frac{SC_R}{n-(k+1)}}{\frac{SC_T}{n-1}}\\ &= \frac{\frac{{\sum}_{i=1}^{n}\left(\widehat{\beta}_{0} + \widehat{\beta}_{1}{x}_{i} - \overline{y}\right)^{2}}{n-(k+1)}}{\frac{{\sum}_{i=1}^{n}\left({y}_{i} - \overline{y}\right)^{2}}{n-1}} \end{align} \]
\[ \begin{align} \begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} &= \begin{bmatrix} 1 & {x}_{1}\\ 1 & {x}_{2}\\ \vdots & \vdots\\ 1 & {x}_{n} \end{bmatrix}\begin{bmatrix} {\beta}_{0}\\ {\beta}_{1} \end{bmatrix} + \begin{bmatrix} {\varepsilon}_{1}\\ {\varepsilon}_{2}\\ \vdots\\ {\varepsilon}_{n} \end{bmatrix} \end{align} \]
\[ \begin{align} \begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} - \begin{bmatrix} 1 & {x}_{1}\\ 1 & {x}_{2}\\ \vdots & \vdots\\ 1 & {x}_{n} \end{bmatrix}\begin{bmatrix} {\beta}_{0}\\ {\beta}_{1} \end{bmatrix} &= \begin{bmatrix} {\varepsilon}_{1}\\ {\varepsilon}_{2}\\ \vdots\\ {\varepsilon}_{n} \end{bmatrix} \end{align} \]
\[ \begin{align} \begin{bmatrix} {\varepsilon}_{1}, & {\varepsilon}_{2}, & \cdots, & {\varepsilon}_{n} \end{bmatrix}\begin{bmatrix} {\varepsilon}_{1}\\ {\varepsilon}_{2}\\ \vdots\\ {\varepsilon}_{n} \end{bmatrix} &= S(\vec{\beta}) \end{align} \]
\[ \begin{align} \begin{bmatrix} \begin{bmatrix} {y}_{1}, & {y}_{2}, \cdots, & {y}_{n} \end{bmatrix} - \begin{bmatrix} {\beta}_{0}, & {\beta}_{1} \end{bmatrix}\begin{bmatrix} 1, & 1, & \cdots, & 1\\ {x}_{1}, & {x}_{2}, & \cdots, & {x}_{n} \end{bmatrix} \end{bmatrix} \begin{bmatrix} \begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} - \begin{bmatrix} 1 & {x}_{1}\\ 1 & {x}_{2}\\ \vdots & \vdots\\ 1 & {x}_{n} \end{bmatrix}\begin{bmatrix} {\beta}_{0}\\ {\beta}_{1} \end{bmatrix} \end{bmatrix} &= S(\vec{\beta})\\ \begin{bmatrix} {y}_{1}, & {y}_{2}, \cdots, & {y}_{n} \end{bmatrix} \begin{bmatrix} \begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} - \begin{bmatrix} 1 & {x}_{1}\\ 1 & {x}_{2}\\ \vdots & \vdots\\ 1 & {x}_{n} \end{bmatrix}\begin{bmatrix} {\beta}_{0}\\ {\beta}_{1} \end{bmatrix} \end{bmatrix} - \begin{bmatrix} {\beta}_{0}, & {\beta}_{1} \end{bmatrix}\begin{bmatrix} 1, & 1, & \cdots, & 1\\ {x}_{1}, & {x}_{2}, & \cdots, & {x}_{n} \end{bmatrix} \begin{bmatrix} \begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} - \begin{bmatrix} 1 & {x}_{1}\\ 1 & {x}_{2}\\ \vdots & \vdots\\ 1 & {x}_{n} \end{bmatrix}\begin{bmatrix} {\beta}_{0}\\ {\beta}_{1} \end{bmatrix} \end{bmatrix} &= \\ \begin{bmatrix} {y}_{1}, & {y}_{2}, \cdots, & {y}_{n} \end{bmatrix} \begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} - \begin{bmatrix} {y}_{1}, & {y}_{2}, \cdots, & {y}_{n} \end{bmatrix} \begin{bmatrix} 1 & {x}_{1}\\ 1 & {x}_{2}\\ \vdots & \vdots\\ 1 & {x}_{n} \end{bmatrix}\begin{bmatrix} {\beta}_{0}\\ {\beta}_{1} \end{bmatrix} - \begin{bmatrix} {\beta}_{0}, & {\beta}_{1} \end{bmatrix}\begin{bmatrix} 1, & 1, & \cdots, & 1\\ {x}_{1}, & {x}_{2}, & \cdots, & {x}_{n} \end{bmatrix} \begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} + \begin{bmatrix} {\beta}_{0}, & {\beta}_{1} \end{bmatrix}\begin{bmatrix} 1, & 1, & \cdots, & 1\\ {x}_{1}, & {x}_{2}, & \cdots, & {x}_{n} \end{bmatrix} \begin{bmatrix} 1 & {x}_{1}\\ 1 & {x}_{2}\\ \vdots & \vdots\\ 1 & {x}_{n} \end{bmatrix}\begin{bmatrix} {\beta}_{0}\\ {\beta}_{1} \end{bmatrix} &= \\ \begin{bmatrix} {y}_{1}, & {y}_{2}, \cdots, & {y}_{n} \end{bmatrix} \begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} - 2\begin{bmatrix} {\beta}_{0}, & {\beta}_{1} \end{bmatrix}\begin{bmatrix} 1, & 1, & \cdots, & 1\\ {x}_{1}, & {x}_{2}, & \cdots, & {x}_{n} \end{bmatrix} \begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} + \begin{bmatrix} {\beta}_{0}, & {\beta}_{1} \end{bmatrix}\begin{bmatrix} 1, & 1, & \cdots, & 1\\ {x}_{1}, & {x}_{2}, & \cdots, & {x}_{n} \end{bmatrix} \begin{bmatrix} 1 & {x}_{1}\\ 1 & {x}_{2}\\ \vdots & \vdots\\ 1 & {x}_{n} \end{bmatrix}\begin{bmatrix} {\beta}_{0}\\ {\beta}_{1} \end{bmatrix} &= \end{align} \]
\[ \begin{align} - 2\begin{bmatrix} 1, & 1, & \cdots, & 1\\ {x}_{1}, & {x}_{2}, & \cdots, & {x}_{n} \end{bmatrix} \begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} + 2\begin{bmatrix} 1, & 1, & \cdots, & 1\\ {x}_{1}, & {x}_{2}, & \cdots, & {x}_{n} \end{bmatrix} \begin{bmatrix} 1 & {x}_{1}\\ 1 & {x}_{2}\\ \vdots & \vdots\\ 1 & {x}_{n} \end{bmatrix}\begin{bmatrix} {\beta}_{0}\\ {\beta}_{1} \end{bmatrix} &= \frac{\partial}{\partial\vec{\beta}}S(\vec{\beta}) \end{align} \]
\[ \begin{align} - \begin{bmatrix} 1, & 1, & \cdots, & 1\\ {x}_{1}, & {x}_{2}, & \cdots, & {x}_{n} \end{bmatrix} \begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} + \begin{bmatrix} 1, & 1, & \cdots, & 1\\ {x}_{1}, & {x}_{2}, & \cdots, & {x}_{n} \end{bmatrix} \begin{bmatrix} 1 & {x}_{1}\\ 1 & {x}_{2}\\ \vdots & \vdots\\ 1 & {x}_{n} \end{bmatrix}\begin{bmatrix} \widehat{\beta}_{0}\\ \widehat{\beta}_{1} \end{bmatrix} &= \begin{bmatrix} 0\\ 0 \end{bmatrix} \\ \begin{bmatrix} 1, & 1, & \cdots, & 1\\ {x}_{1}, & {x}_{2}, & \cdots, & {x}_{n} \end{bmatrix} \begin{bmatrix} 1 & {x}_{1}\\ 1 & {x}_{2}\\ \vdots & \vdots\\ 1 & {x}_{n} \end{bmatrix}\begin{bmatrix} \widehat{\beta}_{0}\\ \widehat{\beta}_{1} \end{bmatrix} &= \begin{bmatrix} 1, & 1, & \cdots, & 1\\ {x}_{1}, & {x}_{2}, & \cdots, & {x}_{n} \end{bmatrix}\begin{bmatrix} {y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n} \end{bmatrix} \\ \begin{bmatrix} n & {\sum}_{i=1}^{n}{x}_{i}\\ {\sum}_{i=1}^{n}{x}_{i} & {\sum}_{i=1}^{n}{x}_{i}^{2} \end{bmatrix}\begin{bmatrix} \widehat{\beta}_{0}\\ \widehat{\beta}_{1} \end{bmatrix} &= \begin{bmatrix} {\sum}_{i=1}^{n}{y}_{i}\\ {\sum}_{i=1}^{n}{x}_{i}{y}_{i} \end{bmatrix} \\ \frac{1}{{n}{\sum}_{i=1}^{n}{x}_{i}^{2}-\left({{\sum}_{i=1}^{n}{x}_{i}}\right)^{2}}\begin{bmatrix} {\sum}_{i=1}^{n}{x}_{i}^{2} & -{\sum}_{i=1}^{n}{x}_{i}\\ -{\sum}_{i=1}^{n}{x}_{i} & n \end{bmatrix}\begin{bmatrix} n & {\sum}_{i=1}^{n}{x}_{i}\\ {\sum}_{i=1}^{n}{x}_{i} & {\sum}_{i=1}^{n}{x}_{i}^{2} \end{bmatrix}\begin{bmatrix} \widehat{\beta}_{0}\\ \widehat{\beta}_{1} \end{bmatrix} &= \frac{1}{{n}{\sum}_{i=1}^{n}{x}_{i}^{2}-\left({{\sum}_{i=1}^{n}{x}_{i}}\right)^{2}}\begin{bmatrix} {\sum}_{i=1}^{n}{x}_{i}^{2} & -{\sum}_{i=1}^{n}{x}_{i}\\ -{\sum}_{i=1}^{n}{x}_{i} & n \end{bmatrix}\begin{bmatrix} {\sum}_{i=1}^{n}{y}_{i}\\ {\sum}_{i=1}^{n}{x}_{i}{y}_{i} \end{bmatrix} \\ \begin{bmatrix} \widehat{\beta}_{0}\\ \widehat{\beta}_{1} \end{bmatrix} &= \frac{1}{{n}{\sum}_{i=1}^{n}{x}_{i}^{2}-\left({{\sum}_{i=1}^{n}{x}_{i}}\right)^{2}}\begin{bmatrix} {\sum}_{i=1}^{n}{x}_{i}^{2}{\sum}_{i=1}^{n}{y}_{i} - {\sum}_{i=1}^{n}{x}_{i}{\sum}_{i=1}^{n}{x}_{i}{y}_{i}\\ n{\sum}_{i=1}^{n}{x}_{i}{y}_{i} - {\sum}_{i=1}^{n}{x}_{i}{\sum}_{i=1}^{n}{y}_{i} \end{bmatrix} \\ &= \begin{bmatrix} \frac{{\sum}_{i=1}^{n}{x}_{i}^{2}{\sum}_{i=1}^{n}{y}_{i} - {\sum}_{i=1}^{n}{x}_{i}{\sum}_{i=1}^{n}{x}_{i}{y}_{i}}{{n}{\sum}_{i=1}^{n}{x}_{i}^{2} - \left({{\sum}_{i=1}^{n}{x}_{i}}\right)^{2}}\\ \frac{n{\sum}_{i=1}^{n}{x}_{i}{y}_{i} - {\sum}_{i=1}^{n}{x}_{i}{\sum}_{i=1}^{n}{y}_{i}}{{n}{\sum}_{i=1}^{n}{x}_{i}^{2} - \left({{\sum}_{i=1}^{n}{x}_{i}}\right)^{2}} \end{bmatrix} \\ &= \begin{bmatrix} \frac{{\sum}_{i=1}^{n}{x}_{i}^{2}\frac{n}{n}{\sum}_{i=1}^{n}{y}_{i} - \frac{n}{n}{\sum}_{i=1}^{n}{x}_{i}{\sum}_{i=1}^{n}{x}_{i}{y}_{i}}{{n}{\sum}_{i=1}^{n}{x}_{i}^{2} - \left({{\sum}_{i=1}^{n}{x}_{i}}\right)^{2}}\\ \frac{n{\sum}_{i=1}^{n}{x}_{i}{y}_{i} - \frac{n}{n}{\sum}_{i=1}^{n}{x}_{i}\frac{n}{n}{\sum}_{i=1}^{n}{y}_{i}}{{n}{\sum}_{i=1}^{n}{x}_{i}^{2} - \left({{\sum}_{i=1}^{n}{x}_{i}}\right)^{2}} \end{bmatrix} \\ &= \begin{bmatrix} \frac{{n}\overline{y}{\sum}_{i=1}^{n}{x}_{i}^{2} - {n}\overline{x}{\sum}_{i=1}^{n}{x}_{i}{y}_{i}}{{n}{\sum}_{i=1}^{n}{x}_{i}^{2}-\left({{\sum}_{i=1}^{n}{x}_{i}}\right)^{2}}\\ \frac{n{\sum}_{i=1}^{n}{x}_{i}{y}_{i} - {n}\overline{x}{n}\overline{y}}{{n}{\sum}_{i=1}^{n}{x}_{i}^{2} - \left({{\sum}_{i=1}^{n}{x}_{i}}\right)^{2}} \end{bmatrix} \\ &= \begin{bmatrix} \frac{\overline{y}{\sum}_{i=1}^{n}{x}_{i}^{2} - \overline{x}{\sum}_{i=1}^{n}{x}_{i}{y}_{i}}{{\sum}_{i=1}^{n}{x}_{i}^{2} - \frac{1}{n}\left({{\sum}_{i=1}^{n}{x}_{i}}\right)^{2}}\\ \frac{{\sum}_{i=1}^{n}{x}_{i}{y}_{i} - {n}\overline{x}\overline{y}}{{\sum}_{i=1}^{n}{x}_{i}^{2}-\frac{1}{n}\left({{\sum}_{i=1}^{n}{x}_{i}}\right)^{2}} \end{bmatrix} \\ &= \begin{bmatrix} \frac{\overline{y}{\sum}_{i=1}^{n}{x}_{i}^{2} - \overline{x}({S}_{x,y} + {n}\overline{x}\overline{y})}{{S}_{x,x}}\\ \frac{{S}_{x,y}}{{S}_{x,x}} \end{bmatrix} \\ &= \begin{bmatrix} \frac{\overline{y}{\sum}_{i=1}^{n}{x}_{i}^{2}}{{S}_{x,x}} - \frac{\overline{x}{S}_{x,y} + {n}\overline{y}\overline{x}^{2}}{{S}_{x,x}}\\ \frac{{S}_{x,y}}{{S}_{x,x}} \end{bmatrix} \\ &= \begin{bmatrix} \frac{\overline{y}{\sum}_{i=1}^{n}{x}_{i}^{2}}{{S}_{x,x}} - \frac{\overline{x}{S}_{x,y}}{{S}_{x,x}} - \frac{{n}\overline{y}\overline{x}^{2}}{{S}_{x,x}}\\ \frac{{S}_{x,y}}{{S}_{x,x}} \end{bmatrix} \\ &= \begin{bmatrix} \frac{\overline{y}{\sum}_{i=1}^{n}{x}_{i}^{2}}{{S}_{x,x}} - \frac{{n}\overline{y}\overline{x}^{2}}{{S}_{x,x}} - \frac{\overline{x}{S}_{x,y}}{{S}_{x,x}}\\ \frac{{S}_{x,y}}{{S}_{x,x}} \end{bmatrix} \\ &= \begin{bmatrix} \overline{y}\frac{{\sum}_{i=1}^{n}{x}_{i}^{2}}{{S}_{x,x}} - \overline{y}\frac{{n}\overline{x}^{2}}{{S}_{x,x}} - \overline{x}\frac{{S}_{x,y}}{{S}_{x,x}}\\ \frac{{S}_{x,y}}{{S}_{x,x}} \end{bmatrix} \\ &= \begin{bmatrix} \overline{y}\left(\frac{{\sum}_{i=1}^{n}{x}_{i}^{2} - {n}\overline{x}^{2}}{{S}_{x,x}}\right) - \frac{{S}_{x,y}}{{S}_{x,x}}\overline{x}\\ \frac{{S}_{x,y}}{{S}_{x,x}} \end{bmatrix} \\ &= \begin{bmatrix} \overline{y} - \frac{{S}_{x,y}}{{S}_{x,x}}\overline{x}\\ \frac{{S}_{x,y}}{{S}_{x,x}} \end{bmatrix} \end{align} \]
Para este caso partícular, se busca minimizar la siguiente función objetivo
\[ \begin{align} {\sum}_{i=1}^{n}{\omega}_{i}\left(\widehat{\beta}_{0} + \widehat{\beta}_{1}{x}_{i} - {y}_{i}\right)^{2} \end{align} \]
Lo más usual es establecer los pesos \({\omega}_{i}\) como el inverso de la variabilidad asociada a la observación dada, sin embargo, en la práctica es imposible establecer dicha varianza por lo que en su lugar se asume que es proporcional al valor \({x}_{i}\)
\[ \begin{align} {\omega}_{i} &= \frac{k}{c{\cdot}{x}_{i}} \end{align} \]
\[ \begin{align} {\omega}_{i} &{\approxeq} \frac{k}{c{\cdot}{x}_{i}^{\alpha}} \end{align} \]
\[ \begin{align} {\arg{\min}}_{{\beta}_{0}, {\beta}_{1}}{{\sum}_{i=1}^{n}{\omega}_{i}\left(\widehat{\beta}_{0} + \widehat{\beta}_{1}{x}_{i} - {y}_{i}\right)^{2} } \end{align} \]
\[ \begin{align} \widehat{\beta}_{0} &= \frac{\left({\sum}_{i=1}^{n}{\omega}_{i}{y}_{i}\right)\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}^{2}\right) - \left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}{y}_{i}\right)\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}\right)}{\left({\sum}_{i=1}^{n}{\omega}_{i}\right)\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}^{2}\right) - \left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}\right)^{2}} \end{align} \]
\[ \begin{align} \widehat{\beta}_{1} &= \frac{\left({\sum}_{i=1}^{n}{\omega}_{i}\right)\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}{y}_{i}\right) - \left({\sum}_{i=1}^{n}{\omega}_{i}{y}_{i}\right)\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}\right)}{\left({\sum}_{i=1}^{n}{\omega}_{i}\right)\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}^{2}\right) - \left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}\right)^{2}} \end{align} \]
Si se asume que \({\sum}_{i=1}^{n}{\omega}_{i}=1\) entonces:
\[ \begin{align} \widehat{\beta}_{0} &= \frac{\left({\sum}_{i=1}^{n}{\omega}_{i}{y}_{i}\right)\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}^{2}\right) - \left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}{y}_{i}\right)\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}\right)}{\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}^{2}\right) - \left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}\right)^{2}} \end{align} \]
\[ \begin{align} \widehat{\beta}_{1} &= \frac{\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}{y}_{i}\right) - \left({\sum}_{i=1}^{n}{\omega}_{i}{y}_{i}\right)\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}\right)}{\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}^{2}\right) - \left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}\right)^{2}} \end{align} \]
Por lo anterior, para
\[ \begin{align} \hat{x} &= {\sum}_{i=1}^{n}{\omega}_{i}{x}_{i} \end{align} \]
Se llega a que
\[ \begin{align} \widehat{y}_{i} &= \widehat{\beta}_{0} + \widehat{\beta}_{1}\hat{x}\\ &= \frac{\left({\sum}_{i=1}^{n}{\omega}_{i}{y}_{i}\right)\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}^{2}\right) - \left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}{y}_{i}\right)\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}\right)}{\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}^{2}\right) - \left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}\right)^{2}} + \frac{\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}{y}_{i}\right) - \left({\sum}_{i=1}^{n}{\omega}_{i}{y}_{i}\right)\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}\right)}{\left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}^{2}\right) - \left({\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}\right)^{2}}{\sum}_{i=1}^{n}{\omega}_{i}{x}_{i}\\ &= {\sum}_{i=1}^{n}{\omega}_{i}{y}_{i} \end{align} \]
Componentes de un modelo lineal generalizado:
\[g\left({y}_{i} | {\theta}_{i}\right) = a\left({\theta}_{i}\right)b\left({y}_{i}\right)\exp{\left\{{y}_{i}Q\left({\theta}_{i}\right)\right\}}\]
\[{\beta}_{0}+{\beta}_{1}{x}_{1}\]
\[g[E({y}) = {\mu}] = {\beta}_{0}+{\beta}_{1}{x}_{1}\]
\[gastos=\beta_0 + \beta_1{\times}ingresos+error\]
\[error{\sim}N(0,\sigma_{error}^2)\]
\[y=m{\times}x+b+error\]
ingresos <- as.data.frame(877802 * rgamma(n=100000, 87))
colnames(ingresos) <- "ingresos"
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tidyr::pack() masks Matrix::pack()
## ✖ tidyr::unpack() masks Matrix::unpack()
ingresos %>%
ggplot(aes(ingresos, fill=cut(ingresos, 100))) + geom_histogram(bins=sqrt(nrow(ingresos)), show.legend=FALSE)
gastos <- rnorm(n=100000, mean=400000, sd=80000) + 0.3 * (ingresos + rnorm(n=100000, mean=219450.5, sd=438901))
colnames(gastos) <- "gastos"
library(tidyverse)
gastos %>%
ggplot(aes(gastos, fill=cut(gastos, 100))) + geom_histogram(bins=sqrt(nrow(gastos)), show.legend=FALSE)
data <- as.data.frame(cbind(ingresos, gastos))
sample <- data %>%
sample_n(size=40000)
library(tidyverse)
sample %>%
ggplot(aes(x=ingresos, y=gastos, color=ingresos)) +
geom_point(shape=16, show.legend=FALSE) +
scale_color_gradient(low="#32aeff", high="#f2aeff")
modelo.1 <- lm(gastos~ingresos, data=sample)
summary(modelo.1)
##
## Call:
## lm(formula = gastos ~ ingresos, data = sample)
##
## Residuals:
## Min 1Q Median 3Q Max
## -680682 -103413 120 103707 633250
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.706e+05 7.233e+03 65.06 <2e-16 ***
## ingresos 2.999e-01 9.417e-05 3185.32 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 153500 on 39998 degrees of freedom
## Multiple R-squared: 0.9961, Adjusted R-squared: 0.9961
## F-statistic: 1.015e+07 on 1 and 39998 DF, p-value: < 2.2e-16
\[ \begin{align} {gastos}_{i}&=\widehat{\beta}_0+\widehat{\beta}_1{\times}{ingresos}_{i}+{error}_{i}\\ &=4.7057583\times 10^{5}+0.2999475{\times}{ingresos}_{i}+{error}_{i} \end{align} \]
\[ {error}_{i}{\sim}N(0,\sigma_{error}^{2}) \]
\[ \begin{align} \widehat{gastos}_{i}&=\widehat{\beta}_0+\widehat{\beta}_1{\times}{ingresos}_{i}\\ &=4.7057583\times 10^{5}+0.2999475{\times}{ingresos}_{i} \end{align} \]
\[ {error}_{i}={gastos}_{i}-\widehat{gastos}_{i} \]
modelo.1$coefficients
## (Intercept) ingresos
## 4.705758e+05 2.999475e-01
\[ t=\frac{\widehat{\beta}_i-{\beta}_i}{\widehat{\sigma}_{\widehat{\beta}_i}}{\sim}t_{\left[n-1\right]} \]
\[ \begin{align} 1-\alpha&=P\left(-t_{\left[n-1,1-\frac{\alpha}{2}\right]}{\leq}t=\frac{\widehat{\beta}_i-{\beta}_i}{\widehat{\sigma}_{\widehat{\beta}_i}}{\leq}+t_{\left[n-1,1-\frac{\alpha}{2}\right]}\right)\\ &=P\left(-t_{\left[n-1,1-\frac{\alpha}{2}\right]}\widehat{\sigma}_{\widehat{\beta}_i}{\leq}\widehat{\beta}_i-{\beta}_i{\leq}+t_{\left[n-1,1-\frac{\alpha}{2}\right]}\widehat{\sigma}_{\widehat{\beta}_i}\right)\\ &=P\left(-t_{\left[n-1,1-\frac{\alpha}{2}\right]}\widehat{\sigma}_{\widehat{\beta}_i}-\widehat{\beta}_i{\leq}-{\beta}_i{\leq}+t_{\left[n-1,1-\frac{\alpha}{2}\right]}\widehat{\sigma}_{\widehat{\beta}_i}-\widehat{\beta}_i\right)\\ &=P\left(t_{\left[n-1,1-\frac{\alpha}{2}\right]}\widehat{\sigma}_{\widehat{\beta}_i}+\widehat{\beta}_i{\geq}{\beta}_i{\geq}-t_{\left[n-1,1-\frac{\alpha}{2}\right]}\widehat{\sigma}_{\widehat{\beta}_i}+\widehat{\beta}_i\right)\\ &=P\left(\widehat{\beta}_i-t_{\left[n-1,1-\frac{\alpha}{2}\right]}\widehat{\sigma}_{\widehat{\beta}_i}{\leq}{\beta}_i{\leq}\widehat{\beta}_i+t_{\left[n-1,1-\frac{\alpha}{2}\right]}\widehat{\sigma}_{\widehat{\beta}_i}\right)\\ \end{align} \]
\[IC_{1-\alpha}=\left(\widehat{\beta}_i-t_{\left[n-1,1-\frac{\alpha}{2}\right]}\widehat{\sigma}_{\widehat{\beta}_i};\widehat{\beta}_i+t_{\left[n-1,1-\frac{\alpha}{2}\right]}\widehat{\sigma}_{\widehat{\beta}_i}\right)\]
confint(modelo.1)
## 2.5 % 97.5 %
## (Intercept) 4.563995e+05 4.847522e+05
## ingresos 2.997629e-01 3.001320e-01
confint(modelo.1)[1,]
## 2.5 % 97.5 %
## 456399.5 484752.2
\[ \begin{align} IC_{1-\alpha}&=\left(4.7057583\times 10^{5}-1.9600233\widehat{\sigma}_{\widehat{\beta}_i};4.7057583\times 10^{5}+1.9600233\widehat{\sigma}_{\widehat{\beta}_i}\right)\\ &=\left(4.7057583\times 10^{5}-1.9600233{\times}7232.7511007;4.7057583\times 10^{5}+1.9600233{\times}7232.7511007\right) \end{align} \]
c(resumen$coefficients[1,1])+
c(-qt(0.975,nrow(sample)-1)*resumen$coefficients[1,2],+qt(0.975,nrow(sample)-1)*resumen$coefficients[1,2])
## [1] 456399.5 484752.2
confint(modelo.1)[2,]
## 2.5 % 97.5 %
## 0.2997629 0.3001320
\[ \begin{align} IC_{1-\alpha}&=\left(0.2999475-1.9600233\widehat{\sigma}_{\widehat{\beta}_i};0.2999475+1.9600233\widehat{\sigma}_{\widehat{\beta}_i}\right)\\ &=\left(0.2999475-1.9600233{\times}9.416562\times 10^{-5};0.2999475+1.9600233{\times}9.416562\times 10^{-5}\right) \end{align} \]
c(resumen$coefficients[2,1])+
c(-qt(0.975,nrow(sample)-1)*resumen$coefficients[2,2],+qt(0.975,nrow(sample)-1)*resumen$coefficients[2,2])
## [1] 0.2997629 0.3001320
\[H_0:{\beta}_i=0\]
\[ t=\frac{\widehat{\beta}_i-0}{\widehat{\sigma}_{\widehat{\beta}_i}}{\sim}t_{\left[n-1\right]} \]
\[H_1:{\beta}_i{\neq}0\]
## [1] "RECHAZO LA HIPÓTESIS NULA DE QUE EL INTERCEPTO ES IGUAL A CERO EN FAVOR DE QUE ES DISTINTO DE CERO"
## [1] "RECHAZO LA HIPÓTESIS NULA DE QUE LA PENDIENTE ES IGUAL A CERO EN FAVOR DE QUE ES DISTINTO DE CERO"
ggplot(sample, aes(x=ingresos, y=residuals(modelo.1), color=abs(residuals(modelo.1)))) +
geom_point() + geom_smooth(formula=y~x, color="blue", method="loess") +
theme(legend.position="bottom")
## Warning: Computation failed in `stat_smooth()`:
## workspace required (2400430050) is too large probably because of setting 'se = TRUE'.
library(olsrr)
##
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
##
## rivers
ols_plot_cooksd_bar(modelo.1)
library(ggfortify)
autoplot(modelo.1)
#library(regclass)
#VIF(modelo.1)
library(tidyverse)
sample %>%
ggplot(aes(x=ingresos, y=gastos, color=ingresos)) +
geom_point(shape=16, show.legend=FALSE) +
scale_color_gradient(low="#32aeff", high="#f2aeff") +
geom_smooth(formula=y~x,method=lm, linetype="dashed",
color="darkgray", fill="blue")
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:olsrr':
##
## cement
## The following object is masked from 'package:dplyr':
##
## select
summary(stepAIC(modelo.1, trace=0))
##
## Call:
## lm(formula = gastos ~ ingresos, data = sample)
##
## Residuals:
## Min 1Q Median 3Q Max
## -680682 -103413 120 103707 633250
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.706e+05 7.233e+03 65.06 <2e-16 ***
## ingresos 2.999e-01 9.417e-05 3185.32 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 153500 on 39998 degrees of freedom
## Multiple R-squared: 0.9961, Adjusted R-squared: 0.9961
## F-statistic: 1.015e+07 on 1 and 39998 DF, p-value: < 2.2e-16
library(MASS)
AIC(modelo.1)
## [1] 1068852
Esta información es para un estudio en el centro de la Florida donde se capturaron 15 caimanes y se hicieron dos mediciones en cada uno de los caimanes. El peso (en libras) se registró junto con la longitud del respiradero del hocico (en pulgadas: esta es la distancia entre la parte posterior de la cabeza y el extremo de la nariz).
alligator = data.frame(
lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)
library(lattice)
Como con la mayoría de los análisis, el primer paso es estudiar los dato;s exploratorios para obtener una impresión visual de si existe una relación entre el peso y la longitud del respiradero del hocico y qué forma es probable que tome. Creamos un diagrama de dispersión de los datos de la siguiente manera:
xyplot(lnWeight ~ lnLength, data = alligator,
xlab = "Snout vent length (inches) on log scale",
ylab = "Weight (pounds) on log scale",
main = "Alligators in Central Florida"
)
El gráfico sugiere que el peso (en la escala logarátmica) aumenta linealmente con la longitud del respiradero (de nuevo en la escala logarítmica), por lo que ajustaremos un modelo de regresión lineal simple a los datos y guardaremos el modelo ajustado a un objeto para su posterior análisis:
alli.mod1 = lm(lnWeight ~ lnLength, data = alligator)
La función lm ajusta un modelo lineal a los datos. Especificamos el modelo usando una f?rmula donde la variable de respuesta está en el lado izquierdo, separada por una de las variables explicativas. La fórmula proporciona una forma flexible de específicar varias formas funcionales diferentes para la relación. El argumento de datos se usa para indicar a R dónde buscar las variables utilizadas en la fórmula.
Ahora que el modelo se guarda como un objeto, podemos usar algunas de las funciones de propósito general para extraer información de este objeto sobre el modelo lineal, p. los parámetros o residuales La gran ventaja con R es que hay funciones definidas para diferentes tipos de modelos, que usan el mismo nombre, como el resumen, y el sistema determina qué funciún pretendemos usar en función del tipo de objeto guardado. Para crear un resumen del modelo ajustado:
summary(alli.mod1)
##
## Call:
## lm(formula = lnWeight ~ lnLength, data = alligator)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.24348 -0.03186 0.03740 0.07727 0.12669
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.4761 0.5007 -16.93 3.08e-10 ***
## lnLength 3.4311 0.1330 25.80 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1229 on 13 degrees of freedom
## Multiple R-squared: 0.9808, Adjusted R-squared: 0.9794
## F-statistic: 665.8 on 1 and 13 DF, p-value: 1.495e-12
Aquí obtenemos mucha información útil.
Las estimaciones para la intercepción del modelo son -8.4761 y el coeficiente que mide la pendiente de la relación con la longitud del respiradero del hocico es 3.4311 y la información sobre los errores est?ndar de estas estimaciones también se proporciona en la tabla de Coeficientes. Vemos que la prueba de significancia de los coeficientes del modelo tambi?n se resume en esa tabla, por lo que podemos ver que hay pruebas sálidas de que el coeficiente es significativamente diferente de cero, a medida que aumenta la longitud del respiradero del hocico, también lo hace el peso.
En lugar de detenernos aquí, realizamos algunas investigaciones utilizando diagnósticos residuales para determinar si los diversos supuestos que sustentan la regresión lineal son razonables para nuestros datos o si hay evidencia que sugiere que se requieren variables adicionales en el modelo o algunas otras alteraciones para identificar una mejor descripción de las variables que determinan cómo cambia el peso.
Se usa un gráfico de los residuos frente a los valores ajustados para determinar si existen patrones sistem?ticos, como la sobreestimación para la mayoría de los valores grandes o el aumento de la dispersión a medida que aumentan los valores ajustados del modelo. Para crear este gráfico, podríamos usar el siguiente código:
xyplot(resid(alli.mod1) ~ fitted(alli.mod1),
xlab = "Fitted Values",
ylab = "Residuals",
main = "Residual Diagnostic Plot",
panel = function(x, y, ...)
{
panel.grid(h = -1, v = -1)
panel.abline(h = 0)
panel.xyplot(x, y, ...)
}
)
Creamos nuestra propia funci?n de panel personalizado utilizando los bloques de construcci?n provistos por el paquete de celosía. Comenzamos creando un conjunto de líneas de cuadrícula como la capa base y h = -1 y v = -1 indican celosía para alinearlas con las etiquetas de los ejes. Luego creamos una línea horizontal sálida para ayudar a distinguir entre residuos positivos y negativos. Finalmente obtenemos los puntos trazados en la capa superior.
Probablemente la trama está bien, pero hay más casos de residuos positivos y cuando consideramos una gráfica de probabilidad normal, vemos que hay algunas deficiencias con el modelo:
qqmath( ~ resid(alli.mod1),
xlab = "Theoretical Quantiles",
ylab = "Residuals"
)
La función resid extrae los residuos del modelo del objeto modelo ajustado.
Los datos de vivienda de Boston son un conjunto de datos en el paquete MASS. El conjunto de datos tiene 506 filas y 14 columnas. Se análisiza y evaluan los factores que afectan el valor medio de las viviendas ocupadas por sus propietarios en los suburbios de Boston; los factores incluyen variables sobre la calidad estructural, el vecindario, la accesibilidad y la contaminación del aire, como la tasa de criminalidad per cápita por ciudad, la proporción de acres comerciales no minoristas por ciudad , índice de accesibilidad a las carreteras radiales, etc.
library(MASS)
library(ggplot2)
attach(Boston)
names(Boston)
## [1] "crim" "zn" "indus" "chas" "nox" "rm" "age"
## [8] "dis" "rad" "tax" "ptratio" "black" "lstat" "medv"
##Sample the dataset. The return for this is row nos.
set.seed(1)
row.number <- sample(1:nrow(Boston), 0.8*nrow(Boston))
train = Boston[row.number,]
test = Boston[-row.number,]
dim(train)
## [1] 404 14
dim(test)
## [1] 102 14
##Explore the data.
ggplot(Boston, aes(medv)) + geom_density(fill="blue")
ggplot(train, aes(log(medv))) + geom_density(fill="blue")
ggplot(train, aes(sqrt(medv))) + geom_density(fill="blue")
#Let's make default model.
model1 = lm(log(medv)~., data=train)
summary(model1)
##
## Call:
## lm(formula = log(medv) ~ ., data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.73932 -0.09713 -0.01923 0.08883 0.86529
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.133e+00 2.370e-01 17.438 < 2e-16 ***
## crim -1.166e-02 1.636e-03 -7.123 5.14e-12 ***
## zn 1.116e-03 6.129e-04 1.821 0.06941 .
## indus 2.134e-03 2.718e-03 0.785 0.43286
## chas 1.084e-01 3.797e-02 2.854 0.00454 **
## nox -7.142e-01 1.727e-01 -4.135 4.35e-05 ***
## rm 8.303e-02 1.907e-02 4.353 1.72e-05 ***
## age -9.102e-05 5.898e-04 -0.154 0.87743
## dis -5.104e-02 9.132e-03 -5.589 4.29e-08 ***
## rad 1.645e-02 2.885e-03 5.700 2.36e-08 ***
## tax -7.018e-04 1.624e-04 -4.322 1.96e-05 ***
## ptratio -3.593e-02 6.048e-03 -5.941 6.29e-09 ***
## black 4.138e-04 1.201e-04 3.447 0.00063 ***
## lstat -2.957e-02 2.238e-03 -13.213 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1921 on 390 degrees of freedom
## Multiple R-squared: 0.7914, Adjusted R-squared: 0.7844
## F-statistic: 113.8 on 13 and 390 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(model1)
# remove the less significant feature
model2 = update(model1, ~.-zn-indus-age)
summary(model2)
##
## Call:
## lm(formula = log(medv) ~ crim + chas + nox + rm + dis + rad +
## tax + ptratio + black + lstat, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.73727 -0.10583 -0.02177 0.09436 0.86896
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.1511772 0.2361239 17.581 < 2e-16 ***
## crim -0.0114749 0.0016304 -7.038 8.74e-12 ***
## chas 0.1098383 0.0377278 2.911 0.003804 **
## nox -0.7160222 0.1599660 -4.476 9.97e-06 ***
## rm 0.0854763 0.0184393 4.636 4.85e-06 ***
## dis -0.0450161 0.0073599 -6.116 2.31e-09 ***
## rad 0.0156919 0.0027803 5.644 3.19e-08 ***
## tax -0.0006071 0.0001455 -4.171 3.74e-05 ***
## ptratio -0.0390424 0.0056372 -6.926 1.78e-11 ***
## black 0.0004127 0.0001198 3.445 0.000632 ***
## lstat -0.0294784 0.0021172 -13.923 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1923 on 393 degrees of freedom
## Multiple R-squared: 0.7894, Adjusted R-squared: 0.784
## F-statistic: 147.3 on 10 and 393 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(model2)
##Plot the residual plot with all predictors.
attach(train)
## The following objects are masked from Boston:
##
## age, black, chas, crim, dis, indus, lstat, medv, nox, ptratio, rad,
## rm, tax, zn
require(gridExtra)
## Loading required package: gridExtra
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
plot1 = ggplot(train, aes(crim, residuals(model2))) + geom_point() + geom_smooth()
plot2=ggplot(train, aes(chas, residuals(model2))) + geom_point() + geom_smooth()
plot3=ggplot(train, aes(nox, residuals(model2))) + geom_point() + geom_smooth()
plot4=ggplot(train, aes(rm, residuals(model2))) + geom_point() + geom_smooth()
plot5=ggplot(train, aes(dis, residuals(model2))) + geom_point() + geom_smooth()
plot6=ggplot(train, aes(rad, residuals(model2))) + geom_point() + geom_smooth()
plot7=ggplot(train, aes(tax, residuals(model2))) + geom_point() + geom_smooth()
plot8=ggplot(train, aes(ptratio, residuals(model2))) + geom_point() + geom_smooth()
plot9=ggplot(train, aes(black, residuals(model2))) + geom_point() + geom_smooth()
plot10=ggplot(train, aes(lstat, residuals(model2))) + geom_point() + geom_smooth()
grid.arrange(plot1,plot2,plot3,plot4,plot5,plot6,plot7,plot8,plot9,plot10,ncol=2,nrow=5)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : at -0.005
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : radius 2.5e-05
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : all data on boundary of neighborhood. make span bigger
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at -0.005
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 0.005
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 1
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.01
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : zero-width neighborhood. make span bigger
## Warning: Computation failed in `stat_smooth()`:
## NA/NaN/Inf en llamada a una función externa (arg 5)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
#Lets make default model and add square term in the model.
model3 = lm(log(medv)~crim+chas+nox+rm+dis+rad+tax+ptratio+
black+lstat+ I(crim^2)+ I(chas^2)+I(nox^2)+ I(rm^2)+ I(dis^2)+
I(rad^2)+ I(tax^2)+ I(ptratio^2)+ I(black^2)+ I(lstat^2), data=train)
summary(model3)
##
## Call:
## lm(formula = log(medv) ~ crim + chas + nox + rm + dis + rad +
## tax + ptratio + black + lstat + I(crim^2) + I(chas^2) + I(nox^2) +
## I(rm^2) + I(dis^2) + I(rad^2) + I(tax^2) + I(ptratio^2) +
## I(black^2) + I(lstat^2), data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58373 -0.08611 -0.01228 0.08528 0.77344
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.273e+00 8.749e-01 9.456 < 2e-16 ***
## crim -3.291e-02 4.505e-03 -7.306 1.61e-12 ***
## chas 1.124e-01 3.223e-02 3.487 0.000546 ***
## nox -6.286e-01 1.074e+00 -0.585 0.558693
## rm -8.026e-01 1.324e-01 -6.063 3.20e-09 ***
## dis -1.202e-01 2.452e-02 -4.900 1.41e-06 ***
## rad 1.628e-02 9.436e-03 1.726 0.085217 .
## tax -3.393e-04 5.300e-04 -0.640 0.522477
## ptratio -1.592e-01 7.163e-02 -2.222 0.026843 *
## black 1.314e-03 5.115e-04 2.568 0.010594 *
## lstat -5.419e-02 5.487e-03 -9.876 < 2e-16 ***
## I(crim^2) 2.961e-04 6.690e-05 4.426 1.25e-05 ***
## I(chas^2) NA NA NA NA
## I(nox^2) -2.450e-01 8.002e-01 -0.306 0.759664
## I(rm^2) 6.752e-02 1.036e-02 6.520 2.22e-10 ***
## I(dis^2) 6.899e-03 1.936e-03 3.564 0.000411 ***
## I(rad^2) 2.739e-04 3.730e-04 0.734 0.463258
## I(tax^2) -4.613e-07 6.474e-07 -0.712 0.476601
## I(ptratio^2) 3.751e-03 2.040e-03 1.839 0.066742 .
## I(black^2) -2.355e-06 1.129e-06 -2.085 0.037695 *
## I(lstat^2) 7.380e-04 1.520e-04 4.854 1.77e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1627 on 384 degrees of freedom
## Multiple R-squared: 0.8526, Adjusted R-squared: 0.8453
## F-statistic: 116.9 on 19 and 384 DF, p-value: < 2.2e-16
##Removing the insignificant variables.
model4=update(model3, ~.-nox-rad-tax-I(crim^2)-I(chas^2)-I(rad^2)-
I(tax^2)-I(ptratio^2)-I(black^2))
summary(model4)
##
## Call:
## lm(formula = log(medv) ~ crim + chas + rm + dis + ptratio + black +
## lstat + I(nox^2) + I(rm^2) + I(dis^2) + I(lstat^2), data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.75555 -0.08920 -0.00584 0.08572 0.83906
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.8984602 0.4469706 15.434 < 2e-16 ***
## crim -0.0113984 0.0013782 -8.271 2.10e-15 ***
## chas 0.1335828 0.0340957 3.918 0.000105 ***
## rm -0.9135506 0.1388913 -6.577 1.53e-10 ***
## dis -0.0771922 0.0230393 -3.350 0.000885 ***
## ptratio -0.0210271 0.0049197 -4.274 2.41e-05 ***
## black 0.0002769 0.0001078 2.568 0.010585 *
## lstat -0.0506485 0.0056777 -8.921 < 2e-16 ***
## I(nox^2) -0.5290802 0.1127763 -4.691 3.75e-06 ***
## I(rm^2) 0.0778438 0.0108068 7.203 3.03e-12 ***
## I(dis^2) 0.0038669 0.0019568 1.976 0.048840 *
## I(lstat^2) 0.0005754 0.0001559 3.691 0.000255 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1747 on 392 degrees of freedom
## Multiple R-squared: 0.8266, Adjusted R-squared: 0.8217
## F-statistic: 169.9 on 11 and 392 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(model4)
pred1 <- predict(model4, newdata = test)
rmse <- sqrt(sum((exp(pred1) - test$medv)^2)/length(test$medv))
c(RMSE = rmse, R2=summary(model4)$r.squared)
## RMSE R2
## 4.8235100 0.8265999
par(mfrow=c(1,1))
plot(test$medv, exp(pred1))
El ejemplo muestra cómo abordar el modelado de regresión lineal. El modelo que se crea aún tiene margen de mejora ya que podemos aplicar técnicas como detección de valores atípicos, detección de correlación para mejorar aún más la precisión de predicciones más precisas. También se puede utilizar una técnica avanzada, como la técnica Random Forest y Boosting, para comprobar si la precisión puede mejorarse aún más para el modelo. Una advertencia es que debemos abstenernos de sobreajustar el modelo de datos de entrenamiento ya que la precisión de la prueba del modelo se reducir para los datos de prueba en caso de sobreajuste.
library(statsr)
datos <- data.frame(x1 = rnorm(10), x2 = rnorm(10), y = rnorm(10))
En estadística, la regresión lineal o ajuste lineal es un modelo matemático usado para aproximar la relación entre una variable explicada \(\vec{y}\) y, \(p\) variables independientes \(\vec{x}_{i}\) con \(p{\in}{Z}^{p}\) y un término aleatorio \(\vec{\varepsilon}\). Este método es aplicable en muchas situaciones en las que se estudia la relación entre dos o más variables o predecir un comportamiento, algunas incluso sin relación con la tecnología. En caso de que no se pueda aplicar un modelo de regresión a un estudio, se dice que no hay correlación entre las variables estudiadas.
library(scatterplot3d)
plot3d <- scatterplot3d(datos$x1,datos$x2,datos$y,
angle = 55, scale.y=0.7, pch=16, color ="red", main ="Regression Plane")
my.lm <- lm(y ~ x1 + x2, data=datos)
plot3d$plane3d(my.lm, lty.box = "solid")
Dado el modelo de regresión planteado de forma general
\[\vec{y} = \boldsymbol{X}\vec{\beta} + \vec{\varepsilon}\]
Se tiene, al despejar, que los errores son iguales a la diferencia
\[\vec{\varepsilon} = \vec{y} - \boldsymbol{X}\vec{\beta}\]
Y entonces la idea, para encontrar la recta de mejor ajuste es minimizar la suma de los errores o distancias verticales de las observaciones a la recta ajustada, por lo que la expresión a minimizar es:
\[ \begin{align} S(\widehat{\vec{\beta}}) &= {\vec{\varepsilon}}^{t}\vec{\varepsilon}\\ &= \left({\vec{y} - \boldsymbol{X}\widehat{\vec{\beta}}}\right)^{t}\left({\vec{y} - \boldsymbol{X}\widehat{\vec{\beta}}}\right)\\ &= {\vec{y}^{t}\left({\vec{y} - \boldsymbol{X}\widehat{\vec{\beta}}}\right) - \left(\boldsymbol{X}\widehat{\vec{\beta}}\right)}^{t}\left({\vec{y} + \boldsymbol{X}\widehat{\vec{\beta}}}\right)\\ &= {\vec{y}^{t}{\vec{y} - \vec{y}^{t}\boldsymbol{X}\widehat{\vec{\beta}}} - \left(\boldsymbol{X}\widehat{\vec{\beta}}\right)}^{t}{\vec{y} + \left(\boldsymbol{X}\widehat{\vec{\beta}}\right)^{t}\boldsymbol{X}\widehat{\vec{\beta}}}\\ &= {\vec{y}^{t}{\vec{y} - \vec{y}^{t}\boldsymbol{X}\widehat{\vec{\beta}}} - \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}}{\vec{y} + \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\boldsymbol{X}\widehat{\vec{\beta}}}\\ &= {\vec{y}^{t}{\vec{y} - \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\vec{y}} - \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}}{\vec{y} + \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\boldsymbol{X}\widehat{\vec{\beta}}}\\ &= {\vec{y}^{t}{\vec{y}} - 2\widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}}{\vec{y} + \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\boldsymbol{X}\widehat{\vec{\beta}}} \end{align} \]
\[ \begin{align} \frac{{\partial}}{{\partial}\vec{\beta}}S(\vec{\beta}) &= \frac{{\partial}}{{\partial}\vec{\beta}}\left({\vec{y}^{t}{\vec{y}} - 2\vec{\beta}^{t}\boldsymbol{X}^{t}{\vec{y} - \vec{\beta}^{t}\boldsymbol{X}^{t}\boldsymbol{X}\vec{\beta}}}\right)\\ &= {- 2\boldsymbol{X}^{t}}{\vec{y} + 2\boldsymbol{X}^{t}\boldsymbol{X}\vec{\beta}} \end{align} \]
Igualando a cero se obtiene lo siguiente:
\[ \begin{align} \frac{{\partial}}{{\partial}\vec{\beta}}S(\vec{\beta}) = 0 &\rightarrow {2\boldsymbol{X}^{t}}{\vec{y} = 2\boldsymbol{X}^{t}\boldsymbol{X}\vec{\beta}}\\ &\rightarrow {\boldsymbol{X}^{t}}{\vec{y} = \boldsymbol{X}^{t}\boldsymbol{X}\vec{\beta}} \end{align} \]
Nota:
Esto admite solución si:
\[rank(\boldsymbol{X}^{t}\boldsymbol{X}|\boldsymbol{X}^{t}\vec{y})=rank(\boldsymbol{X}^{t}\boldsymbol{X})\]
Esto es así dado que:
\(rank(\boldsymbol{X}^{t}\boldsymbol{X}|\boldsymbol{X}^{t}\vec{y}){\geq}rank(\boldsymbol{X}^{t}\boldsymbol{X})\)
\(rank(\boldsymbol{X}^{t}\boldsymbol{X}|\boldsymbol{X}^{t}\vec{y})=rank\left[\boldsymbol{X}^{t}(\boldsymbol{X}|\vec{y})\right]{\leq}rank(\boldsymbol{X}){=}rank(\boldsymbol{X}^{t}\boldsymbol{X})\)
\[ \begin{align} \vec{y} = \begin{bmatrix}{y}_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n}\\ \end{bmatrix} &{\sim} N\left( \vec{\mu} = \begin{bmatrix} {\mu}_{1}\\ {\mu}_{2}\\ \vdots\\ {\mu}_{n}\\ \end{bmatrix},\boldsymbol{{\Sigma}} = \begin{bmatrix} {\sigma}_{1}^{2} & 0 & \cdots & 0\\ 0 & {\sigma}_{2}^{2} & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & {\sigma}_{n}^{2} \end{bmatrix} \right) \end{align} \]
\[ \begin{align} f\left(\vec{y},\vec{\mu},\boldsymbol{\Sigma}\right) &= \frac{1}{\left({2{\pi}}\right)^{2}\left|{\boldsymbol{\Sigma}}\right|^{2}}{\exp{\left\{-\frac{1}{2}\left({\vec{y}-\vec{\mu}}\right)^{2}{\boldsymbol{\Sigma}}^{-1}\left({\vec{y}-\vec{\mu}}\right)\right\}}} \end{align} \]
\[ \begin{align} l\left(\vec{y},\vec{\mu},\boldsymbol{\Sigma}\right) &= \frac{1}{\left({2{\pi}}\right)^{2}\left|{\boldsymbol{\Sigma}}\right|^{2}}{\exp{\left\{-\frac{1}{2}\left({\vec{y}-\vec{\mu}}\right)^{2}{\boldsymbol{\Sigma}}^{-1}\left({\vec{y}-\vec{\mu}}\right)\right\}}} \end{align} \]
Para el modelo de regresión lineal múltiple, el mejor estimador lineal insesgado (BLUE) de una función lineal parametrica estimable \({\vec{\lambda}}^{t}\vec{\beta}\), es \({\vec{\lambda}}^{t}\widehat{\vec{\beta}}\), donde \(\widehat{\vec{\beta}}\) es una solución de las ecuaciones normales.
\[ \boldsymbol{X}^{t}\boldsymbol{X}\widehat{\vec{\beta}}=\boldsymbol{X}^{t}\vec{y} \]
\[ \begin{align} E\left[{\vec{\lambda}}^{t}\widehat{\vec{\beta}}\right] &= E\left[{\vec{\lambda}}^{t}\boldsymbol{G}\boldsymbol{X}^{t}\widehat{\vec{y}}\right]\\ &= {\vec{\lambda}}^{t}\boldsymbol{G}\boldsymbol{X}^{t}E\left[\widehat{\vec{y}}\right]\\ &= {\vec{\lambda}}^{t}\boldsymbol{G}\boldsymbol{X}^{t}\boldsymbol{X}\widehat{\vec{\beta}}\\ &= {\vec{\lambda}}^{t}\boldsymbol{H}\widehat{\vec{\beta}}\\ &= {\vec{\lambda}}^{t}\widehat{\vec{\beta}} \end{align} \]
Condición de estimabilidad \({\vec{\lambda}}^{t}\boldsymbol{H} = {\vec{\lambda}}^{t}\)
\[ \begin{align} E\left[{\vec{d}}^{t}\vec{y}\right] &= {\vec{\lambda}}^{t}\vec{\beta}\\ {\vec{d}}^{t}E\left[\vec{y}\right] &= {\vec{\lambda}}^{t}\vec{\beta}\\ {\vec{d}}^{t}\boldsymbol{X}\vec{\beta} &= {\vec{\lambda}}^{t}\vec{\beta} \end{align} \]
Lo anterior implica \({\vec{d}}^{t}\boldsymbol{X} = {\vec{\lambda}}^{t}\)
\[ \begin{align} V\left[{\vec{\lambda}}^{t}\widehat{\vec{\beta}}\right] &= {\vec{\lambda}}^{t}V\left[\widehat{\vec{\beta}}\right]{\vec{\lambda}}\\ &= {\vec{\lambda}}^{t}V\left[\boldsymbol{G}\boldsymbol{X}^{t}{\vec{y}}\right]{\vec{\lambda}}\\ &= {\vec{\lambda}}^{t}\boldsymbol{G}\boldsymbol{X}^{t}V\left[{\vec{y}}\right]\boldsymbol{X}\boldsymbol{G}^{t}{\vec{\lambda}}\\ &= {\vec{\lambda}}^{t}\boldsymbol{G}\boldsymbol{X}^{t}\boldsymbol{X}\boldsymbol{G}^{t}{\vec{\lambda}}\sigma^{2}\\ &= {\vec{\lambda}}^{t}\boldsymbol{G}\boldsymbol{H}^{t}{\vec{\lambda}}\sigma^{2}\\ &= {\vec{\lambda}}^{t}\boldsymbol{G}{\vec{\lambda}}\sigma^{2} \end{align} \]
\[ \begin{align} C\left[{\vec{\lambda}}^{t}\widehat{\vec{\beta}},{\vec{d}}^{t}\widehat{\vec{\beta}}\right] &= C\left[{\vec{\lambda}}^{t}\boldsymbol{G}\boldsymbol{X}^{t}\widehat{\vec{y}},{\vec{d}}^{t}\widehat{\vec{\beta}}\right]\\ &= {\vec{\lambda}}^{t}\boldsymbol{G}\boldsymbol{X}^{t}{\vec{d}}\sigma^{2}\\ &= {\vec{\lambda}}^{t}\boldsymbol{G}{\vec{\lambda}}\sigma^{2} \end{align} \]
\[ \begin{align} V\left[{\vec{\lambda}}^{t}\widehat{\vec{\beta}}-{\vec{d}}^{t}{y}\right] &= V\left[{\vec{\lambda}}^{t}\boldsymbol{G}\boldsymbol{X}^{t}\widehat{\vec{y}}\right]-2C\left[{\vec{\lambda}}^{t}\boldsymbol{G}\boldsymbol{X}^{t}\widehat{\vec{y}},{\vec{d}}^{t}\widehat{\vec{\beta}}\right]+V\left[{\vec{d}}^{t}{y}\right]\\ &= {\vec{\lambda}}^{t}\boldsymbol{G}{\vec{\lambda}}\sigma^{2}-2{\vec{\lambda}}^{t}\boldsymbol{G}{\vec{\lambda}}\sigma^{2}+V\left[{\vec{d}}^{t}{y}\right]\\ &= -{\vec{\lambda}}^{t}\boldsymbol{G}{\vec{\lambda}}\sigma^{2}+V\left[{\vec{d}}^{t}{y}\right]\\ \end{align} \]
\[ \begin{align} -{\vec{\lambda}}^{t}\boldsymbol{G}{\vec{\lambda}}\sigma^{2}+V\left[{\vec{d}}^{t}{y}\right] &\geq 0\\ V\left[{\vec{d}}^{t}{y}\right] &\geq {\vec{\lambda}}^{t}\boldsymbol{G}{\vec{\lambda}}\sigma^{2}\\ V\left[{\vec{d}}^{t}{y}\right] &\geq V\left[{\vec{\lambda}}^{t}\widehat{\vec{\beta}}\right] \end{align} \]
Luego, como conclusión
\[{\vec{d}}^{t}{y} = {\vec{\lambda}}^{t}\widehat{\vec{\beta}}\]
\[ \begin{align} SC_E &= \left(\vec{y} - \boldsymbol{X}\widehat{\vec{\beta}}\right)^{t}\left(\vec{y} - \boldsymbol{X}\widehat{\vec{\beta}}\right)\\ &= \left(\vec{y} - \boldsymbol{X}\widehat{\vec{\beta}}\right)^{t}\vec{y} - \left(\vec{y} - \boldsymbol{X}\widehat{\vec{\beta}}\right)^{t}\boldsymbol{X}\widehat{\vec{\beta}}\\ &= \left(\vec{y}^{t} - \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\right)\vec{y} - \left(\vec{y}^{t} - \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\right)\boldsymbol{X}\widehat{\vec{\beta}}\\ &= \vec{y}^{t}\vec{y} - \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\vec{y} - \vec{y}^{t}\boldsymbol{X}\widehat{\vec{\beta}} + \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\boldsymbol{X}\widehat{\vec{\beta}}\\ &= \vec{y}^{t}\vec{y} - \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\vec{y} - \left(\vec{y}^{t}\boldsymbol{X}\widehat{\vec{\beta}}\right)^{t} + \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\vec{y}\\ &= \vec{y}^{t}\vec{y} - \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\vec{y} - \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\vec{y} + \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\vec{y}\\ &= \vec{y}^{t}\vec{y} - 2\widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\vec{y} + \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\vec{y}\\ &= \vec{y}^{t}\vec{y} - \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\vec{y} \end{align} \]
A \(\widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\vec{y}\) se le llama Suma de Cuadrados de la Regresión (o Modelo) - \(SC_R\) y también se representa por \({SC}\left[{\beta}_{1},{\beta}_{2},\ldots,{\beta}_{p}\right]\)
A \(\vec{y}^{t}\vec{y} = {\sum}_{i=1}^{n}{y}_{i}^{2}\) se le llama Suma de Cuadrados totales - \(SC_T\), luego en resumen se ha llegado a:
\[ SC_T = SC_R +SC_E \]
\[ \begin{align} E\left[SC_R\right] &= E\left[SC_T - SC_E\right]\\ &= E\left[SC_T\right] - E\left[SC_E\right]\\ &= E\left[{\sum}_{i=1}^{n}{y}_{i}^{2}\right] - E\left[\widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\vec{y}\right]\\ &= {\sum}_{i=1}^{n}\left\{V\left[{y}_{i}\right] + E\left[{y}_{i}\right]^{2}\right\} - \left(n-r\right){\sigma}^{2}\\ &= {\sum}_{i=1}^{n}V\left[{y}_{i}\right] + {\sum}_{i=1}^{n}E\left[{y}_{i}\right]^{2} - \left(n-r\right){\sigma}^{2}\\ &= n{\sigma}^{2} + E\left[\vec{y}\right]^{t}E\left[\vec{y}\right] - \left(n-r\right){\sigma}^{2}\\ &= \left[n - \left(n - r\right)\right]{\sigma}^{2} + \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t} \boldsymbol{X}\widehat{\vec{\beta}}\\ &= \left[n - n + r\right]{\sigma}^{2} + \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t} \boldsymbol{X}\widehat{\vec{\beta}}\\ &= r{\sigma}^{2} + \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t} \boldsymbol{X}\widehat{\vec{\beta}}\\ \end{align} \]
Fuente | gl | Suma de cuadrados - SC | Cuadrado medio - CM | \(E[CM]\) |
---|---|---|---|---|
Modelo | r | \(SC_R=\widehat{\vec{\beta}}^{t}\boldsymbol{X}\vec{y}\) | \(CM_R=\frac{SCR}{r}\) | \(E[CM_R]={\sigma}^{2} + \widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t}\boldsymbol{X}\widehat{\vec{\beta}}\) |
Error | n-r | \(SC_E=\vec{y}^{t}\vec{y} - \widehat{\vec{\beta}}^{t}\boldsymbol{X}\vec{y}\) | \(CM_E=\frac{SC_E}{n-r}\) | \(E[CM_E]={\sigma}^{2}\) |
Total | n | \(SC_T=\vec{y}^{t}\vec{y}\) | \(CM_T=\frac{SC_T}{n}\) |
Luego \(E[CM_R] {\geq} E[CM_E]\) con igualdad solo sí \(\widehat{\vec{\beta}}^{t}\boldsymbol{X}^{t} \boldsymbol{X}\widehat{\vec{\beta}} = 0\) o equivalentemente \(\boldsymbol{X}\widehat{\vec{\beta}} = 0\)
\[y_i=\beta_0 + \beta_1{\times}x_{i,1} + \beta_2{\times}x_{i,2} + \cdots + \beta_p{\times}x_{i,p} + error_{i}\]
\[error_{i}{\sim}N(0,\sigma_{error}^2)\]
\[\boldsymbol{y}_{(n{\times}1)}=\boldsymbol{X}_{(n{\times}p)}\boldsymbol{\beta}_{(p{\times}1)}+\boldsymbol{error}_{(n{\times}1)}\]
\[\boldsymbol{error}_{(n{\times}1)}{\sim}N(\boldsymbol{0}_{(n{\times}1)},\sigma_{error}^2\boldsymbol{I}_{(n{\times}1)})\]
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
library(broom)
library(tidyverse)
library(ggfortify)
library(mosaic)
## Registered S3 method overwritten by 'mosaic':
## method from
## fortify.SpatialPolygonsDataFrame ggplot2
##
## The 'mosaic' package masks several functions from core packages in order to add
## additional features. The original behavior of these functions should not be affected by this.
##
## Attaching package: 'mosaic'
## The following objects are masked from 'package:car':
##
## deltaMethod, logit
## The following objects are masked from 'package:dplyr':
##
## count, do, tally
## The following object is masked from 'package:purrr':
##
## cross
## The following object is masked from 'package:ggplot2':
##
## stat
## The following object is masked from 'package:BayesFactor':
##
## compare
## The following object is masked from 'package:Matrix':
##
## mean
## The following objects are masked from 'package:stats':
##
## binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
## quantile, sd, t.test, var
## The following objects are masked from 'package:base':
##
## max, mean, min, prod, range, sample, sum
library(huxtable)
##
## Attaching package: 'huxtable'
## The following object is masked from 'package:dplyr':
##
## add_rownames
## The following object is masked from 'package:ggplot2':
##
## theme_grey
library(jtools)
library(latex2exp)
library(pubh)
## Loading required package: emmeans
## Loading required package: gtsummary
##
## Attaching package: 'gtsummary'
## The following object is masked from 'package:huxtable':
##
## as_flextable
## The following object is masked from 'package:MASS':
##
## select
## Loading required package: magrittr
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
library(sjlabelled)
##
## Attaching package: 'sjlabelled'
## The following object is masked from 'package:huxtable':
##
## set_label
## The following object is masked from 'package:forcats':
##
## as_factor
## The following object is masked from 'package:dplyr':
##
## as_label
## The following object is masked from 'package:ggplot2':
##
## as_label
library(sjPlot)
##
## Attaching package: 'sjPlot'
## The following object is masked from 'package:huxtable':
##
## font_size
library(sjmisc)
##
## Attaching package: 'sjmisc'
## The following objects are masked from 'package:jtools':
##
## %nin%, center
## The following objects are masked from 'package:huxtable':
##
## add_columns, add_rows, print_html, print_md
## The following object is masked from 'package:purrr':
##
## is_empty
## The following object is masked from 'package:tidyr':
##
## replace_na
## The following object is masked from 'package:tibble':
##
## add_case
library(Ecdat)
## Loading required package: Ecfun
##
## Attaching package: 'Ecfun'
## The following object is masked from 'package:base':
##
## sign
##
## Attaching package: 'Ecdat'
## The following object is masked from 'package:carData':
##
## Mroz
## The following object is masked from 'package:MASS':
##
## SP500
## The following object is masked from 'package:datasets':
##
## Orange
data(birthwt, package = "MASS")
library(tidyverse)
birthwt <- birthwt %>%
mutate(
age = as.numeric(age),
lwt = as.numeric(lwt),
smoke = factor(smoke, labels = c("Non-smoker", "Smoker")),
race = factor(race, labels = c("White", "African American", "Other")),
bwt = as.numeric(bwt)
) %>%
var_labels(
bwt = 'Birth weight (g)',
smoke = 'Smoking status',
race = 'Race'
)
…
glimpse(birthwt)
## Rows: 189
## Columns: 10
## $ low <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ age <dbl> 19, 33, 20, 21, 18, 21, 22, 17, 29, 26, 19, 19, 22, 30, 18, 18, …
## $ lwt <dbl> 182, 155, 105, 108, 107, 124, 118, 103, 123, 113, 95, 150, 95, 1…
## $ race <fct> African American, Other, White, White, White, Other, White, Othe…
## $ smoke <fct> Non-smoker, Non-smoker, Smoker, Smoker, Smoker, Non-smoker, Non-…
## $ ptl <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0…
## $ ht <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ ui <int> 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1…
## $ ftv <int> 0, 3, 1, 2, 0, 0, 1, 1, 1, 0, 0, 1, 0, 2, 0, 0, 0, 3, 0, 1, 2, 3…
## $ bwt <dbl> 2523, 2551, 2557, 2594, 2600, 2622, 2637, 2637, 2663, 2665, 2722…
birthwt %>%
group_by(race, smoke) %>%
summarise(
n = n(),
Mean = mean(bwt, na.rm = TRUE),
Median = median(bwt, na.rm = TRUE),
SD = sd(bwt, na.rm = TRUE),
CV = rel_dis(bwt)
)
## `summarise()` has grouped output by 'race'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 7
## # Groups: race [3]
## race smoke n Mean Median SD CV
## <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 White Non-smoker 44 3429. 3593 710. 0.207
## 2 White Smoker 52 2827. 2776. 626. 0.222
## 3 African American Non-smoker 16 2854. 2920 621. 0.218
## 4 African American Smoker 10 2504 2381 637. 0.254
## 5 Other Non-smoker 55 2816. 2807 709. 0.252
## 6 Other Smoker 12 2757. 3146. 810. 0.294
birthwt %>%
gen_bst_df(bwt ~ race|smoke)
Birth weight (g) | LowerCI | UpperCI | Race | Smoking status |
---|---|---|---|---|
3.43e+03 | 3.22e+03 | 3.64e+03 | White | Non-smoker |
2.83e+03 | 2.66e+03 | 2.99e+03 | White | Smoker |
2.85e+03 | 2.56e+03 | 3.13e+03 | African American | Non-smoker |
2.5e+03 | 2.07e+03 | 2.87e+03 | African American | Smoker |
2.82e+03 | 2.63e+03 | 2.99e+03 | Other | Non-smoker |
2.76e+03 | 2.3e+03 | 3.15e+03 | Other | Smoker |
birthwt %>%
bar_error(bwt ~ race, fill = ~ smoke) %>%
axis_labs() %>%
gf_labs(fill = "Smoking status:")
library(PerformanceAnalytics)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
chart.Correlation(birthwt[,c(2,3,10)], histogram = TRUE, pch = 19)
sapply(birthwt, function(x) sum(is.na(x)))
## low age lwt race smoke ptl ht ui ftv bwt
## 0 0 0 0 0 0 0 0 0 0
cor.mtest <- function(mat, ...) {
mat <- as.matrix(mat)
n <- ncol(mat)
p.mat<- matrix(NA, n, n)
diag(p.mat) <- 0
for (i in 1:(n - 1)) {
for (j in (i + 1):n) {
tmp <- cor.test(mat[, i], mat[, j], ...)
p.mat[i, j] <- p.mat[j, i] <- tmp$p.value
}
}
colnames(p.mat) <- rownames(p.mat) <- colnames(mat)
p.mat
}
p.mat <- cor.mtest(birthwt[,c(2,3,10)])
library(corrplot)
## corrplot 0.92 loaded
##
## Attaching package: 'corrplot'
## The following object is masked _by_ '.GlobalEnv':
##
## cor.mtest
birthwt.cor <- cor(birthwt[,c(2,3,10)])
corrplot(birthwt.cor, method = "number", type = "upper",
tl.cex = 0.9, number.cex = 0.6, order="hclust", diag = FALSE,
addCoef.col = "black", tl.col = "black",
p.mat = p.mat, sig.level = 0.05, insig = "blank")
set.seed(0123456789)
library(dplyr)
birthwt.train <- sample_frac(tbl = birthwt, replace = FALSE, size = 0.80)
birthwt.test <- anti_join(birthwt, birthwt.train)
## Joining, by = c("low", "age", "lwt", "race", "smoke", "ptl", "ht", "ui", "ftv",
## "bwt")
model_norm <- lm(bwt ~ smoke + race, data = birthwt.train)
library(ggfortify)
autoplot(model_norm)
model_norm %>% augment() %>% as_tibble()
.rownames | bwt | smoke | race | .fitted | .resid | .hat | .sigma | .cooksd | .std.resid |
---|---|---|---|---|---|---|---|---|---|
212 | 3.94e+03 | Non-smoker | Other | 2.91e+03 | 1.03e+03 | 0.0208 | 611 | 0.0152 | 1.69 |
155 | 3.27e+03 | Non-smoker | Other | 2.91e+03 | 364 | 0.0208 | 617 | 0.00189 | 0.597 |
199 | 3.77e+03 | Non-smoker | Other | 2.91e+03 | 860 | 0.0208 | 613 | 0.0106 | 1.41 |
101 | 2.77e+03 | Smoker | White | 2.91e+03 | -139 | 0.019 | 617 | 0.000253 | -0.229 |
131 | 3.06e+03 | Non-smoker | White | 3.34e+03 | -276 | 0.0221 | 617 | 0.00116 | -0.454 |
132 | 3.06e+03 | Smoker | White | 2.91e+03 | 154 | 0.019 | 617 | 0.000307 | 0.252 |
180 | 3.57e+03 | Smoker | Other | 2.48e+03 | 1.09e+03 | 0.0394 | 611 | 0.0336 | 1.81 |
52 | 2.3e+03 | Non-smoker | Other | 2.91e+03 | -609 | 0.0208 | 615 | 0.00532 | -1 |
97 | 2.73e+03 | Non-smoker | Other | 2.91e+03 | -177 | 0.0208 | 617 | 0.000451 | -0.291 |
118 | 2.95e+03 | Smoker | White | 2.91e+03 | 39.6 | 0.019 | 618 | 2.04e-05 | 0.065 |
94 | 2.66e+03 | Smoker | White | 2.91e+03 | -245 | 0.019 | 617 | 0.000784 | -0.403 |
79 | 2.47e+03 | Smoker | White | 2.91e+03 | -442 | 0.019 | 616 | 0.00255 | -0.726 |
95 | 2.66e+03 | Smoker | White | 2.91e+03 | -243 | 0.019 | 617 | 0.000771 | -0.399 |
35 | 2.08e+03 | Smoker | White | 2.91e+03 | -824 | 0.019 | 614 | 0.00885 | -1.35 |
106 | 2.84e+03 | Non-smoker | Other | 2.91e+03 | -75.5 | 0.0208 | 617 | 8.15e-05 | -0.124 |
32 | 2.06e+03 | Non-smoker | Other | 2.91e+03 | -855 | 0.0208 | 613 | 0.0105 | -1.4 |
125 | 2.92e+03 | Smoker | White | 2.91e+03 | 13.6 | 0.019 | 618 | 2.41e-06 | 0.0223 |
145 | 3.2e+03 | Non-smoker | Other | 2.91e+03 | 293 | 0.0208 | 617 | 0.00122 | 0.48 |
24 | 1.89e+03 | Non-smoker | Other | 2.91e+03 | -1.02e+03 | 0.0208 | 612 | 0.0148 | -1.67 |
68 | 2.41e+03 | Smoker | White | 2.91e+03 | -494 | 0.019 | 616 | 0.00318 | -0.811 |
161 | 3.32e+03 | Non-smoker | African American | 3.02e+03 | 298 | 0.0506 | 617 | 0.0033 | 0.498 |
98 | 2.75e+03 | Non-smoker | Other | 2.91e+03 | -159 | 0.0208 | 617 | 0.000364 | -0.262 |
133 | 3.06e+03 | Smoker | White | 2.91e+03 | 154 | 0.019 | 617 | 0.000307 | 0.252 |
17 | 1.59e+03 | Non-smoker | Other | 2.91e+03 | -1.32e+03 | 0.0208 | 608 | 0.025 | -2.17 |
219 | 4.05e+03 | Non-smoker | White | 3.34e+03 | 716 | 0.0221 | 615 | 0.00783 | 1.18 |
205 | 3.86e+03 | Smoker | White | 2.91e+03 | 948 | 0.019 | 612 | 0.0117 | 1.55 |
71 | 2.44e+03 | Non-smoker | African American | 3.02e+03 | -581 | 0.0506 | 616 | 0.0125 | -0.968 |
83 | 2.5e+03 | Non-smoker | African American | 3.02e+03 | -524 | 0.0506 | 616 | 0.0102 | -0.873 |
211 | 3.94e+03 | Smoker | White | 2.91e+03 | 1.03e+03 | 0.019 | 611 | 0.0139 | 1.69 |
105 | 2.82e+03 | Smoker | White | 2.91e+03 | -87.4 | 0.019 | 617 | 9.94e-05 | -0.143 |
187 | 3.63e+03 | Smoker | White | 2.91e+03 | 721 | 0.019 | 615 | 0.00676 | 1.18 |
15 | 1.47e+03 | Non-smoker | Other | 2.91e+03 | -1.44e+03 | 0.0208 | 606 | 0.0295 | -2.36 |
78 | 2.47e+03 | Smoker | Other | 2.48e+03 | -14.7 | 0.0394 | 618 | 6.12e-06 | -0.0244 |
213 | 3.94e+03 | Non-smoker | White | 3.34e+03 | 603 | 0.0221 | 615 | 0.00555 | 0.991 |
197 | 3.76e+03 | Smoker | White | 2.91e+03 | 848 | 0.019 | 613 | 0.00936 | 1.39 |
139 | 3.1e+03 | Non-smoker | Other | 2.91e+03 | 194 | 0.0208 | 617 | 0.000536 | 0.318 |
136 | 3.09e+03 | Non-smoker | White | 3.34e+03 | -248 | 0.0221 | 617 | 0.00094 | -0.408 |
163 | 3.32e+03 | Smoker | Other | 2.48e+03 | 840 | 0.0394 | 613 | 0.0199 | 1.39 |
225 | 4.59e+03 | Non-smoker | White | 3.34e+03 | 1.25e+03 | 0.0221 | 609 | 0.0241 | 2.06 |
164 | 3.33e+03 | Smoker | Other | 2.48e+03 | 850 | 0.0394 | 613 | 0.0204 | 1.41 |
37 | 2.12e+03 | Smoker | Other | 2.48e+03 | -356 | 0.0394 | 617 | 0.00357 | -0.59 |
40 | 2.13e+03 | Smoker | African American | 2.59e+03 | -463 | 0.0561 | 616 | 0.00891 | -0.774 |
59 | 2.37e+03 | Smoker | African American | 2.59e+03 | -222 | 0.0561 | 617 | 0.00205 | -0.371 |
176 | 3.54e+03 | Non-smoker | Other | 2.91e+03 | 634 | 0.0208 | 615 | 0.00574 | 1.04 |
54 | 2.32e+03 | Non-smoker | Other | 2.91e+03 | -585 | 0.0208 | 616 | 0.0049 | -0.961 |
34 | 2.08e+03 | Smoker | White | 2.91e+03 | -824 | 0.019 | 614 | 0.00885 | -1.35 |
76 | 2.45e+03 | Non-smoker | Other | 2.91e+03 | -460 | 0.0208 | 616 | 0.00303 | -0.756 |
141 | 3.15e+03 | Smoker | White | 2.91e+03 | 239 | 0.019 | 617 | 0.000741 | 0.391 |
127 | 3.03e+03 | Smoker | White | 2.91e+03 | 125 | 0.019 | 617 | 0.000202 | 0.204 |
82 | 2.5e+03 | Smoker | Other | 2.48e+03 | 14.3 | 0.0394 | 618 | 5.73e-06 | 0.0236 |
203 | 3.8e+03 | Non-smoker | White | 3.34e+03 | 461 | 0.0221 | 616 | 0.00324 | 0.757 |
210 | 3.91e+03 | Non-smoker | White | 3.34e+03 | 574 | 0.0221 | 616 | 0.00503 | 0.943 |
19 | 1.73e+03 | Non-smoker | Other | 2.91e+03 | -1.18e+03 | 0.0208 | 610 | 0.02 | -1.94 |
49 | 2.28e+03 | Non-smoker | Other | 2.91e+03 | -628 | 0.0208 | 615 | 0.00565 | -1.03 |
57 | 2.35e+03 | Non-smoker | White | 3.34e+03 | -985 | 0.0221 | 612 | 0.0148 | -1.62 |
201 | 3.77e+03 | Non-smoker | Other | 2.91e+03 | 860 | 0.0208 | 613 | 0.0106 | 1.41 |
108 | 2.84e+03 | Non-smoker | White | 3.34e+03 | -502 | 0.0221 | 616 | 0.00385 | -0.825 |
88 | 2.59e+03 | Smoker | White | 2.91e+03 | -314 | 0.019 | 617 | 0.00129 | -0.516 |
51 | 2.3e+03 | Smoker | White | 2.91e+03 | -612 | 0.019 | 615 | 0.00488 | -1 |
22 | 1.82e+03 | Smoker | White | 2.91e+03 | -1.09e+03 | 0.019 | 611 | 0.0155 | -1.79 |
69 | 2.42e+03 | Smoker | White | 2.91e+03 | -484 | 0.019 | 616 | 0.00306 | -0.795 |
168 | 3.4e+03 | Non-smoker | African American | 3.02e+03 | 383 | 0.0506 | 617 | 0.00544 | 0.639 |
144 | 3.2e+03 | Smoker | Other | 2.48e+03 | 722 | 0.0394 | 615 | 0.0147 | 1.2 |
206 | 3.86e+03 | Non-smoker | African American | 3.02e+03 | 841 | 0.0506 | 613 | 0.0262 | 1.4 |
96 | 2.72e+03 | Non-smoker | Other | 2.91e+03 | -188 | 0.0208 | 617 | 0.000508 | -0.309 |
169 | 3.42e+03 | Non-smoker | White | 3.34e+03 | 77.9 | 0.0221 | 617 | 9.26e-05 | 0.128 |
217 | 4e+03 | Non-smoker | White | 3.34e+03 | 659 | 0.0221 | 615 | 0.00663 | 1.08 |
77 | 2.47e+03 | Smoker | White | 2.91e+03 | -442 | 0.019 | 616 | 0.00255 | -0.726 |
23 | 1.88e+03 | Smoker | White | 2.91e+03 | -1.02e+03 | 0.019 | 612 | 0.0136 | -1.68 |
193 | 3.65e+03 | Smoker | White | 2.91e+03 | 743 | 0.019 | 614 | 0.00718 | 1.22 |
195 | 3.7e+03 | Non-smoker | White | 3.34e+03 | 361 | 0.0221 | 617 | 0.00199 | 0.593 |
159 | 3.3e+03 | Smoker | Other | 2.48e+03 | 822 | 0.0394 | 614 | 0.0191 | 1.36 |
87 | 2.56e+03 | Smoker | White | 2.91e+03 | -351 | 0.019 | 617 | 0.00161 | -0.576 |
31 | 2.06e+03 | Non-smoker | Other | 2.91e+03 | -855 | 0.0208 | 613 | 0.0105 | -1.4 |
166 | 3.37e+03 | Non-smoker | African American | 3.02e+03 | 355 | 0.0506 | 617 | 0.00468 | 0.593 |
46 | 2.24e+03 | Non-smoker | Other | 2.91e+03 | -670 | 0.0208 | 615 | 0.00643 | -1.1 |
148 | 3.22e+03 | Non-smoker | Other | 2.91e+03 | 315 | 0.0208 | 617 | 0.00142 | 0.516 |
179 | 3.54e+03 | Non-smoker | Other | 2.91e+03 | 634 | 0.0208 | 615 | 0.00574 | 1.04 |
185 | 3.61e+03 | Non-smoker | White | 3.34e+03 | 276 | 0.0221 | 617 | 0.00116 | 0.453 |
86 | 2.55e+03 | Non-smoker | Other | 2.91e+03 | -359 | 0.0208 | 617 | 0.00185 | -0.59 |
191 | 3.65e+03 | Non-smoker | White | 3.34e+03 | 313 | 0.0221 | 617 | 0.0015 | 0.514 |
173 | 3.46e+03 | Non-smoker | White | 3.34e+03 | 121 | 0.0221 | 617 | 0.000223 | 0.199 |
113 | 2.91e+03 | Smoker | White | 2.91e+03 | -2.39 | 0.019 | 618 | 7.42e-08 | -0.00392 |
135 | 3.09e+03 | Non-smoker | Other | 2.91e+03 | 180 | 0.0208 | 617 | 0.000461 | 0.295 |
150 | 3.23e+03 | Non-smoker | Other | 2.91e+03 | 322 | 0.0208 | 617 | 0.00148 | 0.528 |
220 | 4.11e+03 | Non-smoker | White | 3.34e+03 | 773 | 0.0221 | 614 | 0.00912 | 1.27 |
112 | 2.88e+03 | Non-smoker | White | 3.34e+03 | -461 | 0.0221 | 616 | 0.00325 | -0.758 |
29 | 1.94e+03 | Smoker | White | 2.91e+03 | -972 | 0.019 | 612 | 0.0123 | -1.6 |
62 | 2.38e+03 | Non-smoker | Other | 2.91e+03 | -529 | 0.0208 | 616 | 0.00401 | -0.869 |
18 | 1.7e+03 | Non-smoker | African American | 3.02e+03 | -1.32e+03 | 0.0506 | 607 | 0.0643 | -2.2 |
25 | 1.9e+03 | Non-smoker | Other | 2.91e+03 | -1.01e+03 | 0.0208 | 612 | 0.0146 | -1.66 |
102 | 2.78e+03 | Non-smoker | African American | 3.02e+03 | -241 | 0.0506 | 617 | 0.00214 | -0.401 |
56 | 2.35e+03 | Smoker | White | 2.91e+03 | -555 | 0.019 | 616 | 0.00402 | -0.911 |
91 | 2.62e+03 | Non-smoker | Other | 2.91e+03 | -288 | 0.0208 | 617 | 0.00119 | -0.474 |
222 | 4.17e+03 | Non-smoker | White | 3.34e+03 | 829 | 0.0221 | 614 | 0.0105 | 1.36 |
99 | 2.75e+03 | Non-smoker | Other | 2.91e+03 | -160 | 0.0208 | 617 | 0.000368 | -0.264 |
190 | 3.65e+03 | Non-smoker | White | 3.34e+03 | 313 | 0.0221 | 617 | 0.0015 | 0.514 |
146 | 3.2e+03 | Non-smoker | Other | 2.91e+03 | 293 | 0.0208 | 617 | 0.00122 | 0.48 |
209 | 3.88e+03 | Smoker | White | 2.91e+03 | 976 | 0.019 | 612 | 0.0124 | 1.6 |
134 | 3.08e+03 | Non-smoker | White | 3.34e+03 | -258 | 0.0221 | 617 | 0.00102 | -0.424 |
184 | 3.61e+03 | Non-smoker | White | 3.34e+03 | 276 | 0.0221 | 617 | 0.00116 | 0.453 |
181 | 3.57e+03 | Non-smoker | Other | 2.91e+03 | 662 | 0.0208 | 615 | 0.00626 | 1.09 |
45 | 2.22e+03 | Smoker | White | 2.91e+03 | -683 | 0.019 | 615 | 0.00608 | -1.12 |
33 | 2.08e+03 | Non-smoker | White | 3.34e+03 | -1.26e+03 | 0.0221 | 609 | 0.0241 | -2.06 |
129 | 3.06e+03 | Non-smoker | White | 3.34e+03 | -276 | 0.0221 | 617 | 0.00116 | -0.454 |
111 | 2.88e+03 | Non-smoker | Other | 2.91e+03 | -33.5 | 0.0208 | 618 | 1.6e-05 | -0.055 |
114 | 2.92e+03 | Non-smoker | White | 3.34e+03 | -418 | 0.0221 | 617 | 0.00267 | -0.687 |
20 | 1.79e+03 | Smoker | White | 2.91e+03 | -1.12e+03 | 0.019 | 610 | 0.0163 | -1.83 |
27 | 1.93e+03 | Smoker | White | 2.91e+03 | -980 | 0.019 | 612 | 0.0125 | -1.61 |
151 | 3.23e+03 | Non-smoker | White | 3.34e+03 | -104 | 0.0221 | 617 | 0.000166 | -0.171 |
44 | 2.21e+03 | Smoker | Other | 2.48e+03 | -270 | 0.0394 | 617 | 0.00205 | -0.447 |
147 | 3.22e+03 | Non-smoker | Other | 2.91e+03 | 315 | 0.0208 | 617 | 0.00142 | 0.516 |
154 | 3.26e+03 | Smoker | Other | 2.48e+03 | 779 | 0.0394 | 614 | 0.0171 | 1.29 |
36 | 2.1e+03 | Non-smoker | White | 3.34e+03 | -1.24e+03 | 0.0221 | 609 | 0.0234 | -2.03 |
207 | 3.86e+03 | Non-smoker | White | 3.34e+03 | 522 | 0.0221 | 616 | 0.00416 | 0.858 |
119 | 2.95e+03 | Smoker | African American | 2.59e+03 | 359 | 0.0561 | 617 | 0.00537 | 0.601 |
221 | 4.15e+03 | Non-smoker | White | 3.34e+03 | 815 | 0.0221 | 614 | 0.0101 | 1.34 |
121 | 2.98e+03 | Non-smoker | African American | 3.02e+03 | -41.6 | 0.0506 | 618 | 6.42e-05 | -0.0694 |
89 | 2.6e+03 | Smoker | White | 2.91e+03 | -308 | 0.019 | 617 | 0.00124 | -0.506 |
61 | 2.38e+03 | Smoker | African American | 2.59e+03 | -208 | 0.0561 | 617 | 0.0018 | -0.348 |
126 | 3e+03 | Smoker | White | 2.91e+03 | 96.6 | 0.019 | 617 | 0.000122 | 0.158 |
138 | 3.1e+03 | Non-smoker | White | 3.34e+03 | -238 | 0.0221 | 617 | 0.000866 | -0.391 |
202 | 3.79e+03 | Non-smoker | African American | 3.02e+03 | 771 | 0.0506 | 614 | 0.022 | 1.29 |
115 | 2.92e+03 | Smoker | African American | 2.59e+03 | 331 | 0.0561 | 617 | 0.00456 | 0.554 |
162 | 3.32e+03 | Smoker | White | 2.91e+03 | 409 | 0.019 | 617 | 0.00217 | 0.67 |
123 | 2.98e+03 | Smoker | White | 2.91e+03 | 68.6 | 0.019 | 618 | 6.13e-05 | 0.113 |
223 | 4.17e+03 | Non-smoker | White | 3.34e+03 | 836 | 0.0221 | 614 | 0.0107 | 1.37 |
170 | 3.43e+03 | Smoker | White | 2.91e+03 | 522 | 0.019 | 616 | 0.00354 | 0.856 |
196 | 3.73e+03 | Non-smoker | White | 3.34e+03 | 390 | 0.0221 | 617 | 0.00232 | 0.641 |
50 | 2.3e+03 | Smoker | African American | 2.59e+03 | -293 | 0.0561 | 617 | 0.00357 | -0.49 |
175 | 3.47e+03 | Non-smoker | White | 3.34e+03 | 135 | 0.0221 | 617 | 0.000278 | 0.222 |
60 | 2.38e+03 | Smoker | African American | 2.59e+03 | -208 | 0.0561 | 617 | 0.0018 | -0.348 |
130 | 3.06e+03 | Non-smoker | African American | 3.02e+03 | 43.4 | 0.0506 | 618 | 6.97e-05 | 0.0723 |
85 | 2.52e+03 | Non-smoker | African American | 3.02e+03 | -496 | 0.0506 | 616 | 0.0091 | -0.827 |
93 | 2.64e+03 | Non-smoker | Other | 2.91e+03 | -273 | 0.0208 | 617 | 0.00107 | -0.449 |
200 | 3.77e+03 | Non-smoker | White | 3.34e+03 | 432 | 0.0221 | 616 | 0.00285 | 0.71 |
109 | 2.86e+03 | Non-smoker | Other | 2.91e+03 | -47.5 | 0.0208 | 618 | 3.23e-05 | -0.078 |
167 | 3.37e+03 | Smoker | White | 2.91e+03 | 466 | 0.019 | 616 | 0.00282 | 0.764 |
116 | 2.92e+03 | Non-smoker | African American | 3.02e+03 | -98.6 | 0.0506 | 617 | 0.00036 | -0.164 |
172 | 3.44e+03 | Smoker | African American | 2.59e+03 | 855 | 0.0561 | 613 | 0.0304 | 1.43 |
67 | 2.41e+03 | Smoker | White | 2.91e+03 | -498 | 0.019 | 616 | 0.00323 | -0.818 |
103 | 2.78e+03 | Smoker | White | 2.91e+03 | -126 | 0.019 | 617 | 0.000208 | -0.207 |
143 | 3.18e+03 | Non-smoker | Other | 2.91e+03 | 265 | 0.0208 | 617 | 0.001 | 0.434 |
65 | 2.41e+03 | Smoker | White | 2.91e+03 | -498 | 0.019 | 616 | 0.00323 | -0.818 |
156 | 3.27e+03 | Non-smoker | Other | 2.91e+03 | 364 | 0.0208 | 617 | 0.00189 | 0.597 |
188 | 3.64e+03 | Smoker | White | 2.91e+03 | 729 | 0.019 | 615 | 0.00691 | 1.2 |
204 | 3.83e+03 | Non-smoker | White | 3.34e+03 | 489 | 0.0221 | 616 | 0.00365 | 0.803 |
208 | 3.88e+03 | Non-smoker | Other | 2.91e+03 | 974 | 0.0208 | 612 | 0.0136 | 1.6 |
128 | 3.04e+03 | Smoker | African American | 2.59e+03 | 453 | 0.0561 | 616 | 0.00854 | 0.758 |
104 | 2.81e+03 | Non-smoker | Other | 2.91e+03 | -103 | 0.0208 | 617 | 0.000153 | -0.17 |
174 | 3.46e+03 | Non-smoker | White | 3.34e+03 | 122 | 0.0221 | 617 | 0.000227 | 0.2 |
model_norm %>% tidy()
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 3.34e+03 | 91.5 | 36.5 | 1.58e-75 |
smokeSmoker | -430 | 108 | -3.99 | 0.000104 |
raceAfrican American | -320 | 149 | -2.14 | 0.0341 |
raceOther | -428 | 117 | -3.65 | 0.000367 |
model_norm %>% confint() %>% as_tibble()
2.5 % | 97.5 % |
---|---|
3.16e+03 | 3.52e+03 |
-643 | -217 |
-615 | -24.3 |
-659 | -196 |
model_norm %>%
glm_coef(labels = model_labels(model_norm))
Parameter | Coefficient | Pr(>|t|) |
---|---|---|
Constant | 3338.12 (3157.2, 3519.04) | < 0.001 |
Smoking status: Smoker | -429.74 (-642.58, -216.9) | < 0.001 |
Race: African American | -319.5 (-614.66, -24.35) | 0.034 |
Race: Other | -427.65 (-659.34, -195.95) | < 0.001 |
model_norm %>%
glm_coef(se_rob = TRUE, labels = model_labels(model_norm))
Parameter | Coefficient | Pr(>|t|) |
---|---|---|
Constant | 3338.12 (3157.12, 3519.13) | < 0.001 |
Smoking status: Smoker | -429.74 (-644.83, -214.65) | < 0.001 |
Race: African American | -319.5 (-587.4, -51.61) | 0.02 |
Race: Other | -427.65 (-671.48, -183.81) | < 0.001 |
model_norm %>%
plot_model("pred", terms = ~race|smoke, dot.size = 1.5, title = "")
emmip(model_norm, smoke ~ race) %>%
gf_labs(y = get_label(birthwt$bwt), x = "", col = "Smoking status")
library(regclass)
## Loading required package: bestglm
## Loading required package: leaps
## Loading required package: VGAM
## Loading required package: stats4
## Loading required package: splines
##
## Attaching package: 'VGAM'
## The following objects are masked from 'package:mosaic':
##
## chisq, logit
## The following object is masked from 'package:car':
##
## logit
## The following object is masked from 'package:coda':
##
## nvar
## Loading required package: rpart
## Loading required package: randomForest
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:gridExtra':
##
## combine
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:ggplot2':
##
## margin
## Important regclass change from 1.3:
## All functions that had a . in the name now have an _
## all.correlations -> all_correlations, cor.demo -> cor_demo, etc.
##
## Attaching package: 'regclass'
## The following object is masked from 'package:lattice':
##
## qq
model_norm %>% VIF() %>% as_tibble()
GVIF | Df | GVIF^(1/(2*Df)) |
---|---|---|
1.12 | 1 | 1.06 |
1.12 | 2 | 1.03 |
p1 <- ggplot(birthwt.train, aes(birthwt.train[,2], residuals(model_norm))) +
geom_point() + geom_smooth(color = "blue")
p2 <- ggplot(birthwt.train, aes(birthwt.train[,3], residuals(model_norm))) +
geom_point() + geom_smooth(color = "blue")
p3 <- ggplot(birthwt.train, aes(birthwt.train[,10], residuals(model_norm))) +
geom_point() + geom_smooth(color = "blue")
library(pdp)
##
## Attaching package: 'pdp'
## The following object is masked from 'package:purrr':
##
## partial
library(gridExtra)
grid.arrange(p1, p2, p3)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
library(olsrr)
model_norm %>% ols_plot_cooksd_bar()
model_norm %>%
Anova() %>%
tidy()
term | sumsq | df | statistic | p.value |
---|---|---|---|---|
smoke | 6.03e+06 | 1 | 15.9 | 0.000104 |
race | 5.46e+06 | 2 | 7.21 | 0.00103 |
Residuals | 5.57e+07 | 147 |
model_norm %>%
tidy()
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 3.34e+03 | 91.5 | 36.5 | 1.58e-75 |
smokeSmoker | -430 | 108 | -3.99 | 0.000104 |
raceAfrican American | -320 | 149 | -2.14 | 0.0341 |
raceOther | -428 | 117 | -3.65 | 0.000367 |
model_norm %>%
glm_coef(labels = model_labels(model_norm))
Parameter | Coefficient | Pr(>|t|) |
---|---|---|
Constant | 3338.12 (3157.2, 3519.04) | < 0.001 |
Smoking status: Smoker | -429.74 (-642.58, -216.9) | < 0.001 |
Race: African American | -319.5 (-614.66, -24.35) | 0.034 |
Race: Other | -427.65 (-659.34, -195.95) | < 0.001 |
model_norm %>%
glm_coef(se_rob = TRUE, labels = model_labels(model_norm))
Parameter | Coefficient | Pr(>|t|) |
---|---|---|
Constant | 3338.12 (3157.12, 3519.13) | < 0.001 |
Smoking status: Smoker | -429.74 (-644.83, -214.65) | < 0.001 |
Race: African American | -319.5 (-587.4, -51.61) | 0.02 |
Race: Other | -427.65 (-671.48, -183.81) | < 0.001 |
model_norm %>% glance()
r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
---|---|---|---|---|---|---|---|---|---|---|---|
0.136 | 0.119 | 615 | 7.73 | 7.88e-05 | 3 | -1.18e+03 | 2.37e+03 | 2.39e+03 | 5.57e+07 | 147 | 151 |
library(MASS)
model_norm_AIC <- stepAIC(model_norm, trace = 0)
model_norm_AIC %>%
Anova() %>%
tidy()
term | sumsq | df | statistic | p.value |
---|---|---|---|---|
smoke | 6.03e+06 | 1 | 15.9 | 0.000104 |
race | 5.46e+06 | 2 | 7.21 | 0.00103 |
Residuals | 5.57e+07 | 147 |
model_norm_AIC %>% glance()
r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
---|---|---|---|---|---|---|---|---|---|---|---|
0.136 | 0.119 | 615 | 7.73 | 7.88e-05 | 3 | -1.18e+03 | 2.37e+03 | 2.39e+03 | 5.57e+07 | 147 | 151 |
model_norm %>% Anova() %>% tidy()
term | sumsq | df | statistic | p.value |
---|---|---|---|---|
smoke | 6.03e+06 | 1 | 15.9 | 0.000104 |
race | 5.46e+06 | 2 | 7.21 | 0.00103 |
Residuals | 5.57e+07 | 147 |
AIC(model_norm, model_norm_AIC)
df | AIC |
---|---|
5 | 2.37e+03 |
5 | 2.37e+03 |
#library(relaimpo)
#calc.relimp(model_norm_AIC, type = c("lmg", "last", "first", "pratt", "betasq"), rela = T)
#boot <- boot.relimp(model_norm, b = 1000, type = c("lmg", "last", "first", "pratt"),
# rank = TRUE, diff = TRUE, rela = TRUE)
#booteval.relimp(boot)
#plot(booteval.relimp(boot,sort=TRUE))