1. Using a little bit of algebra, prove that (4.2) is equivalent to (4.3). In other words, the logistic function representation and logit representation for the logistic regression model are equivalent.
(4.2) \(\hspace{1cm}\) \(p(X)=\frac{e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X}}\) \(\hspace{1cm}\) Es equivalente a \(\hspace{1cm}\) (4.3) \(\hspace{1cm}\) \(\frac{p(X)}{1-p(X)}=e^{\beta_0+\beta_1X}\)
\(\rightarrow\) \(\hspace{1cm}\) \(p(X)=\frac{e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X}}\) \(\hspace{1cm}\) Pasando el denominador a multiplicar \(\hspace{1cm}\) \(p(X)(1+e^{\beta_0+\beta_1X})=e^{\beta_0+\beta_1X}\)
\(\rightarrow\) \(\hspace{1cm}\) \(p(X)(1+e^{\beta_0+\beta_1X})=e^{\beta_0+\beta_1X}\) \(\hspace{1cm}\) Distribuyendo \(p(X)\) \(\hspace{1cm}\) \(p(X)+p(X)e^{\beta_0+\beta_1X}=e^{\beta_0+\beta_1X}\)
\(\rightarrow\) \(\hspace{1cm}\) \(p(X)+p(X)e^{\beta_0+\beta_1X}=e^{\beta_0+\beta_1X}\) \(\hspace{1cm}\) Despejando \(p(X)\) \(\hspace{1cm}\) \(p(X)=e^{\beta_0+\beta_1X}-p(X)e^{\beta_0+\beta_1X}\)
\(\rightarrow\) \(\hspace{1cm}\) \(p(X)=e^{\beta_0+\beta_1X}-p(X)e^{\beta_0+\beta_1X}\) \(\hspace{1cm}\) Factor común del exponencial \(\hspace{1cm}\) \(p(X)=e^{\beta_0+\beta_1X}(1-p(X))\)
\(\rightarrow\) \(\hspace{1cm}\) \(p(X)=e^{\beta_0+\beta_1X}(1-p(X))\) \(\hspace{1cm}\) Despejando el exponencial \(\hspace{1cm}\) \(\frac{p(X)}{1-p(X)}=e^{\beta_0+\beta_1X}\)
2. It was stated in the text that classifying an observation to the class for which (4.12) is largest is equivalent to classifying an observation to the class for which (4.13) is largest. Prove that this is the case. In other words, under the assumption that the observations in the kth class are drawn from a \(N(\mu_k, \sigma^2)\) distribution, the Bayes’ classifier assigns an observation to the class for which the discriminant function is maximized.
(4.12) \(\hspace{1cm}\) \(p_k(X)=\frac{\pi_k\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{1}{2\sigma^2}(X-\mu_k)^2}}{\sum_{l=1}^{k}\pi_l\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{1}{2\sigma^2}(X-\mu_l)^2}}\) \(\hspace{1cm}\) El termino \(\frac{1}{\sqrt{2\pi}\sigma}\) puede salir de la sumatoria del denominador ya que no depende de \(l\), con esto se puede eliminar el mismo termino del numerador. Expandiendo las exponenciales:
\(\rightarrow\) \(\hspace{1cm}\) \(=\frac{e^{-\frac{1}{2\sigma^2}X^2}e^{\frac{1}{\sigma^2}X\mu_k}e^{-\frac{1}{2\sigma^2}\mu_k^2}\pi_k}{\sum_{l=1}^{k}e^{-\frac{1}{2\sigma^2}X^2}e^{\frac{1}{\sigma^2}X\mu_l}e^{-\frac{1}{2\sigma^2}\mu_l^2}\pi_l^2}\)
Eliminando el exponencial comun: \(e^{-\frac{1}{2\sigma^2}X^2}\) que no depende de de l, por lo que puede salir de la sumatoria.
\(\rightarrow\) \(\hspace{1cm}\) \(p_k(X)=\frac{e^{\frac{1}{\sigma^2}X\mu_k}e^{-\frac{1}{2\sigma^2}\mu_k^2}\pi_k}{\sum_{l=1}^{k}e^{\frac{1}{\sigma^2}X\mu_l}e^{-\frac{1}{2\sigma^2}\mu_l^2}\pi_l^2}\)
El denominador no cambia por lo que se necesita maximizar el numerador:
\(\rightarrow\) \(\hspace{1cm}\) \({e^{\frac{1}{\sigma^2}X\mu_k}e^{-\frac{1}{2\sigma^2}\mu_k^2}\pi_k}\)
Aplicando logaritmo para eliminar las exponenciales. Logaritmo de una multiplicacion es igual a la suma de los logaritmos.
\(\rightarrow\) \(\hspace{1cm}\) \(\delta_k(X)=log(\pi_k)+{\frac{\mu_k}{\sigma^2}X}-{\frac{\mu_k^2}{2\sigma^2}}\)
Por lo que al maximizar (4.13) se conoce la clase a la que pertenece.
3. This problem relates to the QDA model, in which the observations within each class are drawn from a normal distribution with a classspecific mean vector and a class specific covariance matrix. We consider the simple case where p = 1; i.e. there is only one feature. Suppose that we have K classes, and that if an observation belongs to the kth class then X comes from a one-dimensional normal distribution, \(X ~ N(\mu_k, \sigma_k^2)\). Recall that the density function for the one-dimensional normal distribution is given in (4.11). Prove that in this case, the Bayes’ classifier is not linear. Argue that it is in fact quadratic.
A partir de la ecuacion descrita en el inciso anterior, usando solo el numerador de \(p_k(X)\). Recordando que en este inciso \(X ~ N(\mu_k, \sigma_k^2)\).
\(\rightarrow\) \(\hspace{1cm}\) \(\pi_ke^{-\frac{1}{2\sigma_k^2}(X-\mu_k)^2}\) \(\hspace{1cm}\) Aplicamos logaritmo \(\hspace{1cm}\) \(log(\pi_ke^{-\frac{1}{2\sigma_k^2}(X-\mu_k)^2})\)
\(\rightarrow\) \(\hspace{1cm}\) \(log(\pi_ke^{-\frac{1}{2\sigma_k^2}(X-\mu_k)^2})\) \(\hspace{1cm}\) Propiedades de logaritmo \(\hspace{1cm}\) \(log(\pi_k) -\frac{1}{2\sigma_k^2}(X-\mu_k)^2\)
\(\rightarrow\) \(\hspace{1cm}\) \(log(\pi_k) -\frac{1}{2\sigma_k^2}(X-\mu_k)^2\) \(\hspace{1cm}\) Expandiendo \(\hspace{1cm}\) \(log(\pi_k) -\frac{X^2}{2\sigma_k^2}+\frac{\mu_kX}{\sigma_k^2}-\frac{\mu_k^2}{2\sigma_k^2}\)
Por lo que se puede ver que el clasificador de Bayes no es lineal sino que cuadratico.
5. We now examine the differences between LDA and QDA. (a) If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set?
Si el limite de decision de Bayes es lineal se espera que con el training set el QDA se apegue mas a los datos, pero en el test set el LDA se apegara mas a los datos, ya que el QDA puede causar sobreajuste
(b) If the Bayes decision boundary is non-linear, do we expect LDA or QDA to perform better on the training set? On the test set?
Si es no lineal se espera que el QDA funcione mejor en el training set y en el test set.
(c) In general, as the sample size n increases, do we expect the test prediction accuracy of QDA relative to LDA to improve, decline, or be unchanged? Why?
Mejorara ya que el QDA es mas recomendado para grandes muestras porque se apega mejor a los datos.
(d) True or False: Even if the Bayes decision boundary for a given problem is linear, we will probably achieve a superior test error rate using QDA rather than LDA because QDA is flexible enough to model a linear decision boundary. Justify your answer.
Falso, ya que si la muestra no es muy grande el LDA se ajustara mejor a los datos, ya que el QDA al ser mas flexible puede causar sobreajuste.
6. Suppose we collect data for a group of students in a statistics class with variables \(X_1 =\) hours studied, \(X_2 =\) undergrad GPA, and \(Y =\) receive an A. We fit a logistic regression and produce estimated coefficient, \(\hat{\beta_0}=-6\), \(\hat{\beta_1}=0.05\), \(\hat{\beta_2}=1\).
(a) Estimate the probability that a student who studies for 40 h and has an undergrad GPA of 3.5 gets an A in the class. \(X_1=40\) \(\hspace{1cm}\) \(X_2=3.5\) \(\hspace{1cm}\) Hallar la probabilidad que obtenga una A. \[p(X)=\frac{e^{-6+0.05X_1+X_2}}{(1+e^{-6+0.05X_1+X_2})}=0.377\]
exp(-6+(0.05*40)+3.5)/(1+exp(-6+(0.05*40)+3.5))
(b) How many hours would the student in part (a) need to study to have a 50% chance of getting an A in the class?
Usando la ecuacion (4.3) que se demostro en el inciso 1:
\(\frac{p(X)}{1-p(X)}=e^{\beta_0+\beta_1X_1+\beta_2X}\) \(\hspace{1cm}\) \(\rightarrow\) \(\hspace{1cm}\) \(\frac{0.50}{1-0.50}=e^{-6+0.05X_1+X_2}\)
\(\rightarrow\) \(\hspace{1cm}\) \(e^{-6+0.05X_1+X_2}=1\) \(\hspace{1cm}\) Logaritmo en ambos lados de la ecuacion \(\hspace{1cm}\) \(X_1=\frac{6-3.5}{0.05}=50\)
(6-3.5)/0.05
7. Suppose that we wish to predict whether a given stock will issue a dividend this year (“Yes” or “No”) based on \(X\), last year’s percent profit. We examine a large number of companies and discover that the mean value of \(X\) for companies that issued a dividend was ¯\(X=10\), while the mean for those that didn’t was \(\overline{X}=0\). In addition, the variance of \(X\) for these two sets of companies was \(\hat{\sigma}^2=36\). Finally, \(80%\) of companies issued dividends. Assuming that X follows a normal distribution, predict the probability that a company will issue a dividend this year given that its percentage profit was \(X=4\) last year.
\[p(X)=\frac{\pi_ke^{-\frac{1}{2\sigma^2}(X-\mu_k)^2}}{\sum_{L=1}^{2}\pi_ke^{-\frac{1}{2\sigma^2}(X-\mu_L)^2}}\]
Sustituyendo los valores proporcionados. Recordando que hay dos clases en este inciso, y es si una accion dada dara o no dividendo en el presente anio. \(p(X=4)=\frac{0.8e^{-\frac{1}{2(36)}(4-10)^2}}{0.8e^{-\frac{1}{2(36)}(4-10)^2}+0.2e^{-\frac{1}{2(36)}(4-0)^2}}=0.7518\)
(0.8*exp(-(1/72)*(4-10)^2))/((0.8*exp(-(1/72)*(4-10)^2))+(0.2*exp(-(1/72)*(4-0)^2)))