data = read.csv("C:/Users/Will/OneDrive/Documents/School/375T Predictive Analytics/HW7/Ch10Ex11.csv", header = F)
We will confirm that there are noticable differences between the two groups
dataScaled = scale(data)
PVA = prcomp(dataScaled)
biplot(PVA)
As we can see, there are two clusters between the control group (right) and the diseased group (left). My recommendation is to build two 95% confidence rectangles centered at on the clusters. All genes that lie outside of either rectangle differ the most from the others. I was not able to build those graphics.
#library('ISLR')
data = USArrests
scaled = apply(data, 2, scale)
pr.out = prcomp(-scaled)
PVE = (pr.out$sdev)^2/sum((pr.out$sdev)^2)
PVE
## [1] 0.62006039 0.24744129 0.08914080 0.04335752
PCMatrix = pr.out$rotation
sumvar = sum(apply(scaled^2, 2, sum))
PVE2 = apply((scaled %*% PCMatrix)^2, 2, sum)/sumvar
PVE2
## PC1 PC2 PC3 PC4
## 0.62006039 0.24744129 0.08914080 0.04335752
library("ISLR")
## Warning: package 'ISLR' was built under R version 3.5.2
library("MASS")
attach(Auto)
autoScaled = scale(Auto[,1:8])
PVA = prcomp(autoScaled)
biplot(PVA)
Weight, displacement, horsepower, and number of cylinders appear to correlate, and seem to negatively correlate with the other factors.
data = read.csv("C:/Users/Will/OneDrive/Documents/School/375T Predictive Analytics/HW7/MedGPA.csv")
accept = data$Acceptance
GPA = data$GPA
glm.fit = glm(accept~GPA, data = data, family = binomial)
summary(glm.fit)
##
## Call:
## glm(formula = accept ~ GPA, family = binomial, data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7805 -0.8522 0.4407 0.7819 2.0967
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -19.207 5.629 -3.412 0.000644 ***
## GPA 5.454 1.579 3.454 0.000553 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 75.791 on 54 degrees of freedom
## Residual deviance: 56.839 on 53 degrees of freedom
## AIC: 60.839
##
## Number of Fisher Scoring iterations: 4
The probability and logit forms of this model are given by \[\mathbb{P}\left[Y=1\right]=\frac{e^{\hat{\beta_0}+\hat{\beta_1}X}}{1+e^{\hat{\beta_0}+\hat{\beta_1}X}}\] and \[\ln\left(\mathbb{P}\left[Y=1\right]\right)-\ln\left(1-\mathbb{P}\left[Y=1\right]\right)=\hat{\beta_0}+\hat{\beta_1}X\] where \[X = GPA \\ Y = Acceptance \\ \hat{\beta_0} = -19.207 \\ \hat{\beta_1} = 5.454\]
\[\mathbb{P}\left[Y=1\right]=\frac{e^{\hat{\beta_0}+\hat{\beta_1}3.92}}{1+e^{\hat{\beta_0}+\hat{\beta_1}3.92}} \approx0.898\]
If a student has a 50-50 chance of being accepted to medical school then \[\mathbb{P}\left[Y=1\right]=1-\mathbb{P}\left[Y=1\right]\] Take natural log of either side, move to left side \[\ln\left(\mathbb{P}\left[Y=1\right]\right)-\ln\left(1-\mathbb{P}\left[Y=1\right]\right)=\hat{\beta_0}+\hat{\beta_1}X\implies \hat{\beta_0}+\hat{\beta_1}X=0\] solve for X \[\hat{\beta_0}+\hat{\beta_1}X=0 \implies X=-\frac{\beta_0}{\beta_1} \approx 3.522\]
Odds of p to not-p are defined as \[o=\frac{p}{1-p} \implies p=\frac{o}{1+o}\] so ###If the odds of an event occurring are 2:1, what is the probability? \[2=\frac{p}{1-p} \implies p=\frac{2}{3}\]
\[10=\frac{p}{1-p} \implies p=\frac{10}{11}\]
\[\frac{1}{4}=\frac{p}{1-p} \implies p=\frac{1}{5}\]
\[o_{exposed}=\frac{p_{exposed}}{1-p_{exposed}} = \frac{0.6}{1-0.6}=\frac{3}{2}\] \[o_{not}=\frac{p_{not}}{1-p_{not}} = \frac{0.01}{1-0.01}=\frac{1}{99}\] #Problem 8
We are given a sample of 31 patients, 18 of which have metastisized tumors in their the lymph nodes. The fitted logistic model is \[\ln\left(\frac{\hat{\pi}}{1-\hat{\pi}}\right)=\hat{\beta_0}+\hat{\beta_1}X\]
Where \[ X = size \\ Y = metastasized \\ \hat{\beta_0} = -2.086 \\ \hat{\beta_1} = 0.5117 \] where size is measured in centimeters.
\[\ln\left(\frac{\hat{\pi}}{1-\hat{\pi}}\right)=\hat{\beta_0}+\hat{\beta_1}X \implies o=\frac{\hat{\pi}}{1-\hat{\pi}}=e^{\beta_0+\beta_1X} \approx 2.676 \] ###Use the model to predict the probability of metastasis if a patient’s tumor size is 6 cm. \[o=\frac{\hat{\pi}}{1-\hat{\pi}} \implies \hat{\pi}=\frac{o}{1+o} \approx 0.728\]
\[\frac{o_7}{o_6} \approx 1.668\] The odds of cancer increase by approximately 66.8%.
\[\frac{p_7}{p_6} \approx 1.122\] The probability of cancer increases by approximately 66.8%. ##Effects of slope and intercept. Suppose that we have a logistic model with intercept beta_0 = 5 and slope beta_1 = 2. Explain what happens to a plot of the probability form of the model in each of the following circumstances: (See graphs attached at rear)
Slope decreases by one-half
Graph shifts left by three-halves.
Graph reflex with respect to the y-axis