Problem 1

Import data

data = read.csv("C:/Users/Will/OneDrive/Documents/School/375T Predictive Analytics/HW7/Ch10Ex11.csv", header = F)

Run PCA

We will confirm that there are noticable differences between the two groups

dataScaled = scale(data)
PVA = prcomp(dataScaled)
biplot(PVA)

As we can see, there are two clusters between the control group (right) and the diseased group (left). My recommendation is to build two 95% confidence rectangles centered at on the clusters. All genes that lie outside of either rectangle differ the most from the others. I was not able to build those graphics.

Problem 2

Import Data

#library('ISLR')
data = USArrests
scaled = apply(data, 2, scale)

(a)

pr.out = prcomp(-scaled)
PVE = (pr.out$sdev)^2/sum((pr.out$sdev)^2)
PVE
## [1] 0.62006039 0.24744129 0.08914080 0.04335752

(b)

PCMatrix = pr.out$rotation
sumvar = sum(apply(scaled^2, 2, sum))
PVE2 = apply((scaled %*% PCMatrix)^2, 2, sum)/sumvar
PVE2
##        PC1        PC2        PC3        PC4 
## 0.62006039 0.24744129 0.08914080 0.04335752

Problem 3

Import and interpret

library("ISLR")
## Warning: package 'ISLR' was built under R version 3.5.2
library("MASS")
attach(Auto)
autoScaled = scale(Auto[,1:8])
PVA = prcomp(autoScaled)
biplot(PVA)

Weight, displacement, horsepower, and number of cylinders appear to correlate, and seem to negatively correlate with the other factors.

Problem 4

Import Data

data = read.csv("C:/Users/Will/OneDrive/Documents/School/375T Predictive Analytics/HW7/MedGPA.csv")
accept = data$Acceptance
GPA = data$GPA

Fit a logistic regression model to predict acceptance status using the GPA scores

glm.fit = glm(accept~GPA, data = data, family = binomial)
summary(glm.fit)
## 
## Call:
## glm(formula = accept ~ GPA, family = binomial, data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7805  -0.8522   0.4407   0.7819   2.0967  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -19.207      5.629  -3.412 0.000644 ***
## GPA            5.454      1.579   3.454 0.000553 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 75.791  on 54  degrees of freedom
## Residual deviance: 56.839  on 53  degrees of freedom
## AIC: 60.839
## 
## Number of Fisher Scoring iterations: 4

Write down the estimated versions of both the logit and probability forms of this model.

The probability and logit forms of this model are given by \[\mathbb{P}\left[Y=1\right]=\frac{e^{\hat{\beta_0}+\hat{\beta_1}X}}{1+e^{\hat{\beta_0}+\hat{\beta_1}X}}\] and \[\ln\left(\mathbb{P}\left[Y=1\right]\right)-\ln\left(1-\mathbb{P}\left[Y=1\right]\right)=\hat{\beta_0}+\hat{\beta_1}X\] where \[X = GPA \\ Y = Acceptance \\ \hat{\beta_0} = -19.207 \\ \hat{\beta_1} = 5.454\]

What would the estimated model say about the chance that astudent with GPA = 3.92 is accepted into medical school?

\[\mathbb{P}\left[Y=1\right]=\frac{e^{\hat{\beta_0}+\hat{\beta_1}3.92}}{1+e^{\hat{\beta_0}+\hat{\beta_1}3.92}} \approx0.898\]

For approximately what GPAscore would a student have roughly a 50-50 chance of being accepted to medical school? (Hint: You might look at a graph or solve one of the equations algebraically.)

If a student has a 50-50 chance of being accepted to medical school then \[\mathbb{P}\left[Y=1\right]=1-\mathbb{P}\left[Y=1\right]\] Take natural log of either side, move to left side \[\ln\left(\mathbb{P}\left[Y=1\right]\right)-\ln\left(1-\mathbb{P}\left[Y=1\right]\right)=\hat{\beta_0}+\hat{\beta_1}X\implies \hat{\beta_0}+\hat{\beta_1}X=0\] solve for X \[\hat{\beta_0}+\hat{\beta_1}X=0 \implies X=-\frac{\beta_0}{\beta_1} \approx 3.522\]

Problem 5

Odds to probabilities

Odds of p to not-p are defined as \[o=\frac{p}{1-p} \implies p=\frac{o}{1+o}\] so ###If the odds of an event occurring are 2:1, what is the probability? \[2=\frac{p}{1-p} \implies p=\frac{2}{3}\]

If the odds of an event occurring are 10:1, what is the probability?

\[10=\frac{p}{1-p} \implies p=\frac{10}{11}\]

If the odds of an event occurring are 1:4, what is the probability?

\[\frac{1}{4}=\frac{p}{1-p} \implies p=\frac{1}{5}\]

Problem 6

If the probability of a birth defect with exposure to a potential teratogen is 0.6 and without exposure the probability is 0.01, what isthe odds ratio for a birth defect when exposed versus not exposed?

\[o_{exposed}=\frac{p_{exposed}}{1-p_{exposed}} = \frac{0.6}{1-0.6}=\frac{3}{2}\] \[o_{not}=\frac{p_{not}}{1-p_{not}} = \frac{0.01}{1-0.01}=\frac{1}{99}\] #Problem 8

Given information

We are given a sample of 31 patients, 18 of which have metastisized tumors in their the lymph nodes. The fitted logistic model is \[\ln\left(\frac{\hat{\pi}}{1-\hat{\pi}}\right)=\hat{\beta_0}+\hat{\beta_1}X\]

Where \[ X = size \\ Y = metastasized \\ \hat{\beta_0} = -2.086 \\ \hat{\beta_1} = 0.5117 \] where size is measured in centimeters.

Use this model to estimate the odds of metastasis, pi/(1 ??? pi), if a patient’s tumor size is 6 cm.

\[\ln\left(\frac{\hat{\pi}}{1-\hat{\pi}}\right)=\hat{\beta_0}+\hat{\beta_1}X \implies o=\frac{\hat{\pi}}{1-\hat{\pi}}=e^{\beta_0+\beta_1X} \approx 2.676 \] ###Use the model to predict the probability of metastasis if a patient’s tumor size is 6 cm. \[o=\frac{\hat{\pi}}{1-\hat{\pi}} \implies \hat{\pi}=\frac{o}{1+o} \approx 0.728\]

How much do the estimated odds change if the tumor size changes from 6 cm to 7 cm? Provide and interpret an odds ratio.

\[\frac{o_7}{o_6} \approx 1.668\] The odds of cancer increase by approximately 66.8%.

How much does the estimate of ??change if the tumor size changes from 6 cm to 7 cm?

\[\frac{p_7}{p_6} \approx 1.122\] The probability of cancer increases by approximately 66.8%. ##Effects of slope and intercept. Suppose that we have a logistic model with intercept beta_0 = 5 and slope beta_1 = 2. Explain what happens to a plot of the probability form of the model in each of the following circumstances: (See graphs attached at rear)

The slope beta_1 decreases to 1.

pt1 Slope decreases by one-half

The intercept beta_0 increses to 8.

pt2 Graph shifts left by three-halves.

The slope beta_1 decreases to -2.

pt3 Graph reflex with respect to the y-axis