FINAL EXAM

1. REVIEW OF ESSENTIAL CONCEPTS - 15 POINTS

(1) What is the rank of the following matrix?

\[\left[\begin{array} {rrr} 1 & -1 & 3 & -5\\ 2 & 1 & 5 & -9 \\ 6 & -1 & -2 & 4 \end{array}\right] \]

library(Matrix)
A1 <- matrix(c(1,-1, 3, -5, 2, 1, 5, -9, 6, -1, -2, 4), nrow = 3, byrow = T)
print(A1)

##      [,1] [,2] [,3] [,4]
## [1,]    1   -1    3   -5
## [2,]    2    1    5   -9
## [3,]    6   -1   -2    4

cat("The rank of the matrix is", qr(A1)$rank)

## The rank of the matrix is 3

(2) What is the transpose of the above matrix?

cat("The transpose of the matrix is:")

## The transpose of the matrix is:

print(t(A1))

##      [,1] [,2] [,3]
## [1,]    1    2    6
## [2,]   -1    1   -1
## [3,]    3    5   -2
## [4,]   -5   -9    4

(3) Define orthonormal basis vectors Please write down at least one orthonormal basis for the 3-dimensional vector space \(R^3\).

Orthonormal basis vectors: 1) all have length = 1 (they have all been normalized, or turned into unit vectors); 2) are all orthogonal to each other (their dot product = 0); and 3) are linearly independent.

An example of a set of orthonormal basis vectors for \(R^3\): \[{(1,0,0), (0,0.7071068,0.7071068), (0,0.7071068,-0.7071068)}\]

library(far)

#Define a matrix containing 3 vectors
A3 <- matrix(c(2,1,1,0,1,2,0,1,0), nrow = 3, byrow = T)

#Normalize the matrix
A3o <- orthonormalization(A3, basis = T, norm = T)
print(A3o)

##      [,1]      [,2]       [,3]
## [1,]    1 0.0000000  0.0000000
## [2,]    0 0.7071068  0.7071068
## [3,]    0 0.7071068 -0.7071068

#Check for linear independence (determinant is non-zero)
det(A3o)

## [1] -1

#Separate into 3 vectors
A3o1 <- A3o[1,]
A3o2 <- A3o[2,]
A3o3 <- A3o[3,]

#Check that they are orthogonal
0 == A3o1 * A3o2 * A3o3

## [1] TRUE TRUE TRUE

(4) Given the following matrix, what is its characteristic polynomial?

\[\mathbf{A} = \left[\begin{array} {rrr} 5 & 0 & 3\\ 0 & 1 & -2 \\ 1 & 2 & 0 \end{array}\right] \]

Solution: \[\lambda^3 - 6\lambda^2 + 6\lambda - 17\]

library(pracma)

## 
## Attaching package: 'pracma'
## 
## The following objects are masked from 'package:Matrix':
## 
##     expm, lu, tril, triu

A4 <- matrix(c(5, 0, 3, 0, 1, -2, 1, 2, 0), nrow = 3, byrow = T)
print(A4)

##      [,1] [,2] [,3]
## [1,]    5    0    3
## [2,]    0    1   -2
## [3,]    1    2    0

cp <- charpoly(A4, info = T)

## Error term: 0

cp$cp

## [1]   1  -6   6 -17

(5) What are its eigenvalues and eigenvectors?

eigen(A4)$values

## [1] 5.471264+0.000000i 0.264368+1.742772i 0.264368-1.742772i

eigen(A4)$vectors

##                [,1]                  [,2]                  [,3]
## [1,]  0.98551404+0i  0.2583505-0.2761849i  0.2583505+0.2761849i
## [2,] -0.06924766+0i -0.6725516+0.0000000i -0.6725516+0.0000000i
## [3,]  0.15481227+0i -0.2473752+0.5860519i -0.2473752-0.5860519i

(6) Given a column stochastic matrix of links between URLs, what can you say about the PageRank of this set of URLs?

A column stochastic matrix of links between URLs can be used to give a unique ranking, or the (Google) PageRank, of those web pages. This can be done by applying decay to the matrix and iterating until convergence. Decay (alpha) is key because it simulates the randomness of web browsing in the actual population.

(7) Assuming that we are repeatedly sampling sets of numbers (each set is of size n) from an unknown probability density function. What can we say about the average value of each set?

In accordance with the Central Limit Theorem, we can say that the average values of sets of samples of independently distributed random variables will follow the normal distribution. So, as \(n\) grows large, the mean value of the set of samples will be normally distributed around the mean of the original distribution.

(8) What is the derivative of \(e{^x}\cos{^2}(x)\)?

Solution: \[e^xcos(x)(cos(x)-2sin(x))\]

library(Deriv)
f <- function(x) (exp(x)*cos(x)^2)
g <- function(x) {}
body(g) <- Deriv(body(f), 'x')
print(body(g))

## {
##     .e1 <- cos(x)
##     .e1 * (.e1 - 2 * sin(x)) * exp(x)
## }

(9) What is the derivative of \(e{^x}{^3}\)?

Solution: \[3e^{x^3}x^2\]

f <- function(x) (exp(x^3))
g <- function(x) {}
body(g) <- Deriv(body(f), 'x')
print(body(g))

## 3 * (x^2 * exp(x^3))

(10) What is \(\int e{^x}\cos(x) + \sin(x)~dx\)?

\[ \int e{^x}\cos(x) + \int sin(x)\]

\[= \int e{^x}\cos(x) + (-cos(x))\]

\[= \frac{1}{2}e{^x}(sin(x)+cos(x)) - cos(x) + C\]

2. MINI-CODING ASSIGNMENTS - 15 POINTS

2.1. Sampling from function. Assume that you have a function that generates integers between 0 and 20 with the following probability distribution: P(x == k) = (20 k)p^kq{20-k} where p = 0.25 and q = 1 - p = 0.75 and x (set) 2 [0,20]. This is also known as a Binomial Distribution. Write a function to sample from this distribution. After that, generate 1000 samples from this distribution and plot the histogram of the sample.

# p = 0.25, q = 0.75

# define function
binomiald <- function(x, size, p) {
dbinom(x, size, p)
}

# define sample
x <- seq(0, 20 , by = 1)

# sample
mysample <- sample(x, 1000, replace = T, prob = binomiald(x, 20, .25))

#plot
hist(mysample)

2.2 Principal Components Analysis. For the auto data set attached with the final exam, please perform a Principal Components Analysis by performing an SVD on the 4 independent variables (with mpg as the dependent variable) and select the top 2 directions. Please scatter plot the data set after it has been projected to these two dimensions. Your code should print out the two orthogonal vectors and also perform the scatter plot of the data after it has been projected to these two dimensions.

auto <- read.csv("auto-mpg.data", header = F, sep = "")
names(auto) <- c("displacement", "horsepower", "weight", "acceleration", "mpg")
head(auto)

##   displacement horsepower weight acceleration mpg
## 1          307        130   3504         12.0  18
## 2          350        165   3693         11.5  15
## 3          318        150   3436         11.0  18
## 4          304        150   3433         12.0  16
## 5          302        140   3449         10.5  17
## 6          429        198   4341         10.0  15

A <- as.matrix(auto[,1:4])
head(A)

##      displacement horsepower weight acceleration
## [1,]          307        130   3504         12.0
## [2,]          350        165   3693         11.5
## [3,]          318        150   3436         11.0
## [4,]          304        150   3433         12.0
## [5,]          302        140   3449         10.5
## [6,]          429        198   4341         10.0

mpg <- as.matrix(auto[,5])
head(mpg)

##      [,1]
## [1,]   18
## [2,]   15
## [3,]   18
## [4,]   16
## [5,]   17
## [6,]   15

# Use sweep to subract column means
cx <- sweep(A, 2, colMeans(A), "-")
s <- svd(cx)

head(s$u) #principal components in the PCA

##             [,1]        [,2]         [,3]          [,4]
## [1,] -0.03170503 -0.06593017 -0.034068834  0.0562736981
## [2,] -0.04316499 -0.10275329  0.027934606 -0.0115557207
## [3,] -0.02783591 -0.09793801  0.015798400  0.0265439894
## [4,] -0.02756521 -0.08114003  0.028847024  0.0007645433
## [5,] -0.02846751 -0.07235841  0.001223887  0.0721217532
## [6,] -0.08179348 -0.11107760  0.046454703 -0.0134520011

print(s$d) #dimensions

## [1] 16919.50904   769.10176   319.28976    33.69084

plot(s$d,type='b',pch=10,xlab='Singular value',ylab='magnitude')

head(s$v) #vectors

##              [,1]        [,2]         [,3]         [,4]
## [1,] -0.114341470 -0.94619620 -0.302560557 -0.009791875
## [2,] -0.038967092 -0.29819647  0.949995562 -0.084076546
## [3,] -0.992676062  0.12074073 -0.002546427  0.003070351
## [4,]  0.001352834  0.03483225 -0.077194932 -0.996406457

# check using PCA function
pca <- prcomp(cx, center=F, scale.F=F)
print(pca)

## Standard deviations:
## [1] 855.656351  38.895148  16.147177   1.703819
## 
## Rotation:
##                       PC1         PC2          PC3          PC4
## displacement -0.114341470 -0.94619620 -0.302560557 -0.009791875
## horsepower   -0.038967092 -0.29819647  0.949995562 -0.084076546
## weight       -0.992676062  0.12074073 -0.002546427  0.003070351
## acceleration  0.001352834  0.03483225 -0.077194932 -0.996406457

# top 2 dimensions
u2dim <- (s$u[, 1:2])
v2dim <- (s$v[, 1:2])
d2dim <- (diag(s$d)[1:2, 1:2])

autoidim2 <- u2dim %*% d2dim %*% t(v2dim)
newauto <- (autoidim2)
colnames(newauto) <- c("displacement", "horsepower", "weight", "acceleration")
head(newauto)

##      displacement horsepower    weight acceleration
## [1,]     109.3154   36.02390  526.3823    -2.491945
## [2,]     158.2828   52.02465  715.4397    -3.740730
## [3,]     125.1230   40.81377  458.4259    -3.260859
## [4,]     112.3750   36.78279  455.4392    -2.804652
## [5,]     107.7300   35.36367  471.4094    -2.590050
## [6,]     239.0713   79.40169 1363.4550    -4.847912

pairs(~., data=A, main = "Auto Data Pre-PCA")

pairs(~., data=newauto, main = "Auto Data Post-PCA")

plot(A)

plot(newauto)

orthov1 <- s$v[,1]
orthov2 <- s$v[,2]

cat("orthogonal vector 1:", orthov1)

## orthogonal vector 1: -0.1143415 -0.03896709 -0.9926761 0.001352834

cat("orthogonal vector 2:", orthov2)

## orthogonal vector 2: -0.9461962 -0.2981965 0.1207407 0.03483225

cat("check:", round((orthov1 %*% orthov2), 3) == 0)

## check: TRUE

2.3. Sampling in Bootstrapping. As we discussed in class, in bootstrapping we start with n data points and repeatedly sample many times with replacement. Each time, we generate a candidate data set of size n from the original data set. All parameter estimations are performed on these candidate data sets. It can be easily shown that any particular data set generated by sampling n points from an original set of size n covers roughly 63.2% of the original data set. Using probability theory and limits, please prove that this is true. After that, write a program to perform this sampling and show that the empirical observation also agrees this.

When sampling with replacement, the probability of each data point being picked as test data is: \(P = (1-\frac{1}{n}){^n}\)

Therefore, training data is \(P = 1-(1-\frac{1}{n}){^n}\) of the original data.

So, the probability a particular training data point will not be picked is: \(1-\frac{1}{n}\)

For any value of \(n\), \(1-\frac{1}{n} \approx 0.368 \approx \exp^{-1}\)

Which means that the training data will contain \(\approx 63.2\%\) of the test data instances.

bootstrapme <- function(n) {
  data <- (1:n)
  mysample <- replicate(n, {sample(data, 1, replace = T)})
  return(length(unique(mysample))/n)
}

n = 100000
bootstrapme(n)

## [1] 0.63254

3. Mini-project - 20 points

# read in data
rawx <- read.csv("ex3x.dat", header = F, sep = "")
colnames(rawx) <- c("sqft", "bdrm")
head(rawx)

##   sqft bdrm
## 1 2104    3
## 2 1600    3
## 3 2400    3
## 4 1416    2
## 5 3000    4
## 6 1985    4

rawy <- read.csv("ex3y.dat", header = F, sep = "")
colnames(rawy) <- c("price")
head(rawy)

##    price
## 1 399900
## 2 329900
## 3 369000
## 4 232000
## 5 539900
## 6 299900

# X <- rawx
y <- as.matrix(rawy)

# standardize data
x <- scale(rawx, center = T, scale = T)
head(x)

##             sqft       bdrm
## [1,]  0.13000987 -0.2236752
## [2,] -0.50418984 -0.2236752
## [3,]  0.50247636 -0.2236752
## [4,] -0.73572306 -1.5377669
## [5,]  1.25747602  1.0904165
## [6,] -0.01973173  1.0904165

# # define variables
# sqft <- as.matrix(x[,1])
# bdrm <- as.matrix(x[,2])
# price <- as.matrix(y)

# combine datasets
data <- cbind(x, y)
head(data)

##             sqft       bdrm  price
## [1,]  0.13000987 -0.2236752 399900
## [2,] -0.50418984 -0.2236752 329900
## [3,]  0.50247636 -0.2236752 369000
## [4,] -0.73572306 -1.5377669 232000
## [5,]  1.25747602  1.0904165 539900
## [6,] -0.01973173  1.0904165 299900

# ** GRADIENT DESCENT **

# number of observations
m <- nrow(x)
print(m)

## [1] 47

# add dummy variable to x data
x0 <- rep(1, m)
x <- as.matrix(cbind(x0, x))

# check variables
head(x)

##      x0        sqft       bdrm
## [1,]  1  0.13000987 -0.2236752
## [2,]  1 -0.50418984 -0.2236752
## [3,]  1  0.50247636 -0.2236752
## [4,]  1 -0.73572306 -1.5377669
## [5,]  1  1.25747602  1.0904165
## [6,]  1 -0.01973173  1.0904165

head(y)

##       price
## [1,] 399900
## [2,] 329900
## [3,] 369000
## [4,] 232000
## [5,] 539900
## [6,] 299900

# define the gradient function dJ/dtheata: 1/m * (h(x)-y))*x where h(x) = x*theta; in matrix form this is as follows:
grad <- function(x,y,theta){
  gradient <- (1/m)* (t(x) %*% ((x%*%t(theta)) - y))
  return(t(gradient))
}

# define gradient descent update algorithm
grad.descent <- function(x, maxit, alpha){
  theta <- matrix(c(0, 0,0), nrow=1) # Initialize the parameters
  for (i in 1:maxit) {
    theta <- theta - alpha * grad(x, y, theta)
    theta_ret <- rbind(theta_ret,theta)
  }
  return(theta_ret)
}


alphaval <- c(.001, .01, .1, 1.)
par(mfrow = c(1, 1))

for(i in 1:length(alphaval)) {
  theta_ret <- c()
  tab <- grad.descent(x,100, alphaval[i])
  plot(tab[,1],type="b",ylim=c(min(tab),max(tab)),col="red",lty=1,ylab="Value",lwd=1.5, main = paste("Alpha =", alphaval[i]))
lines(tab[,2],type="b",col="black",lty=1,lwd=1.5)
lines(tab[,3],type="b",col="blue",lty=1,lwd=1.5)
legend("topleft", c("Price", "Square Feet", "# Bedrooms"), lty=c(1,1), lwd=c(2.0, 2.0), col=c("red", "black", "blue"))
print(tab[100,])
}

##        x0      sqft      bdrm 
## 32409.958  9839.144  4894.396

##        x0      sqft      bdrm 
## 215810.62  61384.03  20273.55

##         x0       sqft       bdrm 
## 340403.618 109912.678  -5931.109

##         x0       sqft       bdrm 
## 340412.660 110631.050  -6649.474

LM FUNCTION

# read in data
rawx <- read.csv("ex3x.dat", header = F, sep = "")
colnames(rawx) <- c("sqft", "bdrm")
head(rawx)

##   sqft bdrm
## 1 2104    3
## 2 1600    3
## 3 2400    3
## 4 1416    2
## 5 3000    4
## 6 1985    4

rawy <- read.csv("ex3y.dat", header = F, sep = "")
colnames(rawy) <- c("price")
head(rawy)

##    price
## 1 399900
## 2 329900
## 3 369000
## 4 232000
## 5 539900
## 6 299900

# X <- rawx
y <- rawy

# standardize data
x <- scale(rawx, center = T, scale = T)
head(x)

##             sqft       bdrm
## [1,]  0.13000987 -0.2236752
## [2,] -0.50418984 -0.2236752
## [3,]  0.50247636 -0.2236752
## [4,] -0.73572306 -1.5377669
## [5,]  1.25747602  1.0904165
## [6,] -0.01973173  1.0904165

# define variables
sqft <- as.matrix(x[,1])
bdrm <- as.matrix(x[,2])
price <- as.matrix(y)

# fit linear regression model
fit <- lm(price ~ sqft + bdrm)
summary_fit <- summary(fit)
print(summary_fit)

## 
## Call:
## lm(formula = price ~ sqft + bdrm)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -130582  -43636  -10829   43698  198147 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   340413       9637  35.323  < 2e-16 ***
## sqft          110631      11758   9.409 4.22e-12 ***
## bdrm           -6650      11758  -0.566    0.575    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 66070 on 44 degrees of freedom
## Multiple R-squared:  0.7329, Adjusted R-squared:  0.7208 
## F-statistic: 60.38 on 2 and 44 DF,  p-value: 2.428e-13

library(scatterplot3d) 
s3d <-scatterplot3d(sqft, bdrm, price, pch=16, highlight.3d=TRUE, type="h", main="3D Scatterplot")
s3d$plane3d(fit)

OLS FUNCTION

# read in data
rawx <- read.csv("ex3x.dat", header = F, sep = "")
colnames(rawx) <- c("sqft", "bdrm")
head(rawx)

##   sqft bdrm
## 1 2104    3
## 2 1600    3
## 3 2400    3
## 4 1416    2
## 5 3000    4
## 6 1985    4

rawy <- read.csv("ex3y.dat", header = F, sep = "")
colnames(rawy) <- c("price")
head(rawy)

##    price
## 1 399900
## 2 329900
## 3 369000
## 4 232000
## 5 539900
## 6 299900

y <- rawy

# standardize data
x <- scale(rawx, center = T, scale = T)

A <- as.matrix(cbind(x, 1))
colnames(A) <- c("sqft", "bdrm", "intercept")
head(A)

##             sqft       bdrm intercept
## [1,]  0.13000987 -0.2236752         1
## [2,] -0.50418984 -0.2236752         1
## [3,]  0.50247636 -0.2236752         1
## [4,] -0.73572306 -1.5377669         1
## [5,]  1.25747602  1.0904165         1
## [6,] -0.01973173  1.0904165         1

b <- as.matrix(y)
colnames(b) <- c("price")
head(b)

##       price
## [1,] 399900
## [2,] 329900
## [3,] 369000
## [4,] 232000
## [5,] 539900
## [6,] 299900

# calculate t(A)*A
ATA <- t(A) %*% A
print(ATA)

##                   sqft         bdrm    intercept
## sqft      4.600000e+01 2.575849e+01 8.881784e-16
## bdrm      2.575849e+01 4.600000e+01 1.021405e-14
## intercept 8.881784e-16 1.021405e-14 4.700000e+01

# calculate t(A)*b
ATb <- t(A) %*% b
print(ATb)

##              price
## sqft       4917748
## bdrm       2543813
## intercept 15999395

# solve for x-hat using the two matrices
ATAInv <- solve(ATA)
xhat <- (ATAInv %*% ATb)
print(xhat)

##                price
## sqft      110631.050
## bdrm       -6649.474
## intercept 340412.660

# check using lsfit function
lsr <- lsfit(A, b, intercept = F)
coeffs <- lsr$coefficients
print(coeffs)

##       sqft       bdrm  intercept 
## 110631.050  -6649.474 340412.660

Cross-validation

library(stats)
library(boot)

# combine datasets
data <- cbind(x, y)
head(data)

##          sqft       bdrm  price
## 1  0.13000987 -0.2236752 399900
## 2 -0.50418984 -0.2236752 329900
## 3  0.50247636 -0.2236752 369000
## 4 -0.73572306 -1.5377669 232000
## 5  1.25747602  1.0904165 539900
## 6 -0.01973173  1.0904165 299900

# run glm for 8 degrees
set.seed(1)
cv.err <- c()
for(i in 1:8) {
  glm.fit = glm(price ~ poly(sqft+bdrm), data=data)
  cv.err[i] = cv.glm(data, glm.fit, K=5)$delta[1]
}
# summary statistics
summary(glm.fit)

## 
## Call:
## glm(formula = price ~ poly(sqft + bdrm), data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -153135   -65604     1909    63727   176776  
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         340413      12515  27.200  < 2e-16 ***
## poly(sqft + bdrm)   622842      85800   7.259 4.21e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 7361699576)
## 
##     Null deviance: 7.1921e+11  on 46  degrees of freedom
## Residual deviance: 3.3128e+11  on 45  degrees of freedom
## AIC: 1205.2
## 
## Number of Fisher Scoring iterations: 2

# plot cross-validation
degree <- 1:8
plot(degree, cv.err, type = "b")

par(mfrow = c(2, 2))
plot(glm.fit)

SGD Attempt

library(sgd)
rawx <- read.csv("ex3x.dat", header = F, sep = "")
colnames(rawx) <- c("sqft", "bdrm")
head(rawx)

##   sqft bdrm
## 1 2104    3
## 2 1600    3
## 3 2400    3
## 4 1416    2
## 5 3000    4
## 6 1985    4

rawy <- read.csv("ex3y.dat", header = F, sep = "")
colnames(rawy) <- c("price")
head(rawy)

##    price
## 1 399900
## 2 329900
## 3 369000
## 4 232000
## 5 539900
## 6 299900

# X <- rawx
y <- rawy

# standardize data
x <- scale(rawx, center = T, scale = T)
head(x)

##             sqft       bdrm
## [1,]  0.13000987 -0.2236752
## [2,] -0.50418984 -0.2236752
## [3,]  0.50247636 -0.2236752
## [4,] -0.73572306 -1.5377669
## [5,]  1.25747602  1.0904165
## [6,] -0.01973173  1.0904165

# combine datasets
data <- cbind(x, y)
head(data)

##          sqft       bdrm  price
## 1  0.13000987 -0.2236752 399900
## 2 -0.50418984 -0.2236752 329900
## 3  0.50247636 -0.2236752 369000
## 4 -0.73572306 -1.5377669 232000
## 5  1.25747602  1.0904165 539900
## 6 -0.01973173  1.0904165 299900

sgd.data <- sgd(x, y, model="lm")
sgd.data

## $coefficients
##       sqft       bdrm 
## 11926.8714  -599.0125 
## 
## $residuals
##    price     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA> 
## 398215.4 335779.4 362873.0 239853.7 525555.4 300788.5 321769.9 207474.8 
##     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA> 
## 221181.2 249970.3 241562.9 346876.2 331526.1 664160.8 270762.1 446061.0 
##     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA> 
## 309194.5 211242.3 491521.5 584190.2 256273.1 255770.0 248719.4 261133.7 
##     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA> 
## 545411.2 263283.4 472510.6 460982.0 471874.6 290216.2 351405.4 183209.9 
##     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA> 
## 314963.1 562712.2 289399.9 258225.7 241197.3 343682.3 516420.7 285232.1 
##     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA> 
## 372631.8 326204.3 306153.9 310882.6 196218.2 302784.6 251337.6 
## 
## $fitted.values
##        price         <NA>         <NA>         <NA>         <NA> 
##   1684.59523  -5879.42311   6126.95521  -7853.73270  14344.58152 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##   -888.51097  -6869.94932  -8475.80242  -9181.17715  -7470.26824 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##  -1563.86975    123.76605  -1527.11097  35739.20803 -10862.07011 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##   3839.00049  -9294.49809 -11342.32524   8476.46411  14809.82868 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##  -3373.09164   -769.96949  -5819.39122  -1233.69435  28488.83482 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
## -13383.40956  -8010.55526   8017.95979   3125.36063   9683.84478 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##  -1505.36016 -13309.89199    -63.07246  17187.83123  -3499.89825 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##  -8325.72269 -11297.30132   1317.66104  32579.26858   1767.90023 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##  -4131.75542   3695.66360   7846.12925 -11882.61227 -16318.22941 
##         <NA>         <NA> 
##  -2884.57137 -11837.58835 
## 
## $rank
## [1] 2
## 
## $family
## 
## Family: gaussian 
## Link function: identity 
## 
## 
## $linear.predictors
##        price         <NA>         <NA>         <NA>         <NA> 
##   1684.59523  -5879.42311   6126.95521  -7853.73270  14344.58152 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##   -888.51097  -6869.94932  -8475.80242  -9181.17715  -7470.26824 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##  -1563.86975    123.76605  -1527.11097  35739.20803 -10862.07011 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##   3839.00049  -9294.49809 -11342.32524   8476.46411  14809.82868 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##  -3373.09164   -769.96949  -5819.39122  -1233.69435  28488.83482 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
## -13383.40956  -8010.55526   8017.95979   3125.36063   9683.84478 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##  -1505.36016 -13309.89199    -63.07246  17187.83123  -3499.89825 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##  -8325.72269 -11297.30132   1317.66104  32579.26858   1767.90023 
##         <NA>         <NA>         <NA>         <NA>         <NA> 
##  -4131.75542   3695.66360   7846.12925 -11882.61227 -16318.22941 
##         <NA>         <NA> 
##  -2884.57137 -11837.58835 
## 
## $deviance
## [1] 6.057538e+12
## 
## $null.deviance
## [1] 719208918475
## 
## $iter
## [1] 2
## 
## $weights
## price  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
##     1     1     1     1     1     1     1     1     1     1     1     1 
##  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
##     1     1     1     1     1     1     1     1     1     1     1     1 
##  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
##     1     1     1     1     1     1     1     1     1     1     1     1 
##  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
##     1     1     1     1     1     1     1     1     1     1     1 
## 
## $df.residual
## [1] 45
## 
## $df.null
## [1] 46
## 
## $converged
## NULL
## 
## attr(,"class")
## [1] "sgd"

sgd.data$coefficients

##       sqft       bdrm 
## 11926.8714  -599.0125

print(x)

##                sqft       bdrm
##  [1,]  0.1300098691 -0.2236752
##  [2,] -0.5041898382 -0.2236752
##  [3,]  0.5024763638 -0.2236752
##  [4,] -0.7357230647 -1.5377669
##  [5,]  1.2574760154  1.0904165
##  [6,] -0.0197317285  1.0904165
##  [7,] -0.5872397999 -0.2236752
##  [8,] -0.7218814044 -0.2236752
##  [9,] -0.7810230438 -0.2236752
## [10,] -0.6375731100 -0.2236752
## [11,] -0.0763567023  1.0904165
## [12,] -0.0008567372 -0.2236752
## [13,] -0.1392733400 -0.2236752
## [14,]  3.1172918237  2.4045083
## [15,] -0.9219563121 -0.2236752
## [16,]  0.3766430886  1.0904165
## [17,] -0.8565230089 -1.5377669
## [18,] -0.9622229602 -0.2236752
## [19,]  0.7654679091  1.0904165
## [20,]  1.2964843307  1.0904165
## [21,] -0.2940482685 -0.2236752
## [22,] -0.1417900055 -1.5377669
## [23,] -0.4991565072 -0.2236752
## [24,] -0.0486733818  1.0904165
## [25,]  2.3773921652 -0.2236752
## [26,] -1.1333562145 -0.2236752
## [27,] -0.6828730891 -0.2236752
## [28,]  0.6610262907 -0.2236752
## [29,]  0.2508098133 -0.2236752
## [30,]  0.8007012262 -0.2236752
## [31,] -0.2034483104 -1.5377669
## [32,] -1.2591894898 -2.8518586
## [33,]  0.0494765729  1.0904165
## [34,]  1.4298676025 -0.2236752
## [35,] -0.2386816274  1.0904165
## [36,] -0.7092980769 -0.2236752
## [37,] -0.9584479619 -0.2236752
## [38,]  0.1652431861  1.0904165
## [39,]  2.7863503098  1.0904165
## [40,]  0.2029931687  1.0904165
## [41,] -0.4236565421 -1.5377669
## [42,]  0.2986264579 -0.2236752
## [43,]  0.7126179335  1.0904165
## [44,] -1.0075229393 -0.2236752
## [45,] -1.4454227371 -1.5377669
## [46,] -0.1870899846  1.0904165
## [47,] -1.0037479410 -0.2236752
## attr(,"scaled:center")
##        sqft        bdrm 
## 2000.680851    3.170213 
## attr(,"scaled:scale")
##        sqft        bdrm 
## 794.7023535   0.7609819

FINAL EXAM

Honey Berk

December 27, 2015

1. REVIEW OF ESSENTIAL CONCEPTS - 15 POINTS

(1) What is the rank of the following matrix?

\[\left[\begin{array} {rrr} 1 & -1 & 3 & -5\\ 2 & 1 & 5 & -9 \\ 6 & -1 & -2 & 4 \end{array}\right] \]

(2) What is the transpose of the above matrix?

(3) Define orthonormal basis vectors Please write down at least one orthonormal basis for the 3-dimensional vector space \(R^3\).

Orthonormal basis vectors: 1) all have length = 1 (they have all been normalized, or turned into unit vectors); 2) are all orthogonal to each other (their dot product = 0); and 3) are linearly independent.

An example of a set of orthonormal basis vectors for \(R^3\): \[{(1,0,0), (0,0.7071068,0.7071068), (0,0.7071068,-0.7071068)}\]

(4) Given the following matrix, what is its characteristic polynomial?

\[\mathbf{A} = \left[\begin{array} {rrr} 5 & 0 & 3\\ 0 & 1 & -2 \\ 1 & 2 & 0 \end{array}\right] \]

Solution: \[\lambda^3 - 6\lambda^2 + 6\lambda - 17\]

(5) What are its eigenvalues and eigenvectors?

(6) Given a column stochastic matrix of links between URLs, what can you say about the PageRank of this set of URLs?

(7) Assuming that we are repeatedly sampling sets of numbers (each set is of size n) from an unknown probability density function. What can we say about the average value of each set?

(8) What is the derivative of \(e{^x}\cos{^2}(x)\)?

Solution: \[e^xcos(x)(cos(x)-2sin(x))\]

(9) What is the derivative of \(e{^x}{^3}\)?

Solution: \[3e^{x^3}x^2\]

(10) What is \(\int e{^x}\cos(x) + \sin(x)~dx\)?

\[ \int e{^x}\cos(x) + \int sin(x)\]

\[= \int e{^x}\cos(x) + (-cos(x))\]

\[= \frac{1}{2}e{^x}(sin(x)+cos(x)) - cos(x) + C\]

2. MINI-CODING ASSIGNMENTS - 15 POINTS

When sampling with replacement, the probability of each data point being picked as test data is: \(P = (1-\frac{1}{n}){^n}\)

Therefore, training data is \(P = 1-(1-\frac{1}{n}){^n}\) of the original data.

So, the probability a particular training data point will not be picked is: \(1-\frac{1}{n}\)

For any value of \(n\), \(1-\frac{1}{n} \approx 0.368 \approx \exp^{-1}\)

Which means that the training data will contain \(\approx 63.2\%\) of the test data instances.

3. Mini-project - 20 points

LM FUNCTION

OLS FUNCTION

Cross-validation

SGD Attempt

FINAL EXAM

Honey Berk

December 27, 2015

1. REVIEW OF ESSENTIAL CONCEPTS - 15 POINTS

(1) What is the rank of the following matrix?

\[\left[\begin{array} {rrr} 1 & -1 & 3 & -5\\ 2 & 1 & 5 & -9 \\ 6 & -1 & -2 & 4 \end{array}\right] \]

(2) What is the transpose of the above matrix?

(3) Define orthonormal basis vectors Please write down at least one orthonormal basis for the 3-dimensional vector space \(R^3\).

Orthonormal basis vectors: 1) all have length = 1 (they have all been normalized, or turned into unit vectors); 2) are all orthogonal to each other (their dot product = 0); and 3) are linearly independent.

An example of a set of orthonormal basis vectors for \(R^3\): \[{(1,0,0), (0,0.7071068,0.7071068), (0,0.7071068,-0.7071068)}\]

(4) Given the following matrix, what is its characteristic polynomial?

\[\mathbf{A} = \left[\begin{array} {rrr} 5 & 0 & 3\\ 0 & 1 & -2 \\ 1 & 2 & 0 \end{array}\right] \]

Solution: \[\lambda^3 - 6\lambda^2 + 6\lambda - 17\]

(5) What are its eigenvalues and eigenvectors?

(6) Given a column stochastic matrix of links between URLs, what can you say about the PageRank of this set of URLs?

(7) Assuming that we are repeatedly sampling sets of numbers (each set is of size n) from an unknown probability density function. What can we say about the average value of each set?

(8) What is the derivative of \(e{^x}\cos{^2}(x)\)?

Solution: \[e^xcos(x)(cos(x)-2sin(x))\]

(9) What is the derivative of \(e{^x}{^3}\)?

Solution: \[3e^{x^3}x^2\]

(10) What is \(\int e{^x}\cos(x) + \sin(x)~dx\)?

\[ \int e{^x}\cos(x) + \int sin(x)\]

\[= \int e{^x}\cos(x) + (-cos(x))\]

\[= \frac{1}{2}e{^x}(sin(x)+cos(x)) - cos(x) + C\]

2. MINI-CODING ASSIGNMENTS - 15 POINTS

When sampling with replacement, the probability of each data point being picked as test data is: \(P = (1-\frac{1}{n}){^n}\)

Therefore, training data is \(P = 1-(1-\frac{1}{n}){^n}\) of the original data.

So, the probability a particular training data point will not be picked is: \(1-\frac{1}{n}\)

For any value of \(n\), \(1-\frac{1}{n} \approx 0.368 \approx \exp^{-1}\)

Which means that the training data will contain \(\approx 63.2\%\) of the test data instances.

3. Mini-project - 20 points

** LM FUNCTION **

** OLS FUNCTION **

** Cross-validation **

** SGD Attempt **

LM FUNCTION

OLS FUNCTION

Cross-validation

SGD Attempt