1. Review of Essential Concepts - 15 points

What is the rank of the following matrix?

\(\begin{bmatrix} 1 &-1 &3&-5 \\ 2 & 1 & 5&-9\\6&-1&-2&4\end{bmatrix}\)

Let’s try and see how we can reduce this matrix to row echelon form:

m<-matrix(c(1,-1,3,-5,2,1,5,-9,6,-1,-2,4),byrow=TRUE,nrow=3)

tmp <- m

# R2 -> R2 - 2*R1
tmp[2,] <- tmp[2,] - 2*tmp[1,]

# R3 -> R3 - 6*R1
tmp[3,] <- tmp[3,] - 6*tmp[1,]

tmp

##      [,1] [,2] [,3] [,4]
## [1,]    1   -1    3   -5
## [2,]    0    3   -1    1
## [3,]    0    5  -20   34

# R2 -> R2/3
tmp[2,] <- tmp[2,]/3
# R3 -> R3 - 5*R2
tmp[3,] <- tmp[3,] - 5*tmp[2,]

tmp

##      [,1] [,2]        [,3]       [,4]
## [1,]    1   -1   3.0000000 -5.0000000
## [2,]    0    1  -0.3333333  0.3333333
## [3,]    0    0 -18.3333333 32.3333333

# R1 -> R1 + R2
tmp[1,] <- tmp[1,] + tmp[2,]

# R3 -> R3/-18.33
tmp[3,] <- tmp[3,]/tmp[3,3]

tmp

##      [,1] [,2]       [,3]       [,4]
## [1,]    1    0  2.6666667 -4.6666667
## [2,]    0    1 -0.3333333  0.3333333
## [3,]    0    0  1.0000000 -1.7636364

# R2 -> 0.33 * R3 + R2
tmp[2,] <- -1*tmp[2,3] * tmp[3,] +  tmp[2,] 

# R1 -> R1 - 2.66 * R3 
tmp[1,] <- tmp[1,]  - tmp[1,3] * tmp[3,]

tmp

##      [,1] [,2] [,3]        [,4]
## [1,]    1    0    0  0.03636364
## [2,]    0    1    0 -0.25454545
## [3,]    0    0    1 -1.76363636

Given that we don’t have any non-zero rows, we can say that the above matrix has a rank of 3.

What is the transpose of the above matrix?

t(m)

##      [,1] [,2] [,3]
## [1,]    1    2    6
## [2,]   -1    1   -1
## [3,]    3    5   -2
## [4,]   -5   -9    4

Defne orthonormal basis vectors. Please write down at least one orthonormal basis for the 3-dimensional vector space \(R^{3}\).

Given a set of vectors, we can say that they are orthonormal if they meet these two characteristics:

They are unit vectors
They are orthogonal

An example of such a vectorset is:

\(R^{3} = \left [ v_{1}=(1,0,0),v_{2}=(0,1,0),v_{3}=(0,0,1) \right ]\)

Given the following matrix, what is its characteristic polynomial?

\(A = \begin{bmatrix} 5 &0 &3 \\ 0 & 1 & -2\\1&2&0\end{bmatrix}\)

Before we do anything, let’s reduce the first two values of the third row of the matrix:

\(\begin{bmatrix} 5 &0 &3 \\ 0 & 1 & -2\\1&2&0\end{bmatrix} R_3 \rightarrow R_3 - \frac{1}{5} R_1 \begin{bmatrix} 5 &0 &3 \\ 0 & 1 & -2\\0&2&-\frac{3}{5}\end{bmatrix}\)

\(\begin{bmatrix} 5 &0 &3 \\ 0 & 1 & -2\\0&2&-\frac{3}{5}\end{bmatrix} R_3 \rightarrow R_3 - 2 R_2 \begin{bmatrix} 5 &0 &3 \\ 0 & 1 & -2\\0&0&-\frac{3}{5}+4\end{bmatrix}\)

which becomes:

\(\begin{bmatrix} 5 &0 &3 \\ 0 & 1 & -2\\0&0&\frac{17}{5}\end{bmatrix}\)

The characteristic polynomial of A, denoted by \(p_A(t)\), is the polynomial defined by

\(p_A(\lambda) = \det \left(t \boldsymbol{\lambda} - A\right)\)

\(\begin{bmatrix} 5 &0 &3 \\ 0 & 1 & -2\\0&0&\frac{17}{5}\end{bmatrix}\)

\(A = \begin{bmatrix} 5 &0 &3 \\ 0 & 1 & -2\\1&2&0\end{bmatrix}\)

The characteristic polynomial of A, denoted by \(p_A(t)\), is the polynomial defined by

\(p_A(t) = \det \left(t \boldsymbol{I} - A\right)\)

For the provided matrix, let’s compute determinant of:

\(t I-A = \begin{pmatrix}t-5&0&-3\\0&t-1&2\\-1&-2&t\end{pmatrix}\)

\(det(t I-A) = (t-5)(t-1)(t) + (0)(2)(-1) + (-3)(0)(-2) - (-3)(t-1)(-1) - (t-5)(2)(-2) - (0)(0)(t)\)

\(det(t I-A) = (t-5)(t-1)(t) - (-3)(t-1)(-1) - (t-5)(2)(-2)\)

\(det(t I-A) = (t^2 -6t +5)(t) - (3t-3) - (-4t+20)\)

\(det(t I-A) = (t^3 -6t^2 +5t) - 3t+3) +4t-20)\)

\(det(t I-A) = (t^3 -6t^2 +5t) - t - 17)\)

\(det(t I-A) = (t^3 -6t^2 +4t - 17)\)

Therefore,

\(p_A(t) = t^3 -6t^2 +4t - 17\)

What are its eigenvectors and eigenvalues? Please note that it is possible to get complex vectors as eigenvectors.

The eigenvalues are those values of \(\lambda\) such that:

\(\lambda^3 -6\lambda^2 +4\lambda - 17 =0\)

By using Wolfram Alpha (http://www.wolframalpha.com/input/?i=x%5E3+-+6*x%5E2%2B4*x-17%3D0), we find that the system has one real solution: \(\lambda_1=5.8149\) and two complex solutions: \(\lambda_2=0.0926 + 1.7073i\) and \(\lambda_3=0.0926 + 1.7073i\).

Let’s calculate the eigenvector that corresponds to the first eigenvalue:

m<-matrix(c(5,0,3,0,1,-2,1,2,0),byrow=TRUE,nrow=3)

solve(m,matrix(c(5.8149,5.8149,5.8149),byrow=TRUE,nrow=3))

##           [,1]
## [1,]  2.394371
## [2,]  1.710265
## [3,] -2.052318

Let’s obtain the remaining eigenvectors and values by using R’s funciton.

eigen(m)

## $values
## [1] 5.471264+0.000000i 0.264368+1.742772i 0.264368-1.742772i
## 
## $vectors
##                [,1]                  [,2]                  [,3]
## [1,]  0.98551404+0i  0.2583505-0.2761849i  0.2583505+0.2761849i
## [2,] -0.06924766+0i -0.6725516+0.0000000i -0.6725516+0.0000000i
## [3,]  0.15481227+0i -0.2473752+0.5860519i -0.2473752-0.5860519i

Given a column stochastic matrix of links between URLs, what can you say about the PageRank of this set of URLs?

From such a random matrix of URL links, we suspect that the resulting ranked network is also random. The structure that pagerank will reveal is a star where all the nodes are equaly weighted.

Assuming that we are repeatedly sampling sets of numbers (each set is of size n) from an unknown probability density function. What can we say about the average value of each set?

Given a set of samples with an unknown pdf, we can say that the arithmetic mean or average of sets of samples will follow the normal distribution.

What is the derivative of \(e^{x} cos^{2}(x)\)?

By applying the multiplication of functions rule we have:

\(\frac{d}{dx}(e^{x} cos^2(x)) = \frac{d}{dx}(e^{x}) cos^2(x) + e^{x} \frac{d}{dx} cos^2(x)\)

\(\frac{d}{dx}(e^{x} cos^2(x)) = e^{x} cos^2(x) + e^{x} \frac{d}{dx} cos^2(x)\)

Again, we can express \(cos^2(x)\) as \(cos(x) cos(x)\) and apply the multiplication of functions rule:

\(\frac{d}{dx}(e^{x} cos^2(x)) = e^{x} cos^2(x) + e^{x} (\frac{d}{dx} cos(x) cos(x))\)

\(\frac{d}{dx}(e^{x} cos^2(x)) = e^{x} cos^2(x) + e^{x} (\frac{d}{dx}(cos(x)) cos(x) + cos(x)\frac{d}{dx}(cos(x)))\)

\(\frac{d}{dx}(e^{x} cos^2(x)) = e^{x} cos^2(x) + e^{x} (-sin(x) cos(x) -sin(x) cos(x))\)

\(\frac{d}{dx}(e^{x} cos^2(x)) = e^{x} cos^2(x) - 2 e^{x} sin(x) cos(x)\)

\(\frac{d}{dx}(e^{x} cos^2(x)) = cos(x)e^{x}(cos(x) - 2 sin(x) )\)

What is the derivative of \(e^{x^{3}}\)?

By applying the chain rule we have:

\(\frac{d}{dx}(e^{x^{3}}) = \frac{d}{du}(e^{u}) \frac{d}{dx}(x^{3})\)

\(\frac{d}{dx}(e^{x^{3}}) = (e^{x^{3}}) 3x^{2}\)

What \(\int e^{x} cos(x) + sin(x) dx\)?

This is equivalent to:

\(\int e^{x} cos(x) + sin(x) dx = \int e^{x} cos(x) dx + \int sin(x) dx\)

\(\int e^{x} cos(x) + sin(x) dx = \int e^{x} cos(x) dx - cos(x) + c\)

We can solve \(\int e^{x} cos(x) dx\) by parts

Given:

\(\int fg'dx = fg - \int f'g dx\)

where:

f = \(e^{x}\), f’= \(e^{x} dx\)

g’ = cos(x) dx, g = sin(x)

we have:

\(\int e^{x}cos(x)dx = fg - \int f'g dx\)

\(\int e^{x}cos(x)dx = e^{x}cos(x) - \int f'g dx\)

\(\int e^{x}cos(x)dx = e^{x}cos(x) - \int e^{x} sin(x) dx\)

With: \(\int e^{x} sin(x)\)

where:

f = \(sin(x)\), f’= \(cos(x) dx\)

g’ = \(e^{x} dx\), g = \(e^{x}\)

we have:

\(\int e^{x}sin(x)dx = fg - \int f'g dx\)

\(\int e^{x}sin(x)dx = sin(x)e^{x} - \int e^{x}cos(x) dx\)

by parts again:

\(\int e^{x}cos(x) dx\)

where:

f = \(cos(x)\), f’= \(-sin(x) dx\)

g’ = \(e^{x} dx\), g = \(e^{x}\)

\(\int e^{x}cos(x)dx = fg - \int f'g dx\)

\(\int e^{x}cos(x)dx = cos(x)e^{x} - \int e^{x}-sin(x) dx\)

If we now put it all together we can simplify:

\(\int e^{x} sin(x) = sin(x)e^{x} - (cos(x)e^{x} - \int e^{x}-sin(x) dx)\)

\(\int e^{x} sin(x) = sin(x)e^{x} - (cos(x)e^{x} + \int e^{x}sin(x) dx)\)

\(\int e^{x} sin(x) = sin(x)e^{x} - cos(x)e^{x} - \int e^{x}sin(x) dx\)

\(2\int e^{x} sin(x) = sin(x)e^{x} - cos(x)e^{x}\)

\(\int e^{x} sin(x) = \frac{1}{2} sin(x)e^{x} - cos(x)e^{x} + c\)

\(\int e^{x} sin(x) = -\frac{1}{2} e^{x} (cos(x) - sin(x)) + c\)

With all the following previously obtained intermediate equations:

\(\int (e^{x} cos(x) + sin(x) ) dx = \int e^{x} cos(x) dx - cos(x) + c\)

\(\int e^{x}cos(x)dx = e^{x}cos(x) - \int e^{x} sin(x) dx\)

\(\int e^{x} sin(x) = -\frac{1}{2} e^{x} (cos(x) - sin(x)) + c\)

we have:

\(\int (e^{x} cos(x) + sin(x) ) dx = e^{x}cos(x) - (-\frac{1}{2} e^{x} (cos(x) - sin(x))) - cos(x) + c\)

\(\int (e^{x} cos(x) + sin(x) ) dx = e^{x}cos(x) - (-\frac{1}{2} e^{x} cos(x) +\frac{1}{2} e^{x} sin(x)) - cos(x) + c\)

\(\int (e^{x} cos(x) + sin(x) ) dx = e^{x}cos(x) -\frac{1}{2} e^{x} cos(x) +\frac{1}{2} e^{x} sin(x)) - cos(x) + c\)

\(\int (e^{x} cos(x) + sin(x) ) dx = \frac{1}{2} e^{x} cos(x) +\frac{1}{2} e^{x} sin(x) - cos(x) + c\)

Therefore:

\(\int (e^{x} cos(x) + sin(x) ) dx = \frac{1}{2} e^{x} (cos(x) + sin(x))- cos(x) + c\)

2. Mini-coding assignments - 15 points

Sampling from function. Assume that you have a function that generates integers between 0 and 20 with the following probability distribution: \(P(x == k) = \binom{20}{k}p^{k}q^{20-k}\) where \(p = 0.25\) and \(q = 1 - p = 0.75\) and \(x \epsilon [0, 20]\). This is also known as a Binomial Distribution. Write a function to sample from this distribution. After that, generate 1000 samples from this distribution and plot the histogram of the sample. Please note that the Binomial distribution is a discrete distribution and takes values only at integer values of x between \(x \epsilon [0, 20]\). Sampling from a discrete distribution with finite values is very simple but it is not the same as sampling from a continuous distribution.

library("knitr")

f<-function (x) {
  dbinom(x,20,0.25)
}

#Let's setup an array x values
x <- seq(0,20,by=1)

p_x <- sapply(x,f)

#let's obtain f(x)
f_x <- sample(x,1000,replace=TRUE,prob=p_x)

par(mfrow=c(1,2)) 

hist(f_x, breaks=x)

Principal Components Analysis. For the auto data set attached with the final exam, please perform a Principal Components Analysis by performing an SVD on the 4 independent variables (with mpg as the dependent variable) and select the top 2 directions. Please scatter plot the data set after it has been projected to these two dimensions. Your code should print out the two orthogonal vectors and also perform the scatter plot of the data after it has been projected to these two dimensions.

In order to do PCA from SVD we need to:

Normalize the values
Calculate the co-variance matrix
Calculate the eigenvectors of the covariance matrix (orthonormal) by SVD
Validate results against prcomp function results
Select the top 2 eigenvectors that correspond to the largest 2 eigenvalues to be the new basis and project the dataset
Scatterplot the results and compare against original data

auto_mpg_data = read.fwf(file='C:\\Users\\malarconba001\\Google Drive\\CUNY\\IS605\\final_exam\\IS605_Finals_Fall2015\\auto-mpg.data',
                         widths = c(5,11,11,10,8))


colnames(auto_mpg_data) <- c("displacement", "horsepower", "weight", "acceleration","mpg")

# Normalize the data
A <- scale(data.matrix(auto_mpg_data[1:4]))

mpg <- data.matrix(auto_mpg_data[5])
colnames(mpg)<- "mpg"
# Let's calculate the covariance matrix
cov <- cov(A)

# Now, let's do SVD on the covariance Matrix
LSA <- svd(cov)

# and the variance eigenvectors are
LSA$d

## [1] 3.20893718 0.65406182 0.08295314 0.05404786

# which plots as
plot(LSA$d,type='b',pch=20,xlab='Singular value',ylab='magnitude')

We can compare the results to the ones off the prcomp function:

ir.pca <- prcomp(A,center=FALSE,scale.=FALSE)
ir.pca

## Standard deviations:
## [1] 1.7913507 0.8087409 0.2880159 0.2324820
## 
## Rotation:
##                     PC1         PC2        PC3        PC4
## displacement -0.5344668  0.25097580 -0.4903039 -0.6410604
## horsepower   -0.5417662 -0.03090597  0.8189342 -0.1867643
## weight       -0.5127678  0.43914194 -0.1583275  0.7205248
## acceleration  0.3973711  0.86209647  0.2527474 -0.1870952

ir.pca$sdev^2

## [1] 3.20893718 0.65406182 0.08295314 0.05404786

plot(ir.pca, type = "l")

From the above we can see that most of the variance can be explained between the first two dimensions.

Let’s now project the data onto the first two dimensions:

LSA <- svd(A)

# Let's now project the data onto these two dimensions
depth <-2
us <- as.matrix(LSA$u[, 1:depth])
vs <- as.matrix(LSA$v[, 1:depth])
ds <- as.matrix(diag(LSA$d)[1:depth, 1:depth])

A_dn <- us %*% ds %*% t(vs)
colnames(A_dn) <- c("displacement", "horsepower", "weight", "acceleration")
pairs(~.,data=cbind(A_dn,mpg), 
   main=paste("Scatterplot Matrix of auto-mpg Data onto the First",depth,"Dimensions."))

which we can compare against the original dataset:

#colnames(A) <- c("displacement", "horsepower", "weight", "acceleration")
pairs(~.,data=cbind(A,mpg), 
   main="Scatterplot Matrix of auto-mpg Data (Original)")

It looks pretty close to me!

Sampling in Bootstrapping. As we discussed in class, in bootstrapping we start with n data points and repeatedly sample many times with replacement. Each time, we generate a candidate data set of size n from the original data set. All parameter estimations are performed on these candidate data sets. It can be easily shown that any particular data set generated by sampling n points from an original set of size n covers roughly 63.2% of the original data set. Using probability theory and limits, please prove that this is true. After that, write a program to perform this sampling and show that the empirical observation also agrees this.

In this case, the probability of selecting any given value from the sample is p=1/n.

With this, we can also say that the proability of an individual value not being selected is \(q = \left ( 1-\frac{1}{n} \right )\). Since we’re sampling with replacement, the cumulative probability is \(q_{n} = \left ( 1-\frac{1}{n} \right )^{n}\)

With this, we can say that we’re looking for:

\(q_{\infty} = \lim_{n \rightarrow \infty } \left ( 1-\frac{1}{n} \right )^{n}\)

Which we can try and estimate the limit in R:

infinity<-1000
n <- c(1:infinity)
q <-(1-1/n)^n

plot(n,q)

lim <- max(q)

lim

## [1] 0.3676954

As we can see from the chart above, an n =1000 gets us pretty close to \(\infty\).

Therefore, \(p_{\infty} = 1- \lim_{n \rightarrow \infty } \left ( 1-\frac{1}{n} \right )^{n}\),

1-lim

## [1] 0.6323046

Computational proof:

# In this experiment we will boostrap n elements from a sequence of values from 1:n
bootstrap_experiment <- function(n){

  # Define the sequence of values to sample from
  values <- seq(1:n)
  
  # let's sample with replacement n-times
  samples <- replicate(n,{sample(values,1,replace=TRUE)})

  # return the percentage of values samples in this iteration
  
  return(length(unique(samples))/n)
}

# The output of a single experiment with n=1000 is:
n <- 10000
bootstrap_experiment(n)

## [1] 0.6288

#now, let's repeat the experiment m-times and display the average

m<- 100
mean(replicate(m,{bootstrap_experiment(n)}))

## [1] 0.632006

3. Mini-project - 20 points

In this mini project, you’ll perform a Multivariate Linear Regression analysis using Gradient Descent. The data set consists of two predictor variables and one response variable. The predictor variables are living area in square feet and number of bedrooms. The response variable is the price of the house. You have 47 data points in total.

Since both the number of rooms and the living area are in different units, it makes it hard to compare them in relative terms. One way to compensate for this is to standardize the variables. In order to standardize, you estimate the mean and standard deviation of each variable and then compute new versions of these variables. For instance, if you have a variable x, then then standardized version of x is \(x_{std} = (x - \mu)/\sigma\) where \(\mu\) and \(\sigma\) are the mean and standard deviation of x, respectively.

As we saw in the gradient descent equations, we introduce a dummy variable \(x_{0} = 1\) in order to calculate the intercept term of the linear regression. Please standardize the 2 variables, introduce the dummy variable and then write a function to perform gradient descent on this data set. You’ll repeat gradient descent for a range of \(\alpha\) values. Please use \(\alpha = (0.001; 0.01; 0.1; 1.0)\) as your choices. For each value of \(\alpha\) perform about 100 iterations and compute \(J(\theta)\) at the end of each iteration. Please plot \(J(\theta)\) versus number of iterations for each of the \(4\alpha\) choices.

Once you have your final gradient descent solution, compare this with regular linear regression (using the built-in function in R). Also solve using Ordinary Least Squares approach. Please document all 3 solutions in your submission. For bonus points (4 additional points), also perform stochastic gradient descent in addition to regular gradient descent and show those results as well.

Finally, using the built-in cross-validation function in R, perform a 5-fold cross-validation and estimate the test prediction error. In order to perform the cross-validation, you can also write your own procedure if you prefer to use your own instead of the built-in functions.

You can choose any one of the 3 approaches to perform cross-validation on. No need to do it for all 3 approaches.

Data Loading and Standardization

First, let’s load the dataset and combine it all into a combined dataframe:

library("knitr")
ex3x <- as.matrix(read.table('C:/Users/malarconba001/Google Drive/CUNY/IS605/final_exam/IS605_Finals_Fall2015/mini-project-data/ex3x.dat'))

colnames(ex3x)<-c("sqf","bedrooms")

ex3y <- as.matrix(read.table('C:/Users/malarconba001/Google Drive/CUNY/IS605/final_exam/IS605_Finals_Fall2015/mini-project-data/ex3y.dat'))

colnames(ex3y)<-c("price")

Now, let’s standardize the bedrooms and sqf and combine the columns into a single dataset:

#standardize the x values
ex3x <- scale(ex3x)

house_data <- data.frame(ex3x,ex3y)

pairs(~.,data=house_data, 
   main="Scatterplot Matrix of House Prices Data")

head(house_data)

##           sqf   bedrooms  price
## 1  0.13000987 -0.2236752 399900
## 2 -0.50418984 -0.2236752 329900
## 3  0.50247636 -0.2236752 369000
## 4 -0.73572306 -1.5377669 232000
## 5  1.25747602  1.0904165 539900
## 6 -0.01973173  1.0904165 299900

Modeling price = f(sqf,bedrooms)

LGFGS Linear Search

# inspired by: http://www.r-bloggers.com/linear-regression-by-gradient-descent/
# and
# https://class.coursera.org/ml-005/lecture/20

cost <- function(X, y, theta) {
  sum( (rowSums(X %*% theta) - cbind(y,y,y))^2 ) / (2*length(y))
}

regression_ld <- function(x,y,alpha,num_iters){
  # initialize coefficients
  theta <- matrix(c(0,0,0,0,0,0,0,0,0), nrow=3)

  # add a column of 1's for the intercept coefficient 
  X <- as.matrix(cbind(1, as.matrix(x)))

  cost_history <- double(num_iters)
  # gradient descent
  for (i in 1:num_iters) {
    error <- as.matrix((as.matrix(X)%*%as.matrix(theta) - cbind(y,y,y)))
    delta <- t(X) %*% error / length(y)
    theta <- theta - alpha * delta
    cost_history[i] <- cost(X, y, theta[,1])
  }

  theta <- theta[,1]
  print(theta)

  plot(cost_history, type='line', col='blue', lwd=2, main=paste('Cost function: alpha=',alpha) , ylab='cost',xlab='Iterations')
  return(min(cost_history))
}


regression_ld(ex3x,ex3y,0.01,1000)

##                   sqf   bedrooms 
## 340397.964 109848.008  -5866.454

## [1] 6130632656

regression_ld(ex3x,ex3y,0.1,1000)

##                   sqf   bedrooms 
## 340412.660 110631.050  -6649.474

## [1] 6129840152

regression_ld(ex3x,ex3y,1,1000)

##                   sqf   bedrooms 
## 340412.660 110631.050  -6649.474

## [1] 6129840152

Linear Regression

With a linear regression we have:

house_data.glm <- glm(price ~ sqf+bedrooms, data=house_data)
summary(house_data.glm) # show results

## 
## Call:
## glm(formula = price ~ sqf + bedrooms, data = house_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -130582   -43636   -10829    43698   198147  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   340413       9637  35.323  < 2e-16 ***
## sqf           110631      11758   9.409 4.22e-12 ***
## bedrooms       -6650      11758  -0.566    0.575    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 4365189199)
## 
##     Null deviance: 7.1921e+11  on 46  degrees of freedom
## Residual deviance: 1.9207e+11  on 44  degrees of freedom
## AIC: 1181.5
## 
## Number of Fisher Scoring iterations: 2

Least Squares

A <- cbind(1,ex3x)

b <- ex3y


A_T <- t(A)

A_TA <- A_T %*% A

A_Tb <- A_T %*% b

A_TA_inv <- solve(A_TA)

x_hat <- A_TA_inv %*% A_Tb
x_hat

##               price
##          340412.660
## sqf      110631.050
## bedrooms  -6649.474

5-fold Cross Validation of the Linear Model

library(boot)

# 5-fold cross validation
(cv.err.5 <- cv.glm(house_data, house_data.glm, K = 5)$delta)

## [1] 4560119246 4505164554

# test error prediction
mu_hat <- fitted(house_data.glm)
house_data.diag <- glm.diag(house_data.glm)
(cv.err <- mean((house_data.glm$y - mu_hat)^2/(1 - house_data.diag$h)^2))

## [1] 4647800409

In order to interpret the delta for this linear model, we need to compare it against some other model. Let’s see how the linear model compares against a various degrees of polynomial models (1-3):

do_auto_poly_regression <- function(data,power){
  glm.fit <- glm(price~poly(sqf+bedrooms,power), data=data)  
  return(cv.glm(data,glm.fit,K=5)$delta[1])
}


degree<-c(1:3)
cv_err <- sapply(degree,  function(x) do_auto_poly_regression(house_data, x))
degree <- c(0,degree)
cv_err <- c(cv.err.5[1],cv_err)
cv_err

## [1]  4560119246  7629033735  6912612787 10348525159

plot(degree,cv_err,type='b'
     ,main="Cross Validation of Linear and Polynomial Models of house_data"
     ,xlab="Polynomial Degree"
     ,ylab="Cross Validation Error")

As we can see, the linear model offers the least amount of cross validation error in comparison to other polynomial models.

Conclusion

In this excercise we compared three different techniques to do a multivariate linear regression of home prices. The three modeling techniques produce a multivariate linear model in the form of \(price = 110631\dot sqf -6650 bedrooms + 340413\). It is worth mentioning the results of the glm model reveal that the number of bedrooms is NOT statistically significant.

In regards to the LGFGS linear search method, we see that it performs much better with an \(\alpha=1\). At this alpha level, we find a solution that is comparable to the Least Squares or R’s Linear Regression library. In addition, the squared error appears to be minimized after only 40 iterations.

IS 605 FUNDAMENTALS OF COMPUTATIONAL MATHEMATICS: Final Exam

Mauricio Alarcon rmalarc@msn.com

December 20, 2015

1. Review of Essential Concepts - 15 points

2. Mini-coding assignments - 15 points

3. Mini-project - 20 points

Data Loading and Standardization

Modeling price = f(sqf,bedrooms)

LGFGS Linear Search

Linear Regression

Least Squares

5-fold Cross Validation of the Linear Model

Conclusion