Question 1

A fractional power makes the shrinkage too weak to be useful in practice. The regularisation term penalises the model’s fit in an effort to reduce overfitting - by using a fractional (\(<1\)) \(p\), we may see only a small regularization effect. This wouldn’t be useful, so we don’t do that.

https://stats.stackexchange.com/questions/269298/why-do-we-only-see-l-1-and-l-2-regularization-but-not-other-norms

Question 2

The loss function used for LASSO may not necessarily be applicable to other models. This means that the subset of features selected by LASSO will not necessarily be best in other models.

More research needed here, but maybe the No Free Lunch Theorem applies?

Question 3

  1. Simulate a linear dataset, which includes some zeros in parameters \(\beta\)
sim_linear_data = function (k=4, n=3) {
    # we have k predictors and n rows
    # Assume x's come from standard normal distribution

    set.seed(20)
    x = rnorm(k*n, 0, 1)
    # convert x's into a n by k matrix and add constant term to create design matrix:
    X = cbind(1, matrix(x, ncol=k))

    # Pick j < k non-zero betas
    num_non_zero = floor(runif(1, 1, k))
    betas = append(runif(num_non_zero), runif(k+1 - num_non_zero))
    
    # Assume epsilon is from standard normal
    epsilon = rnorm(n, 0, 1)
    
    # Calculate y vector
    y = betas %*% t(X) + epsilon
    # # Build a dataframe of our simulated dataset
    # data = cbind(X, t(y))
    # simulated_data = as.data.frame(X)

    # Return everything
    list(betas, epsilon, X, t(y))
}

results = sim_linear_data(20, 100)
true_betas = results[[1]]
true_epsilon = results[[2]]
X = results[[3]]
y = results[[4]]
  1. Use the glmnet package to select LASSO using k-fold cross-validation (CV)

References:

library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.0-2
fit = glmnet(X, y)
plot(fit)

Each curve corresponds to a variable, showing the path of the coefficient against the \(l_1\)-norm as \(\lambda\) varies.

We perform cross validation and plot the CV curve to see how the model performs when cross-validated. Note that the default number of folds is 10:

set.seed(21)
cv = cv.glmnet(X, y)
plot(cv)

  1. Assign different CV folds, observe whether the features selected will variate in different runs. Comment on that.

honestly idk what the question means here, i’m presuming they mean different numbers of CV folds (aside from the default 10) lol. Could be wrong…

# function for plotting n CV plots, starting from 5-fold
plot_CVs = function (num_CVs, X, y) {
    set.seed(22)
    numrows = ceiling(num_CVs/2)
    par(mfrow=c(numrows, 2))
    for (i in 1:num_CVs) {
        plot(cv.glmnet(X, y, nfolds=5*i)) # cv.glmnet really does all the heavy lifting here, if you want you can pull the coefficients out and inspect them too. glmnet has great documentation on this.
    }
}

plot_CVs(6, X, y)

  1. Increase/decrease the size of teh dataset and see how the results will change.
# We will use the functions we declared earlier to explore different dataset sizes quickly.

wrapper_function = function(num_predictors, dataset_size, number_of_CVs_to_test) {
    # simulate dataset
    simulation = sim_linear_data(k=num_predictors, n=dataset_size)
    X = simulation[[3]]
    y = simulation[[4]]
    
    # pass into cross validation plotting
    plot_CVs(number_of_CVs_to_test, X, y)
}

wrapper_function(10, 100000, 4)

  1. Do some additional explorations and share your findings regarding the properties of lasso (optional)

If it’s optional, we don’t do those.