Assignment7

1

According to the author, smoothing splines are very commonly used. Based on what the paper says:

1. For what orders are they defined? Smoothing splines are only defined for odd polynomial orders k. The paper states “Unlike trend filtering, smoothing splines are only defined for an odd polynomial order k.”
1. Which is the most common value for k? The most common value for k used with smoothing splines is k=3, corresponding to cubic smoothing splines. The paper states “In practice, it seems that the case k = 3 (i.e., cubic smoothing splines) is by far the most common case considered.”
1. What type of penalty use smoothing splines? Have you seen this in another method in this class? Smoothing splines use a squared L2 penalty on the (k+1)/2 derivative of the fitted function f. This is similar to the ridge regression penalty, which penalizes the squared L2 norm of the coefficients.
1. In the empirical comparisons, which model performs best? What happens as we increase the degrees of freedom (df)? In the empirical comparisons on simulated data in Section 2.2, trend filtering generally performs best, especially for functions with spatially inhomogeneous smoothness. As the degrees of freedom increase, trend filtering is better able to adapt to the local level of smoothness compared to smoothing splines. Even when allowing smoothing splines to have more degrees of freedom, trend filtering still fits the data better in regions of high curvature while smoothing splines overfit in smooth regions. The author attributes trend filtering’s superior performance to its use of an L1 penalty, which allows it to adapt to local smoothness, compared to the L2 penalty of smoothing splines.

2

The author shows several methods in lasso form.

1. Which methods are these? The author shows that both trend filtering and locally adaptive regression splines can be represented in lasso form. Specifically, trend filtering is expressed as: \[\hat{\alpha} = \arg\min_{\alpha \in \mathbb{R}^n} \frac{1}{2} |y - H\alpha|2^2 + \lambda \sum{j=k+2}^n |\alpha_j|\] and locally adaptive regression splines as:

\[\hat{\theta} = \arg\min_{\theta \in \mathbb{R}^n} \frac{1}{2} |y - G\theta|2^2 + \lambda \sum{j=k+2}^n |\theta_j|\] where H and G are appropriate basis matrices for the two methods.

1. What penalty is used in this case? Both of these lasso problems use an L1 penalty on the coefficients, but only on the coefficients \(\alpha_{k+2}, ..., \alpha_n\) and \(\theta_{k+2}, ..., \theta_n\), respectively. This is equivalent to placing an L1 penalty on the discrete (k+1)-st derivative of the fitted values, which encourages adaptive knot selection.
1. Which method performs better in the empirical comparisons in 3.4? In Section 3.4, the author compares trend filtering and locally adaptive regression splines empirically on the simulated examples from Section 2.2 (the “hills” and “doppler” examples). He finds that for any fixed value of the tuning parameter \(\lambda\), the trend filtering and locally adaptive regression spline fits are practically indistinguishable, even though the two methods are not formally equivalent for polynomial orders k >= 2. Both methods seem to perform similarly well at adaptively fitting these spatially inhomogeneous signals.
1. What is observed for small values of \(\lambda\)? The author notes that for very small values of \(\lambda\), slight differences between the trend filtering and locally adaptive regression spline estimates start to appear, but these are not practically meaningful differences. In the asymptotic theory in Section 5, the author proves that trend filtering and locally adaptive regression splines converge to each other at the minimax rate when \(\lambda\) is of an appropriate order, but the empirical comparisons suggest that the two methods perform similarly for a wide range of \(\lambda\) values, deviating only slightly for very small \(\lambda\).

3

The author examines astrophysics data.

1. Briefly describe the data used and what the goal of the analysis is. The data comes from an astrophysics simulation model for quasar spectra. A quasar spectrum shows the relative flux (brightness) of a quasar as a function of wavelength. The true spectrum is believed to be spatially inhomogeneous, with regions of rapid oscillations (absorption lines called the “Lyman-alpha forest”) as well as smooth regions. The goal is to estimate the true spectrum from noisy observations at n = 1172 wavelengths.
1. Briefly describe the experimental setting (i.e., methods used, parameters, frameworks/packages, tuning, comparison metric). The author compares trend filtering, smoothing splines, and wavelet smoothing. Each method is fit over a range of 146 complexity parameters (degrees of freedom) ranging from 4 to 150. Smoothing splines are fit using the smooth.spline function in R, while wavelet smoothing uses the wavethresh package. Locally adaptive regression splines are not compared due to their similarity to trend filtering. The estimates are compared based on their mean squared error in estimating the true spectrum, averaged over 20 simulation replications.
1. What method performs best and why, according to the author? Trend filtering performs the best, achieving the lowest average squared error, especially for lower complexity models. The author attributes this to trend filtering’s ability to adapt to the local structure of the true spectrum, fitting the smooth regions while also capturing the high-frequency oscillations. Smoothing splines perform second best, but do not adapt as well to the inhomogeneity of the signal. Wavelet smoothing performs the worst, as it tends to overfit the noisy Lyman-alpha forest. Even when comparing trend filtering to adaptively chosen smoothing splines (different penalty parameters in smooth vs wiggly regions), trend filtering performs just as well or better. The strong performance of trend filtering is consistent with the theoretical results showing that it achieves the minimax convergence rate for spatially inhomogeneous signals.

Assignment7

Ao (Alan) Huang

2024-04-16

1

2

3

The codes are also publicly available at https://rpubs.com/AlanHuang/CSC642-R_Assignment7