Principal Component Analysis (PCA) is a powerful statistical technique used to extract meaningful patterns from high-dimensional or noisy data. In the context of noise removal and signal reconstruction, PCA helps isolate the true signal from the noise, making it easier to analyze and visualize the underlying patterns.

Noise Removal with PCA:

Noise refers to random variations in the data that do not represent the true signal. PCA is used to reduce the noise by identifying the principal components of the data, which capture the most significant variance (patterns) while filtering out the components with less variance (often corresponding to noise).

Simulate Noisy Time-Series Data Here i have taken as simple sine wave as actual or true signal and introduced three different noises on it. Purpose it to filter out the true signal from mix of true signal with noise.

# Load required libraries
library(ggplot2)
set.seed(123)  # For reproducibility
time <- seq(0, 10, by = 0.1)
true_signal <- sin(2 * pi * 0.5 * time)  # True sine wave signal

# Add multiple types of noise to the signal
noise1 <- rnorm(length(time), mean = 0, sd = 0.5)  # Random Gaussian noise
noise2 <- rnorm(length(time), mean = 0, sd = 0.3)  # Additional Gaussian noise
noise3 <- rpois(length(time), lambda = 0.3) - 0.3  # Poisson noise
noisy_signal <- true_signal + noise1 + noise2 + noise3  # Combine all noise sources

Apply PCA

Here, the noisy signal is transformed into a higher-dimensional space using time-delay embedding (creating lagged versions of the signal). The embed() function generates this matrix (lagged_matrix), which is then subjected to PCA using the prcomp() function. The PCA identifies the components that explain the most variance in the data (signal) and separates them from the components with lower variance (often corresponding to noise).

# Step 2: Apply PCA
embedding_dim <- 10  # Number of lagged components to include
lagged_matrix <- embed(noisy_signal, embedding_dim)  # Lag the signal to create embedding

# Perform PCA on the lagged matrix
pca_result <- prcomp(lagged_matrix, center = TRUE, scale. = TRUE)

Signal Reconstruction with PCA

Signal reconstruction involves taking the noisy data, applying PCA to isolate the main components, and then reconstructing the data using only the most important components (those that represent the signal).

Reconstruct the Signal

The most important principal components are selected (in this case, the first 2 components). These components represent the signal, which has the highest variance. The reconstructed_matrix is computed by multiplying the selected principal components with their corresponding eigenvectors. The final reconstructed signal (reconstructed_signal) is obtained by averaging the rows of the reconstructed matrix. This process removes noise because the low-variance components that correspond to random fluctuations have been discarded, leaving only the main patterns of the signal.

# Step 3: Reconstruct the Signal
num_components <- 5  # Number of principal components to keep

# Reconstruct the signal by multiplying the principal components and loadings
reconstructed_matrix <- pca_result$x[, 1:num_components] %*% t(pca_result$rotation[, 1:num_components])

# Reconstructed signal (row sums to match the structure of the original signal)
reconstructed_signal <- rowSums(reconstructed_matrix)

Rescale the Reconstructed Signal

The amplitude of the reconstructed signal may differ from the true signal because of PCA’s scaling effects. To correct this, we rescale the reconstructed signal to have the same standard deviation as the true signal, ensuring it matches the original sine wave’s amplitude.

# Re-scale the reconstructed signal back to match the original amplitude of the true signal
reconstructed_signal <- reconstructed_signal * sd(true_signal) / sd(reconstructed_signal)

Visualization and Evaluation

The reconstructed signal is then visualized alongside the noisy signal and the true signal to compare the effectiveness of the noise removal.

Plot the Results

In this step, the noisy signal (in red), the true signal (in blue), and the reconstructed signal (in green) are plotted together for comparison. The plot visually shows how PCA has removed much of the noise from the original signal, leaving a smoother curve that more closely matches the true signal.

# Step 4: Plot the Results
plot(
  time[-(1:(embedding_dim - 1))], noisy_signal[-(1:(embedding_dim - 1))],
  type = "l", col = "red", lwd = 1.5,
  main = "Noise Removal Using PCA with Multiple Noise Types",
  xlab = "Time", ylab = "Value"
)
lines(time[-(1:(embedding_dim - 1))], true_signal[-(1:(embedding_dim - 1))],
      col = "blue", lwd = 2, lty = 2)
lines(time[-(1:(embedding_dim - 1))], reconstructed_signal, col = "green", lwd = 2, lty = 3)
legend(
  "topright",
  legend = c("Noisy Signal", "True Signal", "Reconstructed Signal"),
  col = c("red", "blue", "green"), lty = c(1, 2, 3), cex = 0.8
)

Evaluate Performance

Finally, the Mean Squared Error (MSE) between the true signal and the reconstructed signal is calculated. The MSE provides a quantitative measure of how well the reconstructed signal matches the true signal. A lower MSE indicates that the reconstruction is more accurate and that the noise has been effectively removed.

# Step 5: Evaluate Performance
# Calculate Mean Squared Error (MSE) between the true and reconstructed signal
mse <- mean((true_signal[-(1:(embedding_dim - 1))] - reconstructed_signal)^2)
cat("Mean Squared Error (MSE):", mse, "\n")
## Mean Squared Error (MSE): 1.082562

Conclusion

1.082562 is a relatively moderate MSE, which suggests that there is some deviation between the true sine wave and the reconstructed signal. The PCA-based reconstruction may not perfectly capture the original sine wave, but it’s providing a reasonable approximation and same is visible in the plots shown above.

Advantages of PCA for Noise Removal and Signal Reconstruction: