Introduction

This presentation shows kernel regression and the plots

data(freeny)
View(freeny)
attach(freeny)
library(np)
## Nonparametric Kernel Methods for Mixed Datatypes (version 0.60-18)
## [vignette("np_faq",package="np") provides answers to frequently asked questions]
## [vignette("np",package="np") an overview]
## [vignette("entropy_np",package="np") an overview of entropy-based methods]
bw.opt <- npregbw(
  xdat = freeny[, c("price.index", "income.level", "market.potential")],
  ydat = freeny$lag.quarterly.revenue
)
## Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 /Multistart 1 of 3 -Multistart 1 of 3 \Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 /Multistart 1 of 3 -Multistart 2 of 3 |Multistart 2 of 3 |Multistart 2 of 3 /Multistart 2 of 3 -Multistart 2 of 3 \Multistart 2 of 3 |Multistart 2 of 3 |Multistart 2 of 3 |Multistart 2 of 3 /Multistart 2 of 3 -Multistart 3 of 3 |Multistart 3 of 3 |Multistart 3 of 3 /Multistart 3 of 3 -Multistart 3 of 3 \Multistart 3 of 3 |Multistart 3 of 3 |Multistart 3 of 3 |Multistart 3 of 3 /Multistart 3 of 3 -                   
bw.opt
## 
## Regression Data (39 observations, 3 variable(s)):
## 
##               price.index income.level market.potential
## Bandwidth(s):  0.05258261   0.01344569      0.009177238
## 
## Regression Type: Local-Constant
## Bandwidth Selection Method: Least Squares Cross-Validation
## Bandwidth Type: Fixed
## Objective Function Value: 0.0004453551 (achieved on multistart 1)
## 
## Continuous Kernel Type: Second-Order Gaussian
## No. Continuous Explanatory Vars.: 3
model.kernel<-npreg(bw.opt)
summary(model.kernel)
## 
## Regression Data: 39 training points, in 3 variable(s)
##               price.index income.level market.potential
## Bandwidth(s):  0.05258261   0.01344569      0.009177238
## 
## Kernel Regression Estimator: Local-Constant
## Bandwidth Type: Fixed
## Residual standard error: 0.01215325
## R-squared: 0.9984953
## 
## Continuous Kernel Type: Second-Order Gaussian
## No. Continuous Explanatory Vars.: 3

using the Gaussian with 3 explanatory variable

The bandwidths are the most important numbers in kernel regression.

They control the smoothness of the regression curve:

The model is more adaptive since each variable has its own bandwidth.

R-squared: 0.9984953

The model explains 99.85% of the variation in the dependent variable; only 0.15% is unexplained. This is due to errors. The predictors fit the data extremely well.

This is because the kernel is flexible and adapts to the data shape, hence producing lower residuals and higher R-squared.

# creating time index
freeny$time <- 1:nrow(freeny)

Since time in data has chronological order, return the number of rows into the data set of 39, and the index created is called time. This time index allows for the analysis of trends and patterns over the specified period

# 2. Load np package
library(np)

# 3. Fit kernel regression
bw.time <- npregbw(lag.quarterly.revenue ~ time, data = freeny)
## Multistart 1 of 1 |Multistart 1 of 1 |Multistart 1 of 1 |Multistart 1 of 1 /Multistart 1 of 1 |Multistart 1 of 1 |                   
bw.time
## 
## Regression Data (39 observations, 1 variable(s)):
## 
##                    time
## Bandwidth(s): 0.9959131
## 
## Regression Type: Local-Constant
## Bandwidth Selection Method: Least Squares Cross-Validation
## Formula: lag.quarterly.revenue ~ time
## Bandwidth Type: Fixed
## Objective Function Value: 0.0002768781 (achieved on multistart 1)
## 
## Continuous Kernel Type: Second-Order Gaussian
## No. Continuous Explanatory Vars.: 1
model.time <- npreg(bw.time)

Interpretation of Kernel Regression Results (Time Trend Model)

The model aims to capture how lag.quarterly.revenue changes over the sequence of time periods.

The optimal bandwidth selected was 0.9959,by using Least Squares Cross-Validation (LSCV). This bandwidth produces a smooth estimate of the revenue trend, avoiding excessive fluctuations while still capturing the underlying pattern.

# 4. Create the scatter plot first!
# -------------------------------
# Kernel Regression: Revenue vs Time
# -------------------------------

# Step 1: Create a scatter plot of the observed data
plot(
  freeny$time,                        # X-axis: Time
  freeny$lag.quarterly.revenue,       # Y-axis: Lag Quarterly Revenue
  pch = 19,                           # Solid circle points
  col = "blue",                        # Points colored blue
  xlab = "Time",                        # X-axis label
  ylab = "Lag Quarterly Revenue",      # Y-axis label
  main = "Kernel Regression: Revenue vs Time"  # Plot title
)

# Step 2: Add the kernel regression fitted curve
lines(
  freeny$time,            # X-axis: Time
  fitted(model.time),     # Y-axis: Fitted values from kernel regression
  col = "red",            # Curve colored red
  lwd = 2                 # Line width thicker for visibility
)

### Kernel Regression Curve: lag.quarterly.revenue vs time

The kernel regression curve provides a smooth curve, this is because the model uses a data-driven bandwidth of approximately 1, the resulting curve is smooth and captures the overall trend while removing short-term fluctuations.

The curve shows a clear upward trend in revenue over time. Unlike a linear regression line, this means that revenue does not constantly increase with time.

library(np)
library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
# Fit the kernel regression to be used in 4D Visualisation 
bw.opt <- npregbw(
  xdat = freeny[, c("price.index", "income.level", "market.potential")],
  ydat = freeny$lag.quarterly.revenue
)
## Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 /Multistart 1 of 3 -Multistart 1 of 3 \Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 /Multistart 1 of 3 -Multistart 2 of 3 |Multistart 2 of 3 |Multistart 2 of 3 /Multistart 2 of 3 -Multistart 2 of 3 \Multistart 2 of 3 |Multistart 2 of 3 /Multistart 2 of 3 |Multistart 2 of 3 |Multistart 2 of 3 /Multistart 3 of 3 |Multistart 3 of 3 |Multistart 3 of 3 /Multistart 3 of 3 -Multistart 3 of 3 \Multistart 3 of 3 |Multistart 3 of 3 |Multistart 3 of 3 /Multistart 3 of 3 -                   
model.kernel <- npreg(bw.opt)

# Extract fitted m(x)
freeny$m_x <- fitted(model.kernel)

# 4D interactive scatter plot
plot_ly(
  data = freeny,
  x = ~price.index,
  y = ~income.level,
  z = ~market.potential,
  color = ~m_x,                  # 4th dimension: predicted revenue
  colorscale = "Viridis",        # correct syntax
  type = "scatter3d",
  mode = "markers",
  marker = list(size = 5)
) %>%
  layout(
    title = "4D Kernel Regression: Predicted Revenue vs Predictors",
    scene = list(
      xaxis = list(title = "Price Index"),
      yaxis = list(title = "Income Level"),
      zaxis = list(title = "Market Potential")
    )
  )
## Warning: 'scatter3d' objects don't have these attributes: 'colorscale'
## Valid attributes include:
## 'connectgaps', 'customdata', 'customdatasrc', 'error_x', 'error_y', 'error_z', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'legendgroup', 'legendgrouptitle', 'legendrank', 'line', 'marker', 'meta', 'metasrc', 'mode', 'name', 'opacity', 'projection', 'scene', 'showlegend', 'stream', 'surfaceaxis', 'surfacecolor', 'text', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'visible', 'x', 'xcalendar', 'xhoverformat', 'xsrc', 'y', 'ycalendar', 'yhoverformat', 'ysrc', 'z', 'zcalendar', 'zhoverformat', 'zsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'

4D Kernel Regression Plot: Interpretation

The 4D plot visualizes the estimated kernel regression function m(x) for lagged quarterly revenue using three predictors: price index, income level, and market potential.

  • X-axis: Price Index

  • Y-axis: Income Level

  • Z-axis: Market Potential

  • Color: Predicted lag quarterly revenue (m(x))

The color scale shows the magnitude of predicted revenue, with lighter colors indicating higher revenue levels, while deep dark colors show lower revenue levels.

The plot allows us to observe how combinations of predictors (price index, income, and market potential) interact to influence revenue, revealing nonlinear patterns and regions of high or low revenue that would not be captured by a linear model.

Each of the predictors has a different impact on revenue; this is explained below.

price index

There is a negative relationship between predicted revenue and price index.

Using the laws of demand, an increase in prices of goods or services reduces the total demand of goods and services in an economy, hence reducing the total revenue since they will buy fewer goods and services.

Also, a reduction in prices of goods and services will increase the demand of goods and services in an economy; hence, this will have a positive impact on the total revenue in an economy.

Income Level

There is a positive relationship between and income levels predicted revenue.

An increase in income levels means that there will be an increase in disposable income; consumers have more money in their pockets. This will increase total revenue since more goods and services are consumed in the economy.

Market Potential There is a positive relationship between market potential and predicted revenue. A larger market size means that there is a potential for increasing the revenue; also, a small market size reduces the revenue in an economy.