This presentation shows kernel regression and the plots
data(freeny)
View(freeny)
attach(freeny)
library(np)
## Nonparametric Kernel Methods for Mixed Datatypes (version 0.60-18)
## [vignette("np_faq",package="np") provides answers to frequently asked questions]
## [vignette("np",package="np") an overview]
## [vignette("entropy_np",package="np") an overview of entropy-based methods]
bw.opt <- npregbw(
xdat = freeny[, c("price.index", "income.level", "market.potential")],
ydat = freeny$lag.quarterly.revenue
)
## Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 /Multistart 1 of 3 -Multistart 1 of 3 \Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 /Multistart 1 of 3 -Multistart 2 of 3 |Multistart 2 of 3 |Multistart 2 of 3 /Multistart 2 of 3 -Multistart 2 of 3 \Multistart 2 of 3 |Multistart 2 of 3 |Multistart 2 of 3 |Multistart 2 of 3 /Multistart 2 of 3 -Multistart 3 of 3 |Multistart 3 of 3 |Multistart 3 of 3 /Multistart 3 of 3 -Multistart 3 of 3 \Multistart 3 of 3 |Multistart 3 of 3 |Multistart 3 of 3 |Multistart 3 of 3 /Multistart 3 of 3 -
bw.opt
##
## Regression Data (39 observations, 3 variable(s)):
##
## price.index income.level market.potential
## Bandwidth(s): 0.05258261 0.01344569 0.009177238
##
## Regression Type: Local-Constant
## Bandwidth Selection Method: Least Squares Cross-Validation
## Bandwidth Type: Fixed
## Objective Function Value: 0.0004453551 (achieved on multistart 1)
##
## Continuous Kernel Type: Second-Order Gaussian
## No. Continuous Explanatory Vars.: 3
model.kernel<-npreg(bw.opt)
summary(model.kernel)
##
## Regression Data: 39 training points, in 3 variable(s)
## price.index income.level market.potential
## Bandwidth(s): 0.05258261 0.01344569 0.009177238
##
## Kernel Regression Estimator: Local-Constant
## Bandwidth Type: Fixed
## Residual standard error: 0.01215325
## R-squared: 0.9984953
##
## Continuous Kernel Type: Second-Order Gaussian
## No. Continuous Explanatory Vars.: 3
The bandwidths are the most important numbers in kernel regression.
They control the smoothness of the regression curve:
The model is more adaptive since each variable has its own bandwidth.
R-squared: 0.9984953
The model explains 99.85% of the variation in the dependent variable; only 0.15% is unexplained. This is due to errors. The predictors fit the data extremely well.
This is because the kernel is flexible and adapts to the data shape, hence producing lower residuals and higher R-squared.
# creating time index
freeny$time <- 1:nrow(freeny)
Since time in data has chronological order, return the number of rows into the data set of 39, and the index created is called time. This time index allows for the analysis of trends and patterns over the specified period
# 2. Load np package
library(np)
# 3. Fit kernel regression
bw.time <- npregbw(lag.quarterly.revenue ~ time, data = freeny)
## Multistart 1 of 1 |Multistart 1 of 1 |Multistart 1 of 1 |Multistart 1 of 1 /Multistart 1 of 1 |Multistart 1 of 1 |
bw.time
##
## Regression Data (39 observations, 1 variable(s)):
##
## time
## Bandwidth(s): 0.9959131
##
## Regression Type: Local-Constant
## Bandwidth Selection Method: Least Squares Cross-Validation
## Formula: lag.quarterly.revenue ~ time
## Bandwidth Type: Fixed
## Objective Function Value: 0.0002768781 (achieved on multistart 1)
##
## Continuous Kernel Type: Second-Order Gaussian
## No. Continuous Explanatory Vars.: 1
model.time <- npreg(bw.time)
The model aims to capture how lag.quarterly.revenue
changes over the sequence of time periods.
The optimal bandwidth selected was 0.9959,by using Least Squares Cross-Validation (LSCV). This bandwidth produces a smooth estimate of the revenue trend, avoiding excessive fluctuations while still capturing the underlying pattern.
# 4. Create the scatter plot first!
# -------------------------------
# Kernel Regression: Revenue vs Time
# -------------------------------
# Step 1: Create a scatter plot of the observed data
plot(
freeny$time, # X-axis: Time
freeny$lag.quarterly.revenue, # Y-axis: Lag Quarterly Revenue
pch = 19, # Solid circle points
col = "blue", # Points colored blue
xlab = "Time", # X-axis label
ylab = "Lag Quarterly Revenue", # Y-axis label
main = "Kernel Regression: Revenue vs Time" # Plot title
)
# Step 2: Add the kernel regression fitted curve
lines(
freeny$time, # X-axis: Time
fitted(model.time), # Y-axis: Fitted values from kernel regression
col = "red", # Curve colored red
lwd = 2 # Line width thicker for visibility
)
### Kernel Regression Curve: lag.quarterly.revenue vs time
The kernel regression curve provides a smooth curve, this is because the model uses a data-driven bandwidth of approximately 1, the resulting curve is smooth and captures the overall trend while removing short-term fluctuations.
The curve shows a clear upward trend in revenue over time. Unlike a linear regression line, this means that revenue does not constantly increase with time.
library(np)
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
# Fit the kernel regression to be used in 4D Visualisation
bw.opt <- npregbw(
xdat = freeny[, c("price.index", "income.level", "market.potential")],
ydat = freeny$lag.quarterly.revenue
)
## Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 /Multistart 1 of 3 -Multistart 1 of 3 \Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 /Multistart 1 of 3 -Multistart 2 of 3 |Multistart 2 of 3 |Multistart 2 of 3 /Multistart 2 of 3 -Multistart 2 of 3 \Multistart 2 of 3 |Multistart 2 of 3 /Multistart 2 of 3 |Multistart 2 of 3 |Multistart 2 of 3 /Multistart 3 of 3 |Multistart 3 of 3 |Multistart 3 of 3 /Multistart 3 of 3 -Multistart 3 of 3 \Multistart 3 of 3 |Multistart 3 of 3 |Multistart 3 of 3 /Multistart 3 of 3 -
model.kernel <- npreg(bw.opt)
# Extract fitted m(x)
freeny$m_x <- fitted(model.kernel)
# 4D interactive scatter plot
plot_ly(
data = freeny,
x = ~price.index,
y = ~income.level,
z = ~market.potential,
color = ~m_x, # 4th dimension: predicted revenue
colorscale = "Viridis", # correct syntax
type = "scatter3d",
mode = "markers",
marker = list(size = 5)
) %>%
layout(
title = "4D Kernel Regression: Predicted Revenue vs Predictors",
scene = list(
xaxis = list(title = "Price Index"),
yaxis = list(title = "Income Level"),
zaxis = list(title = "Market Potential")
)
)
## Warning: 'scatter3d' objects don't have these attributes: 'colorscale'
## Valid attributes include:
## 'connectgaps', 'customdata', 'customdatasrc', 'error_x', 'error_y', 'error_z', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'legendgroup', 'legendgrouptitle', 'legendrank', 'line', 'marker', 'meta', 'metasrc', 'mode', 'name', 'opacity', 'projection', 'scene', 'showlegend', 'stream', 'surfaceaxis', 'surfacecolor', 'text', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'visible', 'x', 'xcalendar', 'xhoverformat', 'xsrc', 'y', 'ycalendar', 'yhoverformat', 'ysrc', 'z', 'zcalendar', 'zhoverformat', 'zsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
The 4D plot visualizes the estimated kernel regression function m(x) for lagged quarterly revenue using three predictors: price index, income level, and market potential.
X-axis: Price Index
Y-axis: Income Level
Z-axis: Market Potential
Color: Predicted lag quarterly revenue (m(x))
The color scale shows the magnitude of predicted revenue, with lighter colors indicating higher revenue levels, while deep dark colors show lower revenue levels.
The plot allows us to observe how combinations of predictors (price index, income, and market potential) interact to influence revenue, revealing nonlinear patterns and regions of high or low revenue that would not be captured by a linear model.
Each of the predictors has a different impact on revenue; this is explained below.
price index
There is a negative relationship between predicted revenue and price index.
Using the laws of demand, an increase in prices of goods or services reduces the total demand of goods and services in an economy, hence reducing the total revenue since they will buy fewer goods and services.
Also, a reduction in prices of goods and services will increase the demand of goods and services in an economy; hence, this will have a positive impact on the total revenue in an economy.
Income Level
There is a positive relationship between and income levels predicted revenue.
An increase in income levels means that there will be an increase in disposable income; consumers have more money in their pockets. This will increase total revenue since more goods and services are consumed in the economy.
Market Potential There is a positive relationship between market potential and predicted revenue. A larger market size means that there is a potential for increasing the revenue; also, a small market size reduces the revenue in an economy.