This document generates and visualizes a synthetic dataset where the
payment
is a function of two variables: n
(from 1 to 20) and v
(from 0 to 0.9). We explore how
payment
varies with the product n × v
, and we
fit a quadratic regression model to describe this relationship.
library(ggplot2)
library(dplyr)
library(knitr)
# Define sequences for n and v
nValues <- seq(1, 20, 1)
vValues <- seq(0, 0.9, 0.01)
# Define the payment function
payment <- function(n, v) {
9.99201 * n * (1 - v)
}
# Create a data frame with all combinations of n and v
df <- expand.grid(n = nValues, v = vValues)
# Compute nv and payment
df <- df %>%
mutate(
nv = n * v,
payment = payment(n, v)
)
kable(df[sample(dim(df)[1] ,8), ], caption = "sample values in df")
n | v | nv | payment | |
---|---|---|---|---|
929 | 9 | 0.46 | 4.14 | 48.561169 |
1162 | 2 | 0.58 | 1.16 | 8.393288 |
477 | 17 | 0.23 | 3.91 | 130.795411 |
1135 | 15 | 0.56 | 8.40 | 65.947266 |
246 | 6 | 0.12 | 0.72 | 52.757813 |
525 | 5 | 0.26 | 1.30 | 36.970437 |
1067 | 7 | 0.53 | 3.71 | 32.873713 |
1211 | 11 | 0.60 | 6.60 | 43.964844 |
ggplot(df, aes(x = nv, y = payment)) +
geom_point(aes(color = as.factor(round(n)), size = sqrt(1+10*v)), shape = 21, alpha = 0.7) +
# ggplot(df, aes(x = nv, y = payment, color = as.factor(round(n)))) +
# geom_point(alpha = 0.5, size = 0.7) +
labs(
title = "Scatterplot of Payment vs n × v",
x = "n × v",
y = "Payment",
shape = "Rounded v",
color = "Rounded n"
) +
theme_minimal()
# Fit quadratic model
model <- lm(payment ~ nv + I(nv^2), data = df)
# Show summary of the model
summary(model)
##
## Call:
## lm(formula = payment ~ nv + I(nv^2), data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -58.230 -36.672 -8.738 26.596 144.866
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 54.97439 2.09272 26.269 < 2e-16 ***
## nv 3.12828 0.79809 3.920 9.19e-05 ***
## I(nv^2) -0.30933 0.05664 -5.461 5.38e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 44.13 on 1817 degrees of freedom
## Multiple R-squared: 0.02443, Adjusted R-squared: 0.02336
## F-statistic: 22.76 on 2 and 1817 DF, p-value: 1.735e-10
Notice that the coefficient for the quadratic term is negative (with a positive on the linear term), indicating that a concave and eventually downward function is possible. In fact you can even “estimate” the value of “nv” at which it will turn downward.
ggplot(df, aes(x = nv, y = payment, color = as.factor(round(n, 1)))) +
geom_point(alpha = 0.4, size = 0.7) +
stat_smooth(method = "lm", formula = y ~ x + I(x^2), color = "black", size = 1.2) +
labs(
title = "Quadratic Regression: Payment vs n × v",
x = "n × v",
y = "Payment",
color = "n"
) +
theme_minimal() +
theme(legend.position = "right")
df <- df %>% filter(round(10 * v) %in% c(1,4,7,10)) # use a manageable subset
ggplot(df, aes(x = nv, y = payment, shape = as.factor(round(10*v)), color = as.factor(round(n, 1)))) +
geom_point(alpha = 0.4, size = 0.7) +
# aes(group = 1), to ensure that single regression is done
stat_smooth(aes(group = 1), method = "lm", formula = y ~ x + I(x^2), color = "black", size = 1.2) +
labs(
title = "Quadratic Regression: Payment vs n × v",
x = "n × v",
y = "Payment",
color = "n"
) +
theme_minimal() +
theme(legend.position = "bottom") + # Move legend to bottom for horizontal layout
guides(
color = guide_legend(nrow = 1, byrow = TRUE),
shape = guide_legend(nrow = 1, byrow = TRUE)
)