Simple linear regression is a method to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable.
Real-world applications include predicting house prices based on square footage, predicting sales based on advertising spend, etc.
The linear regression equation is given by:
\[ y = \beta_0 + \beta_1 x + \epsilon \]
The parameters \( \beta_0 \) and \( \beta_1 \) are estimated using the least squares method, which minimizes the sum of the squared residuals:
\[ \hat{\beta_1} = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2} \] \[ \hat{\beta_0} = \bar{y} - \hat{\beta_1} \bar{x} \]
We will use a dataset containing house prices and their corresponding square footage to illustrate simple linear regression.
# Scatter plot with ggplot2
scatter_plot
regression_plot
## `geom_smooth()` using formula = 'y ~ x'
plot_3d
#Setup
library(ggplot2)
library(plotly)
library(knitr)
library(kableExtra)
# Sample Data
set.seed(123)
data = data.frame(
sqft = rnorm(100, mean=1500, sd=200),
price = rnorm(100, mean=300000, sd=50000)
)
#Linear Model
model = lm(price ~ sqft, data = data)
# Scatter plot with ggplot2
scatter_plot = ggplot(data, aes(x=sqft, y=price)) +
geom_point() +
labs(title="Scatter Plot of House Prices vs. Square Footage",
x="Square Footage", y="Price")
# Regression line plot with ggplot2
regression_plot = scatter_plot +
geom_smooth(method="lm", se=FALSE, color="red") +
labs(title="Linear Regression of House Prices on Square Footage")
# 3D plotly plot
data$rooms = rnorm(100, mean=5, sd=1)
plot_3d = plot_ly(data, x = ~sqft, y = ~rooms, z = ~price, type = "scatter3d", mode = "markers") %>%
layout(title = "3D Scatter Plot of House Prices",
scene = list(xaxis = list(title = "Square Footage"),
yaxis = list(title = "Number of Rooms"),
zaxis = list(title = "Price")))