Based on the below articule, I am going to compare our algorithm versus the lm function, which already have its own algorithm.
https://thecleverprogrammer.com/2020/07/23/linear-regression-algorithm-without-scikit-learn/
A Linear Regression algorithm makes a prediction by simply computing a weighted sum of the input features, plus a constant called the bias term. In mathematics a linear regression algorithm looks like:
\(\hat{y} = \theta_0 + \theta_1x_1 + \theta_2x_2 + ... + \theta_nx_n\)
Let’s create our own linear regression algorithm, We will first create this algorithm using the mathematical equation. Then we will visualize our algorithm using the ggplot2 package in R. We will start here by creating linear-looking data so that we can use that data in creating our Linear Regression Algorithm.
First, we need some data. We will use the runif() function, and we will generate 100 random observations:
X <- 2 * runif(100)
y <- 4 + 3 * X + runif(100)
If we visualise the data, we can easily see the linearity:
library(ggplot2)
df_original <- as.data.frame(cbind(X, y))
ggplot(df_original, aes(X, y)) +
geom_point()
Now, let’s move forward by creating a Linear regression mathematical algorithm. We will use the t() function to transpose the matrix, and the solve() to compute the inverse of the matrix. In R, the matrix multilication can be done using %*%:
#create a vector of ones
ones <- rep(1, 100)
# add x0 = 1 to each instance
X_array <- array(c(ones, X), c(100,2))
X_array_T <- t(X_array)
The function that we used to generate the data is y = 3xi + Gaussian noise. Let’s see what our algorithm found:
# model
linear <- solve(t(X_array)%*%X_array)%*%t(X_array)%*%y
df_model <- as.data.frame(linear, row.names = c("intercept", "slope"))
df_model
We can see we get close values to our original data.
That’s looks good as a linear regression model. Now let’s make predictions using our algorithm:
X_new = c(0, 2)
ones_new = rep(1, 2)
X_new_b <- array(c(ones_new, X_new), c(2,2))
y_predict <- X_new_b%*%linear
y_predict
[,1]
[1,] 4.489252
[2,] 10.491528
We can also plot our linear regression to visualise how good is fitting the original data:
df = as.data.frame(X, y)
'row.names' is not a character vector of length 100 -- omitting it. Will be an error!
library(ggplot2)
ggplot(df, aes(X, y)) +
geom_point() +
geom_abline(slope = df_model["slope",], intercept = df_model["intercept",])
We saw above how we can create our own algorithm, we can practice creating our own algorithm by creating an algorithm which is already existing. So that you can evaluate your algorithm using the already existing algorithm. Like here we will cross-check the linear regressing algorithm that we made with the algorithm within the lm() function. The results of my algorithm were:
lm(y~X, df_original)
Call:
lm(formula = y ~ X, data = df_original)
Coefficients:
(Intercept) X
4.489 3.001