This is just create x and y variables. x contains 100 numbers range from 1 to 10. y is 3 + 4x.

set.seed(13)
x = replicate(100, sample(1:10, 1))
y = 3 + x*4*rnorm(100, 1, 0.2)
ggplot() + geom_point(aes(x,y)) + geom_smooth(aes(x,y), method="lm") + theme_bw()
## `geom_smooth()` using formula 'y ~ x'

This is just from original linear model formula.

lm(y~x)
## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##        2.38         4.17

from statistics methods

\[\beta_1\ =\ \frac{\left(cor\left(x,y\right)\cdot sd\left(y\right)\right)}{sd\left(x\right)} \] \[\beta_0\ =\ \overline{y}\ -\ \beta_{1\ }\cdot\ \overline{x}\]

slope = (cor(x,y)*sd(y))/sd(x)
slope
## [1] 4.170181
intercept = mean(y) - mean(x)*slope
intercept
## [1] 2.380452

from linear algebra

\[ \beta=\ \left(X^TX\right)^{-1}X^Ty \]

# adding value 1 to first column. 
X <-  matrix(c(rep(1,100),x), nrow=100)
betas <- solve(t(X) %*% X) %*% t(X) %*% y
rownames(betas) <- c("beta0", "beta1")
betas
##           [,1]
## beta0 2.380452
## beta1 4.170181

With calculus (gradient descent)

Hypothesis of linear regression (only one variable) is \[ y\ =\ \beta_0\ +\ \beta_1x \] Goal is to reduce that value \[ \frac{1}{2m}\ \sum_{i=1}^m\left(\left(\beta_0^{ }\ +\ \beta_1x^i\right)-y\right)^2 \] as much as possible. Average of the square of predicted value minus actual value. 2 is added to make differentiate easier.

beta0 <- 1; beta1 <- 1;
alpha <- 0.01
m <- length(x)
for (i in 1:10000){
    beta0 <- beta0 - (alpha/m)*sum((beta0 + beta1*X[,2]) - y)
    beta1 <- beta1 - (alpha/m)*sum(((beta0 + beta1*X[,2]) - y)*X[,2])
}
beta0
## [1] 2.380452
beta1
## [1] 4.170181
data.frame(stat_method = c(intercept, slope), linalg = betas, gradient_desc = c(beta0,beta1), build_in = c(coef(lm(y~x))[1], coef(lm(y~x))[2]))
##       stat_method   linalg gradient_desc build_in
## beta0    2.380452 2.380452      2.380452 2.380452
## beta1    4.170181 4.170181      4.170181 4.170181