Purpose of writing this markdown

For a long period of time, I have been applying machine learning algorithms to my data with the mature packages for 2 years,I can achieve the model I want but just with little trepidation,simply because I am just a algorithm user,not a person with self-implement capacity.

So from now on, I decide to implement those machine learning algorithms with raw R codes,in order to make a deeper understanding of them.Wish me good luck.

Data preparation

Just like my last markdown, I am gonna use diamonds data set from ggplot2 package.Also this is for linear regression,so I need to remove those categorize variables

  library(tidyverse,quietly = T)
  diamonds<-diamonds %>% select_if(is.numeric)
  diamonds<-apply(diamonds,2,function(x){(x-min(x))/(max(x)-min(x))}) %>% as.data.frame()
  glimpse(diamonds)
## Observations: 53,940
## Variables: 7
## $ carat <dbl> 0.006237006, 0.002079002, 0.006237006, 0.018711019, 0.02...
## $ depth <dbl> 0.5138889, 0.4666667, 0.3861111, 0.5388889, 0.5638889, 0...
## $ table <dbl> 0.2307692, 0.3461538, 0.4230769, 0.2884615, 0.2884615, 0...
## $ price <dbl> 0.000000e+00, 0.000000e+00, 5.406282e-05, 4.325026e-04, ...
## $ x     <dbl> 0.3677840, 0.3621974, 0.3770950, 0.3910615, 0.4040968, 0...
## $ y     <dbl> 0.06757216, 0.06519525, 0.06910017, 0.07181664, 0.073853...
## $ z     <dbl> 0.07641509, 0.07264151, 0.07264151, 0.08270440, 0.086477...

Methods we are gonna use

1. Least square method

For the problem of multiple linear regression,the final expression is Y=Xβ+u,Y is response variable,X is the features,β is parameters and u is random error.

The final solution we want is β=(XTX)-1XTY(T means tranpose).

  ols<-function(y,x){
    x<-as.matrix(x)
    x<-cbind(intercept=1,x)
    return(solve(t(x) %*% x) %*% t(x) %*% y)
  }

  ols(y=diamonds$price,x=diamonds %>% select(-price)) %>% print()
##                  [,1]
## intercept  0.50575312
## carat      2.84430795
## depth     -0.85263614
## table     -0.51704494
## x         -0.75069185
## y          0.20753028
## z          0.07032677

Maybe we should check the correctness

  print(lm(price~.,diamonds))
## 
## Call:
## lm(formula = price ~ ., data = diamonds)
## 
## Coefficients:
## (Intercept)        carat        depth        table            x  
##     0.51467      2.77889     -0.39539     -0.28800     -0.76392  
##           y            z  
##     0.21119      0.07157

2. Gradient descent algorithm