Simple Linear Regression demo

Linear Regression establishes a relationship between a Dependent variable i.e. Y and one or more Independent variables i.e X, using a best fit straight line known as Regression Line. The equation of this regresiion line can then be used to predict value of ‘Y’ for any given ‘X’.

        Dependent Variable  (Target)      : Continuous
        Independent Variable(Predictor(s)): Continuous/Discrete

Simple linear regression involves one target(Y) and one predictor(X). This demo performs simple linear regression using Least Sqaures Method to find regression line that shows trend in the data i.e. relationship between X and Y . The equation of regression line in slope-intercept form is:

        Y = mX + c   ,where m= slope of straight line
                            c= Y-intercept

1. Load and view dataset

require("datasets")
data("iris")
str(iris)

## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

2. Preprocess the dataset

Since simple L.R. requires just one target, let’s take “Sepal.Length”" attribute as our target(Y) and “Sepal.Width” attribute as Predictor(X) to find if there exists any kind of relationship between them.

Y<- iris[,"Sepal.Width"] # select Target attribute
X<- iris[,"Sepal.Length"] # select Predictor attribute
head(X)

## [1] 5.1 4.9 4.7 4.6 5.0 5.4

head(Y)

## [1] 3.5 3.0 3.2 3.1 3.6 3.9

3. Find correlation between “Sepal.Length” and “Sepal.Width” by applying Least Squares Linear Regression Model

xycorr<- cor(Y,X, method="pearson") # find pearson correlation coefficient
xycorr # a value near 1 implies high correlation and that near 0 shows low correlation

## [1] -0.1175698

plot(Y~X, col=X)
model1<- lm(Y~X)
model1 # provides regression line coefficients i.e. slope and y-intercept

## 
## Call:
## lm(formula = Y ~ X)
## 
## Coefficients:
## (Intercept)            X  
##     3.41895     -0.06188

plot(Y~X, col=X) # scatter plot between X and Y
abline(model1, col="blue", lwd=3) # add regression line to scatter plot to see relationship between X and Y

The graph shows that slope of the line is downwards, hence, there exists a negative correlation between ‘X’ and ‘Y’. So, if we increase X, the value of Y will decrease and vice-versa.

4. Check correlation between “Petal.Length” and “Petal.Width” by applying Least Squares Linear Regression Model

U<- iris[,"Petal.Width"] # select Target
V<- iris[,"Petal.Length"] # select Predictor
xycorr<- cor(U,V, method="pearson")
xycorr

## [1] 0.9628654

plot(U~V, col=V)
model2<- lm(U~V)
model2

## 
## Call:
## lm(formula = U ~ V)
## 
## Coefficients:
## (Intercept)            V  
##     -0.3631       0.4158

plot(U~V, col=V) # scatter plot between U and V
abline(model2, col="blue", lwd=3) # add regression line to scatter plot to see relationship between U and V

The above graph shows that slope of the line goes upwards, hence, there exists a positive correlation between ‘U’ and ‘V’. So, if we increase X, the value of Y will also increase and vice-versa.

4. Perform prediction

Now, let’s use the line coefficients for two equations that we got in model1 and model2 to predict value of Target for any given value of Predictor.

# Prediction of 'Sepal.Width' when 'Sepal.Length'= 20
p1<- predict(model1,data.frame("X"=20))
p1

##        1 
## 2.181251

The predicted value of Sepal.Width is 2.1812509 when Sepal.Length= 20

# Prediction of 'Petal.Width' when 'Petal.Length'= 15
p2<- predict(model2,data.frame("V"=15))
p2

##        1 
## 5.873256

The predicted value of Petal.Width is 5.8732557 when Petal.Length= 15

You may also wish to try out Data Classification, Clustering or Linear Regression from following links:

k-NN Classification for beginners

Using Iris Dataset
Using Airquality Dataset
k-means Clustering for beginners

Using Iris Dataset
Using Airquality Dataset
Linear Regression for beginners

Using Airquality Dataset

Good luck! :)