Simple Linear Regression

Problem Defination

Check if there is good correlation in the above dataset and if it can be used for regression model
If yes, predict weight for the following heights 160, 170, 180

Setup

loading libraries

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
library(corrgram)
## Warning: package 'corrgram' was built under R version 3.3.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.3.3

Dataset

# height in cms
hght <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131, 153, 177, 148, 189, 138, 146, 199, 167, 153, 130)
# weight in kgs
wght <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48, 65, 84, 59, 93, 49, 55, 79, 75, 66, 49)

Dataset

dfrModel <- data.frame(hght, wght)
names(dfrModel) <- c("hght","wght")
head(dfrModel)
##   hght wght
## 1  151   63
## 2  174   81
## 3  138   56
## 4  186   91
## 5  128   47
## 6  136   57
# check outliers in wght
WGHTPlot <- ggplot(dfrModel, aes(x="", y=wght)) +
            geom_boxplot(aes(fill=wght), color="green") +
            labs(title="WGHT Outliers")
                              
# check out hght
HGHTPlot <- ggplot(dfrModel, aes(x="", y=hght)) +
            geom_boxplot(aes(fill=hght), color="blue") +
            labs(title="HGHT Outliers")
# show plot
grid.arrange(HGHTPlot, WGHTPlot, nrow=1, ncol=2)

Correlation

# correlation coefficient
cor(dfrModel$hght, dfrModel$wght)
## [1] 0.944644
#cor(x, y, method = c("pearson", "kendall", "spearman"))
# correlation test
cor.test(dfrModel$hght, dfrModel$wght)
## 
##  Pearson's product-moment correlation
## 
## data:  dfrModel$hght and dfrModel$wght
## t = 12.215, df = 18, p-value = 3.788e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8627911 0.9782375
## sample estimates:
##      cor 
## 0.944644
#cor.test(x, y, method=c("pearson", "kendall", "spearman"))

Observation The cor function gives us 0.944644 correlations which indicates a strong positive correlation between the height of the person and his/her weight.

Correlation Visualization

# visualize correlation
pairs(dfrModel)

#plot(dfrModel)
# visualize correlation
corrgram(dfrModel)

Plot Graph

# base chart
plot(dfrModel$wght,dfrModel$hght, col="blue", main="Regression",
abline(lm(dfrModel$hght~dfrModel$wght)), cex=1, pch=16, xlab="Height in cms", ylab="Weight in kgs")

# ggplot
ggplot(dfrModel, aes(x=hght, y=wght)) +
    geom_point(shape=19, colour="blue", fill="blue") +
    geom_smooth(method='lm', formula=y~x) +
    labs(title="hght & Wght Regression") +
    labs(x="Height in cms") +
    labs(y="Weight in kgs")

Linear Model

x <- dfrModel$hght
y <- dfrModel$wght
slmModel <- lm(y~x)

Observation A linear model is created successfully

# print summary
summary(slmModel)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.1573  -1.7267   0.7701   2.6045   6.2102 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -33.55669    8.25032  -4.067 0.000723 ***
## x             0.63675    0.05213  12.215 3.79e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.846 on 18 degrees of freedom
## Multiple R-squared:  0.8924, Adjusted R-squared:  0.8864 
## F-statistic: 149.2 on 1 and 18 DF,  p-value: 3.788e-10

Observation For predicting the Values ,P value should be less then 0.05 hence indicating that the above model can be used for predicting the weights of the indiviuals

R square should be close to 1 as possibile as R sqaure is 0.88 it indicates that the model is good for predicting the weights of the individuals

Test Data

# find wght of a person with height  160,170,180
dfrTest <- data.frame(x=c(160,170,180))
#names(dfrTest) <- c("x")
dfrTest 
##     x
## 1 160
## 2 170
## 3 180

Observation The above Data Frame is created for predicting the Weights of the Indivudals

Predict

result <-  predict(slmModel, dfrTest)
print(result)
##        1        2        3 
## 68.32394 74.69148 81.05902

Observation The above weights are of Indivudals with Heights of 160cm,170cm and 180cm as created in the data frame