Check if there is good correlation in the above dataset and if it can be used for regression model
If yes, predict weight for the following heights 160, 170, 180
Setup
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
library(corrgram)
## Warning: package 'corrgram' was built under R version 3.3.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.3.3
# height in cms
hght <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131, 153, 177, 148, 189, 138, 146, 199, 167, 153, 130)
# weight in kgs
wght <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48, 65, 84, 59, 93, 49, 55, 79, 75, 66, 49)
Dataset
dfrModel <- data.frame(hght, wght)
names(dfrModel) <- c("hght","wght")
head(dfrModel)
## hght wght
## 1 151 63
## 2 174 81
## 3 138 56
## 4 186 91
## 5 128 47
## 6 136 57
# check outliers in wght
WGHTPlot <- ggplot(dfrModel, aes(x="", y=wght)) +
geom_boxplot(aes(fill=wght), color="green") +
labs(title="WGHT Outliers")
# check out hght
HGHTPlot <- ggplot(dfrModel, aes(x="", y=hght)) +
geom_boxplot(aes(fill=hght), color="blue") +
labs(title="HGHT Outliers")
# show plot
grid.arrange(HGHTPlot, WGHTPlot, nrow=1, ncol=2)
Correlation
# correlation coefficient
cor(dfrModel$hght, dfrModel$wght)
## [1] 0.944644
#cor(x, y, method = c("pearson", "kendall", "spearman"))
# correlation test
cor.test(dfrModel$hght, dfrModel$wght)
##
## Pearson's product-moment correlation
##
## data: dfrModel$hght and dfrModel$wght
## t = 12.215, df = 18, p-value = 3.788e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8627911 0.9782375
## sample estimates:
## cor
## 0.944644
#cor.test(x, y, method=c("pearson", "kendall", "spearman"))
Observation The cor function gives us 0.944644 correlations which indicates a strong positive correlation between the height of the person and his/her weight.
Correlation Visualization
# visualize correlation
pairs(dfrModel)
#plot(dfrModel)
# visualize correlation
corrgram(dfrModel)
Plot Graph
# base chart
plot(dfrModel$wght,dfrModel$hght, col="blue", main="Regression",
abline(lm(dfrModel$hght~dfrModel$wght)), cex=1, pch=16, xlab="Height in cms", ylab="Weight in kgs")
# ggplot
ggplot(dfrModel, aes(x=hght, y=wght)) +
geom_point(shape=19, colour="blue", fill="blue") +
geom_smooth(method='lm', formula=y~x) +
labs(title="hght & Wght Regression") +
labs(x="Height in cms") +
labs(y="Weight in kgs")
Linear Model
x <- dfrModel$hght
y <- dfrModel$wght
slmModel <- lm(y~x)
Observation A linear model is created successfully
# print summary
summary(slmModel)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.1573 -1.7267 0.7701 2.6045 6.2102
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -33.55669 8.25032 -4.067 0.000723 ***
## x 0.63675 0.05213 12.215 3.79e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.846 on 18 degrees of freedom
## Multiple R-squared: 0.8924, Adjusted R-squared: 0.8864
## F-statistic: 149.2 on 1 and 18 DF, p-value: 3.788e-10
Observation For predicting the Values ,P value should be less then 0.05 hence indicating that the above model can be used for predicting the weights of the indiviuals
R square should be close to 1 as possibile as R sqaure is 0.88 it indicates that the model is good for predicting the weights of the individuals
Test Data
# find wght of a person with height 160,170,180
dfrTest <- data.frame(x=c(160,170,180))
#names(dfrTest) <- c("x")
dfrTest
## x
## 1 160
## 2 170
## 3 180
Observation The above Data Frame is created for predicting the Weights of the Indivudals
Predict
result <- predict(slmModel, dfrTest)
print(result)
## 1 2 3
## 68.32394 74.69148 81.05902
Observation The above weights are of Indivudals with Heights of 160cm,170cm and 180cm as created in the data frame