Check if there is good correlation in the above dataset and if it can be used for regression model
If yes, predict weight for the following heights 160, 170, 180
# height in cms
hght <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131, 153, 177, 148, 189, 138, 146, 199, 167, 153, 130)
# weight in kgs
wght <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48, 65, 84, 59, 93, 49, 55, 79, 75, 66, 49)
Setup
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.3
data <- data.frame(hght,wght)
head(data)
## hght wght
## 1 151 63
## 2 174 81
## 3 138 56
## 4 186 91
## 5 128 47
## 6 136 57
cor.test(data$hght,data$wght)
##
## Pearson's product-moment correlation
##
## data: data$hght and data$wght
## t = 12.215, df = 18, p-value = 3.788e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8627911 0.9782375
## sample estimates:
## cor
## 0.944644
### visualising correlation
ggplot(data,aes(data$hght,data$wght)) + geom_point(shape = 19,colour = 'red',fill = 'red') +
geom_smooth(method= 'lm',formula = y~x)
Observation It is observed that as height increases weight also increases
Linear Model
x <- data$hght
y <- data$wght
model <- lm(y~x)
summary(model)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.1573 -1.7267 0.7701 2.6045 6.2102
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -33.55669 8.25032 -4.067 0.000723 ***
## x 0.63675 0.05213 12.215 3.79e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.846 on 18 degrees of freedom
## Multiple R-squared: 0.8924, Adjusted R-squared: 0.8864
## F-statistic: 149.2 on 1 and 18 DF, p-value: 3.788e-10
test <- data.frame(hght = c(160, 170, 180),stringsAsFactors = F)
names(test) <- c('x')
predictions <- predict(model, test)
print(predictions)
## 1 2 3
## 68.32394 74.69148 81.05902