child_IQ <- c(87,91,94,98,103,108,111,123)
Mother_IQ <- c(94,96,89,102,98,94,116,117)
print(child_IQ)
## [1] 87 91 94 98 103 108 111 123
print(Mother_IQ)
## [1] 94 96 89 102 98 94 116 117
model_iq <- lm(Mother_IQ ~ child_IQ)
summary(model_iq)
##
## Call:
## lm(formula = Mother_IQ ~ child_IQ)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.965 -4.226 2.223 3.594 8.971
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.6439 22.7449 1.347 0.2265
## child_IQ 0.6882 0.2220 3.101 0.0211 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.965 on 6 degrees of freedom
## Multiple R-squared: 0.6157, Adjusted R-squared: 0.5517
## F-statistic: 9.613 on 1 and 6 DF, p-value: 0.0211
#SSE - sum(actual - predicted^2)
SSE <- sum(model_iq$residuals^2)
#SSR - sum(predicted - ybar^2)
ybar <- mean(Mother_IQ)
SSR <- sum((model_iq$fitted.values-ybar)^2)
#SST - sum((y - ybar)^2) / SSE+SSR
SST <- SSE+SSR
SSE - sum(y-yhat)^2 - sum of squares errors or residuals, assess how well the regression line fits the data SSR - sum(yhat-ybar)^2 - sum of squares regression, SST - sum(y-ybar)^2 - sum of squares total, assess how well the mean fits the data
model_iq$residuals
## 1 2 3 4 5 6
## 3.486356 2.733723 -6.330753 3.916614 -3.524178 -10.964970
## 7 8
## 8.970555 1.712654
residuals or errors are the difference between actual and predicted values of y variable.
#Residual Standard Error sqrt(SSE/n-q)
#n = no. of obs; q = no. of variables(x & y)
sqrt(SSE/(8-2))
## [1] 6.965398
Standardized error of residuals, is an estimate of the accuracy of the dependent variable being measured.
#Coefficient of determination
#multiple R-squared
#SSR/SST or 1-SSE/SST
r.sq <- 1-(SSE/SST)
r.sq
## [1] 0.6157087
SSR/SST
## [1] 0.6157087
r² = SSR/SST - Interpretation of r-square (in this case, as the value 0.6157) about 61% of the variability in the number of child IQ made is explained by the linear relationship between the number of mother IQ and number of child IQ.
How well my x variable is explaining the y variable.
#Coefficient of Correlation (r):
#r = (sign of B1) sqrt(r²) the B1 is positive
sqrt(r.sq)
## [1] 0.7846711
Note: r² - multiple R-squared - Coefficient of Determination tells only percentage of variability, it does not talk about direction of variability whether it is positive or negative.
r - coefficient of correlation tells percentage of variability along with direction whether the relationship is negative or positive.
#adj.R-square
#1-((SSE/n-q)/(SST/n-1))
1-((SSE/6)/(SST/7))
## [1] 0.5516602
It has been adjusted for the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance.
summary(model_iq)
##
## Call:
## lm(formula = Mother_IQ ~ child_IQ)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.965 -4.226 2.223 3.594 8.971
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.6439 22.7449 1.347 0.2265
## child_IQ 0.6882 0.2220 3.101 0.0211 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.965 on 6 degrees of freedom
## Multiple R-squared: 0.6157, Adjusted R-squared: 0.5517
## F-statistic: 9.613 on 1 and 6 DF, p-value: 0.0211
Ho: b1 = 0 Ha: b1 != 0
t-test statistic: t = b1-0/b1.std.err
#b1.std.err: sqrt(sse/n-q)/sqrt(x-xbr)²
#sqrt(SSE/n-q) / sqrt(sum((x-xbar)^2))
n <- length(Mother_IQ)
q <- length(model_iq$coefficients)
n
## [1] 8
q
## [1] 2
sq.sse <- sqrt(SSE/(n-q))
#sqrt(sum((x-xbar)^2))
x <- child_IQ
xbar <- mean(x)
sq.xxbr <- sqrt(sum((x-xbar)^2))
std.err <- sq.sse/sq.xxbr
std.err
## [1] 0.2219501
#bo/bo.std.err
#b1/b1.std.err
bo <- 30.6438634
b1 <- 0.6881584
b1/std.err
## [1] 3.100509
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.6439 22.7449 1.347 0.2265
child_IQ 0.6882 0.2220 3.101 0.0211 * — Signif. codes:
0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
Check the significance of model, intercept and slope are significantly different than zero.
Ho: b1 = 0 Ha: b1 != 0
t-score = 3.101 p-value = 0.0211 alpha = 0.05
Draw t-critical value from t-distribution table with degrees of freedom, n-1, 8-1=7, with alpha = 0.05, two-tail ttest
t-critical value: 2.365 t-score : 3.101
t-score gt t-critical value, hence reject Ho
p-value, 0.0211, which is less than alpha value 0.05, hence we reject null hypothesis. Child IQ is significantly contributing to the model.
Decision: Reject Ho, p-value is less than alpha. Conclusion: child IQ is a significant predictor of Mother IQ
Y-intercept is 30.6439 tells when the child IQ is zero, mother’s IQ is 30.6439.
We can also say that if any mother who has IQ of 30.6439, those mother’s child IQ would be zero.
Slope value, 0.6882 tells that 1 unit in crease in x variable there is an increase of 0.6882 in y variable.
If mother’s IQ is increased by 0.6882, there would be 1 unit increase in child’s IQ.
if child IQ is 100 units then the predicted value is derived like (100b1)+bo - (100 0.6882) + 30.6439.
#MAE - mean absolute error
#abs - will convert negative value into positive value
mean(abs(model_iq$residuals))
## [1] 5.204975
#MSE - mean square error
mean(model_iq$residuals^2)
## [1] 36.38758
#MAPE - Mean absolute percentage error
#(abs(actual-predicted)/actual)
#(abs(actual-predicted)/actual)*100
#mean((abs(actual-predicted)/actual)*100)
mean(abs((Mother_IQ-model_iq$fitted.values)/Mother_IQ)*100)
## [1] 5.245943
#RMSE - root mean square error - sqrt(MSE)
sqrt(mean(model_iq$residuals^2))
## [1] 6.032212
#library(DMwR)
#regr.eval(y, pedicted)
library(DMwR)
## Warning: package 'DMwR' was built under R version 3.5.3
## Loading required package: lattice
## Loading required package: grid
regr.eval(Mother_IQ, model_iq$fitted.values)
## mae mse rmse mape
## 5.20497525 36.38758091 6.03221194 0.05245943
Influential observations - Outliers
Linearity. Residual line should be close to zero.
plot(model_iq,1)
The above graph shows residuals are not close to zero. Violation of residual linearity.
plot(model_iq,2)
The above graph shows the errors are normally distributed.
plot(model_iq,3)
The errors are not constant.
plot(model_iq,4)
plot(model_iq)