The given dataset shows the last 10 instances of rewards paid during the festive season. We would like to see if establish the regression model for predicting reward amount in the upcoming festival using Simple Linear Regression.
dim(Festival)
## [1] 10 4
str(Festival)
## 'data.frame': 10 obs. of 4 variables:
## $ Instance: int 1 2 3 4 5 6 7 8 9 10
## $ Years : int 4 3 1 3 3 2 6 5 1 3
## $ Salary : int 1700 5400 3200 4400 4950 2550 3600 6000 4500 5200
## $ Amount : int 250 850 550 400 700 250 600 900 450 650
summary(Festival)
## Instance Years Salary Amount
## Min. : 1.00 Min. :1.00 Min. :1700 Min. :250.0
## 1st Qu.: 3.25 1st Qu.:2.25 1st Qu.:3300 1st Qu.:412.5
## Median : 5.50 Median :3.00 Median :4450 Median :575.0
## Mean : 5.50 Mean :3.10 Mean :4150 Mean :560.0
## 3rd Qu.: 7.75 3rd Qu.:3.75 3rd Qu.:5138 3rd Qu.:687.5
## Max. :10.00 Max. :6.00 Max. :6000 Max. :900.0
attach(Festival)
With only one variable available, the best possible measure of the Award is the mean of Awards for last 10 instances. The mean Award Amount is 560 rupees.
par(mfrow =c(2,2))
boxplot(Salary, horizontal = TRUE, main="Boxplot of Salary")
hist(Salary)
boxplot(Amount, horizontal = TRUE, main="Boxplot of Amount")
hist(Amount)
par(mfrow=c(1,1))
plot(Salary, Amount, xlab="Salary", ylab="Amount", col="Red")
cor.test(Salary, Amount)
##
## Pearson's product-moment correlation
##
## data: Salary and Amount
## t = 4.6391, df = 8, p-value = 0.001668
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4848411 0.9647888
## sample estimates:
## cor
## 0.8538222
We will now predict the festival reward amount (y variable) using salary (x variable)
fit1 <- lm(Amount~Salary, data=Festival)
summary(fit1)
##
## Call:
## lm(formula = Amount ~ Salary, data = Festival)
##
## Residuals:
## Min 1Q Median 3Q Max
## -195.41 -77.22 31.85 104.21 124.56
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -27.79227 132.69660 -0.209 0.83934
## Salary 0.14164 0.03053 4.639 0.00167 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 124.7 on 8 degrees of freedom
## Multiple R-squared: 0.729, Adjusted R-squared: 0.6951
## F-statistic: 21.52 on 1 and 8 DF, p-value: 0.001668
Coefficient of determination (r-squared) = 0.729. It means that 72.9% of variation in the rewards is determined by independent variable Salary.
Predicted <-predict(fit1, data.frame(Salary))
Predicted
## 1 2 3 4 5 6 7 8
## 212.9901 737.0459 425.4451 595.4092 673.3094 333.3813 482.0998 822.0279
## 9 10
## 609.5728 708.7185
Pred_Actual <- data.frame(cbind(Festival$Amount),Predicted)
Pred_Actual
## cbind.Festival.Amount. Predicted
## 1 250 212.9901
## 2 850 737.0459
## 3 550 425.4451
## 4 400 595.4092
## 5 700 673.3094
## 6 250 333.3813
## 7 600 482.0998
## 8 900 822.0279
## 9 450 609.5728
## 10 650 708.7185
plot(Predicted, col="Red")
lines(Predicted, col="Red")
lines(Amount,col="Blue")
The above plot shows the predicted / Actual values of the reward amount.
Multi-linear Regression
plot(Festival)
fit2 <- lm(Amount ~ Years, data = Festival)
summary(fit2)
##
## Call:
## lm(formula = Amount ~ Years, data = Festival)
##
## Residuals:
## Min 1Q Median 3Q Max
## -350.87 -139.52 35.37 132.04 294.54
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 419.21 163.55 2.563 0.0335 *
## Years 45.41 47.41 0.958 0.3661
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 226.9 on 8 degrees of freedom
## Multiple R-squared: 0.1029, Adjusted R-squared: -0.009237
## F-statistic: 0.9176 on 1 and 8 DF, p-value: 0.3661
The second x- variable number of years of experience does not seem to be significantly contributing to the variation as seen from the above SLR Model. It has low r-squared value.
fit3 <- lm(Amount ~ Salary+Years, data=Festival)
summary(fit3)
##
## Call:
## lm(formula = Amount ~ Salary + Years, data = Festival)
##
## Residuals:
## Min 1Q Median 3Q Max
## -191.19 -55.02 11.78 31.84 185.49
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -105.47879 143.81006 -0.733 0.48711
## Salary 0.13717 0.02988 4.591 0.00251 **
## Years 31.03888 25.49884 1.217 0.26294
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 121.1 on 7 degrees of freedom
## Multiple R-squared: 0.7764, Adjusted R-squared: 0.7125
## F-statistic: 12.15 on 2 and 7 DF, p-value: 0.00529
Coefficent for Years of service b2 = 31.03888. Keeping salary constant, for every single increase in the number of years of service, the reward amount increases by 31.04
End of the document