From this QQ plot we can see a bell shaped distribution. This indicated that the Amount spend is not normally distributed. Given that the shape of the QQ plot opens up means it reight sknewed. From the histogram we can see a large right sknewedness because the majority of the frequency is on the left. Overall it can be concluded that Amount spent is not normally distributed. Lastly from the normally test we can see that the p-value is insignificant and the distribution is not normal.
> with(Marketing, qqPlot(AmountSpent, dist="norm", id=list(method="y", n=2, labels=rownames(Marketing))))
[1] 988 497
> with(Marketing, Hist(AmountSpent, scale="frequency", breaks="Sturges", col="darkgray"))
> normalityTest(~AmountSpent, test="shapiro.test", data=Marketing)
Shapiro-Wilk normality test
data: AmountSpent
W = 0.8784, p-value < 2.2e-16
From the QQ plot we can see that there is still a bell shaped curve, but much more data lies in the 95% confidence interval range. The graph also has a logn tail. From the histogram we can see a slight right skewness. Lastly from the normally test we can see that the p-value is insignificant and the distribution is not normal.
> with(Marketing, qqPlot(Salary, dist="norm", id=list(method="y", n=2, labels=rownames(Marketing))))
[1] 929 535
> with(Marketing, Hist(Salary, scale="frequency", breaks="Sturges", col="darkgray"))
> normalityTest(~Salary, test="shapiro.test", data=Marketing)
Shapiro-Wilk normality test
data: Salary
W = 0.96338, p-value = 3.763e-15
> cor(Marketing[,c("AmountSpent","Catalogs","Children","Salary")], use="complete")
AmountSpent Catalogs Children Salary
AmountSpent 1.0000000 0.4726499 -0.22230817 0.69959571
Catalogs 0.4726499 1.0000000 -0.11345543 0.18355086
Children -0.2223082 -0.1134554 1.00000000 0.04966316
Salary 0.6995957 0.1835509 0.04966316 1.00000000
> scatterplotMatrix(~AmountSpent+Catalogs+Children+Salary, regLine=FALSE, smooth=FALSE, diagonal=list(method="density"), data=Marketing)
> RegModel.1 <- lm(AmountSpent~Salary, data=Marketing)
> summary(RegModel.1)
Call:
lm(formula = AmountSpent ~ Salary, data = Marketing)
Residuals:
Min 1Q Median 3Q Max
-2179.7 -315.2 -53.5 279.7 3752.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -15.31783 45.37416 -0.338 0.736
Salary 0.02196 0.00071 30.930 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 687.1 on 998 degrees of freedom
Multiple R-squared: 0.4894, Adjusted R-squared: 0.4889
F-statistic: 956.7 on 1 and 998 DF, p-value: < 2.2e-16
> RegModel.5 <- lm(Catalogs~AmountSpent, data=Marketing)
> summary(RegModel.5)
Call:
lm(formula = Catalogs ~ AmountSpent, data = Marketing)
Residuals:
Min 1Q Median 3Q Max
-14.6592 -5.3263 -0.3718 4.5121 12.6460
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.072e+01 2.980e-01 35.97 <2e-16 ***
AmountSpent 3.257e-03 1.922e-04 16.94 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.839 on 998 degrees of freedom
Multiple R-squared: 0.2234, Adjusted R-squared: 0.2226
F-statistic: 287.1 on 1 and 998 DF, p-value: < 2.2e-16
I would recommend using the linear regression model surrounding salary and amount spent. This model can help the marker to identify how much each person would spend based on thier salary. This can help them to target the people that will spend the most. It also allows the marketer to avdertise the correct products to each demographic. Someone with a higher salary is more likely to be able to purchase a $500 gaming system in comarison to someone with a lower salary.