Email : juliansalomo2@gmail.com
RPubs : https://rpubs.com/juliansalomo/
Department : Business Statistics
Address : ARA Center, Matana University Tower
Jl. CBD Barat Kav, RT.1, Curug Sangereng, Kelapa Dua, Tangerang, Banten 15810.
The impact of how a dollar spent on an organization’s marketing efforts on its sales is something that all organizations should consider. A fiscally prudent organization should be using its relatively scarce resources wisely. Thus, all organizations need to ask themselves, “Is the money I’m spending worth the return on sales?” Furthermore, organizations can delve deeper by asking, “For every dollar spent on marketing, how much are we getting in return on sales?” One can answer these questions using a simple linear regression model. As always, we will use a fabricated example to examine a store’s marketing efforts and their impact on sales. This will also be a more comprehensive primer on the simple linear regression model, the model that the majority of econometrics students are first exposed to.
We use the data below:
## Pr(>F)
## youtube <2e-16 ***
## facebook <2e-16 ***
## newspaper 0.8599
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Standard significance level of research is 5%. And as we can see the p-value from the result above, youtube and facebook p-value are less than 5%, otherwise newspaper’s p-value is more than 5%, that means there is a relationship between sales and youtube, facebook while there is no relationship between sales and newspaper.
As we already know from the result before, that there is no relationship between sales and newspaper or we can say that the relationship is weak. To know the power of the relationship, we can check the correlation for each media with sales using function cor().
## youtube facebook newspaper
## 0.7822244 0.5762226 0.2282990
Now we will interpret the result above using Spearman’s rank correlation below:
* \(|r| = 1\) perfect relationship.
* \(|r|\ge 0.8\) very strong relationship.
* \(0.6 \le |r| <0.8\) strong relationship.
* \(0.4 \le |r| <0.6\) moderate relationship.
* \(0.2 \le |r| <0.4\) weak relationship.
* \(0 < |r| < 0.2\) very weak relationship.
* \(|r|=0\) no relationship.
From the criteria above we can conclude that youtube has a strong relationship, facebook has a moderate relationship, and newspaper has a weak relationship.
As we already mention before in the first point. Media that contribute in the sales are youtube and facebook and newspaper is the opposite. To know the media that contributed the most, we can look at the coefficient. The biggest coefficient is the one that contributed the most.
## (Intercept) youtube facebook newspaper
## 3.526667243 0.045764645 0.188530017 -0.001037493
So, from the result we can know that facebook is the media that contributed the most then followed by youtube. Unlike the others, newspaper has negative effect that it decrease the sales amounts instead.
To know how accurate the media can estimate the sales, we should take a look at the coefficient of determination (\(R^2\))
## [1] 0.8956373
So, the coefficient of determination is \(R^2=0.8956373\), which is mean the effect of advertising budget can estimate until 89.6% of the amount of sales.
To check the accuracy we should find the value of Mean Absolute Percentage Error (MAPE). The formula for MAPE is: \[MAPE=\frac{\sum_{i=1}^{n}\text{%} e_i}{n}\]
b0 <- coef(linmod)[[1]]
b1 <- coef(linmod)[[2]]
x1 <- marketing$youtube
b2 <- coef(linmod)[[3]]
x2 <- marketing$facebook
b3 <- coef(linmod)[[4]]
x3 <- marketing$newspaper
marketing$sales_pred <- (b0 +b1*x1 +b2*x2 +b3*x3) %>% round(2)
marketing$error <- abs(marketing$sales-marketing$sales_pred) %>% round(2)
marketing$'%error' <- round(marketing$error/marketing$sales*100,2)
total_percentage_error <- sum(marketing$`%error`)
MAPE <- total_percentage_error/nrow(marketing)
paste(total_percentage_error/nrow(marketing),"%")## [1] "13.87845 %"
So, we get the percentage of error is 13.87845%, which is mean the accuracy for predict is about \(1-MAPE\)
## [1] "The accuracy is 86.12155 %"
So, the accuracy to predict future sales is around 86.12%
Apart from that, the accuracy can also be seen through the comparison plot as shown below.
df <- data.frame(marketing$sales, marketing$sales_pred)
ggplot(df[1:50,], aes(x=1:50)) +
geom_line(aes(y=marketing$sales_pred[1:50], colour="Predicted")) +
geom_point(aes(x=1:50, y=marketing$sales_pred[1:50], colour="Predicted")) +
geom_line(aes(y=marketing$sales[1:50], colour="Actual"))+
geom_point(aes(x=1:50, y=marketing$sales[1:50], colour="Actual")) +
scale_colour_manual("", values = c(Predicted="red", Actual="blue")) +
labs(x="Observation's Number", y="Sales Value")From this plot, it can be seen that the predicted value is similar to the actual value, which indicates that our model has a fairly high accuracy.
Interaction among the advertising media indicates one of the linear regression problem called Multicollinearity. To detect whether there is a problem or not, we should check the VIF value. We can check VIF value of the linear model using vif() function that provided by package car.
## youtube facebook newspaper
## 1.004611 1.144952 1.145187
From the result above, we can see that all of the VIF value are less than 5, which is conclude that there is no synergy or interaction among the advertising media.
We can also check by the correlation between each variable using function cor().
## youtube facebook newspaper sales
## youtube 1.00000000 0.05480866 0.05664787 0.7822244
## facebook 0.05480866 1.00000000 0.35410375 0.5762226
## newspaper 0.05664787 0.35410375 1.00000000 0.2282990
## sales 0.78222442 0.57622257 0.22829903 1.0000000
From the result above, all of the correlation are less than 0.8 which is conclude that there is no synergy or interaction among the advertising media.