Elminating Variables: Lifetime Post Total Impressions by People Who’ve Liked Your Page vs. Lifetime Post Total Impressions
Our first task was to sort through the data and analyse the varaibles. We wanted to remove like variables that are highly correlated with each other to avoid redundancy in our evaluation.
plot(FacebookData$Lifetime.Post.Impressions.by.people.who.have.liked.your.Page,FacebookData$Lifetime.Post.Total.Impressions,
xlab = "Impressions by People Who Have Liked Your Page", ylab = "Lifetime Total Impressions",
main = "Total Impression Correlation")
From the graph, one can tell that these variables have a strong positive correlation with each other. Since both variables accomplish the same task, and have similar titles, we decided to only keep one variable for our data
analysis.
Elminating Variables Continued: Lifetime Total Reach by People Who’ve Liked Your Page vs. Lifetime Total Reach
plot(FacebookData$Lifetime.Post.reach.by.people.who.like.your.Page,FacebookData$Lifetime.Post.Total.Reach,
xlab = "Lifetime Total Reach to People Who Have Liked Your Page", ylab = "Lifetime Total Reach",
main = "Total Reach Correlation")
Just like before, these two variables have a strong positive correlation, and are very similar. This led us to remove one of the variables for our analysis
Basic Comparison of Paid vs. Not Paid
To continue our exploratory analysis, our objective was to get an idea of the relationship of the type of ad (Paid or Not Paid) compared to several other variables.
boxplot(FinalData$PageTotalLikes~FinalData$Paid,
xlab = "Paid", ylab= "PageTotalLikes", main = "Ad Data",
col = c("red", "blue"))
boxplot(FinalData$LifetimeEngagedUsers~FinalData$Paid,
xlab = "Paid", ylab= "Lifetime Engaged Users", main = "Ad Data",
col = c("red", "blue"))
boxplot(FinalData$TotalInteractions~FinalData$Paid,
xlab = "Paid", ylab= "Total Interactions", main = "Ad Data",
col = c("red", "blue"))
The purpose of creating these boxpolts was to give us a rough idea of the difference of influence that a paid ad had on certain variables. The above 3 plots compare what we determined were the most important variables (Total Reach, Total Likes, and Engaged Users). Simply put, our data determined there was no significant difference in influence on any of the variables if the ad was paid or not. Each plot has very similar outliers, interquartile ranges, and medians, showing no fluctuation between the categorical variables effect on the important quantitative variables.
Regression of Variables:
Our next analysis of the data was to evaluate the R squared values of the linear relationship between variables in the data.
Influence that a Paid Ad has on the number of Page Total Likes
results1<-lm(PageTotalLikes~Paid, FinalData)
summary(results1)
##
## Call:
## lm(formula = PageTotalLikes ~ Paid, data = FinalData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41791 -10530 6439 13232 16280
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 123161.3 857.4 143.648 <2e-16 ***
## Paid 396.5 1633.0 0.243 0.808
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16270 on 495 degrees of freedom
## Multiple R-squared: 0.0001191, Adjusted R-squared: -0.001901
## F-statistic: 0.05895 on 1 and 495 DF, p-value: 0.8083
qqnorm(results1$residuals, ylab="Residuals", main="Accuracy of Model")
qqline(results1$residuals)
As you can see in the summary, our R squared values show a very poor relationship between Page Total Likes and a paid ad. This agrees with our previous assumptions and analysis that a paid ad would not increase the number of likes your page receives.
Influence that a Paid Ad has on the number of Total Interactions
results2<-lm(TotalInteractions~Paid, FinalData)
summary(results2)
##
## Call:
## lm(formula = TotalInteractions ~ Paid, data = FinalData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -234.39 -127.39 -71.49 31.61 1942.61
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 186.49 13.89 13.430 <2e-16 ***
## Paid 47.90 26.45 1.811 0.0707 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 263.5 on 495 degrees of freedom
## Multiple R-squared: 0.006583, Adjusted R-squared: 0.004577
## F-statistic: 3.28 on 1 and 495 DF, p-value: 0.07072
qqnorm(results2$residuals, ylab="Residuals", main="Accuracy of Model")
qqline(results2$residuals)
We continued our regression analysis, finding the R squared value of the correlation between Total Interactions and a paid ad. Once again, the R squared value was extremely low, showing almost no relationship between the variable and a paid ad. Just like our first regression analysis, this result confirms our position of there being no statistical difference from an ad that was paid for and one that was not.
Naive Bayes Model: Can the model correctly determine the type of ad?
For our predictive analysis, we wanted to create a model that would help us evaluate our main business problem and have the ability to properly predict if an ad was paid or not based on the results of the influencing variables
library(e1071)
FinalData <- read.table("FinalData.csv", header = TRUE, sep = ",")
traindata <- as.data.frame(FinalData[1:498,])
testdata <- as.data.frame(FinalData[20,])
traindata
testdata
tprior <- table(traindata$Paid)
tprior
tprior <- tprior/sum(tprior)
tprior
PageTotalLikesCounts <- table(traindata[,c("Paid1", "PageTotalLikes")])
PageTotalLikesCounts
PageTotalLikesCounts <- PageTotalLikesCounts/rowSums(PageTotalLikesCounts)
PageTotalLikesCounts
LifetimePostTotalImpressionsCounts <- table(traindata[,c("Paid1", "LifetimePostTotalImpressions")])
LifetimePostTotalImpressionsCounts <- LifetimePostTotalImpressionsCounts/rowSums(LifetimePostTotalImpressionsCounts)
LifetimePostTotalImpressionsCounts
TotalInteractionsCounts <- table(traindata[,c("Paid1", "TotalInteractions")])
TotalInteractionsCounts <- TotalInteractionsCounts/rowSums(TotalInteractionsCounts)
TotalInteractionsCounts
prob_Paid <-
PageTotalLikesCounts["Paid",as.character(testdata[,c("PageTotalLikes")])]*
LifetimePostTotalImpressionsCounts["Paid",as.character(testdata[,c("LifetimePostTotalImpressions")])]*
TotalInteractionsCounts["Paid",as.character(testdata[,c("TotalInteractions")])]*
tprior["Paid"]
prob_NotPaid <-
PageTotalLikesCounts["NotPaid",as.character(testdata[,c("PageTotalLikes")])]*
LifetimePostTotalImpressionsCounts["NotPaid",as.character(testdata[,c("LifetimePostTotalImpressions")])]*
TotalInteractionsCounts["NotPaid",as.character(testdata[,c("TotalInteractions")])]*
tprior["NotPaid"]
model <- naiveBayes(Paid1 ~ PageTotalLikes+LifetimePostTotalImpressions+
TotalInteractions,traindata)
model
results <- predict(model,testdata)
results
## [1] NotPaid
## Levels: NotPaid Paid
Row20<- FinalData[20,c("Paid1", "Paid")]
Row20
## Paid1 Paid
## 20 Paid 1
As shown above, the model incorrectly predicted Row 20 of our data as being a Not Paid ad, when in fact it was paid for. This cofirms that the data does not differ between a paid ad and one that is not paid for. The model could have an incorrect prediciton because of the lack of variation in results to the categorical variables.
Conclusion
Our overall conclusion of the data given is that there is no benefit to a paid ad. There is no positive result to page likes, engaged users, total interactions, or other variables involved in the data set. This conclusion comes as a surprise as it does not follow the advertisement industry standard, where paid ads usually do give an advantage to those who want to monetize their advertisements.