Facebook Data: Is There a Benefit to a Paid Ad?

By Rachel Parker and Nick Esposito

Elminating Variables: Lifetime Post Total Impressions by People Who’ve Liked Your Page vs. Lifetime Post Total Impressions

Our first task was to sort through the data and analyse the varaibles. We wanted to remove like variables that   are highly correlated with each other to avoid redundancy in our evaluation.

plot(FacebookData$Lifetime.Post.Impressions.by.people.who.have.liked.your.Page,FacebookData$Lifetime.Post.Total.Impressions,
     xlab = "Impressions by People Who Have Liked Your Page", ylab = "Lifetime Total Impressions", 
     main = "Total Impression Correlation")

From the graph, one can tell that these variables have a strong positive correlation with each other. Since both variables accomplish the same task, and have similar titles, we decided to only keep one variable for our data 
analysis.

Elminating Variables Continued: Lifetime Total Reach by People Who’ve Liked Your Page vs. Lifetime Total Reach

plot(FacebookData$Lifetime.Post.reach.by.people.who.like.your.Page,FacebookData$Lifetime.Post.Total.Reach,
     xlab = "Lifetime Total Reach to People Who Have Liked Your Page", ylab = "Lifetime Total Reach", 
     main = "Total Reach Correlation")

Just like before, these two variables have a strong positive correlation, and are very similar. This led us to   remove one of the variables for our analysis

Basic Comparison of Paid vs. Not Paid

To continue our exploratory analysis, our objective was to get an idea of the relationship of the type of ad     (Paid or Not Paid) compared to several other variables.

boxplot(FinalData$PageTotalLikes~FinalData$Paid,
        xlab = "Paid", ylab= "PageTotalLikes", main = "Ad Data", 
        col = c("red", "blue"))

boxplot(FinalData$LifetimeEngagedUsers~FinalData$Paid,
        xlab = "Paid", ylab= "Lifetime Engaged Users", main = "Ad Data", 
        col = c("red", "blue"))

boxplot(FinalData$TotalInteractions~FinalData$Paid,
        xlab = "Paid", ylab= "Total Interactions", main = "Ad Data", 
        col = c("red", "blue"))

The purpose of creating these boxpolts was to give us a rough idea of the difference of influence that a paid ad had on certain variables. The above 3 plots compare what we determined were the most important variables (Total  Reach, Total Likes, and Engaged Users). Simply put, our data determined there was no significant difference in influence on any of the variables if the ad was paid or not. Each plot has very similar outliers, interquartile ranges, and medians, showing no fluctuation between the categorical variables effect on the important quantitative variables.

Regression of Variables:

Our next analysis of the data was to evaluate the R squared values of the linear relationship between variables  in the data.

Influence that a Paid Ad has on the number of Page Total Likes

results1<-lm(PageTotalLikes~Paid, FinalData)
summary(results1)

## 
## Call:
## lm(formula = PageTotalLikes ~ Paid, data = FinalData)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -41791 -10530   6439  13232  16280 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 123161.3      857.4 143.648   <2e-16 ***
## Paid           396.5     1633.0   0.243    0.808    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16270 on 495 degrees of freedom
## Multiple R-squared:  0.0001191,  Adjusted R-squared:  -0.001901 
## F-statistic: 0.05895 on 1 and 495 DF,  p-value: 0.8083

qqnorm(results1$residuals, ylab="Residuals", main="Accuracy of Model")
qqline(results1$residuals)

As you can see in the summary, our R squared values show a very poor relationship between Page Total Likes and a paid ad. This agrees with our previous assumptions and analysis that a paid ad would not increase the number of likes your page receives.

Influence that a Paid Ad has on the number of Total Interactions

results2<-lm(TotalInteractions~Paid, FinalData)
summary(results2)

## 
## Call:
## lm(formula = TotalInteractions ~ Paid, data = FinalData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -234.39 -127.39  -71.49   31.61 1942.61 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   186.49      13.89  13.430   <2e-16 ***
## Paid           47.90      26.45   1.811   0.0707 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 263.5 on 495 degrees of freedom
## Multiple R-squared:  0.006583,   Adjusted R-squared:  0.004577 
## F-statistic:  3.28 on 1 and 495 DF,  p-value: 0.07072

qqnorm(results2$residuals, ylab="Residuals", main="Accuracy of Model")
qqline(results2$residuals)

We continued our regression analysis, finding the R squared value of the correlation between Total Interactions  and a paid ad. Once again, the R squared value was extremely low, showing almost no relationship between the      variable and a paid ad. Just like our first regression analysis, this result confirms our position of there being no statistical difference from an ad that was paid for and one that was not.

Naive Bayes Model: Can the model correctly determine the type of ad?

For our predictive analysis, we wanted to create a model that would help us evaluate our main business problem   and have the ability to properly predict if an ad was paid or not based on the results of the influencing        variables

library(e1071)

FinalData <- read.table("FinalData.csv", header = TRUE, sep = ",")
traindata <- as.data.frame(FinalData[1:498,])
testdata <- as.data.frame(FinalData[20,])
traindata
testdata

tprior <- table(traindata$Paid)
tprior
tprior <- tprior/sum(tprior)
tprior

PageTotalLikesCounts <- table(traindata[,c("Paid1", "PageTotalLikes")])
PageTotalLikesCounts

PageTotalLikesCounts <- PageTotalLikesCounts/rowSums(PageTotalLikesCounts)
PageTotalLikesCounts

LifetimePostTotalImpressionsCounts <- table(traindata[,c("Paid1", "LifetimePostTotalImpressions")])
LifetimePostTotalImpressionsCounts <- LifetimePostTotalImpressionsCounts/rowSums(LifetimePostTotalImpressionsCounts)
LifetimePostTotalImpressionsCounts

TotalInteractionsCounts <- table(traindata[,c("Paid1", "TotalInteractions")])
TotalInteractionsCounts <- TotalInteractionsCounts/rowSums(TotalInteractionsCounts)
TotalInteractionsCounts

prob_Paid <-
  PageTotalLikesCounts["Paid",as.character(testdata[,c("PageTotalLikes")])]*
  LifetimePostTotalImpressionsCounts["Paid",as.character(testdata[,c("LifetimePostTotalImpressions")])]*
  TotalInteractionsCounts["Paid",as.character(testdata[,c("TotalInteractions")])]*
  tprior["Paid"]

prob_NotPaid <-
  PageTotalLikesCounts["NotPaid",as.character(testdata[,c("PageTotalLikes")])]*
  LifetimePostTotalImpressionsCounts["NotPaid",as.character(testdata[,c("LifetimePostTotalImpressions")])]*
  TotalInteractionsCounts["NotPaid",as.character(testdata[,c("TotalInteractions")])]*
  tprior["NotPaid"]

model <- naiveBayes(Paid1 ~  PageTotalLikes+LifetimePostTotalImpressions+
                      TotalInteractions,traindata)

model

results <- predict(model,testdata)
results

## [1] NotPaid
## Levels: NotPaid Paid

Row20<- FinalData[20,c("Paid1", "Paid")]
Row20

##    Paid1 Paid
## 20  Paid    1

As shown above, the model incorrectly predicted Row 20 of our data as being a Not Paid ad, when in fact it was   paid for. This cofirms that the data does not differ between a paid ad and one that is not paid for. The model   could have an incorrect prediciton because of the lack of variation in results to the categorical variables.

Conclusion

Our overall conclusion of the data given is that there is no benefit to a paid ad. There is no positive result to page likes, engaged users, total interactions, or other variables involved in the data set. This conclusion comes as a surprise as it does not follow the advertisement industry standard, where paid ads usually do give an       advantage to those who want to monetize their advertisements.

Facebook Data: Is There a Benefit to a Paid Ad?

April 10, 2020

By Rachel Parker and Nick Esposito