Marketing Analysis using R

Introduction

Facebook is one of the biggest online advertisers’ platforms that helps advertisers track the success of their campaigns and also helps add valuable information such as demographic information.

advertisers can create Facebook campaign targeted towards specific Audience. Targeting allows advertisers to tailor content appropriately to that specific audience.

This project will focus on analyzing from a business perspective the different campaigns and ads that were created in Facebook and their specific targets that give answers towards the success of the different ads generated.

It is very important to understand the different campaign objectives of a company as well as its key performance indicators (KPIs) so this project will explore this dataset that contains the following elements:

ad_id
xyz_campaign_id
fb_campaign_id
age
gender
Interest ID
Impressions
Clicks
Spent
Total_Conversion
Approved_Conversion

Dataset Cleaning

After understanding the scope of the project, it is important to clean the dataset and identify the different ways the information can be used.

#Lets import our libraries
library(dplyr) #Used to manipulate data in R
library(readr) #Used to reAd csv files
library(DataExplorer) #Used To explore data
library(DT) #Used to display tables in Rmarkdown
library(ggplot2) #Used to display different plots and graphs
library(heatmaply) #Used for Heatmaps
library(kableExtra) #Used for tables

#Lets bring our data
data <- read.csv("KAG__data.csv", stringsAsFactors = FALSE)

#Lets drop fb_campaign_id
data<- data %>% select(-fb_campaign_id)

#Lets Rename xyz_campaign_id
data<- data %>% rename(campaign_id=xyz_campaign_id)

#Lets Rename Interest
data<- data %>% rename(Interest_ID = interest)

#Look for missing data
plot_missing(data)

There is no data missing from the dataset, so the project can continue without an issue.

Creating additional features

While the dataset includes relevant information there are some useful marketing metrics missing that can be added to drive more useful information from this dataset.

Click-through-rate (CTR): Refers to percentage of impressions that became clicks. This metric helps understanding the success of an Ad and how engaging they are.
Cost Per Click (CPC): Denotes how much did each click cost. This is useful for advertisers who have a set budget and do not want to overspent.
Cost Per Conversion: Shows how much it costs to get a customer to convert. This gives an overview to get a feel of the campaign effectiveness.
Cost Per Thousand/Mille (CPM): Defines the price of 1,000 advertisement impression on one ad. It is useful for campaigns that are focused on brand awareness.

All these metrics are very useful in advertising campaigns as they may factor into the overall return on investment (ROI).

#Lets calculate the different variables for our calculations
data <- data %>% mutate(CTR = round(((Clicks / Impressions) * 100),2), 
                        CPC = ifelse(Clicks != 0, round(Spent / Clicks,2), Spent), 
                        CostPerConv_Total = ifelse(Total_Conversion !=0,round(Spent/Total_Conversion,2),Spent),
                        CostPerConv_Approved = ifelse(Approved_Conversion !=0,round(Spent/Approved_Conversion,2),Spent),
                        CPM = round((Spent / Impressions) * 1000, 2) )

#Lets remove some columns to see only columns that analyzes clicks and Impressions' metrics
desc_data <- data %>% select(-Total_Conversion, -Approved_Conversion, -CostPerConv_Total, -CostPerConv_Approved) 

#Descriptive Statistics of this dataset
DT::datatable(head(desc_data, 25),
              rownames = FALSE,
              options = list(
                pageLength = 10))

Facebook ads Performance Analysis

There are more than 1,000 ads in the dataset, so the project will focused on understanding which ads have performed well and which did not.

Below we will explore the top 10 ads by number of impressions

#Top 10 ad_ids by number of impressions
Top_10 <- data[order(data$Impressions,decreasing = TRUE),] %>% select(ad_id, CPC, CPM, Impressions) %>% head(10)

DT::datatable(head(Top_10, 10),
              rownames = FALSE,
              options = list(
                pageLength = 10))

Below we can identify the bottom 10 ads by number of impressions

#Bottom 10 ad_ids by number of impressions
Bottom_10 <- data[order(data$Impressions,decreasing = FALSE),] %>% select(ad_id, CPC, CPM, Impressions) 

DT::datatable(head(Bottom_10, 10),
              rownames = FALSE,
              options = list(
                pageLength = 10))

Below we can identify the Ad with the least CPC that leAd to the most number impressions and the Ad that led to the least number impressions.

#Ad with least CPC that leAd to most impressions
data %>% 
  filter(CPC == min(CPC)) %>% 
  filter(Impressions == max(Impressions)) %>% 
  select(ad_id, CPC, Impressions, Clicks) %>%
  kable() %>%
  kable_styling(c("striped", "bordered"))

ad_id	CPC	Impressions	Clicks
1121094	0	24362	0

#Ad with highest CPC that leAd to least impressions
data %>% 
  filter(CPC == max(CPC)) %>%
  filter(Impressions == min(Impressions)) %>% 
  select(ad_id, CPC, Impressions, Clicks) %>%
  kable() %>%
  kable_styling(c("striped", "bordered"))

ad_id	CPC	Impressions	Clicks
1121223	2.21	64020	5

Finally we can search for the campaigns that spent most and least efficiently on brand awareness

#What campaign spent least efficiently on brand awareness on an average
data %>% 
  filter(CPM == max(CPM)) %>% 
  filter(Impressions == min(Impressions)) %>% 
  select(campaign_id, CPM, Impressions, Clicks) %>%
  kable() %>%
  kable_styling(c("striped", "bordered"))

campaign_id	CPM	Impressions	Clicks
936	1.5	944	1

#What campaign spent most efficiently on brand awareness on an average
data %>% filter(CPM == min(CPM)) %>% 
  filter(Impressions == max(Impressions)) %>% 
  select(campaign_id, CPM, Impressions, Clicks) %>%
  kable() %>%
  kable_styling(c("striped", "bordered"))

campaign_id	CPM	Impressions	Clicks
1178	0	24362	0

Return on advertising Spent

Another useful information that many advertisers are interested in knowing is the Return on advertising Spent (ROAS) which helps looking at the revenue as a percentage of the advertising spent.

The following formula will be used for ROAS:

ROAS = Revenue / Cost = (5* Total_Conversion + 50* Approved_Conversion) / Cost

#Lets calculate ROAS
data_new <-
  data %>%
  mutate(totConv = Total_Conversion + Approved_Conversion, conVal = Total_Conversion * 5, appConVal = Approved_Conversion *50) %>%
  mutate(totConVal = conVal + appConVal) %>%
  mutate(ROAS = round(totConVal / Spent, 2)) %>%
  filter(ROAS != 'Inf') %>%
  filter(ROAS != 'NaN') 

# Lets build a Boxplot based on gender
data_new %>%
  ggplot(aes(x = as.factor(campaign_id), y = ROAS, fill = as.factor(gender))) + 
  geom_boxplot() + 
  scale_y_log10() + 
  ggtitle("ROAS Boxplot") +
  theme(plot.title = element_text(hjust = 0.5))+
  theme(
plot.title = element_text(color="dark green", size=14, face="bold.italic"),
axis.title.x = element_text(color="blue", size=14, face="bold"),
axis.title.y = element_text(color="dark red", size=14, face="bold"))+
  labs(x = 'Campaign ID', y = 'ROAS') +
  scale_fill_discrete(name="Gender")

# ROAS by gender for the different campaigns
data_new %>%
  select(campaign_id, gender, ROAS) %>%
  group_by(campaign_id, gender) %>%
  summarise(median.ROAS = median(ROAS), mean.ROAS = mean(ROAS))%>%
  kable() %>%
  kable_styling(c("striped", "bordered"))

campaign_id	gender	median.ROAS	mean.ROAS
916	F	4.100	10.914615
916	M	5.420	13.318182
936	F	2.360	8.556011
936	M	5.205	16.441091
1178	F	0.730	1.580846
1178	M	1.190	2.549003

Given the finding above, it can be assumed that it would be optimal to focus on displaying these Facebook ads in males as this audience drives a higher ROAS for all campaigns.

This analysis is still very premature in the interpretation of this project, so a deeper analysis would need to be performed before jumping towards that conclusion.

Correlation Analysis

#Lets create a trimmed dataset 
dataTrim <- data_new %>%
  select(CTR, CPC, CostPerConv_Approved, CostPerConv_Total, CPM, ROAS, Impressions, Spent, Clicks)

#After omitting missing values, and normalize data, lets calculate correlations and plot a heatmap
heatmaply_cor(cor(normalize(na.omit(dataTrim))))

From the heatmap above it is possible to interpret that there is a positive correlation between the amount spent and the number of Impressions and clicks gotten from an ad. The more it is spent the more clicks and conversion seem to follow.

#Lets do a boxplot to determine the campaign where the most money was spent
ggplot(data, aes(as.factor(campaign_id), Spent)) +
  geom_boxplot(fill = "#086CA2", color = "black") + 
  scale_y_log10() +
  ggtitle("Spent Boxplot by Campaign ID") +
  theme(plot.title = element_text(hjust = 0.5))+
  theme(
    panel.border = element_blank(),
    panel.background = element_rect(colour = "black"),
plot.title = element_text(color="dark green", size=14, face="bold.italic"),
axis.title.x = element_text(color="blue", size=14, face="bold"),
axis.title.y = element_text(color="dark red", size=14, face="bold"))+
  labs(x = "Campaign ID", y = "Advertising Spend")

From the boxplot above we can easily observe that campaign 1178 spent the most in advertisement. We will now focus on exploring impactful insights about this campaign ID.

Campaign 1178 Analysis

For any advertising analysis it is very important to first understand the campaign objectives and goals before making decisions with the insights driven by the data analysis provided.

As there is not a specific context for this dataset this project will assume that this is an e-commerce business that is focused on maximizing revenue.

The overall objective of this campaign will be to find out whether spending more money will drive higher conversion and a higher ROAS on the different advertisements.

#Lets filter our Campaign ID '1178'
data_1178 <- data_new %>% 
  filter(campaign_id == 1178) %>%
  select(-campaign_id)

#Lets look at the distribution of this data
plot_bar(data_1178,ggtheme = theme_classic())

plot_histogram(data_1178,ggtheme = theme_classic())

#Lets also see the correlation matrix for this campaign
plot_correlation(data_1178)

#Lets now see an overview of the success of conversion and ROAS in relation to the amount spent.

ggplot(data_1178, aes(Spent, ROAS), ggtheme = theme_classic()) + 
  geom_point() + 
  geom_smooth(method = "lm") +
  scale_y_log10() +
  labs(x = "Amount spent on campaign", y = "ROAS")

ggplot(data_1178, aes(Spent, CostPerConv_Total), ggtheme = theme_light()) +
  geom_point() + 
  scale_y_log10() +
  geom_smooth(method = "lm") +
  labs(x = "Amount spent on campaign", y = "Conversion Value")

By looking into the different graphs above we get a better understanding of the different variables involved in this campaign. This is helpful in the project as it gives an overview of the appropriate analysis that would need to be made for this campaign.

It can be interpreted that the more amount spent on this campaign the more conversion it would follow. It is noticeable that spending more on this campaign can have a suboptimal effect on the ROAS. Which means that the more is spent on an Ad the lesser the Return on advertising spent.

ROAS by Gender

#Lets build a boxplot with gender and ROAS
ggplot(data_1178, aes(gender, ROAS, fill =gender)) + 
  geom_boxplot() + 
  scale_y_log10()+
ggtitle("ROAS Boxplot") +
  theme(plot.title = element_text(hjust = 0.5))+
  theme(
plot.title = element_text(color="dark green", size=14, face="bold.italic"),
axis.title.x = element_text(color="dark blue", size=14, face="bold"),
axis.title.y = element_text(color="dark red", size=14, face="bold")) +
  labs(x = 'Gender', y = 'ROAS') +
  scale_fill_discrete(name="Gender")

#Now lets see which gender has a higher ROAS mean
data_1178 %>%
  select(gender, ROAS) %>%
  group_by(gender) %>%
  summarise(median.ROAS = median(ROAS), mean.ROAS = mean(ROAS))%>%
  kable() %>%
  kable_styling(c("striped", "bordered"))

gender	median.ROAS	mean.ROAS
F	0.73	1.580846
M	1.19	2.549003

The ROAS for campaign ID 1178 is higher for males than it is for females. The ROAS differences in gender are noticeable, thus it could have a big impact on the success of a marketing campaign profitability.

Analysis by Interest

After seeing the differences between gender, it is possible to distinguish that there are specific targets in the dataset that could be further explored in order to recommend better outcomes in the campaign.

#Lets take a look at the amount of clicks per Interest ID
ggplot(data_1178, aes(Interest_ID,Clicks))  +
  geom_bar(stat = 'identity') +
  ggtitle("ROAS by Interest ID") +
  theme(plot.title = element_text(hjust = 0.5))+
  theme(
panel.background = element_rect(fill = "#BFD5E3", colour = "#6D9EC1"),
plot.title = element_text(color="dark green", size=14, face="bold.italic"),
axis.title.x = element_text(color="dark blue", size=14, face="bold"),
axis.title.y = element_text(color="dark red", size=14, face="bold")) +
  labs(x = 'Interest ID', y = 'ROAS')

#Lets take a look at the amount of ROAS per Interest ID
ggplot(data_1178, aes(Interest_ID,ROAS))  +
  geom_bar(stat = 'identity') +
  ggtitle("ROAS by Interest ID") +
  theme(plot.title = element_text(hjust = 0.5))+
  theme(
panel.background = element_rect(fill = "#BFD5E3", colour = "#6D9EC1"),
plot.title = element_text(color="dark green", size=14, face="bold.italic"),
axis.title.x = element_text(color="dark blue", size=14, face="bold"),
axis.title.y = element_text(color="dark red", size=14, face="bold")) +
  labs(x = 'Interest ID', y = 'ROAS')

#Lets further explore and look at our best performers by ROAS
Interest_1178 <- data_1178 %>%
  select(Interest_ID, ROAS, Clicks) %>%
  group_by(Interest_ID) %>%
  summarise(median.ROAS = median(ROAS),
            mean.ROAS = round(mean(ROAS),2), 
            clicks = sum(Clicks)) %>%
  arrange(desc(mean.ROAS))
  
#Lets display our top 10 Interest ID by Mean ROAS
  DT::datatable(head(Interest_1178, 10),
              rownames = FALSE,
              options = list(
                pageLength = 10))

From the table above we can see that Interest IDs 104 & 101 have the strongest ROAS mean with a good number of clicks.

This project will also further explore Interest ID 15 as it brought 1,554 clicks and has a decent mean of ROAS.

Analysis by Gender

Having identified the Interest IDs with the strongest ROAS and the strongest clicks the project will focus on breaking apart this data into gender to see the differences between males and females audiences for the different advertisements.

#Lets build a boxplot for these interest ID and split by gender
data_1178 %>%
  filter(Interest_ID == 104 | Interest_ID == 101 | Interest_ID == 15) %>%
  ggplot(aes(x = as.factor(Interest_ID), y = ROAS, fill = gender)) + 
  geom_boxplot() + 
  scale_y_log10() +
  ggtitle("ROAS Boxplot by Gender") +
  theme(plot.title = element_text(hjust = 0.5))+
  theme(
plot.title = element_text(color="dark green", size=14, face="bold.italic"),
axis.title.x = element_text(color="dark blue", size=14, face="bold"),
axis.title.y = element_text(color="dark red", size=14, face="bold")) +
  labs(x = 'Interest ID', y = 'ROAS') +
  scale_fill_discrete(name="Gender")

#Lets further explore and look at our best performers by ROAS
Gender_1178 <- data_1178 %>%
  select(Interest_ID, gender, ROAS, Clicks) %>%
  group_by(Interest_ID, gender) %>%
  filter(Interest_ID == 104 | Interest_ID == 101 | Interest_ID == 15) %>%
  summarise(median.ROAS = median(ROAS),
            mean.ROAS = round(mean(ROAS),2), 
            clicks = sum(Clicks)) %>%
  arrange(desc(mean.ROAS))
  
#Lets display our top Mean ROAS split by Interest ID and Gender
  Gender_1178%>%
  kable() %>%
  kable_styling(c("striped", "bordered"))

Interest_ID	gender	median.ROAS	mean.ROAS	clicks
104	M	18.785	18.78	21
101	M	18.580	16.04	17
15	M	1.640	4.96	827
15	F	1.195	3.41	727
101	F	1.890	1.86	507
104	F	1.850	1.33	244

By looking at the analysis above including the boxplot and the table created, it is easy to identify an opportunity to increase budget expenditure to display advertisement on males that have an Interest ID of 101 and 104 as they are both showing promising mean and median ROAS.

Nonetheless, the data does suggest that there was only a small number of clicks on those ads which means that further investigation needs to be performed in the long run to see if the ROAS is maintained over time.

Lastly, it is also noticeable that Interest ID 15 has brought a lot of attention having a great number of clicks with a good mean ROAS for both females and males, so increasing budget for this specific targeted audience would also be recommended.

Analysis by Age

This project has been able to break apart the data into both Interest ID and Gender for campaign 1178, but there is still one more demographic that could be explored in this dataset to better target the Facebook ads.

#Lets build a boxplot for Interest ID 15, 101 & 104

data_1178 %>%
  filter(Interest_ID == 104 | Interest_ID == 101 | Interest_ID == 15) %>%
  ggplot(aes(x = as.factor(age), y = ROAS, fill = as.factor(Interest_ID))) + 
  geom_boxplot() + 
  scale_y_log10() +
  ggtitle("ROAS Boxplot by Age Group") +
  theme(plot.title = element_text(hjust = 0.5))+
  theme(
plot.title = element_text(color="dark green", size=14, face="bold.italic"),
axis.title.x = element_text(color="dark blue", size=14, face="bold"),
axis.title.y = element_text(color="dark red", size=14, face="bold")) +
  labs(x = 'Age Group', y = 'ROAS') +
  scale_fill_discrete(name="Interest\nID")

#Lets further explore and look at our best performers by ROAS
Age_1178 <- data_1178 %>%
  select(Interest_ID, age, ROAS, Clicks) %>%
  group_by(age, Interest_ID) %>%
  filter(Interest_ID == 104 | Interest_ID == 101 | Interest_ID == 15) %>%
  summarise(median.ROAS = median(ROAS),
            mean.ROAS = round(mean(ROAS),2), 
            clicks = sum(Clicks)) %>%
  arrange(desc(mean.ROAS))
  
#Lets display our top Mean ROAS split by Interest ID & Age Group
  DT::datatable(head(Age_1178, 20),
              options = list(
                pageLength = 10))

From the analysis that was conducted above the best ROAS is coming from the 40-44 age group on Interest ID 104 with the highest Mean and Median ROAS and with 59 clicks. Thus, it can be concluded that there could be an increase in budget expenditure on this specific audience.

Following that analysis, the strongest age group seems to be both 30-34 and 35-39 which means that targeting people in their 30s could be the smartest move as it is a big demographic on all Interest IDs with an adequate Mean and Median ROAS and strong number of clicks.

Lastly, it is possible to point out that the age that received the lowest performance for these Interest ID is the age group of 45-49 thus it would be recommended to decrease the expenditure in this age group.

Conclusion

This project analyzed different components involved in various campaigns from Facebook advertisements. The analysis of this project gives a better overview on how to analyze from different variables involved in the Return of Investment (ROI) of digital campaigns.

This research gives a glimpse on the different analysis that can be done using digital advertising datasets or other marketing purposes. The measure of success will always be driven by the company’s marketing objectives and the business model of the organization.

Highlights

Campaign Objectives and business model is fundamental before being able to provide any insighs based on the dataset examined
The most impactful campaign ID from this dataset is 1178 which was explored in detailed
Interest IDs 104, 101 & 15 showed the strongest performance in the different advertisement for campaign 1178
An increase in budget expenditure for males with interest IDs 104 & 101 could bring optimal performance
Interest ID 15 shows promising performance in both males and females
Age group 40-44 with Interest ID 104 has a high Median & Mean ROAS so increasing expenditure in that targeted aucience would be beneficial
Strong recommendation to target people in their 30s as this age group shows promising ROAS and number of clicks for them

Facebook Ad Dataset

Nicolas Kaswalder

May 13th 2020