From its very advent , social media has had an important impact on all of us. Social media started out as a method for people to either connect or reconnect with each other. At this point, social media has become much more. Business people use marketing to successfully grow their businesses and to get their word out in a tremendous way. The most effective marketing approach is one that uses social media and traditional marketing in tandem. Business owners have figured out that social media marketing (SMM) has a very positive effect on the success of business and it is a method that takes very little money to accomplish a solid end result. The fact is that the clients are hanging out in the online social communitie, and this makes it the breeding ground for comapnies to advertise their products.
My study concerns facebook advertisements of a certain comapny which only sells online. Having no influence on the retail market, their only means of revenue is generated via users making an approved conversion after looking at the product. Here, the interest field in the data is only not about the hard sell. It is about building relationships with others with the similar interests who will eventually become future customers. It is all about people and solving their problems. The more impressions the company can make on the people people, the more they will begin to trust it, believe in their credibility, want to do business with them, and ultimately become loyal customers.Thus, tyhe frequency of these advertisements are of great importance.
The specific objective of this Study was to investigate the advertising strategy employed by the company in areas if diverse interests, and to record the type of response they receive in the form of clicks. Our goal was to compare the money spent on these campaigns, the clicks received on their advertisements, and the response of the customers in the form of actually ordering the products, which determine the efficacy of each advertisement on facebook.
In this Study, I have also categorised the market diversification via numeric representation of an estimate of the people who show interest in buying these products.
Accordingly, we construct the following hypothesis:
Hypothesis H1: The Sales of any product increase when the click rate, frequency of advertisements and Money Spent for that product increases from the customers who express interest in that field.
For this study, I have collected data from the comapny sales statistics. (https://www.kaggle.com/loveall/clicks-conversion-tracking). The file conversion_data.csv contains 1143 observations in 11 variables. Below are the descriptions of the variables.
1.) ad_id: an unique ID for each ad.
2.) xyz_campaign_id: an ID associated with each ad campaign of XYZ company.
3.) fb_campaign_id: an ID associated with how Facebook tracks each campaign.
4.) age: age of the person to whom the ad is shown.
5.) gender: gender of the person to whim the add is shown
6.) interest: a code specifying the category to which the person’s interest belongs (interests are as mentioned in the person’s Facebook public profile).
7.) Impressions: the number of times the ad was shown.
8.) Clicks: number of clicks on for that ad.
9.) Spent: Amount paid by company xyz to Facebook, to show that ad.
10.) Total conversion: Total number of people who enquired about the product after seeing the ad.
11.) Approved conversion: Total number of people who bought the product after seeing the ad.
In order to test Hypothesis 1a, we proposed the following model:
\[Approved Conversion= \ Impressions+ \ Spent + \ interest + \ Clicks + \ Total Conversion + \epsilon\]
setwd("C:/Users/Ayush/Desktop/IIM LUCKNOW INTERNSHIP/CSV files")
sales.df=read.csv(paste("sales.csv",sep=""))
attach(sales.df)
str(sales.df)
## 'data.frame': 1143 obs. of 11 variables:
## $ ad_id : int 708746 708749 708771 708815 708818 708820 708889 708895 708953 708958 ...
## $ xyz_campaign_id : int 916 916 916 916 916 916 916 916 916 916 ...
## $ fb_campaign_id : int 103916 103917 103920 103928 103928 103929 103940 103941 103951 103952 ...
## $ age : Factor w/ 4 levels "30-34","35-39",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ interest : int 15 16 20 28 28 29 15 16 27 28 ...
## $ Impressions : int 7350 17861 693 4259 4133 1915 15615 10951 2355 9502 ...
## $ Clicks : int 1 2 0 1 1 0 3 1 1 3 ...
## $ Spent : num 1.43 1.82 0 1.25 1.29 ...
## $ Total_Conversion : int 2 2 1 1 1 1 1 1 1 1 ...
## $ Approved_Conversion: int 1 0 0 0 1 1 0 1 0 0 ...
dim(sales.df)
## [1] 1143 11
colnames(sales.df)
## [1] "ad_id" "xyz_campaign_id" "fb_campaign_id"
## [4] "age" "gender" "interest"
## [7] "Impressions" "Clicks" "Spent"
## [10] "Total_Conversion" "Approved_Conversion"
summary(sales.df)
## ad_id xyz_campaign_id fb_campaign_id age gender
## Min. : 708746 Min. : 916 Min. :103916 30-34:426 F:551
## 1st Qu.: 777633 1st Qu.: 936 1st Qu.:115716 35-39:248 M:592
## Median :1121185 Median :1178 Median :144549 40-44:210
## Mean : 987261 Mean :1067 Mean :133784 45-49:259
## 3rd Qu.:1121805 3rd Qu.:1178 3rd Qu.:144658
## Max. :1314415 Max. :1178 Max. :179982
## interest Impressions Clicks Spent
## Min. : 2.00 Min. : 87 Min. : 0.00 Min. : 0.00
## 1st Qu.: 16.00 1st Qu.: 6504 1st Qu.: 1.00 1st Qu.: 1.48
## Median : 25.00 Median : 51509 Median : 8.00 Median : 12.37
## Mean : 32.77 Mean : 186732 Mean : 33.39 Mean : 51.36
## 3rd Qu.: 31.00 3rd Qu.: 221769 3rd Qu.: 37.50 3rd Qu.: 60.02
## Max. :114.00 Max. :3052003 Max. :421.00 Max. :639.95
## Total_Conversion Approved_Conversion
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 1.000 1st Qu.: 0.000
## Median : 1.000 Median : 1.000
## Mean : 2.856 Mean : 0.944
## 3rd Qu.: 3.000 3rd Qu.: 1.000
## Max. :60.000 Max. :21.000
View(sales.df)
columns=sales.df[,c("interest","Impressions","Clicks","Spent","Total_Conversion","Approved_Conversion")]
n=cor(columns)
library(corrplot)
## Warning: package 'corrplot' was built under R version 3.3.3
## corrplot 0.84 loaded
corrplot(n,method="circle")
r=n
round(r,2)
## interest Impressions Clicks Spent Total_Conversion
## interest 1.00 0.10 0.09 0.07 0.12
## Impressions 0.10 1.00 0.95 0.97 0.81
## Clicks 0.09 0.95 1.00 0.99 0.69
## Spent 0.07 0.97 0.99 1.00 0.73
## Total_Conversion 0.12 0.81 0.69 0.73 1.00
## Approved_Conversion 0.06 0.68 0.56 0.59 0.86
## Approved_Conversion
## interest 0.06
## Impressions 0.68
## Clicks 0.56
## Spent 0.59
## Total_Conversion 0.86
## Approved_Conversion 1.00
library(corrgram)
## Warning: package 'corrgram' was built under R version 3.3.3
corrgram(columns,upper.panel=panel.pie)
library(car)
## Warning: package 'car' was built under R version 3.3.3
scatterplotMatrix(~Approved_Conversion+Total_Conversion+Spent+Clicks,main="Approved Sales versus other factors")
scatterplotMatrix(~Approved_Conversion+Total_Conversion+Impressions+interest,main="Approved Sales versus Impressions and Interest factors")
Null hypothesis: Number of clicks did not affect the Approved COnversion
Alternate Hypothesis: The clicks affect the Approved Hypothesis DIRECTLY.
cor(Clicks,Approved_Conversion)
## [1] 0.5595258
t.test(Clicks, Approved_Conversion)
##
## Welch Two Sample t-test
##
## data: Clicks and Approved_Conversion
## t = 19.272, df = 1144.1, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 29.14294 35.74945
## sample estimates:
## mean of x mean of y
## 33.390201 0.944007
Therefore Null hypothesis is rejected, The clicks affect the Approved Hypothesis DIRECTLY.
Null hypothesis: Amount of money Invested did not affect the Approved COnversion
Alternate Hypothesis: The Spent amount affect the Approved Hypothesis DIRECTLY.
cor(Spent,Approved_Conversion)
## [1] 0.5931778
t.test(Spent, Approved_Conversion)
##
## Welch Two Sample t-test
##
## data: Spent and Approved_Conversion
## t = 19.609, df = 1142.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 45.37197 55.46133
## sample estimates:
## mean of x mean of y
## 51.360656 0.944007
Therefore Null hypothesis is rejected, The Amount spent is a good investment as it affect the Approved Hypothesis DIRECTLY.
Null hypothesis: Frequency of Advertisement occuring did not affect the Approved COnversion
Alternate Hypothesis:Frequency of Advertisement occuring affects the Approved Hypothesis DIRECTLY.
cor(Impressions,Approved_Conversion)
## [1] 0.6842485
t.test(Impressions, Approved_Conversion)
##
## Welch Two Sample t-test
##
## data: Impressions and Approved_Conversion
## t = 20.185, df = 1142, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 168580.2 204882.2
## sample estimates:
## mean of x mean of y
## 1.867321e+05 9.440070e-01
Therefore Null hypothesis is rejected, Sales increase with the frequency of advertisements.
library(corrplot)
columns=sales.df[,c("interest","Impressions","Clicks","Spent","Total_Conversion","Approved_Conversion")]
corrplot(cor(columns),method="circle")
Here, y= Approved_Conversion -> To see the dependancy that variables hold on the final sales
1st Model (All Variables):
model1=Approved_Conversion ~ Impressions+Spent+interest+Clicks+Total_Conversion
fit1=lm(model1,data=sales.df )
summary(fit1)
##
## Call:
## lm(formula = model1, data = sales.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7621 -0.3926 -0.2524 0.5997 6.5563
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.103e-01 4.358e-02 2.532 0.01149 *
## Impressions 1.287e-06 5.105e-07 2.521 0.01183 *
## Spent -5.751e-04 4.054e-03 -0.142 0.88723
## interest -3.001e-03 9.980e-04 -3.007 0.00269 **
## Clicks -6.805e-03 4.608e-03 -1.477 0.14008
## Total_Conversion 3.321e-01 1.096e-02 30.298 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8629 on 1137 degrees of freedom
## Multiple R-squared: 0.7545, Adjusted R-squared: 0.7534
## F-statistic: 698.9 on 5 and 1137 DF, p-value: < 2.2e-16
Accordingly, we chose Interest, IMpressions and Total_conversion as the variables for our Regression Model.
Model 2:
model2=Approved_Conversion ~ Impressions+interest+Total_Conversion
fit2=lm(model2,data=sales.df )
summary(fit2)
##
## Call:
## lm(formula = model2, data = sales.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0739 -0.3849 -0.2497 0.6098 7.4258
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.606e-02 4.263e-02 2.019 0.04375 *
## Impressions -2.927e-07 1.414e-07 -2.070 0.03869 *
## interest -2.966e-03 9.630e-04 -3.080 0.00212 **
## Total_Conversion 3.536e-01 9.886e-03 35.769 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8707 on 1139 degrees of freedom
## Multiple R-squared: 0.7496, Adjusted R-squared: 0.7489
## F-statistic: 1137 on 3 and 1139 DF, p-value: < 2.2e-16
Visualising the Beta Coefficients and Their Confidence Intervals from model 2:
library(coefplot)
## Warning: package 'coefplot' was built under R version 3.3.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.3.3
coefplot(fit2,intercept=FALSE,outerCI=1.96,coefficients=c("Impressions","interest","Total_Conversion"))
## Warning: Ignoring unknown aesthetics: xmin, xmax
summary(fit1)$adj.r.squared
## [1] 0.7534337
summary(fit2)$adj.r.squared
## [1] 0.7489433
AIC(fit1)
## [1] 2914.504
AIC(fit2)
## [1] 2933.141
Therefore the hypothesis that Approved Conversion or the final consumption depends directly on the:
Which determines that the social media advertising is a good investment, if the impression is made on the correct interest/niche of audience.
Model 1 fits better than Model 2 due to lesser AIC value.
Best Fit: y=Approved_Conversion ~ Impressions+Spent+interest+Clicks+Total_Conversion
This paper was motivated by the need for research that could improve my understanding of how social media advertising influences thesales of products in the online shopping industry. The unique contribution of this paper is that I investigated that the Social Median Advertising is a group of operations and methods used to generate publicity through social media channels and Internet communities. Social media advertising is the planning and executing of advertising campaigns through those channels. The reason that the face of marketing is changing so drastically is that the marketers understand that they need to go wherever the clients are. The fact is that the clients are hanging out in the online social communities.
https://www.compukol.com/the-impact-of-social-media-on-advertising/
table(gender) # target audience sex.
## gender
## F M
## 551 592
table(age) #knowing which age group gets most influenced by social media ads.
## age
## 30-34 35-39 40-44 45-49
## 426 248 210 259
table(xyz_campaign_id)
## xyz_campaign_id
## 916 936 1178
## 54 464 625
table(interest) # what interest niche works most via social media campaigns
## interest
## 2 7 10 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30
## 25 24 85 51 140 43 32 49 36 33 23 24 26 41 60 51 77 25
## 31 32 36 63 64 65 66 100 101 102 103 104 105 106 107 108 109 110
## 25 33 21 46 48 19 11 6 7 7 5 5 7 5 8 7 6 8
## 111 112 113 114
## 6 7 6 5
xtabs(~gender+xyz_campaign_id)
## xyz_campaign_id
## gender 916 936 1178
## F 19 256 276
## M 35 208 349
xtabs(~gender+interest)#which range of products satisfy which gender more
## interest
## gender 2 7 10 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 36 63
## F 8 10 45 21 68 24 15 18 18 18 8 10 16 23 37 21 32 12 11 15 3 28
## M 17 14 40 30 72 19 17 31 18 15 15 14 10 18 23 30 45 13 14 18 18 18
## interest
## gender 64 65 66 100 101 102 103 104 105 106 107 108 109 110 111 112 113
## F 27 9 2 3 4 4 3 3 3 4 4 4 4 4 4 3 2
## M 21 10 9 3 3 3 2 2 4 1 4 3 2 4 2 4 4
## interest
## gender 114
## F 3
## M 2
aggregate(Total_Conversion,by=list(Gender=gender),sum)
## Gender x
## 1 F 1644
## 2 M 1620
#which sex has more enquiry rate after social media campaign
aggregate(Approved_Conversion,by=list(Gender=gender),sum)
## Gender x
## 1 F 495
## 2 M 584
#which sex has more buying rate after social media campaign
aggregate(Total_Conversion,by=list(Age_Group=age),sum)
## Age_Group x
## 1 30-34 1431
## 2 35-39 626
## 3 40-44 523
## 4 45-49 684
#which Age Group has more enquiry rate after social media campaign
aggregate(Approved_Conversion,by=list(Age_Group=age),sum)
## Age_Group x
## 1 30-34 494
## 2 35-39 207
## 3 40-44 170
## 4 45-49 208
#which Age Group has more buying rate after social media campaign
boxplot(Clicks,main="Clicks the Company gets from all advertisements",horizontal = TRUE,xlab="Clicks",ylab="Company")
boxplot(Spent,main="Amount the Company spends on advertisements",horizontal = TRUE,xlab="Amount",ylab="Company")
boxplot(Spent~age,main="Amount the Company spends on Peculiar Age groups",horizontal = TRUE,xlab="Amount",ylab="Age-Group")
boxplot(Clicks~age,main="Amount the Company gets in CLicks from Peculiar Age groups",horizontal = TRUE,xlab="Clicks",ylab="Company")
boxplot(Total_Conversion~age,main="Amount the Company gets in queries from Peculiar Age groups",horizontal = TRUE,xlab="Query",ylab="Company")
boxplot(Approved_Conversion~age,main="Amount the Company gets in ACTUAL SALES from Peculiar Age groups",horizontal = TRUE,xlab="sales conversion",ylab="Company")
hist(Clicks,breaks=30,col="blue")
hist(Spent,breaks=30,col="gold")
hist(Total_Conversion,col="red")
hist(Approved_Conversion,col="green")
library(lattice)
histogram(~gender | age,type="count",layout=c(4,1),col=c("pink","darkblue"))
library(car)
scatterplot(age~Approved_Conversion,main="Conversion rates according to age")
## Warning in Ops.factor(x[floor(d)], x[ceiling(d)]): '+' not meaningful for
## factors
## Warning in smoother(.x, .y, col = col[2], log.x = logged("x"), log.y =
## logged("y"), : could not fit smooth
## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
scatterplot(Approved_Conversion~Total_Conversion,main="Approved vs Total Conversion Rates")
scatterplot(Approved_Conversion~Clicks,main="Clicks VS Success!!")
scatterplot(Approved_Conversion~Spent,main="Spent VS Success!!")
scatterplot(gender~Approved_Conversion,main="Male=1 Female=2")
## Warning in Ops.factor(x[floor(d)], x[ceiling(d)]): '+' not meaningful for
## factors
## Warning in smoother(.x, .y, col = col[2], log.x = logged("x"), log.y =
## logged("y"), : could not fit smooth
## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors