1. Introduction

From its very advent , social media has had an important impact on all of us. Social media started out as a method for people to either connect or reconnect with each other. At this point, social media has become much more. Business people use marketing to successfully grow their businesses and to get their word out in a tremendous way. The most effective marketing approach is one that uses social media and traditional marketing in tandem. Business owners have figured out that social media marketing (SMM) has a very positive effect on the success of business and it is a method that takes very little money to accomplish a solid end result. The fact is that the clients are hanging out in the online social communitie, and this makes it the breeding ground for comapnies to advertise their products.

2. Overview of the Study

My study concerns facebook advertisements of a certain comapny which only sells online. Having no influence on the retail market, their only means of revenue is generated via users making an approved conversion after looking at the product. Here, the interest field in the data is only not about the hard sell. It is about building relationships with others with the similar interests who will eventually become future customers. It is all about people and solving their problems. The more impressions the company can make on the people people, the more they will begin to trust it, believe in their credibility, want to do business with them, and ultimately become loyal customers.Thus, tyhe frequency of these advertisements are of great importance.

3. An empirical field study of the company’s facebook advertisement campaign

3.1 Overview

The specific objective of this Study was to investigate the advertising strategy employed by the company in areas if diverse interests, and to record the type of response they receive in the form of clicks. Our goal was to compare the money spent on these campaigns, the clicks received on their advertisements, and the response of the customers in the form of actually ordering the products, which determine the efficacy of each advertisement on facebook.

In this Study, I have also categorised the market diversification via numeric representation of an estimate of the people who show interest in buying these products.

Accordingly, we construct the following hypothesis:

Hypothesis H1: The Sales of any product increase when the click rate, frequency of advertisements and Money Spent for that product increases from the customers who express interest in that field.

3.2 Data

For this study, I have collected data from the comapny sales statistics. (https://www.kaggle.com/loveall/clicks-conversion-tracking). The file conversion_data.csv contains 1143 observations in 11 variables. Below are the descriptions of the variables.

1.) ad_id: an unique ID for each ad.

2.) xyz_campaign_id: an ID associated with each ad campaign of XYZ company.

3.) fb_campaign_id: an ID associated with how Facebook tracks each campaign.

4.) age: age of the person to whom the ad is shown.

5.) gender: gender of the person to whim the add is shown

6.) interest: a code specifying the category to which the person’s interest belongs (interests are as mentioned in the person’s Facebook public profile).

7.) Impressions: the number of times the ad was shown.

8.) Clicks: number of clicks on for that ad.

9.) Spent: Amount paid by company xyz to Facebook, to show that ad.

10.) Total conversion: Total number of people who enquired about the product after seeing the ad.

11.) Approved conversion: Total number of people who bought the product after seeing the ad.

3.3 Model

In order to test Hypothesis 1a, we proposed the following model:

\[Approved Conversion= \ Impressions+ \ Spent + \ interest + \ Clicks + \ Total Conversion + \epsilon\]

Importing the dataset and initial analysis:

setwd("C:/Users/Ayush/Desktop/IIM LUCKNOW INTERNSHIP/CSV files")
sales.df=read.csv(paste("sales.csv",sep=""))
attach(sales.df)
str(sales.df)
## 'data.frame':    1143 obs. of  11 variables:
##  $ ad_id              : int  708746 708749 708771 708815 708818 708820 708889 708895 708953 708958 ...
##  $ xyz_campaign_id    : int  916 916 916 916 916 916 916 916 916 916 ...
##  $ fb_campaign_id     : int  103916 103917 103920 103928 103928 103929 103940 103941 103951 103952 ...
##  $ age                : Factor w/ 4 levels "30-34","35-39",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ gender             : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
##  $ interest           : int  15 16 20 28 28 29 15 16 27 28 ...
##  $ Impressions        : int  7350 17861 693 4259 4133 1915 15615 10951 2355 9502 ...
##  $ Clicks             : int  1 2 0 1 1 0 3 1 1 3 ...
##  $ Spent              : num  1.43 1.82 0 1.25 1.29 ...
##  $ Total_Conversion   : int  2 2 1 1 1 1 1 1 1 1 ...
##  $ Approved_Conversion: int  1 0 0 0 1 1 0 1 0 0 ...
dim(sales.df)
## [1] 1143   11
colnames(sales.df)
##  [1] "ad_id"               "xyz_campaign_id"     "fb_campaign_id"     
##  [4] "age"                 "gender"              "interest"           
##  [7] "Impressions"         "Clicks"              "Spent"              
## [10] "Total_Conversion"    "Approved_Conversion"
summary(sales.df)
##      ad_id         xyz_campaign_id fb_campaign_id      age      gender 
##  Min.   : 708746   Min.   : 916    Min.   :103916   30-34:426   F:551  
##  1st Qu.: 777633   1st Qu.: 936    1st Qu.:115716   35-39:248   M:592  
##  Median :1121185   Median :1178    Median :144549   40-44:210          
##  Mean   : 987261   Mean   :1067    Mean   :133784   45-49:259          
##  3rd Qu.:1121805   3rd Qu.:1178    3rd Qu.:144658                      
##  Max.   :1314415   Max.   :1178    Max.   :179982                      
##     interest       Impressions          Clicks           Spent       
##  Min.   :  2.00   Min.   :     87   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.: 16.00   1st Qu.:   6504   1st Qu.:  1.00   1st Qu.:  1.48  
##  Median : 25.00   Median :  51509   Median :  8.00   Median : 12.37  
##  Mean   : 32.77   Mean   : 186732   Mean   : 33.39   Mean   : 51.36  
##  3rd Qu.: 31.00   3rd Qu.: 221769   3rd Qu.: 37.50   3rd Qu.: 60.02  
##  Max.   :114.00   Max.   :3052003   Max.   :421.00   Max.   :639.95  
##  Total_Conversion Approved_Conversion
##  Min.   : 0.000   Min.   : 0.000     
##  1st Qu.: 1.000   1st Qu.: 0.000     
##  Median : 1.000   Median : 1.000     
##  Mean   : 2.856   Mean   : 0.944     
##  3rd Qu.: 3.000   3rd Qu.: 1.000     
##  Max.   :60.000   Max.   :21.000
View(sales.df)

Correlation Matrix and Corrgram:

columns=sales.df[,c("interest","Impressions","Clicks","Spent","Total_Conversion","Approved_Conversion")]

n=cor(columns)

library(corrplot)
## Warning: package 'corrplot' was built under R version 3.3.3
## corrplot 0.84 loaded
corrplot(n,method="circle")

r=n
round(r,2)
##                     interest Impressions Clicks Spent Total_Conversion
## interest                1.00        0.10   0.09  0.07             0.12
## Impressions             0.10        1.00   0.95  0.97             0.81
## Clicks                  0.09        0.95   1.00  0.99             0.69
## Spent                   0.07        0.97   0.99  1.00             0.73
## Total_Conversion        0.12        0.81   0.69  0.73             1.00
## Approved_Conversion     0.06        0.68   0.56  0.59             0.86
##                     Approved_Conversion
## interest                           0.06
## Impressions                        0.68
## Clicks                             0.56
## Spent                              0.59
## Total_Conversion                   0.86
## Approved_Conversion                1.00
library(corrgram)
## Warning: package 'corrgram' was built under R version 3.3.3

corrgram(columns,upper.panel=panel.pie)

ScatterPlot:

library(car)
## Warning: package 'car' was built under R version 3.3.3
scatterplotMatrix(~Approved_Conversion+Total_Conversion+Spent+Clicks,main="Approved Sales versus other factors")

scatterplotMatrix(~Approved_Conversion+Total_Conversion+Impressions+interest,main="Approved Sales versus Impressions and Interest factors")

Hypothesis and t-tests

Null hypothesis: Number of clicks did not affect the Approved COnversion

Alternate Hypothesis: The clicks affect the Approved Hypothesis DIRECTLY.

cor(Clicks,Approved_Conversion)
## [1] 0.5595258
t.test(Clicks, Approved_Conversion)
## 
##  Welch Two Sample t-test
## 
## data:  Clicks and Approved_Conversion
## t = 19.272, df = 1144.1, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  29.14294 35.74945
## sample estimates:
## mean of x mean of y 
## 33.390201  0.944007

Therefore Null hypothesis is rejected, The clicks affect the Approved Hypothesis DIRECTLY.

Null hypothesis: Amount of money Invested did not affect the Approved COnversion

Alternate Hypothesis: The Spent amount affect the Approved Hypothesis DIRECTLY.

cor(Spent,Approved_Conversion)
## [1] 0.5931778
t.test(Spent, Approved_Conversion)
## 
##  Welch Two Sample t-test
## 
## data:  Spent and Approved_Conversion
## t = 19.609, df = 1142.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  45.37197 55.46133
## sample estimates:
## mean of x mean of y 
## 51.360656  0.944007

Therefore Null hypothesis is rejected, The Amount spent is a good investment as it affect the Approved Hypothesis DIRECTLY.

Null hypothesis: Frequency of Advertisement occuring did not affect the Approved COnversion

Alternate Hypothesis:Frequency of Advertisement occuring affects the Approved Hypothesis DIRECTLY.

cor(Impressions,Approved_Conversion)
## [1] 0.6842485
t.test(Impressions, Approved_Conversion)
## 
##  Welch Two Sample t-test
## 
## data:  Impressions and Approved_Conversion
## t = 20.185, df = 1142, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  168580.2 204882.2
## sample estimates:
##    mean of x    mean of y 
## 1.867321e+05 9.440070e-01

Therefore Null hypothesis is rejected, Sales increase with the frequency of advertisements.

Linear Regression Model

library(corrplot)
columns=sales.df[,c("interest","Impressions","Clicks","Spent","Total_Conversion","Approved_Conversion")]
corrplot(cor(columns),method="circle")

Here, y= Approved_Conversion -> To see the dependancy that variables hold on the final sales

                      1st Model (All Variables): 
model1=Approved_Conversion ~ Impressions+Spent+interest+Clicks+Total_Conversion
  
fit1=lm(model1,data=sales.df )
summary(fit1)
## 
## Call:
## lm(formula = model1, data = sales.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7621 -0.3926 -0.2524  0.5997  6.5563 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1.103e-01  4.358e-02   2.532  0.01149 *  
## Impressions       1.287e-06  5.105e-07   2.521  0.01183 *  
## Spent            -5.751e-04  4.054e-03  -0.142  0.88723    
## interest         -3.001e-03  9.980e-04  -3.007  0.00269 ** 
## Clicks           -6.805e-03  4.608e-03  -1.477  0.14008    
## Total_Conversion  3.321e-01  1.096e-02  30.298  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8629 on 1137 degrees of freedom
## Multiple R-squared:  0.7545, Adjusted R-squared:  0.7534 
## F-statistic: 698.9 on 5 and 1137 DF,  p-value: < 2.2e-16

Accordingly, we chose Interest, IMpressions and Total_conversion as the variables for our Regression Model.

              Model 2:
model2=Approved_Conversion ~ Impressions+interest+Total_Conversion
  
fit2=lm(model2,data=sales.df )
summary(fit2)
## 
## Call:
## lm(formula = model2, data = sales.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0739 -0.3849 -0.2497  0.6098  7.4258 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       8.606e-02  4.263e-02   2.019  0.04375 *  
## Impressions      -2.927e-07  1.414e-07  -2.070  0.03869 *  
## interest         -2.966e-03  9.630e-04  -3.080  0.00212 ** 
## Total_Conversion  3.536e-01  9.886e-03  35.769  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8707 on 1139 degrees of freedom
## Multiple R-squared:  0.7496, Adjusted R-squared:  0.7489 
## F-statistic:  1137 on 3 and 1139 DF,  p-value: < 2.2e-16

Visualising the Beta Coefficients and Their Confidence Intervals from model 2:

library(coefplot)
## Warning: package 'coefplot' was built under R version 3.3.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.3.3
coefplot(fit2,intercept=FALSE,outerCI=1.96,coefficients=c("Impressions","interest","Total_Conversion"))
## Warning: Ignoring unknown aesthetics: xmin, xmax

summary(fit1)$adj.r.squared
## [1] 0.7534337
summary(fit2)$adj.r.squared
## [1] 0.7489433
AIC(fit1)
## [1] 2914.504
AIC(fit2)
## [1] 2933.141

3.4 Results

Therefore the hypothesis that Approved Conversion or the final consumption depends directly on the:

  1. Number of Clicks received by the Social Media Advertisement
  2. Frequency of the Advertisement being Displayedin front of users
  3. Interest specified by users
  4. Amount of Queries received by the companies.
  5. Amount fof Money Spent by the companies towards the advertisement campaign

Which determines that the social media advertising is a good investment, if the impression is made on the correct interest/niche of audience.

Model 1 fits better than Model 2 due to lesser AIC value.

Best Fit: y=Approved_Conversion ~ Impressions+Spent+interest+Clicks+Total_Conversion

4. Conclusion

This paper was motivated by the need for research that could improve my understanding of how social media advertising influences thesales of products in the online shopping industry. The unique contribution of this paper is that I investigated that the Social Median Advertising is a group of operations and methods used to generate publicity through social media channels and Internet communities. Social media advertising is the planning and executing of advertising campaigns through those channels. The reason that the face of marketing is changing so drastically is that the marketers understand that they need to go wherever the clients are. The fact is that the clients are hanging out in the online social communities.

5. References

https://www.compukol.com/the-impact-of-social-media-on-advertising/

https://www.kaggle.com/loveall/clicks-conversion-tracking

https://www.ukessays.com/essays/media/social-media-advertising-becoming-central-to-marketing-media-essay.php

Appendix 1

Descriptive statistics

One way and two way contingency variables:

table(gender) # target audience sex.
## gender
##   F   M 
## 551 592
table(age) #knowing which age group gets most influenced by social media ads.
## age
## 30-34 35-39 40-44 45-49 
##   426   248   210   259
table(xyz_campaign_id)
## xyz_campaign_id
##  916  936 1178 
##   54  464  625
table(interest) # what interest niche works most via social media campaigns
## interest
##   2   7  10  15  16  18  19  20  21  22  23  24  25  26  27  28  29  30 
##  25  24  85  51 140  43  32  49  36  33  23  24  26  41  60  51  77  25 
##  31  32  36  63  64  65  66 100 101 102 103 104 105 106 107 108 109 110 
##  25  33  21  46  48  19  11   6   7   7   5   5   7   5   8   7   6   8 
## 111 112 113 114 
##   6   7   6   5
xtabs(~gender+xyz_campaign_id)
##       xyz_campaign_id
## gender 916 936 1178
##      F  19 256  276
##      M  35 208  349
xtabs(~gender+interest)#which range of products satisfy which gender more
##       interest
## gender  2  7 10 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 36 63
##      F  8 10 45 21 68 24 15 18 18 18  8 10 16 23 37 21 32 12 11 15  3 28
##      M 17 14 40 30 72 19 17 31 18 15 15 14 10 18 23 30 45 13 14 18 18 18
##       interest
## gender 64 65 66 100 101 102 103 104 105 106 107 108 109 110 111 112 113
##      F 27  9  2   3   4   4   3   3   3   4   4   4   4   4   4   3   2
##      M 21 10  9   3   3   3   2   2   4   1   4   3   2   4   2   4   4
##       interest
## gender 114
##      F   3
##      M   2
aggregate(Total_Conversion,by=list(Gender=gender),sum)
##   Gender    x
## 1      F 1644
## 2      M 1620
#which sex has more enquiry rate after social media campaign
aggregate(Approved_Conversion,by=list(Gender=gender),sum)
##   Gender   x
## 1      F 495
## 2      M 584
#which sex has more buying rate after social media campaign

aggregate(Total_Conversion,by=list(Age_Group=age),sum)
##   Age_Group    x
## 1     30-34 1431
## 2     35-39  626
## 3     40-44  523
## 4     45-49  684
#which Age Group has more enquiry rate after social media campaign
aggregate(Approved_Conversion,by=list(Age_Group=age),sum)
##   Age_Group   x
## 1     30-34 494
## 2     35-39 207
## 3     40-44 170
## 4     45-49 208
#which Age Group has more buying rate after social media campaign

Boxplots for the variables:

boxplot(Clicks,main="Clicks the Company gets from all advertisements",horizontal = TRUE,xlab="Clicks",ylab="Company")

boxplot(Spent,main="Amount the Company spends on advertisements",horizontal = TRUE,xlab="Amount",ylab="Company")

boxplot(Spent~age,main="Amount the Company spends on Peculiar Age groups",horizontal = TRUE,xlab="Amount",ylab="Age-Group")

boxplot(Clicks~age,main="Amount the Company gets in CLicks from Peculiar Age groups",horizontal = TRUE,xlab="Clicks",ylab="Company")

boxplot(Total_Conversion~age,main="Amount the Company gets in queries from Peculiar Age groups",horizontal = TRUE,xlab="Query",ylab="Company")

boxplot(Approved_Conversion~age,main="Amount the Company gets in ACTUAL SALES from Peculiar Age groups",horizontal = TRUE,xlab="sales conversion",ylab="Company")

Histogram for the Variables(Mostly to view the relations between attributes):

hist(Clicks,breaks=30,col="blue")

hist(Spent,breaks=30,col="gold")

hist(Total_Conversion,col="red")

hist(Approved_Conversion,col="green")

library(lattice)

histogram(~gender | age,type="count",layout=c(4,1),col=c("pink","darkblue"))

ScatterPlots:

library(car)
scatterplot(age~Approved_Conversion,main="Conversion rates according to age")
## Warning in Ops.factor(x[floor(d)], x[ceiling(d)]): '+' not meaningful for
## factors
## Warning in smoother(.x, .y, col = col[2], log.x = logged("x"), log.y =
## logged("y"), : could not fit smooth
## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors

scatterplot(Approved_Conversion~Total_Conversion,main="Approved vs Total Conversion Rates")

scatterplot(Approved_Conversion~Clicks,main="Clicks VS Success!!")

scatterplot(Approved_Conversion~Spent,main="Spent VS Success!!")

scatterplot(gender~Approved_Conversion,main="Male=1 Female=2")
## Warning in Ops.factor(x[floor(d)], x[ceiling(d)]): '+' not meaningful for
## factors
## Warning in smoother(.x, .y, col = col[2], log.x = logged("x"), log.y =
## logged("y"), : could not fit smooth
## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors