Marketing Campaign - Customer Exploration

a. Download data from kaggle

The data used in this project was downloaded from kaggle,“Customer Personality Analysis”.

b. Data Cleaning

I noticed that some observations have some mistakes, I will exclude these from the sample. I will also exclude the richest 1%, assuming that the income distribution behaves like a Lognormal, the richest tend to exhibit behavior that is not representative. Considering that they are outliers I will exclude them from this analysis.

richest<-quantile(db$Income, .99, na.rm = TRUE)

db1<-filter(db,
            Education != "2n Cycle",
            Education != "Basic",
            Marital_Status!= "Alone",
            Marital_Status!= "Together",
            Marital_Status!= "Absurd",
            Marital_Status!= "YOLO",
            Income <= richest)

c. Getting to know the data

Let’s get some summary statistics in order to get to know the data and in order to spot some mistakes.

table1<-summarise(db1, 
          income_mean = mean(Income, na.rm = TRUE),
          income_max = max(Income, na.rm = TRUE),
          income_min = min(Income, na.rm = TRUE),
          age_old = min(Year_Birth, na.rm = TRUE),
          age_young = max(Year_Birth, na.rm = TRUE),
          kid_mean = mean(Kidhome, na.rm = TRUE),
          teen_mean = mean(Teenhome, na.rm = TRUE),
          kid_max = max(Kidhome, na.rm = TRUE),
          teen_max = max(Teenhome, na.rm = TRUE),
          kid_min = min(Kidhome, na.rm = TRUE),
          teen_min = min(Teenhome, na.rm = TRUE))

kable(table1)%>%
  kable_styling()

income_mean	income_max	income_min	age_old	age_young	kid_mean	teen_mean	kid_max	teen_max	kid_min	teen_min
52442.8	93790	1730	1940	1995	0.4356298	0.5191371	2	2	0	0

kable(table(db1$Education, db1$Marital_Status))%>%
  kable_styling()

	Divorced	Married	Single	Widow
Graduation	117	428	242	35
Master	37	138	73	11
PhD	52	186	94	24

d. Exploration of correlations - response to campaigns

In order to see who responds to each type of campaign let’s run some regressions. The objective is to see the level of significance of the demographic variables. If we had more information about the speciic campaign, we could have expected results, but given that we have no information about the campaigns, how they work, who is the target, we run general regressions.

Dependent<-select(db1, AcceptedCmp1, AcceptedCmp2, AcceptedCmp3, AcceptedCmp4, AcceptedCmp5, Complain)
models1<-lapply(Dependent, function(x) lm(x ~ Education + Marital_Status + log(Income) + Kidhome + Teenhome, data=db1))
stargazer(models1, title="Models: Response to Campaigns", align=TRUE,  type='html')

**Models: Response to Campaigns**

	Dependent variable:

	x
	(1)	(2)	(3)	(4)	(5)	(6)

EducationMaster	-0.004	-0.006	-0.010	0.016	0.013	-0.005
	(0.017)	(0.007)	(0.019)	(0.019)	(0.017)	(0.007)

EducationPhD	-0.002	0.006	0.00003	0.016	0.018	-0.009
	(0.015)	(0.006)	(0.017)	(0.017)	(0.015)	(0.006)

Marital_StatusMarried	0.017	-0.0002	-0.029	0.014	0.028	0.004
	(0.018)	(0.008)	(0.021)	(0.021)	(0.019)	(0.007)

Marital_StatusSingle	0.002	0.003	-0.020	0.002	0.002	0.008
	(0.020)	(0.009)	(0.023)	(0.022)	(0.020)	(0.008)

Marital_StatusWidow	-0.010	0.001	-0.036	0.055	0.034	-0.002
	(0.032)	(0.014)	(0.037)	(0.036)	(0.033)	(0.013)

log(Income)	0.122^***	0.011^*	-0.004	0.080^***	0.156^***	0.001
	(0.014)	(0.006)	(0.017)	(0.016)	(0.015)	(0.006)

Kidhome	-0.036^***	-0.011^**	0.011	-0.042^***	-0.026^**	0.013^**
	(0.013)	(0.006)	(0.015)	(0.014)	(0.013)	(0.005)

Teenhome	-0.082^***	-0.003	-0.031^**	0.011	-0.115^***	0.004
	(0.011)	(0.005)	(0.013)	(0.013)	(0.012)	(0.005)

Constant	-1.196^***	-0.098	0.151	-0.795^***	-1.568^***	-0.016
	(0.158)	(0.069)	(0.182)	(0.178)	(0.161)	(0.064)


Observations	1,437	1,437	1,437	1,437	1,437	1,437
R²	0.105	0.011	0.007	0.047	0.153	0.008
Adjusted R²	0.100	0.005	0.001	0.041	0.148	0.003
Residual Std. Error (df = 1428)	0.232	0.101	0.267	0.260	0.236	0.095
F Statistic (df = 8; 1428)	20.848^***	1.931^*	1.182	8.718^***	32.272^***	1.507

Note:	p<0.1; p<0.05; p<0.01

People with higher income responded positively to campaigns 1, 4 and 5.
People with kids at home had a negative response to campaigns 1, 4 and 5.
Families with teens at home responded negatively to campaigns 1, 3 and 5.
The education level and matiral status were not significant in any category for any campaign.

e. Exploration of correlations - product preferences

#I need to incorportae the product variables!!!!!!!!!

Products<-select(db1, MntWines, MntFruits, MntMeatProducts, MntSweetProducts, MntGoldProds)
models2<-lapply(Products, function(x) lm(x ~ Education + Marital_Status + log(Income) + Kidhome + Teenhome, data=db1))
stargazer(models2, title="Models: Product Preferences", align=TRUE,  type='html')

**Models: Product Preferences**

	Dependent variable:

	x
	(1)	(2)	(3)	(4)	(5)

EducationMaster	54.368^***	-11.125^***	-14.077	-11.227^***	-9.448^***
	(17.710)	(2.400)	(11.593)	(2.387)	(3.336)

EducationPhD	99.945^***	-11.826^***	-27.860^***	-13.162^***	-21.166^***
	(15.865)	(2.150)	(10.385)	(2.138)	(2.989)

Marital_StatusMarried	-1.496	-2.643	5.796	0.922	-2.797
	(19.518)	(2.645)	(12.777)	(2.631)	(3.677)

Marital_StatusSingle	-14.703	-1.671	20.052	-0.456	-1.447
	(21.305)	(2.888)	(13.947)	(2.872)	(4.014)

Marital_StatusWidow	-26.792	-2.259	-0.343	5.820	5.471
	(34.391)	(4.661)	(22.513)	(4.636)	(6.479)

log(Income)	359.950^***	34.788^***	223.502^***	34.217^***	27.840^***
	(15.439)	(2.092)	(10.107)	(2.081)	(2.909)

Kidhome	-169.291^***	-14.004^***	-91.397^***	-14.763^***	-22.189^***
	(13.742)	(1.863)	(8.996)	(1.852)	(2.589)

Teenhome	-53.317^***	-17.520^***	-134.675^***	-16.105^***	-3.342
	(12.140)	(1.645)	(7.947)	(1.636)	(2.287)

Constant	-3,484.482^***	-326.002^***	-2,128.061^***	-322.541^***	-235.855^***
	(168.945)	(22.897)	(110.595)	(22.772)	(31.829)


Observations	1,437	1,437	1,437	1,437	1,437
R²	0.477	0.320	0.461	0.321	0.195
Adjusted R²	0.474	0.316	0.458	0.317	0.191
Residual Std. Error (df = 1428)	247.835	33.590	162.239	33.406	46.692
F Statistic (df = 8; 1428)	162.777^***	83.946^***	152.927^***	84.469^***	43.325^***

Note:	p<0.1; p<0.05; p<0.01

Having a master is positively correlated with wine consumpetion and negatively correlated with consumption of fruits, sweets and gold products.
Having a PhD is positively correlated with consumption of wine and negatively correlated with consumption of other products.
Marital status is not correlated with consumption of any type of good.
As expected, income is positively correlated with consumption of all the goods (this are normal goods, not inferior).
Surprisingly having kids and teens at home is negatively correlated with consumption of almost all the goods.

f. Exploration of correlations - Purchase type

#I need to incorportae the product variables!!!!!!!!!

Purchases<-select(db1, NumDealsPurchases, NumWebPurchases, NumCatalogPurchases, NumStorePurchases, NumWebVisitsMonth)
models3<-lapply(Purchases, function(x) lm(x ~ Education + Marital_Status + log(Income) + Kidhome + Teenhome, data=db1))
stargazer(models3, title="Models: Purchase type", align=TRUE,  type='html')

**Models: Purchase type**

	Dependent variable:

	x
	(1)	(2)	(3)	(4)	(5)

EducationMaster	-0.030	-0.126	-0.166	0.018	-0.016
	(0.121)	(0.165)	(0.153)	(0.171)	(0.131)

EducationPhD	-0.018	0.097	-0.051	-0.066	0.177
	(0.108)	(0.148)	(0.137)	(0.153)	(0.118)

Marital_StatusMarried	0.026	-0.005	-0.041	0.215	-0.052
	(0.133)	(0.182)	(0.168)	(0.188)	(0.145)

Marital_StatusSingle	-0.028	-0.152	-0.071	0.161	-0.092
	(0.145)	(0.199)	(0.183)	(0.205)	(0.158)

Marital_StatusWidow	0.031	-0.088	0.054	-0.226	-0.048
	(0.235)	(0.321)	(0.296)	(0.332)	(0.255)

log(Income)	-0.385^***	1.726^***	2.599^***	3.499^***	-2.595^***
	(0.105)	(0.144)	(0.133)	(0.149)	(0.114)

Kidhome	0.637^***	-1.217^***	-1.649^***	-1.634^***	0.954^***
	(0.094)	(0.128)	(0.118)	(0.132)	(0.102)

Teenhome	1.477^***	0.745^***	-0.972^***	-0.083	0.900^***
	(0.083)	(0.113)	(0.105)	(0.117)	(0.090)

Constant	5.453^***	-14.256^***	-23.984^***	-31.148^***	32.407^***
	(1.153)	(1.579)	(1.455)	(1.629)	(1.251)


Observations	1,437	1,437	1,437	1,437	1,437
R²	0.223	0.264	0.435	0.465	0.427
Adjusted R²	0.218	0.260	0.432	0.462	0.424
Residual Std. Error (df = 1428)	1.691	2.316	2.134	2.390	1.836
F Statistic (df = 8; 1428)	51.098^***	63.986^***	137.363^***	155.117^***	133.157^***

Note:	p<0.1; p<0.05; p<0.01

Income is significant for all the types of purchases but with different signs. The hugher the income the more likely they are to make purchases through the web catalogue and in store. And less likely to access deals and visit the web.
Having kids at home makes it more likely to buy deals and look at the web.
Having teens is correlated with accepting deals, doing web purchases and visiting the web and negatively correlated with catalog and store purchases.