Research design on the profitability of Virtual Try-on Technology on YSL’s lipstick sales

Part 1: Research Proposal

Executive Summary / Abstract

Many cosmetic companies today have introduced Virtual Try-on technology (VTO) on their online shopping websites. VTO can provide a better shopping experience and help customers find the right product according to relevant studies. However, for companies, whether VTO can improve the sales revenue remains unclear.

In this paper, we designed an experimental research to anwser three questions:

Compared to on-line shopping without VTO, is mean sales of lipsticks higher in on-line shopping with VTO?
Compared to the difference in mean sales of lipsticks between shopping with and without VTO in customers older than 30, is the difference higher in customers younger than 30?
Compared to the difference in mean sales of lipsticks between shopping with and without VTO in best-seller products, is the difference higher in miscellaneous products?

Our experiment is based on the lipsticks and customers of YSL company. We randomly select 2000 customers and assign them equally into two groups. In the control group, customers view 20 lipsticks and shop online without VTO. In the treatment group, customers should use VTO before making buying decisions for all 20 lipsticks. In our experiment, we balance the age and race of selected customers, and the color and popularity of lipsticks according to YSL’s customer segmentation and product list.

The result of our experiment is analyzed by the mean sales of two groups. To answer the first question, we perform a two-sample t-test to see whether the mean sales of treatment group is significantly higher than the control group. To answer the second and third question, we perform ANOVA test and Tukey’s HSD method to see whether the difference in mean sales of lipsticks between shopping with and without VTO is significantly different between two groups.

We hope our research can provide cosmetic companies with a unique perspective on VTO’s profitability and usage.

Statement of the Problem

The fast development of e-commerce has boosted various technologies to improve customer experience in online shopping. Virtual try-on (VTO) is one of the most popular technologies used by cosmetic companies to help customers decide on the right product for them. For example, YSL has placed VTO in an obvious position on its website, see Figure 1. By uploading customers’ images to the VTO system, VTO uses augmented reality (AR) to let them try on the product, e.g., lipstick, online and in real time. The system will then render the source image with makeup estimation, see Figure2. It is applicable in multiple product ranges and customer appearances.

For customers, VTO helps them decide the suitable lipstick and creates a better kind of online shopping experience. For companies, however, it remains a question on whether the investment on this fancy technology is worthwhile. Stakeholders may ask: will VTO increase the sales? and if yes, by how much and what product will benefit from it? These managerial questions can be answered through a research method.

Research Questions, Hypotheses, and Effects

This paper aims to answer three research questions with hypothesis listed below:

1. Compared to on-line shopping without VTO, is mean sales of lipsticks higher in on-line shopping with VTO

Null Hypo: Compared to on-line shopping without VTO, the mean sales of lipsticks are smaller in on-line shopping with VTO.

Alternative Hypo: Compared to on-line shopping without VTO, the mean sales of lipsticks are higher in on-line shopping with VTO.

Meaningful effect size: 10% higher in the mean sales of lipsticks with VTO than without VTO.

2. Compared to the difference in mean sales of lipsticks between shopping with and without VTO in customers older than 30, is the difference higher in customers younger than 30?

Null Hypo: Compared to the difference in mean sales of lipsticks between shopping with and without VTO in customers older than 30, the difference is smaller in customers younger than 30.

Alternative Hypo: Compared to the difference in mean sales of lipsticks between shopping with and without VTO in customers older than 30, the difference is higher in customers younger than 30.

Meaningful effect size: 10% higher in the difference in mean sales between with and without VTO in customers younger than 30 than the difference in customers older than 30.

3. In comparing the difference of the mean sales of lipsticks between shopping with VTO and without, the difference of the mean sales is higher for the category of miscellaneous lipsticks rather than for the bestseller ones?

Null Hypo: Compared to the changes in mean sales of bestseller category lipsticks by using VTO, the change in mean sales of the miscellaneous category lipsticks by using VTO is not larger.

Alternative Hypo: Compared to the changes in mean sales of bestseller category lipsticks by using VTO, the change in mean sales of the miscellaneous category lipsticks by using VTO is larger.

Meaningful effect size: 10% higher in the difference in mean sales between with and without VTO in miscellaneous products than the difference in best-seller products.

Literature Review

Despite the wide application of VTO by cosmetic leaders, controversies remain in the VTO’s effects on product sales and in what circumstances these effects differ. Earlier studies (Bialkova, 2022) mostly focus on try-on experience, categorized into words such as ‘Satisfaction’, ‘Immersion’, and ‘Interactivity’, which are vulnerable to inconsistency as subjective feelings greatly differ by individuals. But few studies have looked into VTO’s impacts on sales, a big concern shared by cosmetic companies but neglected by scholars. Additionally, when considering what may change the effects of VTO, most studies focus on technology and system design. For example, Atieh (2017) concluded that better aesthetic quality and estimation accuracy of AR leads to a higher level of enjoyment during the shopping experience. Also, a study by Bigne (2021) stated that a system design that provides interactivity, realism, and ease of use will positively influence the virtual try-on experience. However, besides technology, customers and products also greatly influence VTO’s effects. Take the example of lipstick. Some colors naturally don’t fit certain racial groups for cultural and aesthetic reasons (Collins, 2021), a fact cannot be changed by VTO. In this case, it is assumed that VTO will not help the sales of these products but is more likely to discourage customers. This effect gap among different customers and products was neglected by earlier studies but can help cosmetic companies make the best use of VTO by targeting the right customers and products. Considering the limitations of current research, it is necessary to conduct further studies into the effects of VTO.

Research Plan

Population of Interest

The population of interest is customers who have purchased YSL’s lipsticks and do not have visual try-on experiences before the study.

Sample Selection

The research aims to recruit 2000 customers who have purchased YSL’s lipsticks and do not have visual try-on experiences before to participate in our experiment. We cooperate with the YSL sales team. Recruitment materials invite customers who intend to buy lipsticks to take a survey before they pay, and then use the data of the survey to randomly select 2000 people as a sample. Our study is designed to find out whether Virtual Try-On (VTO) technology helps increase the sales of lipsticks and improve customers’ purchasing intention, so we place a high value on participants’ potential interest in lipsticks. To include all target audiences who need lipsticks and rule out bias, we will not oversample those who have higher purchase prevalence, such as the women of age group between 20 and 40.

We look into recruited customers’ profile and their purchase history (Table 1), seeking to answer if the sales growth or decline after the VTO application is differed by color preference of lipsticks. Product preference is also included to shed light on the potential VTO application on other product categories such as blusher. Age group gives insight into customers’ perception on VTO, as in if a certain age group finds VTO helpful and accurate or not.

Table 1. Socio-demographic Characteristics of Sample

Demographic characteristic	Sample
N	2000
Female, %	85%
Other (including male, other, prefer not to tell), %	15%
Average age (years)	28
Age group, %
Below 20	15
20-30	35
30-40	33
40-50	13
50-60	3
Above 60	1
Race, %
White	57.8
Hispanic and Latino	18.7
African American	12.1
Asian	5.9
Native American	0.7
Hawaiian	0.2
Other	4.6

Relationship status, %
Married or engaged	16
In a relationship	31
Single (Never married)	44
Other (Including divorced, separated, and widowed, etc)	9
Customer profile: purchase history
Lipsticks purchase preference, % (Who have bought one group more than others)
Best-seller	31
Reddish orange	20
Pink/rosy	22
Nude	18
Brunet	7
Others (have only purchased one lipstick & have bought each group evenly)	2
Product preference, %
Have only bought lipsticks	59
Have bought other products	36
Never buy at YSL’s official site	5

Operational Procedures

1）For research question 1 and research question 2

Participants will be randomly divided into two groups: group A and group B. Group A will be the experimental group where participants will use the Visual Try-On to help them do the lipstick shopping, while group B will be the control group where participants will only be presented with the photos of lipsticks with color and lipstick numbers. 20 lipsticks of different shades are involved in the experiment, see Table2. Specifically, we select two of YSL’s primarily popularized series in consideration of the different textures of the lipsticks: VINYL CREAM LIP STAIN and ROUGE VOLUPTÉ SHINE LIPSTICK BALM. Either series, composed of 10 Lipsticks, has 5 subcategories. One is the best-seller group since we assumed that VTO technology has a negligible effect on customers’ purchasing intention and behavior. The other group is a miscellaneous group based on lipsticks color attribution: Reddish orange, Pink/Rosy, Nude, and Brunet. Each color category has two lipsticks to reduce contingency on a single choice within every single group.

Table 2. 20 lipsticks in different shades

Ca t egories	B es t -seller Group	Misce l laneous Groups	Misce l laneous Groups	Misce l laneous Groups	Misce l laneous Groups
		reddish orange	p i nk/rosy	nude	brunet
VINYL CREAM LIP STAIN	407	406	405	404	401
	416	408	410	434	409
ROUGE VOLUPTÉ SHINE L IPSTICK BALM	80	14	12	44	122
	83	46	84	150	131

1）For research question 3

Participants will also be firstly randomly divided into two groups: group A and group B. A will be the experimental group where participants will use the Visual Try-On to help them do the lipstick shopping, while group B will be the control group where participants will only be presented with the photos of lipsticks with color and lipstick numbers. One thing different from the design for question 1 and question 2 is that we will further divide group A and group B into group a1, group a2; and group b1, group b2. Group a1 and group b1 will only be presented with four lipsticks with color in the best seller group, while group a2 and group b2 will only be presented with four lipsticks with color in the miscellaneous group. The dependent variable will be the total number of lipsticks each participant buys. In this way, we can use two-way anova to test whether the effect would be different on different lipstyle groups

Brief Schedule

In Oct 21 st 2022, Our team cooperates with YSL’ sales team to quickly and precisely reduce the scope to find out appropriate population of interest, what follows is the random sampling of 2000 customers for our research studies. After sampling, we use questionnaire survey to gather information so as to assure the exact experiment day. It’s designed to take 15 days to finish the preparation stage. Research implementation begins immediately after preparation, after our professional page designers providing secure and exclusive interface, experiment is conducted and is finished by Nov 25 th. Since we have three research questions with analysis methods of one t-test and two two-way ANOVA tests, the data analysis procedure lasts for 5 days. In Nov 30 th, with all experiments and simulations done, our team spends 5 days on integrating results and writing report to better illustrate how Virtual Try- on Technology affects YSL’s lipstick sales in a combination of descriptive and quantitative scale.

Table3 Time Table

Procedures	Start date	Duration days	End date
Preparation	2022.10.21	15 days	2022.11.05
Research Implementation	2022.11.05	20 days	2022.11.25
Data Analysis	2022.11.25	5days	2022.11.30
Experiment Report	2022.11.30	5days	2 022.12.5*

Data Collection

The study will be conducted online, and participants can use their smartphones or laptops with cameras to do the Visual Try-On. Members of each group will be asked to decide whether they’re going to buy any lipsticks and if want, which color they would like to buy. Apart from these two questions, they will also be surveyed about other experiment experiences, and table 2 will demonstrate the detailed factors we will analyze.

Data Security

At first, we coordinate with the organization of YSL to recruit voluntary customers who will engage in our study. In this process, YSL will use its server and equipment to store all of data about customers. As a famous fashion company, YSL spend a lot protecting customers’ data from leaking and so it could ensure data security. Next, researchers should be trained to pay attention to data security before performing the study. The training tutorial should include three parts. (1) improve researcher’s awareness about data security by talking about the meaning of data security. (2) describe the scenario where researchers might be involved in hurting data security. (3) tell them the severe consequences if they do not obey the requirements about data security. Most importantly, in the process of study, researchers should be given the limited permissions to access the data they need. Also, they should adopt secret methods to transfer data, such as using encrypted U disk, transferring encrypted documents via emails, etc. After we finished the study, all the results of study about customers’ data will also be stored in YSL’s server.

Variables

Outcomes (dependent variables)

Outcomes (dependent variables) are the quantities (or sales) of lipsticks every participate purchased at the end of the study. Researchers will show 20 lipsticks one by one to participants and these participants will decide to whether to purchase these lipsticks one by one. In this way, the study will generate a binary outcome (0 represents not buy/1 represents buy) of every lipstick for every subject. After adding up the binary outcome of all the lipsticks, researchers will know the total number of lipsticks every subject purchased. Every lipstick is sold 40$, so it is easy to calculate lipsticks.

Yi-En Tseng and 20%, Zijian Li and 20%, Yuxuan Zhang and 60%

Treatments (Independent Variables)

Treatments (Independent Variables) are the variables about whether participate purchase lipsticks by virtual trying on. VTO group include those who purchase lipsticks after virtual trying on while non-VTO group include those who purchase lipsticks after checking the pictures. We supposed customers will purchase more lipsticks with VTO because they are more likely to be confident to buy the lipsticks that fit them. However, another possibility is that customers will purchase fewer lipstick by VTO because they are less likely to Impulsively buying online. So, we need to divide participant into control group(non-VTO) and treatment group (VTO) to research how VTO effects sales of lipsticks.

Yalan Liu and 50%, Wenlu Guo and 50%

other variable

The first other variable is age. Those who are younger than 30 years old are be considered as young people while those who are older than 30 years old are be considered as non-young people.

The second other variable is lipstick style. There are two different types of lipstick—best seller group and miscellaneous group. Best seller group includes 4 lipsticks and miscellaneous group includes 16 lipsticks.

Yalan Liu and 50%, Wenlu Guo and 50%

Statistical Analysis Plan

Question 1

In the research question1, we compare the mean sales of treatment group (with VTO) with control group (without VTO). The sales from one individual customer is composed by his or her buying choice on the 20 lipsticks. The sales from one group is composed by the sum of sales of all individual customers. To analyze the difference in mean sales, we perform a two-sample t-test because both our dependent variable and independent variables are numeric variable.

We aim to reject the null hypothesis that the mean sales of treatment group is smaller than or equal as the control group. We set the p-value for rejection as 0.05 and a minimum effect size to be considered meaningful as 10% higher in the mean sales of treatment group than control group.

Question 2

In the research question2, we try to compare the difference in mean sales of lipsticks between shopping with and without VTO in customers older than 30 and the difference younger than 30. According to the combination of VTO and age, we divide customers into four groups: group A (young people and VTO) , group B (young people and Non VTO) , group C (Non young people and VTO) , group D(Non young people and Non VTO). Young people are those whose age are below 30 years old while non young people are those whose age are above 30 years old. To solve the problem, we decided to use the two-way ANOVA model. This test statistic is based on the ratio of the mean sum of squares between groups and the mean sum of squares within each group. If the ratio of the mean sum of squares between groups is significantly different from the mean sum of squares within each group, we can infer there are significant differences between the four groups. By looking at the grouped means, we could make the decisions about our question.

Question 3

Research question 3 states that in comparing the difference of the mean sales of lipsticks between shopping with VTO and without, the difference of the mean sales is higher for the category of miscellaneous lipsticks rather than for the bestseller ones. In this case, since there are two independent categorical variables - with VTO/without VTO and Bestseller/ Miscellaneous (lipsticks) - and the numeric dependent variable being the mean sales of lipsticks, a two-way ANOVA method is chosen for conducting the analysis. This specific method would not only allow separate analysis of the effect for each of the two groups of independent variables on the dependent variable, but would also allow a comparison of their combination effect.

To have a better and clear understanding of the research question, we change the variable-with VTO/without VTO- to VTO/ non-VTO, and also change Bestseller/Miscellaneous (lipsticks) to 1/0 to make a binary independent variable so as to conduct further analysis.

Sample Size and Statistical Power

Our sample size is 2000 customers. This size is large enough for us to balance the customer profile difference that may affect customers’ buying behavior and usage of VTO, such as race, age and purchasing experience. Additionally, with this large sample size, we can have a favorable statistical power (over 90%) for identifying the effect.

We assume cosmetic companies can freely choose the sample size with their own online shopping records. Considering the cost involved in conducting the experiment, here we propose the minimum sample size to detect the effect with over 85% power. For the first question, if we want to identify a medium effect (0.5), we need at least 60 customers in each sample. If we want to identify a small effect (0.2), we need at least 360 customers in each sample. For the second and third question, for a medium effect we need at least 20 customers in each group (40 in total), and for small effect we need at least 114 customers in each group (228 in total).

Possible Recommendations

question 1

If we can reject the null hypothesis and detect a meaningful effect in improving sales by VTO, we will recommend the introduction of VTO by cosmetic companies. Cosmetic companies can increase their sales by encouraging customers to use VTO before making their buying decision. However, if we fail to reject the null hypothesis, we will believe VTO has no effect in improving sales and thus the profitability of this technology should be reconsidered.

question2

The result of the two-way anova test shows that, compared to the difference in mean sales of lipsticks between shopping with and without VTO for customers older than 30 years old, the difference is higher for customers younger than 30 years old. It implies that younger customers are affected more than older customers by VTO. That said, the null hypothesis, which was that the difference in mean sales of lipsticks between shopping with and without VTO for customers younger than 30 is smaller than customers older than 30, was rejected.

When it’s significant that VTO has a positive effect on customers younger than 30 years old, the finding can be a decisive influence on decision-making process for from marketing department of YSL to management class.A useful interpretation might be when it comes to buying lipsticks, younger customers consider VTO helpful in finding out if the lipsticks fit them, and so the sales increased accordingly. It’s hard to select customers to show VTO or not, since we don’t have the customer profiles of new customers. However, there’s a few key takeaways. The younger customers tend to be more interactive and acceptive of new changes in online shopping process, which makes room for future applications of marketing products. Possible business recommendations can be made through this finding. For example, interactive game or pop-up buttons for marketing purposes may interest younger customers, and reach the company’s target impression of certain content. On the other hand, to some extent, it connotes the factor of difference between online shopping and shopping brick-and-mortar, since the outcome of sales difference is triggered by online shopping behavior of customers from different ages. The inclination of online shopping and shopping brick-and-mortar for customers is worth looking into. In this case, for online shopping, younger customers are more likely to buy with VTO, however, things may go a different way when it happens in a mall. Younger customers may not care for paper coupons as older customers do. The company needs to know where its sales come from, why are the sales made, and whom make the purchase through what channel.

However, if the null hypothesis is not rejected, which infers that sales of younger customers are not affected by VTO technology more than older customers. Failure to reject null hypothesis would overturn our perception of the customer’s purchasing behavior, which would be worth digging into for a more in depth research on the interaction of factors contributing to this result. We would seek to answer (1) If there’s other confounding factor that makes younger customers consider VTO not helpful, and (2) what instead of VTO that reassures younger customers that the colors of lipstick fit them, for making further recommendations.

question3

Reiterating our null hypothesis, which was that VTO exerts the same effect on the mean sales of both the miscellaneous lipsticks and the bestseller ones, we can now confirm, based on the results of our analysis, that we are rejecting it. Indeed, from the results of our two-way ANOVA test, we have obtained statistical significant results confirming that the usage of VTO has a higher influence on the mean sales of miscellaneous lipsticks. Detailedly, since the independent variable- either group or lipstyle has a p-value of less than 0.05, and simultaneously, the combination of the two also has a significant p-value of 0.0079, it can be concluded that H1 hypothesis- in comparing the difference of the mean sales of lipsticks between shopping with VTO and without, the difference of the mean sales is higher for the category of miscellaneous lipsticks rather than for the bestseller ones-is true under this scenario.

Besides, apart from the conclusion we made, we want to use Tukey’s Honest Significant Difference test to further figure out the difference between VTO’s effect on bestseller group and miscellaneous group in a quantifiable scale. The data showcases that VTO has increases the mean sales of bestseller lipsticks by 0.21, while the mean sales of miscellaneous lipsticks has been improved by 0.45. The difference gap between these two improvements is 0.24. By contrast, the difference of mean sales between bestseller and miscellaneous products is 0.896, which is indicated by Non-VTO:1-Non-VTO:0. The above-mentioned illustration further prove our research question’s alternative hypothesis and thus, reject the null hypothesis.

Limitations and Uncertainties

There are two major limitations. Firstly, we only have one dependent variable quantities (or sales) in our research. It is one-sided because participants may think VTO as freshness at the first time and so would like to purchase more lipstick in this situation. Also, the organization could not just depend on sales to make decisions, for customers’ purchasing action could be assessed from multiple angles such as experiences, perceptions, and satisfactions. In the future, we could add more dependent variables in our experiment. Secondly, our research only focuses on lipstick. YSL has a lot of make-up products, such as eye shadow, blush etc. When customers want to purchase these products online, they could also use VTO. In the future, we could explore the effects of VTO on other make-up products.

Part 2: Simulated Studies

Research Question 1:

Compared to on-line shopping without VTO, is mean sales of lipsticks higher in on-line shopping with VTO?

Scenario 1: No Effect

In this research question, we define no-effect as the situation where the mean sales of lipsticks with VTO is lower or no more than 5% higher than the mean sales without VTO.

Simulation

We use simulation techniques to generate a dataset that will enable us to test the effect. We use rbinom function in R to generate binary variables, which suggests whether customers buy the lipstick or not. In order to simulate the situation where there is no significant difference in the mean sales between using VTO and not, we set the probability of buying lipsticks similar in the two groups.

For customer group who use VTO, we set the probability of buying lipsticks between 0.4 to 0.8. For customer group who don’t use VTO, we set the probability of buying lipsticks around the 5% higher or lower than the probability of VTO group. The buying probability for each group is summarized by use of VTO, age group and product type as Table 4.

Table 4. Parameter setting for no-effect scenario simulation

Tre a tment	Age group	Best s eller	R e ddish o range	Pink or rosy	nude	b runet
VTO	Below 30	0 . 6-0.8	0.6	0 . 5-0.7	0 . 5-0.7	0 . 6-0.7
VTO	Over 30	0.6	0.4	0.4	0.4	0.4
N o n-VTO	Below 30	0 . 6-0.7	0.4 3 -0.53	0 .7 3 -0.83	0 .4 3 -0.63	0.4 3 -0.73
N o n-VTO	Over 30	0.55	0 . 3-0.5	0 . 3-0.5	0 . 3-0.4	0 . 3-0.5

#We pack the data-generation code into a function
n<-2000
library(data.table)
library(dplyr)
library(forcats)
experiment<-function(seedn){
set.seed(seed=seedn)
n<-2000
group<-c(rep.int(x='VTO',times=n/2),rep.int(x='Non-VTO',times=n/2))
VTO.dat<- data.table(group=group)
VTO.dat[,Age := sample(x=c("Below 20" ,"20-30" , "30-40" ,"40-50" , "50-60" ,"Above 60"), size = 2000, replace =T,prob=c(0.15,0.35, 0.33, 0.13,0.03, 0.01))]
VTO.dat <- VTO.dat %>%  
  mutate(Age_group = fct_recode(.f = Age, "Young_people" = "Below 20", "Young_people" = "20-30", "Non_young_people" = "30-40", "Non_young_people" = "40-50","Non_young_people" = "Above 60", "Non_young_people" = "50-60"))
nVTO_young=nrow(VTO.dat[group=='VTO'&Age_group=='Young_people'])
nNonVTO_young=nrow(VTO.dat[group=='Non-VTO'&Age_group=='Young_people'])
nVTO_old=nrow(VTO.dat[group=='VTO'&Age_group=='Non_young_people'])
nNonVTO_old=nrow(VTO.dat[group=='Non-VTO'&Age_group=='Non_young_people'])

VTO.dat<- data.table(VTO.dat)

#create binary variables to represent the probability that whether customers will buy this lipstick or not
#quantity_Best_seller_Group-use_vto
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_407 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_407 := rbinom(n=nVTO_old,size=1,prob=0.7) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_416 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_416 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_80 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_80 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_83 := rbinom(n=nVTO_young,size=1,prob=0.6) ]  
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_83 := rbinom(n=nVTO_old,size=1,prob=0.6) ] 

#quantity_Best_seller_Group-not_use_vto
VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_407 := rbinom(n=nNonVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_407 := rbinom(n=nNonVTO_old,size=1,prob=0.55) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_416 := rbinom(n=nNonVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_416 := rbinom(n=nNonVTO_old,size=1,prob=0.55) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_80 := rbinom(n=nNonVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_80 := rbinom(n=nNonVTO_old,size=1,prob=0.55) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_83 := rbinom(n=nNonVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_83 := rbinom(n=nNonVTO_old,size=1,prob=0.55) ]

#add up the best seller
VTO.dat[,quantity_Best_seller_Group := quantity_407+quantity_416 +quantity_80+quantity_83 ]

#__________________________________________________________________
#quantity_reddish_orange
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_406 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_406 := rbinom(n=nVTO_old,size=1,prob=0.7) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_408 := rbinom(n=nVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_408 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_14 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_14 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_46 := rbinom(n=nVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_46 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_406 := rbinom(n=nNonVTO_young,size=1,prob=0.53) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_406 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_408 := rbinom(n=nNonVTO_young,size=1,prob=0.43) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_408 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_14 := rbinom(n=nNonVTO_young,size=1,prob=0.53) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_14 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_46 := rbinom(n=nNonVTO_young,size=1,prob=0.43) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_46 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[,quantity_reddish_orange := quantity_406+quantity_408 +quantity_14+quantity_46] 

#__________________________________________________________________
#quantity_pink_or_rosy
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_405 := rbinom(n=nVTO_young,size=1,prob=0.8) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_405 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_410 := rbinom(n=nVTO_young,size=1,prob=0.8) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_410 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_12 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_12 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_84 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_84 := rbinom(n=nVTO_old,size=1,prob=0.3) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_405 := rbinom(n=nNonVTO_young,size=1,prob=0.83) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_405 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_410 := rbinom(n=nNonVTO_young,size=1,prob=0.83) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_410 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_12 := rbinom(n=nNonVTO_young,size=1,prob=0.73) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_12 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_84 := rbinom(n=nNonVTO_young,size=1,prob=0.83) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_84 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[,quantity_pink_or_rosy := quantity_405+quantity_410 +quantity_12+quantity_84]

#__________________________________________________________________
#quantity_nude
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_404 := rbinom(n=nVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_404 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_434 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_434 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_44 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_44 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_150 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_150 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_404 := rbinom(n=nNonVTO_young,size=1,prob=0.43) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_404 := rbinom(n=nNonVTO_old,size=1,prob=0.2) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_434 := rbinom(n=nNonVTO_young,size=1,prob=0.63) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_434 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_44 := rbinom(n=nNonVTO_young,size=1,prob=0.63) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_44 := rbinom(n=nNonVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_150 := rbinom(n=nNonVTO_young,size=1,prob=0.53) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_150 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[,quantity_nude := quantity_404+quantity_434 +quantity_44+quantity_150]


#__________________________________________________________________
#quantity_brunet

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_401 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_401 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_409 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_409 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_122 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_122 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_131 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_131 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_401 := rbinom(n=nNonVTO_young,size=1,prob=0.43) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_401 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_409 := rbinom(n=nNonVTO_young,size=1,prob=0.73) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_409 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_122 := rbinom(n=nNonVTO_young,size=1,prob=0.43) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_122 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_131 := rbinom(n=nNonVTO_young,size=1,prob=0.63) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_131 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[,quantity_brunet := quantity_401+quantity_409 +quantity_122+quantity_131]

#__________________________________________________________________
#quantity and sales 
VTO.dat[,quantity  := quantity_Best_seller_Group+quantity_reddish_orange+quantity_pink_or_rosy+quantity_nude+quantity_brunet]
VTO.dat[,sales := quantity*40]

#__________________________________________________________________
#quantity and sales 
VTO.dat[,quantity  := quantity_Best_seller_Group+quantity_reddish_orange+quantity_pink_or_rosy+quantity_nude+quantity_brunet]
VTO.dat[,sales := quantity*40]
  return(VTO.dat)
}

#create analyze.experiment function to perform 2 sample t test with the datasets generated above
analyze.experiment<-function(the.dat){
  #t.test
  salestest<-t.test(x=the.dat[group == "VTO",sales],y=the.dat[group == "Non-VTO",sales],alternative = "greater")
  #collect coefficients and calculate effect size
  sales.effect<-salestest$estimate[1]-salestest$estimate[2]
  effect.size<-sales.effect/salestest$estimate[2]
  lower.bound<-salestest$conf.int[1]
  upper.bound<-salestest$conf.int[2]
  t<-salestest$statistic
  p<-salestest$p.value
  result<-data.table(effect=sales.effect,upper_ci=upper.bound,lower_ci=lower.bound,effect.size=effect.size,t=t,p=p)
  return(result)
}

#perform simulation 2000 times; each simulation has a different seed so the data will be changed
B<-1000
n<-2000
RNGversion(vstr=3.6)
set.seed(seed=198)
round<-1:B
exp=1
sim=experiment(exp)
round=rep.int(x=exp,times=n)
round=as.data.table(round)
s=cbind(round,sim)

x <- 2:n
for (exp in x) {
sim=experiment(exp)
round=rep.int(x=exp,times=n)
round=as.data.table(round)
ss=cbind(round,sim)
s=rbind(s,ss)
}

Analysis

We run the simulation 2000 times to learn the mean effect. In 2000 times of random simulation, the mean effect is 1.728, which well suggests that there is no significant difference in the mean sales between using VTO and not. Also, we get a p-value higher than 0.05 in 91.6% of simulations, suggesting a strong experiment power.

If our future study based on real customer data receives such results, we should believe that VTO does not improve out sales revenue in general.

Table 5. Simulation results for no-effect scenario

Res e arch Que s tion	Sce n ario	Mean E f fect in Simu l ated Data	95% C onfi d ence Int e rval of Mean E f fect	P erce n tage of F alse Posi t ives	Pe r cent a geof True Nega t ives	P erce n tage of F alse Nega t ives	P erce n tage of True Posi t ives
Que s tion 1	No E f fect	1 .728	[-2 6 .53, inf]	0 . 0315	0 . 9685

# analyze the t-test result using the analyze.experiment function
s.results=s[,analyze.experiment(the.dat=.SD),keyby='round']
s.results[,summary(effect)]

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.84   13.23   16.32   16.38   19.56   31.20

summary(s.results)

     round            effect         upper_ci      lower_ci     
 Min.   :   1.0   Min.   : 1.84   Min.   :Inf   Min.   :-5.861  
 1st Qu.: 500.8   1st Qu.:13.23   1st Qu.:Inf   1st Qu.: 5.524  
 Median :1000.5   Median :16.32   Median :Inf   Median : 8.594  
 Mean   :1000.5   Mean   :16.38   Mean   :Inf   Mean   : 8.640  
 3rd Qu.:1500.2   3rd Qu.:19.56   3rd Qu.:Inf   3rd Qu.:11.826  
 Max.   :2000.0   Max.   :31.20   Max.   :Inf   Max.   :23.542  
  effect.size             t                p            
 Min.   :0.004412   Min.   :0.3932   Min.   :0.0000000  
 1st Qu.:0.032443   1st Qu.:2.8200   1st Qu.:0.0000167  
 Median :0.040302   Median :3.4737   Median :0.0002624  
 Mean   :0.040458   Mean   :3.4846   Mean   :0.0064424  
 3rd Qu.:0.048501   3rd Qu.:4.1589   3rd Qu.:0.0024256  
 Max.   :0.078305   Max.   :6.7050   Max.   :0.3471123

s.results[,mean(p<0.05)] #the power of experiment; 1-power = type2 error

[1] 0.9685

s.results[,mean(effect.size>0.1)]

[1] 0

Scenario 2: An Expected Effect

In this research question, we define a meaningful effect as 10% higher in the mean sales of lipsticks with VTO than without VTO.

Simulation

We use simulation techniques to generate a data set that will enable us to test the effect. We use rbinom function in R to generate binary variables, which suggests whether customers buy the lipstick or not. For customer group who use VTO and is aged below 30, we set the probability of buying best-seller products as 0.8 and miscellaneous products between 0.7. For customer group who use VTO and is aged above 30, we set the probability of buying best-seller products as 0.6 and miscellaneous products as 0.4. For customer group who don’t use VTO and is aged below 30, we set the probability of buying best-seller products between 0.78 to 0.81 and miscellaneous products between 0.63 to 0.73. For customer group who don’t use VTO and is aged above 30, we set the probability of buying best-seller products between 0.59 to 0.61 and miscellaneous products between 0.34 to 0.43. The buying probability for each group is summarized by use of VTO, age group and product type as Table 6.

Table 6. Parameter setting for an effected scenario simulation

* Treat men t**	Age g r oup	Best se l ler*	R e ddish or a nge	Pink or r osy*	n ude	br u net
VTO	Below 30	0.8	0.7	0.7	0.7	0.7
VTO	Over 30	0.6	0.4	0.4	0.4	0.4
N o n-VTO	Below 30	0 .78 - 0.81	0.63 -0.73	0.7	0.63 -0.73	0.68 -0.72
N o n-VTO	Over 30	0.59 -0.61	0.34 -0.43	0.39 -0.41	3 8-0.4	3 8-0.4

#We pack the data-generation code into a function
n<-2000
library(data.table)
experiment<-function(seedn){
set.seed(seed=seedn)
n<-2000
group<-c(rep.int(x='VTO',times=n/2),rep.int(x='Non-VTO',times=n/2))
VTO.dat<- data.table(group=group)
VTO.dat[,Age := sample(x=c("Below 20" ,"20-30" , "30-40" ,"40-50" , "50-60" ,"Above 60"), size = 2000, replace =T,prob=c(0.15,0.35, 0.33, 0.13,0.03, 0.01))]
VTO.dat <- VTO.dat %>%  
  mutate(Age_group = fct_recode(.f = Age, "Young_people" = "Below 20", "Young_people" = "20-30", "Non_young_people" = "30-40", "Non_young_people" = "40-50","Non_young_people" = "Above 60", "Non_young_people" = "50-60"))
nVTO_young=nrow(VTO.dat[group=='VTO'&Age_group=='Young_people'])
nNonVTO_young=nrow(VTO.dat[group=='Non-VTO'&Age_group=='Young_people'])
nVTO_old=nrow(VTO.dat[group=='VTO'&Age_group=='Non_young_people'])
nNonVTO_old=nrow(VTO.dat[group=='Non-VTO'&Age_group=='Non_young_people'])

VTO.dat<- data.table(VTO.dat)

#create binary variables to represent the probability that whether customers will buy this lipstick or not
#quantity_Best_seller_Group-use_vto
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_407 := rbinom(n=nVTO_young,size=1,prob=0.9) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_407 := rbinom(n=nVTO_old,size=1,prob=0.7) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_416 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_416 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_80 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_80 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_83 := rbinom(n=nVTO_young,size=1,prob=0.6) ]  
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_83 := rbinom(n=nVTO_old,size=1,prob=0.6) ] 

#quantity_Best_seller_Group-not_use_vto
VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_407 := rbinom(n=nNonVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_407 := rbinom(n=nNonVTO_old,size=1,prob=0.55) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_416 := rbinom(n=nNonVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_416 := rbinom(n=nNonVTO_old,size=1,prob=0.55) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_80 := rbinom(n=nNonVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_80 := rbinom(n=nNonVTO_old,size=1,prob=0.55) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_83 := rbinom(n=nNonVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_83 := rbinom(n=nNonVTO_old,size=1,prob=0.55) ]

#add up the best seller
VTO.dat[,quantity_Best_seller_Group := quantity_407+quantity_416 +quantity_80+quantity_83 ]

#__________________________________________________________________
#quantity_reddish_orange
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_406 := rbinom(n=nVTO_young,size=1,prob=0.9) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_406 := rbinom(n=nVTO_old,size=1,prob=0.7) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_408 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_408 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_14 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_14 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_46 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_46 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_406 := rbinom(n=nNonVTO_young,size=1,prob=0.53) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_406 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_408 := rbinom(n=nNonVTO_young,size=1,prob=0.43) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_408 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_14 := rbinom(n=nNonVTO_young,size=1,prob=0.53) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_14 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_46 := rbinom(n=nNonVTO_young,size=1,prob=0.43) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_46 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[,quantity_reddish_orange := quantity_406+quantity_408 +quantity_14+quantity_46] 

#__________________________________________________________________
#quantity_pink_or_rosy
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_405 := rbinom(n=nVTO_young,size=1,prob=0.9) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_405 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_410 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_410 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_12 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_12 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_84 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_84 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_405 := rbinom(n=nNonVTO_young,size=1,prob=0.83) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_405 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_410 := rbinom(n=nNonVTO_young,size=1,prob=0.83) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_410 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_12 := rbinom(n=nNonVTO_young,size=1,prob=0.73) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_12 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_84 := rbinom(n=nNonVTO_young,size=1,prob=0.83) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_84 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[,quantity_pink_or_rosy := quantity_405+quantity_410 +quantity_12+quantity_84]

#__________________________________________________________________
#quantity_nude
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_404 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_404 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_434 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_434 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_44 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_44 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_150 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_150 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_404 := rbinom(n=nNonVTO_young,size=1,prob=0.43) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_404 := rbinom(n=nNonVTO_old,size=1,prob=0.2) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_434 := rbinom(n=nNonVTO_young,size=1,prob=0.63) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_434 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_44 := rbinom(n=nNonVTO_young,size=1,prob=0.63) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_44 := rbinom(n=nNonVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_150 := rbinom(n=nNonVTO_young,size=1,prob=0.53) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_150 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[,quantity_nude := quantity_404+quantity_434 +quantity_44+quantity_150]


#__________________________________________________________________
#quantity_brunet

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_401 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_401 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_409 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_409 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_122 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_122 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_131 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_131 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_401 := rbinom(n=nNonVTO_young,size=1,prob=0.43) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_401 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_409 := rbinom(n=nNonVTO_young,size=1,prob=0.73) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_409 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_122 := rbinom(n=nNonVTO_young,size=1,prob=0.43) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_122 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_131 := rbinom(n=nNonVTO_young,size=1,prob=0.63) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_131 := rbinom(n=nNonVTO_old,size=1,prob=0.3) ]

VTO.dat[,quantity_brunet := quantity_401+quantity_409 +quantity_122+quantity_131]

#__________________________________________________________________
#quantity and sales 
VTO.dat[,quantity  := quantity_Best_seller_Group+quantity_reddish_orange+quantity_pink_or_rosy+quantity_nude+quantity_brunet]
VTO.dat[,sales := quantity*40]

#__________________________________________________________________
#quantity and sales 
VTO.dat[,quantity  := quantity_Best_seller_Group+quantity_reddish_orange+quantity_pink_or_rosy+quantity_nude+quantity_brunet]
VTO.dat[,sales := quantity*40]
  return(VTO.dat)
}
#create analyze.experiment function to perform 2 sample t test with the datasets generated above
analyze.experiment<-function(the.dat){
  #t.test
  salestest<-t.test(x=the.dat[group == "VTO",sales],y=the.dat[group == "Non-VTO",sales],alternative = "greater")
  #collect coefficients and calculate effect size
  sales.effect<-salestest$estimate[1]-salestest$estimate[2]
  effect.size<-sales.effect/salestest$estimate[2]
  lower.bound<-salestest$conf.int[1]
  upper.bound<-salestest$conf.int[2]
  t<-salestest$statistic
  p<-salestest$p.value
  result<-data.table(effect=sales.effect,upper_ci=upper.bound,lower_ci=lower.bound,effect.size=effect.size,t=t,p=p)
  return(result)
}
#perform simulation 2000 times; each simulation has a different seed so the data will be changed
B<-1000
n<-2000
RNGversion(vstr=3.6)
set.seed(seed=198)
round<-1:B
exp=1
sim=experiment(exp)
round=rep.int(x=exp,times=n)
round=as.data.table(round)
s=cbind(round,sim)

x <- 2:n
for (exp in x) {
sim=experiment(exp)
round=rep.int(x=exp,times=n)
round=as.data.table(round)
ss=cbind(round,sim)
s=rbind(s,ss)
}

Analysis

In 2000 times of random simulation, the mean effect size (the difference in mean sales) is 20.27. This effect size takes up 12.85% of mean sales of non-VTO group, which suggests a significant and meaningful increase in sales after using VTO. Also, we get a p-value lower than 0.05 in 99.1% of simulations and thus we can say we have a strong experiment power.

If our future study based on real customer data receives such results, we should believe that VTO does improve our sales revenue by 10% in general.

Table 7. Simulation results for an effected scenario

*Res e arch Q uest i on**

S cena r io

* Mean

E

f fect

in

Simu

l ated

ata **

*95% C onfi d ence Int e rval of Mean Eff e ct**

P erce n tage of F alse P o siti v es

P erce n tage of True N e gati v es

P erce n tage of F alse N e gati v es

P erce n tage of True P o siti v es

Que s tion 1

Ef f ect: (Exp e cted S ize)

2 0.27

[- 6 .13, inf]

0 . 0475

0 .984

# analyze the t-test result using the analyze.experiment function
s.results=s[,analyze.experiment(the.dat=.SD),keyby='round']
s.results[,summary(effect)]

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  35.64   49.00   52.32   52.32   55.68   70.32

summary(s.results)

     round            effect         upper_ci      lower_ci    
 Min.   :   1.0   Min.   :35.64   Min.   :Inf   Min.   :27.24  
 1st Qu.: 500.8   1st Qu.:49.00   1st Qu.:Inf   1st Qu.:40.57  
 Median :1000.5   Median :52.32   Median :Inf   Median :43.87  
 Mean   :1000.5   Mean   :52.32   Mean   :Inf   Mean   :43.87  
 3rd Qu.:1500.2   3rd Qu.:55.68   3rd Qu.:Inf   3rd Qu.:47.20  
 Max.   :2000.0   Max.   :70.32   Max.   :Inf   Max.   :61.97  
  effect.size           t                p            
 Min.   :0.0867   Min.   : 6.983   Min.   :0.000e+00  
 1st Qu.:0.1195   1st Qu.: 9.524   1st Qu.:0.000e+00  
 Median :0.1285   Median :10.182   Median :0.000e+00  
 Mean   :0.1285   Mean   :10.193   Mean   :1.420e-15  
 3rd Qu.:0.1371   3rd Qu.:10.844   3rd Qu.:0.000e+00  
 Max.   :0.1743   Max.   :13.861   Max.   :1.965e-12

s.results[,mean(p<0.05)] #the power of experiment; 1-power = type2 error

[1] 1

s.results[,mean(effect.size>0.1)]

[1] 0.984

Research Question 2:

Compared to the difference in mean sales of lipsticks between shopping with and without VTO in customers older than 30, is the difference higher in customers younger than 30?

Scenario 1: No Effect

In this research question, we define no-effect as the situation where the difference in mean sales of lipsticks between shopping with and without VTO in customers younger than 30 is lower than or equal to the difference older than 30.

Simulation

We use simulation techniques to generate a dataset that will enable us to test the effect. We use rbinom function in R to generate binary variables, which suggests whether customers buy the lipstick or not. In order to simulate the situation where all of four group have the same sales., we set the probability of buying lipsticks similar in the four groups.

For customer group who use VTO and age below 30, we set the probability of buying lipsticks between 0.4 to 0.8. For customer group who use VTO and age over 30, we set the probability of buying lipsticks between 0.4 to 0.9. For customer group who don’t use VTO and below 30 or over 30, we set the probability of buying lipsticks around the 5% higher or lower than the probability of VTO group and below 30 or over 30. The buying probability for each group is summarized by use of VTO, age group and product type as Table8.

Table 8. Parameter setting for no-effect scenario simulation

* Treat men t**	Age g r oup	Best se l ler*	R e ddish or a nge	Pink or r osy*	n ude	br u net
VTO	Below 30	0 . 6-0.7	0 . 4-0.5	0 . 4-0.6	0 . 5-0.7	0.4 -0.55
VTO	Over 30	0 . 5-0.7	0.5	0 . 5-0.6	0 . 5-0.6	0 . 4-0.5
N o n-VTO	Below 30	0 . 4-0.6	0.5	0 . 3-0.5	0 . 4-0.5	0 . 4-0.5
N o n-VTO	Over 30	0 . 4-0.5	0 . 4-0.5	0.5	0 . 5-0.6	0 . 4-0.5

n<-2000
library(data.table)
experiment<-function(seedn){
set.seed(seed=seedn)
group<-c(rep.int(x='VTO',times=n/2),rep.int(x='Non-VTO',times=n/2))
VTO.dat<- data.table(group=group)
VTO.dat[,Age := sample(x=c("Below 20" ,"20-30" , "30-40" ,"40-50" , "50-60" ,"Above 60"), size = 2000, replace =T,prob=c(0.15,0.35, 0.33, 0.13,0.03, 0.01))]
VTO.dat <- VTO.dat %>%  
  mutate(Age_group = fct_recode(.f = Age, "Young_people" = "Below 20", "Young_people" = "20-30", "Non_young_people" = "30-40", "Non_young_people" = "40-50","Non_young_people" = "Above 60", "Non_young_people" = "50-60"))
nVTO_young=nrow(VTO.dat[group=='VTO'&Age_group=='Young_people'])
nNonVTO_young=nrow(VTO.dat[group=='Non-VTO'&Age_group=='Young_people'])
nVTO_old=nrow(VTO.dat[group=='VTO'&Age_group=='Non_young_people'])
nNonVTO_old=nrow(VTO.dat[group=='Non-VTO'&Age_group=='Non_young_people'])

VTO.dat<- data.table(VTO.dat)
#quantity_Best_seller_Group-use_vto
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_407 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_407 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_416 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_416 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_80 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_80 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_83 := rbinom(n=nVTO_young,size=1,prob=0.6) ]  
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_83 := rbinom(n=nVTO_old,size=1,prob=0.7) ] 

#quantity_Best_seller_Group-not_use_vto
VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_407 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_407 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_416 := rbinom(n=nNonVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_416 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_80 := rbinom(n=nNonVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_80 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_83 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_83 := rbinom(n=nNonVTO_old,size=1,prob=0.4) ]

#add up the best seller
VTO.dat[,quantity_Best_seller_Group := quantity_407+quantity_416 +quantity_80+quantity_83 ]

#__________________________________________________________________
#quantity_reddish_orange
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_406 := rbinom(n=nVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_406 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_408 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_408 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_14 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_14 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_46 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_46 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_406 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_406 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_408 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_408 := rbinom(n=nNonVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_14 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_14 := rbinom(n=nNonVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_46 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_46 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[,quantity_reddish_orange := quantity_406+quantity_408 +quantity_14+quantity_46] 

#__________________________________________________________________
#quantity_pink_or_rosy
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_405 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_405 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_410 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_410 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_12 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_12 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_84 := rbinom(n=nVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_84 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_405 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_405 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_410 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_410 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_12 := rbinom(n=nNonVTO_young,size=1,prob=0.3) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_12 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_84 := rbinom(n=nNonVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_84 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[,quantity_pink_or_rosy := quantity_405+quantity_410 +quantity_12+quantity_84]

#__________________________________________________________________
#quantity_nude
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_404 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_404 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_434 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_434 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_44 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_44 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_150 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_150 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_404 := rbinom(n=nNonVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_404 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_434 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_434 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_44 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_44 := rbinom(n=nNonVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_150 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_150 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[,quantity_nude := quantity_404+quantity_434 +quantity_44+quantity_150]


#__________________________________________________________________
#quantity_brunet

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_401 := rbinom(n=nVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_401 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_409 := rbinom(n=nVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_409 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_122 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_122 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_131 := rbinom(n=nVTO_young,size=1,prob=0.55) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_131 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_401 := rbinom(n=nNonVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_401 := rbinom(n=nNonVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_409 := rbinom(n=nNonVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_409 := rbinom(n=nNonVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_122 := rbinom(n=nNonVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_122 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_131 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_131 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[,quantity_brunet := quantity_401+quantity_409 +quantity_122+quantity_131]

#__________________________________________________________________
#quantity and sales 
VTO.dat[,quantity  := quantity_Best_seller_Group+quantity_reddish_orange+quantity_pink_or_rosy+quantity_nude+quantity_brunet]
VTO.dat[,sales := quantity*40]
  return(VTO.dat)
}


analyze.experiment<-function(data3){
  
model1 <- lm(sales~group+Age_group+group*Age_group,data=data3)
anova<-anova(model1)
interactionv2<-aov(sales~group+Age_group+group*Age_group,data=data3)
Tukey<-TukeyHSD(interactionv2, ordered=FALSE, conf.level=.95)

p_group_Age_group<-anova$`Pr(>F)`[3] # the p value of anova
Tukey_p_higher_than_0.05<-sum(Tukey$`group:Age_group`[,4]>0.05) #count the number of p value over 0.05 ( for Tukey,all 6 combo considered)
VTO_young_Non_VTO_young<-Tukey$`group:Age_group`[1,1] #difference between VTO_1 and VTO_0
VTO_old_Non_VTO_old<-Tukey$`group:Age_group`[6,1]  #difference between Non_VTO_1 and Non_VTO0
diff<-VTO_young_Non_VTO_young-VTO_old_Non_VTO_old 

result<-data.table(p_group_Age_group=p_group_Age_group,
                   Tukey_p_higher_than_0.05=Tukey_p_higher_than_0.05,
                   VTO_young_Non_VTO_young=VTO_young_Non_VTO_young,
                   VTO_old_Non_VTO_old=VTO_old_Non_VTO_old,
                   diff=diff)
  return(result)
}


#run the simulation 2000 times
n<-2000
RNGversion(vstr=3.6)
set.seed(seed=198)
exp=1
sim=experiment(exp)
round=rep.int(x=exp,times=n)
round=as.data.table(round)
s=cbind(round,sim)

x <- 2:n
for (exp in x) {
sim=experiment(exp)
round=rep.int(x=exp,times=n)
round=as.data.table(round)
ss=cbind(round,sim)
s=rbind(s,ss)
}

Analysis

In 2000 times of random simulation, the mean difference between young group using VTO and young group without VTO in sales is 30.06. the mean difference between old group using VTO and young group without VTO in sales is 31.75. the mean effect is -1.69, which suggests young people do not have a significant and meaningful increase in sales after using VTO than old people. Also, we get a p-value lower than 0.05 in 5.85% of simulations and thus we can say we do not have a strong experiment power.

Table 9. Simulation results for no-effect scenario

*Res e arch Q uest i on**

S cena r io

* Mean

E

f fect

in

Simu

l ated

ata **

*95% C onfi d ence Int e rval of Mean Eff e ct**

P erce n tage of F alse P o siti v es

P erce n tage of True N e gati v es

P erce n tage of F alse N e gati v es

P erce n tage of True P o siti v es

Que s tion 2

No E f fect

1.69

[ -29 . 312, 22. 877]

0. 064

0. 936

s.results=s[,analyze.experiment(data3=.SD),keyby='round']
summary(s.results)

     round        p_group_Age_group  Tukey_p_higher_than_0.05
 Min.   :   1.0   Min.   :0.001451   Min.   :0.000           
 1st Qu.: 500.8   1st Qu.:0.216219   1st Qu.:1.000           
 Median :1000.5   Median :0.473612   Median :2.000           
 Mean   :1000.5   Mean   :0.479631   Mean   :1.426           
 3rd Qu.:1500.2   3rd Qu.:0.729629   3rd Qu.:2.000           
 Max.   :2000.0   Max.   :0.999925   Max.   :2.000           
 VTO_young_Non_VTO_young VTO_old_Non_VTO_old      diff        
 Min.   :23.73           Min.   :22.55       Min.   :-24.036  
 1st Qu.:38.19           1st Qu.:35.73       1st Qu.: -2.856  
 Median :42.24           Median :39.64       Median :  2.606  
 Mean   :42.10           Mean   :39.71       Mean   :  2.390  
 3rd Qu.:45.90           3rd Qu.:43.64       3rd Qu.:  7.887  
 Max.   :57.64           Max.   :58.59       Max.   : 25.294

s.results[,mean(p_group_Age_group<0.05)]

[1] 0.064

Scenario 2: An Expected Effect

In this research question, we define an expected effect as the situation where the difference in mean sales of lipsticks between shopping with and without VTO in customers younger than 30 is higher more 10% than the difference older than 30.

Simulation

We use simulation techniques to generate a dataset that will enable us to test the effect. We use rbinom function in R to generate binary variables, which suggests whether customers buy the lipstick or not. In order to simulate the situation where all of four group have the same sales., we set the probability of buying lipsticks similar in the four groups.

Table 10. Parameter setting for an effected scenario simulation

* Treat men t**	Age g r oup	Best se l ler*	R e ddish or a nge	Pink or r osy*	n ude	br u net
VTO	Below 30	0 . 6-0.8	0 . 4-0.7	0 . 5-0.6	0 . 5-0.7	0 . 6-0.8
VTO	Over 30	0 . 5-0.7	0.5	0 . 5-0.9	0 . 4-0.6	0 . 5-0.8
N o n-VTO	Below 30	0 . 4-0.6	0.5	0 . 3-0.5	0 . 4-0.5	0 . 4-0.7
N o n-VTO	Over 30	0 . 4-0.5	0 . 4-0.5	0.5	0 . 5-0.8	0 . 4-0.5

#We pack the data-generation code into a function
n<-2000
library(data.table)
experiment<-function(seedn){
set.seed(seed=seedn)
group<-c(rep.int(x='VTO',times=n/2),rep.int(x='Non-VTO',times=n/2))
VTO.dat<- data.table(group=group)
VTO.dat[,Age := sample(x=c("Below 20" ,"20-30" , "30-40" ,"40-50" , "50-60" ,"Above 60"), size = 2000, replace =T,prob=c(0.15,0.35, 0.33, 0.13,0.03, 0.01))]
VTO.dat <- VTO.dat %>%  
  mutate(Age_group = fct_recode(.f = Age, "Young_people" = "Below 20", "Young_people" = "20-30", "Non_young_people" = "30-40", "Non_young_people" = "40-50","Non_young_people" = "Above 60", "Non_young_people" = "50-60"))
nVTO_young=nrow(VTO.dat[group=='VTO'&Age_group=='Young_people'])
nNonVTO_young=nrow(VTO.dat[group=='Non-VTO'&Age_group=='Young_people'])
nVTO_old=nrow(VTO.dat[group=='VTO'&Age_group=='Non_young_people'])
nNonVTO_old=nrow(VTO.dat[group=='Non-VTO'&Age_group=='Non_young_people'])

VTO.dat<- data.table(VTO.dat)
#quantity_Best_seller_Group-use_vto
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_407 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_407 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_416 := rbinom(n=nVTO_young,size=1,prob=0.8) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_416 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_80 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_80 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_83 := rbinom(n=nVTO_young,size=1,prob=0.6) ]  
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_83 := rbinom(n=nVTO_old,size=1,prob=0.7) ] 

#quantity_Best_seller_Group-not_use_vto
VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_407 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_407 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_416 := rbinom(n=nNonVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_416 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_80 := rbinom(n=nNonVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_80 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_83 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_83 := rbinom(n=nNonVTO_old,size=1,prob=0.4) ]

#add up the best seller
VTO.dat[,quantity_Best_seller_Group := quantity_407+quantity_416 +quantity_80+quantity_83 ]

#__________________________________________________________________
#quantity_reddish_orange
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_406 := rbinom(n=nVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_406 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_408 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_408 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_14 := rbinom(n=nVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_14 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_46 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_46 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_406 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_406 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_408 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_408 := rbinom(n=nNonVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_14 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_14 := rbinom(n=nNonVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_46 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_46 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[,quantity_reddish_orange := quantity_406+quantity_408 +quantity_14+quantity_46] 

#__________________________________________________________________
#quantity_pink_or_rosy
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_405 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_405 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_410 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_410 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_12 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_12 := rbinom(n=nVTO_old,size=1,prob=0.9) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_84 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_84 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_405 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_405 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_410 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_410 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_12 := rbinom(n=nNonVTO_young,size=1,prob=0.3) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_12 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_84 := rbinom(n=nNonVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_84 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[,quantity_pink_or_rosy := quantity_405+quantity_410 +quantity_12+quantity_84]

#__________________________________________________________________
#quantity_nude
VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_404 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_404 := rbinom(n=nVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_434 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_434 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_44 := rbinom(n=nVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_44 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_150 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_150 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_404 := rbinom(n=nNonVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_404 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_434 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_434 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_44 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_44 := rbinom(n=nNonVTO_old,size=1,prob=0.8) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_150 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_150 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[,quantity_nude := quantity_404+quantity_434 +quantity_44+quantity_150]


#__________________________________________________________________
#quantity_brunet

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_401 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_401 := rbinom(n=nVTO_old,size=1,prob=0.8) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_409 := rbinom(n=nVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_409 := rbinom(n=nVTO_old,size=1,prob=0.8) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_122 := rbinom(n=nVTO_young,size=1,prob=0.8) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_122 := rbinom(n=nVTO_old,size=1,prob=0.6) ]

VTO.dat[group == "VTO"&Age_group=='Young_people',quantity_131 := rbinom(n=nVTO_young,size=1,prob=0.6) ]
VTO.dat[group == "VTO"&Age_group=='Non_young_people',quantity_131 := rbinom(n=nVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_401 := rbinom(n=nNonVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_401 := rbinom(n=nNonVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_409 := rbinom(n=nNonVTO_young,size=1,prob=0.4) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_409 := rbinom(n=nNonVTO_old,size=1,prob=0.4) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_122 := rbinom(n=nNonVTO_young,size=1,prob=0.7) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_122 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[group == "Non-VTO"&Age_group=='Young_people',quantity_131 := rbinom(n=nNonVTO_young,size=1,prob=0.5) ]
VTO.dat[group == "Non-VTO"&Age_group=='Non_young_people',quantity_131 := rbinom(n=nNonVTO_old,size=1,prob=0.5) ]

VTO.dat[,quantity_brunet := quantity_401+quantity_409 +quantity_122+quantity_131]

#__________________________________________________________________
#quantity and sales 
VTO.dat[,quantity  := quantity_Best_seller_Group+quantity_reddish_orange+quantity_pink_or_rosy+quantity_nude+quantity_brunet]
VTO.dat[,sales := quantity*40]
  return(VTO.dat)
}
#create analyze.experiment function to perform Anova test
analyze.experiment<-function(data3){
  
model1 <- lm(sales~group+Age_group+group*Age_group,data=data3)
anova<-anova(model1)
interactionv2<-aov(sales~group+Age_group+group*Age_group,data=data3)
Tukey<-TukeyHSD(interactionv2, ordered=FALSE, conf.level=.95)

p_group_Age_group<-anova$`Pr(>F)`[3] # the p value of anova
Tukey_p_higher_than_0.05<-sum(Tukey$`group:Age_group`[,4]>0.05) #count the number of p value over 0.05 ( for Tukey,all 6 combo considered)
VTO_young_Non_VTO_young<-Tukey$`group:Age_group`[1,1] #difference between VTO_1 and VTO_0
VTO_old_Non_VTO_old<-Tukey$`group:Age_group`[6,1]  #difference between Non_VTO_1 and Non_VTO0
diff<-VTO_young_Non_VTO_young-VTO_old_Non_VTO_old 

result<-data.table(p_group_Age_group=p_group_Age_group,
                   Tukey_p_higher_than_0.05=Tukey_p_higher_than_0.05,
                   VTO_young_Non_VTO_young=VTO_young_Non_VTO_young,
                   VTO_old_Non_VTO_old=VTO_old_Non_VTO_old,
                   diff=diff)
  return(result)
}
#run the simulation 2000 times
n<-2000
RNGversion(vstr=3.6)
set.seed(seed=198)
exp=1
sim=experiment(exp)
round=rep.int(x=exp,times=n)
round=as.data.table(round)
s=cbind(round,sim)

x <- 2:n
for (exp in x) {
sim=experiment(exp)
round=rep.int(x=exp,times=n)
round=as.data.table(round)
ss=cbind(round,sim)
s=rbind(s,ss)
}

Analysis

In 2000 times of random simulation, the mean difference between young group using VTO and young group without VTO in sales is 96.04. the mean difference between old group using VTO and young group without VTO in sales is 71.87. the mean effect is 24.17, which suggests young people have a significant and meaningful increase in sales after using VTO than old people. Also, we get a p-value lower than 0.05 in 87% of simulations and thus we can say we have a strong experiment power.

Table 11. Simulation results for an effect scenario

*Res e arch Q uest i on**

S cena r io

* Mean

E

f fect

in

Simu

l ated

ata **

*95% C onfi d ence Int e rval of Mean Eff e ct**

P erce n tage of F alse P o siti v es

P erce n tage of True N e gati v es

P erce n tage of F alse N e gati v es

P erce n tage of True P o siti v es

Que s tion 2

Ef f ect: ( Exp e cted S ize)

2 4 .166

[4 . 515, 48 . 396]

87%

13%

s.results=s[,analyze.experiment(data3=.SD),keyby='round']
summary(s.results)

     round        p_group_Age_group   Tukey_p_higher_than_0.05
 Min.   :   1.0   Min.   :0.0000000   Min.   :0.000           
 1st Qu.: 500.8   1st Qu.:0.0001638   1st Qu.:1.000           
 Median :1000.5   Median :0.0018659   Median :1.000           
 Mean   :1000.5   Mean   :0.0287945   Mean   :1.206           
 3rd Qu.:1500.2   3rd Qu.:0.0150387   3rd Qu.:2.000           
 Max.   :2000.0   Max.   :0.9625853   Max.   :2.000           
 VTO_young_Non_VTO_young VTO_old_Non_VTO_old      diff       
 Min.   : 79.81          Min.   :55.09       Min.   :-4.515  
 1st Qu.: 92.23          1st Qu.:68.22       1st Qu.:18.897  
 Median : 95.98          Median :72.05       Median :24.277  
 Mean   : 96.04          Mean   :71.87       Mean   :24.166  
 3rd Qu.: 99.84          3rd Qu.:75.38       3rd Qu.:29.274  
 Max.   :113.30          Max.   :91.71       Max.   :48.396

s.results[,mean(p_group_Age_group<0.05)]

[1] 0.87

Research Question 3:

In comparing the difference of the mean sales of lipsticks between shopping with VTO and without, the difference of the mean sales is higher for the category of miscellaneous lipsticks rather than for the bestseller ones?

Scenario 1: No Effect

In this research question, we define no-effect as the situation where the change in mean sales of miscellaneous lipsticks by using VTO is lower or no more than 5% higher than the changes in mean sales of bestsellers by using VTO.

Simulation

We use simulation techniques to generate a dataset that will enable us to test the effect. We use rbinom function in R to generate quantity variables, which suggest how many lipsticks each customer would buy. The purpose is to simulate the situation where there is no significant difference in the change in mean sales of miscellaneous lipsticks and the changes in mean sales of bestsellers by using VTO.

The buying probability for each group is summarized by the use of VTO, Lipstyle and buying quantity in Table12. Basically, the probability to buy more lipsticks for the bestseller group is more, however, the effects of VTO on both groups are set to be quite similar.

Table 12. Parameter setting for no-effect scenario simulation

Tre a tment	Li p style	Quan t ity_0	Quan t ity_1	Quan t ity_2	Quan t ity_3	Quan t ity_4
VTO	Best s eller	0.01	0.10	0.25	0.38	0.18
VTO	M i scell a neous	0.09	0.32	0.33	0.16	0.1
N o n-VTO	Best s eller	0.04	0.13	0.25	0.35	0.15
N o n-VTO	M i scell a neous	0.15	0.35	0.33	0.13	0.04

#simulate the data for the third question
#seed=198
#form the dataset
experiment<-function(seedn){
set.seed(seed=seedn)
n=2000
group<-c(rep.int(x='VTO',times=n/2),rep.int(x='Non-VTO',times=n/2))
VTO.dat<- data.table(group=group)
VTO.dat[group == "VTO",lipstyle:=c(rep.int(x=1,times=n/4),rep.int(x=0,times=n/4)) ]
VTO.dat[group == "Non-VTO",lipstyle:=c(rep.int(x=1,times=n/4),rep.int(x=0,times=n/4)) ]
VTO.dat[group == "VTO"&lipstyle==1, quantity := sample(x=c(0,1,2,3,4), size =n/4, replace =T,prob=c(0.01,0.10, 0.25, 0.38,0.18))]
VTO.dat[group == "VTO"&lipstyle==0, quantity := sample(x=c(0,1,2,3,4), size =n/4, replace =T,prob=c(0.09,0.32, 0.33, 0.16,0.10))]
VTO.dat[group == "Non-VTO"&lipstyle==1, quantity := sample(x=c(0,1,2,3,4), size =n/4, replace =T,prob=c(0.04,0.13, 0.25, 0.35,0.15))]
VTO.dat[group == "Non-VTO"&lipstyle==0, quantity := sample(x=c(0,1,2,3,4), size =n/4, replace =T,prob=c(0.15,0.35, 0.33, 0.13,0.04))]
VTO.dat <- VTO.dat %>% mutate(testID=1:2000)
data3 <- data.frame(VTO.dat, stringsAsFactors = TRUE)
data3$lipstyle <- as.factor(data3$lipstyle)
  return(data3)
}

#do the anova test
analyze.experiment<-function(data3){
  
test3 <- lm(quantity~group+lipstyle+group*lipstyle, data=data3)
anova<-anova(test3)
interaction3 <- aov(quantity~group+lipstyle+group*lipstyle, data=data3)
Tukey<-TukeyHSD(interaction3, ordered=FALSE, conf.level=.95)

p_group_lipstyle<-anova$`Pr(>F)`[3] # the p value of anova
Tukey_p_higher_than_0.05<-sum(Tukey$`group:lipstyle`[,4]>0.05) #count the number of p value over 0.05 ( for Tukey,all 6 combo considered)
VTO_0_Non_VTO_0<-Tukey$`group:lipstyle`[1,1] #difference between VTO_1 and VTO_0
VTO_1_Non_VTO_1<-Tukey$`group:lipstyle`[6,1]  #difference between Non_VTO_1 and Non_VTO0
diff<-VTO_0_Non_VTO_0-VTO_1_Non_VTO_1 

result<-data.table(p_group_lipstyle=p_group_lipstyle,
                   Tukey_p_higher_than_0.05=Tukey_p_higher_than_0.05,
                   VTO_0_Non_VTO_0=VTO_0_Non_VTO_0,
                   VTO_1_Non_VTO_1=VTO_1_Non_VTO_1,
                   diff=diff)
  return(result)
}

#run the simulation 2000 times
n<-2000
RNGversion(vstr=3.6)
set.seed(seed=198)
exp=1
sim=experiment(exp)
round=rep.int(x=exp,times=n)
round=as.data.table(round)
s=cbind(round,sim)

x <- 2:n
for (exp in x) {
sim=experiment(exp)
round=rep.int(x=exp,times=n)
round=as.data.table(round)
ss=cbind(round,sim)
s=rbind(s,ss)
}

Analysis

We run the simulation 2000 times to learn the mean effect. In 2000 times of random simulation, the mean effect is 0.102, which well suggests that there is no significant difference in the changes in sales of using VTO between the bestseller group and the miscellaneous group. The p-value is 0.3421, which is higher than 0.05

Table 13. Simulation results for no-effect scenario

*Res e arch Q uest i on**

S cena r io

* Mean

E

f fect

in

Simu

l ated

ata **

*95% C onfi d ence Int e rval of Mean Eff e ct**

P erce n tage of F alse P o siti v es

P erce n tage of True N e gati v es

P erce n tage of F alse N e gati v es

P erce n tage of True P o siti v es

Que s tion 3

No E f fect

0 .102

[-0 . 238, 0. 436]

0 .199

0 .801

s.results=s[,analyze.experiment(data3=.SD),keyby='round']
summary(s.results)

     round        p_group_lipstyle    Tukey_p_higher_than_0.05 VTO_0_Non_VTO_0 
 Min.   :   1.0   Min.   :0.0000023   Min.   :0.000            Min.   :0.0760  
 1st Qu.: 500.8   1st Qu.:0.0803692   1st Qu.:0.000            1st Qu.:0.2540  
 Median :1000.5   Median :0.2637702   Median :0.000            Median :0.3000  
 Mean   :1000.5   Mean   :0.3421416   Mean   :0.348            Mean   :0.2998  
 3rd Qu.:1500.2   3rd Qu.:0.5728970   3rd Qu.:1.000            3rd Qu.:0.3460  
 Max.   :2000.0   Max.   :1.0000000   Max.   :2.000            Max.   :0.5620  
 VTO_1_Non_VTO_1       diff       
 Min.   :0.0080   Min.   :-0.238  
 1st Qu.:0.1560   1st Qu.: 0.042  
 Median :0.1980   Median : 0.100  
 Mean   :0.1978   Mean   : 0.102  
 3rd Qu.:0.2400   3rd Qu.: 0.162  
 Max.   :0.4420   Max.   : 0.436

s.results[,mean(p_group_lipstyle<0.05)]

[1] 0.199

Scenario 2: An Expected Effect

In this research question, we define a meaningful effect as 10% higher in the changes in sales of lipsticks by using VTO in the miscellaneous group than in the bestseller group.

Simulation

We use simulation techniques to generate a data set that will enable us to test the effect. We use rbinom function in R to generate quantity variables, which suggest how many lipsticks each customer would buy. The purpose is to simulate the situation where there is a significant difference in the change in mean sales of miscellaneous lipsticks and the changes in mean sales of bestsellers by using VTO.

The buying probability for each group is summarized by the use of VTO, Lipstyle and buying quantity in TableX. Basically, the probability to buy more lipsticks for the bestseller group is more. Besides, the effects of VTO on the miscellaneous group is designed to be larger than on the bestseller group, represented by a larger increase in higher quantity rate in the miscellaneous group using VTO than not using VTO.

Table 14. Parameter setting for an effected scenario simulation

Tre a tment	Li p style	Quan t ity_0	Quan t ity_1	Quan t ity_2	Quan t ity_3	Quan t ity_4
VTO	Best s eller	0.01	0.07	0.37	0.38	0.17
VTO	M i scell a neous	0.08	0.25	0.28	0.33	0.06
N o n-VTO	Best s eller	0.04	0.13	0.25	0.35	0.15
N o n-VTO	M i scell a neous	0.15	0.35	0.33	0.13	0.04

#simulate the data for the third question
#seed=198
#form the dataset
experiment<-function(seedn){
set.seed(seed=seedn)
n=2000
group<-c(rep.int(x='VTO',times=n/2),rep.int(x='Non-VTO',times=n/2))
VTO.dat<- data.table(group=group)
VTO.dat[group == "VTO",lipstyle:=c(rep.int(x=1,times=n/4),rep.int(x=0,times=n/4)) ]
VTO.dat[group == "Non-VTO",lipstyle:=c(rep.int(x=1,times=n/4),rep.int(x=0,times=n/4)) ]
VTO.dat[group == "VTO"&lipstyle==1, quantity := sample(x=c(0,1,2,3,4), size =n/4, replace =T,prob=c(0.01,0.07, 0.37, 0.38,0.17))]
VTO.dat[group == "VTO"&lipstyle==0, quantity := sample(x=c(0,1,2,3,4), size =n/4, replace =T,prob=c(0.08,0.25, 0.28, 0.33,0.06))]
VTO.dat[group == "Non-VTO"&lipstyle==1, quantity := sample(x=c(0,1,2,3,4), size =n/4, replace =T,prob=c(0.04,0.13, 0.25, 0.35,0.15))]
VTO.dat[group == "Non-VTO"&lipstyle==0, quantity := sample(x=c(0,1,2,3,4), size =n/4, replace =T,prob=c(0.15,0.35, 0.33, 0.13,0.04))]
VTO.dat <- VTO.dat %>% mutate(testID=1:2000)
data3 <- data.frame(VTO.dat, stringsAsFactors = TRUE)
data3$lipstyle <- as.factor(data3$lipstyle)
  return(data3)
}


#do the anova test
analyze.experiment<-function(data3){
  
test3 <- lm(quantity~group+lipstyle+group*lipstyle, data=data3)
anova<-anova(test3)
interaction3 <- aov(quantity~group+lipstyle+group*lipstyle, data=data3)
Tukey<-TukeyHSD(interaction3, ordered=FALSE, conf.level=.95)

p_group_lipstyle<-anova$`Pr(>F)`[3] # the p value of anova
Tukey_p_higher_than_0.05<-sum(Tukey$`group:lipstyle`[,4]>0.05) #count the number of p value over 0.05 ( for Tukey,all 6 combo considered)
VTO_0_Non_VTO_0<-Tukey$`group:lipstyle`[1,1] #difference between VTO_1 and VTO_0
VTO_1_Non_VTO_1<-Tukey$`group:lipstyle`[6,1]  #difference between Non_VTO_1 and Non_VTO0
diff<-VTO_0_Non_VTO_0-VTO_1_Non_VTO_1 

result<-data.table(p_group_lipstyle=p_group_lipstyle,
                   Tukey_p_higher_than_0.05=Tukey_p_higher_than_0.05,
                   VTO_0_Non_VTO_0=VTO_0_Non_VTO_0,
                   VTO_1_Non_VTO_1=VTO_1_Non_VTO_1,
                   diff=diff)
  return(result)
}

#run the simulation 2000 times
n<-2000
RNGversion(vstr=3.6)
set.seed(seed=198)
exp=1
sim=experiment(exp)
round=rep.int(x=exp,times=n)
round=as.data.table(round)
s=cbind(round,sim)

x <- 2:n
for (exp in x) {
sim=experiment(exp)
round=rep.int(x=exp,times=n)
round=as.data.table(round)
ss=cbind(round,sim)
s=rbind(s,ss)
}

Analysis

In 2000 times of random simulation, the mean effect is 0.3253, which suggests a significant and meaningful larger changes in sales by using VTO in the miscellaneous group. Also, we get a p-value lower than 0.05 in 95.45% of simulations and thus we can say we have a strong experiment power.

Table 15. Simulation results for an effected scenario

Res e arch Q uest i on*	S cena r io	* Mean E f fect in Simu l ated D ata **	95% C onfi d ence Int e rval of Mean Eff e ct*	P erce n tage of F alse P o siti v es	P erce n tage of True N e gati v es	P erce n tage of F alse N e gati v es	P erce n tage of True P o siti v es
Que s tion 3	Ef f ect: Exp e cted Size	0 . 3253	[ 0.02 4 0,0. 6 500]			0 . 0455	0 . 9545

*Res e arch Q uest i on**

S cena r io

* Mean

E

f fect

in

Simu

l ated

ata **

*95% C onfi d ence Int e rval of Mean Eff e ct**

P erce n tage of F alse P o siti v es

P erce n tage of True N e gati v es

P erce n tage of F alse N e gati v es

P erce n tage of True P o siti v es

Que s tion 3

Ef f ect: Exp e cted Size

0 . 3253

[ 0.02 4 0,0. 6 500]

0 . 0455

0 . 9545

s.results=s[,analyze.experiment(data3=.SD),keyby='round']
summary(s.results)

     round        p_group_lipstyle    Tukey_p_higher_than_0.05 VTO_0_Non_VTO_0 
 Min.   :   1.0   Min.   :0.0000000   Min.   :0.000            Min.   :0.2520  
 1st Qu.: 500.8   1st Qu.:0.0000202   1st Qu.:0.000            1st Qu.:0.4340  
 Median :1000.5   Median :0.0003251   Median :1.000            Median :0.4760  
 Mean   :1000.5   Mean   :0.0097197   Mean   :0.575            Mean   :0.4783  
 3rd Qu.:1500.2   3rd Qu.:0.0033480   3rd Qu.:1.000            3rd Qu.:0.5220  
 Max.   :2000.0   Max.   :0.7882708   Max.   :1.000            Max.   :0.6880  
 VTO_1_Non_VTO_1       diff       
 Min.   :-0.044   Min.   :0.0240  
 1st Qu.: 0.112   1st Qu.:0.2655  
 Median : 0.152   Median :0.3260  
 Mean   : 0.153   Mean   :0.3253  
 3rd Qu.: 0.194   3rd Qu.:0.3840  
 Max.   : 0.382   Max.   :0.6500

s.results[,mean(p_group_lipstyle<0.05)]

[1] 0.9545

References

Bialkova, S., & Barr, C. (2022). Virtual try-on: How to enhance consumer experience? In 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW) (pp. 01-08). IEEE.

Bigne, E. (2021). A model of adoption of AR-based self-service technologies: a two country comparison. International Journal of Retail & Distribution Management.

Collins, H.N., Johnson, P.I., Calderon, N.M. et al. Differences in personal care product use by race/ethnicity among women in California: implications for chemical exposures. J Expo Sci Environ Epidemiol (2021). https://doi.org/10.1038/s41370-021-00404-7.

Poushineh, A., & Vasquez-Parraga, A. Z. (2017). Discernible impact of augmented reality on retail customer’s experience, satisfaction, and willingness to buy. Journal of Retailing and Consumer Services, 34, 229-234.

Research design on the profitability of Virtual Try-on Technology on YSL’s lipstick sales

Authors: Zijian Li, Yuxuan Zhang, Wenlu Guo, Yalan liu, YiEn Tseng

Part 1: Research Proposal

Executive Summary / Abstract

Statement of the Problem

Research Questions, Hypotheses, and Effects

Importance of the Study and Social Impact

Literature Review

Research Plan

Population of Interest

Sample Selection

Brief Schedule

Data Collection

Data Security

Variables

Statistical Analysis Plan

Sample Size and Statistical Power

Possible Recommendations

Limitations and Uncertainties

Part 2: Simulated Studies

Research Question 1:

Scenario 1: No Effect

Simulation

Analysis

Scenario 2: An Expected Effect

Simulation

Analysis

Research Question 2:

Scenario 1: No Effect

Simulation

Analysis

Scenario 2: An Expected Effect

Simulation

Analysis

Research Question 3:

Scenario 1: No Effect

Simulation

Analysis

Scenario 2: An Expected Effect

Simulation

Analysis

References