We will use NPS, NPC, readr, janitor, tidytext, tidyr and dplyr packages.
Firs we load our data set refering the survey of a non-real airline company:
We prepare the data
survey_data <- clean_names(survey_data)
survey_data$cat <- cut(survey_data$x12_q12_nps_score, breaks = c(-1, 6, 8, 10), labels = c("Detractor",
"Passive", "Promoter"))
survey_data
We need to understand the main statistics:
summary(survey_data)
## response_id response_status total_amount amount_per_gift
## Min. :11706300 Length:2579 Min. : 1.0 Min. : 1.00
## 1st Qu.:12013070 Class :character 1st Qu.: 120.0 1st Qu.: 15.00
## Median :12207363 Mode :character Median : 240.0 Median : 25.00
## Mean :12271973 Mean : 309.7 Mean : 56.58
## 3rd Qu.:12528042 3rd Qu.: 480.0 3rd Qu.: 40.00
## Max. :13777359 Max. :5000.0 Max. :5000.00
##
## pledge_frequency source_type plg_user_group
## Length:2579 Length:2579 Length:2579
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## x1_q1_how_long_listening x2_q2_age x14_q14_gender
## Length:2579 Length:2579 Length:2579
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## x1_q1_how_long_listening_code x2_q2_age_code x12_q12_nps_score
## Min. :1.000 Min. :1.00 Min. :-1.000
## 1st Qu.:2.000 1st Qu.:3.00 1st Qu.: 9.000
## Median :3.000 Median :4.00 Median :10.000
## Mean :3.193 Mean :3.96 Mean : 8.439
## 3rd Qu.:4.000 3rd Qu.:5.00 3rd Qu.:10.000
## Max. :7.000 Max. :8.00 Max. :10.000
## NA's :14
## monthly_equiv cat
## Min. : 0.0833 Detractor: 42
## 1st Qu.: 10.0000 Passive : 159
## Median : 20.0000 Promoter :2080
## Mean : 25.8114 NA's : 298
## 3rd Qu.: 40.0000
## Max. :416.6667
##
Question: What information do we have?
We need to exclude missing values and incorrect value (-1).
survey_data <- with(survey_data, survey_data[!(x12_q12_nps_score == -1 | is.na(x12_q12_nps_score)), ])
survey_data <- survey_data %>% drop_na(cat)
Let’s check the responses:
# Frequency table
prop.table(table(survey_data$x12_q12_nps_score))
##
## 0 1 2 3 4 5
## 0.0013152126 0.0017536168 0.0004384042 0.0008768084 0.0008768084 0.0065760631
## 6 7 8 9 10
## 0.0065760631 0.0179745726 0.0517316966 0.0670758439 0.8448049101
Maybe it is better if you translate this information in terms of detractors, passives and promoters.
summary(npc(survey_data$x12_q12_nps_score))
## Detractor Passive Promoter
## 42 159 2080
And display these results in a table (similar to the previous table):
table(survey_data$x12_q12_nps_score, npc(survey_data$x12_q12_nps_score))
##
## Detractor Passive Promoter
## 0 3 0 0
## 1 4 0 0
## 2 1 0 0
## 3 2 0 0
## 4 2 0 0
## 5 15 0 0
## 6 15 0 0
## 7 0 41 0
## 8 0 118 0
## 9 0 0 153
## 10 0 0 1927
Let’s make a graph:
# Histogram (v1)
hist(survey_data$x12_q12_nps_score, breaks=-1:10, col=c(rep("red",7), rep("yellow",2), rep("green", 2)))
# Histogram (v2)
barplot(prop.table(table(survey_data$x12_q12_nps_score)),col=c(rep("red",7), rep("yellow",2), rep("green", 2)))
By segments:
# Age group
age_group <- aggregate(survey_data$x12_q12_nps_score,list(survey_data$x2_q2_age), FUN= mean)
age_group$Group.1 <- factor(age_group$Group.1, levels = c("18-24","25-34","35-44","45-54","55-64","65-74","75+"))
ggplot(age_group) + geom_col(aes(x=as.factor(Group.1), y=x, fill= as.factor(Group.1), width =0.5)) + xlab("Age") + ylab("NPS") + geom_text(aes(x=as.factor(Group.1), y=x, label=round(x,2)), vjust = 0.5, hjust = 1.5) +
coord_flip() + scale_fill_brewer("Age Group")
## Warning: Ignoring unknown aesthetics: width
Question: What can we observe?
# NPS Calculation
nps(survey_data$x12_q12_nps_score) # equivalent nps(x, breaks = list(0:6, 7:8, 9:10))
## [1] 0.8934678
# Standard Error
nps.se(survey_data$x12_q12_nps_score)
## [1] 0.007607452
# Variance
nps.var(survey_data$x12_q12_nps_score)
## [1] 0.1320091
Now that we have Net Promoter Score from the NPS survey, we should ask:
To answer these questions we can run a type of significance test called the Wald test as implemented in the NPS package:
nps.test (survey_data$x12_q12_nps_score,y = NULL, test = "wald", conf = 0.95, breaks = list(0:6,7:8, 9:10))
## One sample Net Promoter Score Z test
##
## NPS of x: 0.89 (n = 2281)
##
## Standard error of x: 0.008
## Confidence level: 0.95
## p value: 0
## Confidence interval: 0.9083781 0.8785574
The goal of any significance test like the Wald test is to test how likely it is that a measurement happened by chance. The Wald test is not the only type of valid significance test for this situation, but the most important concept to draw from this is that of sample size. There are different ways of calculating a confidence interval, but a larger sample size will give you more accurate results.