Calculate NPS with R

Load packages

We will use NPS, NPC, readr, janitor, tidytext, tidyr and dplyr packages.

Loading data

Firs we load our data set refering the survey of a non-real airline company:

We prepare the data

survey_data <- clean_names(survey_data)
survey_data$cat <- cut(survey_data$x12_q12_nps_score, breaks = c(-1, 6, 8, 10), labels = c("Detractor", 
    "Passive", "Promoter"))
survey_data

We need to understand the main statistics:

summary(survey_data)
##   response_id       response_status     total_amount    amount_per_gift  
##  Min.   :11706300   Length:2579        Min.   :   1.0   Min.   :   1.00  
##  1st Qu.:12013070   Class :character   1st Qu.: 120.0   1st Qu.:  15.00  
##  Median :12207363   Mode  :character   Median : 240.0   Median :  25.00  
##  Mean   :12271973                      Mean   : 309.7   Mean   :  56.58  
##  3rd Qu.:12528042                      3rd Qu.: 480.0   3rd Qu.:  40.00  
##  Max.   :13777359                      Max.   :5000.0   Max.   :5000.00  
##                                                                          
##  pledge_frequency   source_type        plg_user_group    
##  Length:2579        Length:2579        Length:2579       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  x1_q1_how_long_listening  x2_q2_age         x14_q14_gender    
##  Length:2579              Length:2579        Length:2579       
##  Class :character         Class :character   Class :character  
##  Mode  :character         Mode  :character   Mode  :character  
##                                                                
##                                                                
##                                                                
##                                                                
##  x1_q1_how_long_listening_code x2_q2_age_code x12_q12_nps_score
##  Min.   :1.000                 Min.   :1.00   Min.   :-1.000   
##  1st Qu.:2.000                 1st Qu.:3.00   1st Qu.: 9.000   
##  Median :3.000                 Median :4.00   Median :10.000   
##  Mean   :3.193                 Mean   :3.96   Mean   : 8.439   
##  3rd Qu.:4.000                 3rd Qu.:5.00   3rd Qu.:10.000   
##  Max.   :7.000                 Max.   :8.00   Max.   :10.000   
##                                NA's   :14                      
##  monthly_equiv             cat      
##  Min.   :  0.0833   Detractor:  42  
##  1st Qu.: 10.0000   Passive  : 159  
##  Median : 20.0000   Promoter :2080  
##  Mean   : 25.8114   NA's     : 298  
##  3rd Qu.: 40.0000                   
##  Max.   :416.6667                   
## 

Question: What information do we have?

We need to exclude missing values and incorrect value (-1).

survey_data <- with(survey_data, survey_data[!(x12_q12_nps_score == -1 | is.na(x12_q12_nps_score)), ])
survey_data <- survey_data %>% drop_na(cat)

NPS

Calculate NPS

Let’s check the responses:

# Frequency table
prop.table(table(survey_data$x12_q12_nps_score))
## 
##            0            1            2            3            4            5 
## 0.0013152126 0.0017536168 0.0004384042 0.0008768084 0.0008768084 0.0065760631 
##            6            7            8            9           10 
## 0.0065760631 0.0179745726 0.0517316966 0.0670758439 0.8448049101

Maybe it is better if you translate this information in terms of detractors, passives and promoters.

summary(npc(survey_data$x12_q12_nps_score))
## Detractor   Passive  Promoter 
##        42       159      2080

And display these results in a table (similar to the previous table):

table(survey_data$x12_q12_nps_score, npc(survey_data$x12_q12_nps_score))
##     
##      Detractor Passive Promoter
##   0          3       0        0
##   1          4       0        0
##   2          1       0        0
##   3          2       0        0
##   4          2       0        0
##   5         15       0        0
##   6         15       0        0
##   7          0      41        0
##   8          0     118        0
##   9          0       0      153
##   10         0       0     1927

Let’s make a graph:

# Histogram (v1)
hist(survey_data$x12_q12_nps_score, breaks=-1:10, col=c(rep("red",7), rep("yellow",2), rep("green", 2)))

# Histogram (v2)
barplot(prop.table(table(survey_data$x12_q12_nps_score)),col=c(rep("red",7), rep("yellow",2), rep("green", 2)))

By segments:

# Age group
age_group <- aggregate(survey_data$x12_q12_nps_score,list(survey_data$x2_q2_age), FUN= mean)
age_group$Group.1 <- factor(age_group$Group.1, levels = c("18-24","25-34","35-44","45-54","55-64","65-74","75+"))
ggplot(age_group) + geom_col(aes(x=as.factor(Group.1), y=x, fill= as.factor(Group.1), width =0.5)) + xlab("Age") + ylab("NPS") + geom_text(aes(x=as.factor(Group.1), y=x, label=round(x,2)),  vjust = 0.5, hjust = 1.5)  + 
  coord_flip() + scale_fill_brewer("Age Group")
## Warning: Ignoring unknown aesthetics: width

Question: What can we observe?

# NPS Calculation
nps(survey_data$x12_q12_nps_score) # equivalent nps(x, breaks = list(0:6, 7:8, 9:10))
## [1] 0.8934678
# Standard Error
nps.se(survey_data$x12_q12_nps_score)
## [1] 0.007607452
# Variance
nps.var(survey_data$x12_q12_nps_score)
## [1] 0.1320091

Now that we have Net Promoter Score from the NPS survey, we should ask:

  • Is my survey sample large enough?
  • Is this fluctuation in NPS scores meaningful?

To answer these questions we can run a type of significance test called the Wald test as implemented in the NPS package:

nps.test (survey_data$x12_q12_nps_score,y = NULL, test = "wald", conf = 0.95, breaks = list(0:6,7:8, 9:10))
## One sample Net Promoter Score Z test
## 
## NPS of x: 0.89 (n = 2281)
## 
## Standard error of x: 0.008 
## Confidence level: 0.95 
## p value: 0 
## Confidence interval: 0.9083781 0.8785574

The goal of any significance test like the Wald test is to test how likely it is that a measurement happened by chance. The Wald test is not the only type of valid significance test for this situation, but the most important concept to draw from this is that of sample size. There are different ways of calculating a confidence interval, but a larger sample size will give you more accurate results.