** Netflix vs. Hulu: Analyzing Customer Retention Strategies - Netflix

Project Objective

What are the customer rentention strategies presented at Netflix? 

Step 1 & 2: Install and Load libraries

#install.packages("gclus")
#install.packages("ggpubr")
#install.packages("tidyverse")
#install.packages("readxl")
library(ggpubr)
## Loading required package: ggplot2
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.4     âś” readr     2.1.5
## âś” forcats   1.0.0     âś” stringr   1.5.1
## âś” lubridate 1.9.3     âś” tibble    3.2.1
## âś” purrr     1.0.2     âś” tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gclus)
## Loading required package: cluster
library(readxl)

###Step 2: Import Netflix data and clean data

rev_df <- read_excel("Netflix Excel (2).xlsx")
head(rev_df)
## # A tibble: 6 Ă— 4
##    Subs Pricing Revenue Rating
##   <dbl>   <dbl>   <dbl>  <dbl>
## 1  149.    13.0    4.52   0.75
## 2  152.    13.0    4.92   0.68
## 3  158.    13.0    5.25   0.93
## 4  167.    13.0    5.47   0.93
## 5  183.    14.0    5.77   0.86
## 6  193.    14.0    6.15   0.96

###Step 3: Summarize the data

summary(rev_df)
##       Subs          Pricing         Revenue          Rating      
##  Min.   :148.9   Min.   :12.99   Min.   :4.520   Min.   :0.5700  
##  1st Qu.:190.4   1st Qu.:13.99   1st Qu.:6.055   1st Qu.:0.7425  
##  Median :211.4   Median :13.99   Median :7.410   Median :0.9000  
##  Mean   :206.2   Mean   :14.39   Mean   :7.010   Mean   :0.8325  
##  3rd Qu.:225.0   3rd Qu.:15.49   3rd Qu.:7.940   3rd Qu.:0.9425  
##  Max.   :260.3   Max.   :15.49   Max.   :8.830   Max.   :0.9600
Interpretation: On average, there are approximately 206.2 subscriptions, with a price ranging from $12.99 - $15.49, and revenue averaging around $7.01. The customer ratings tend to be relatively low for Netflix , with a mean rating of 0.83. This suggests that ratings do not vary much and are generally below 1. The data also mentions that while pricing and revenue have moderate ranges, the ratings are consistently closer to the lower end of the scale, with a maximum rating of 0.96.

Step 4: Set up to create pairwise scatterplot matrix

pairs(~ Subs + Revenue + Pricing + Rating, data = rev_df)

Interpretation: Subscriptions and revenue are highly positively correlated (0.99), which means that as subscriptions increase, revenue also increases significantly. Pricing is moderately correlated with both subscriptions (0.90) and revenue (0.90), however, ratings show very weak or negative correlations with all other variables, indicating that customer ratings do not have a strong relationship with subscriptions, revenue, or pricing. 

Step 5: Analyze the correlation matrix for the dataset

corr <- cor(rev_df)
corr
##                 Subs    Pricing     Revenue       Rating
## Subs     1.000000000  0.8971548  0.98587540 -0.004053675
## Pricing  0.897154846  1.0000000  0.89904480 -0.353124961
## Revenue  0.985875397  0.8990448  1.00000000 -0.058104709
## Rating  -0.004053675 -0.3531250 -0.05810471  1.000000000
Interpretation: This data suggests a distribution with most values clustered between -2.24 and 2.33, indicating that the majority of observations are slightly negative or neutral, with a few extreme positive outliers. 

Step 6: Create the summary model for the data

model <- lm(Subs ~ Revenue + Pricing + Rating, data = rev_df)
summary(model)
## 
## Call:
## lm(formula = Subs ~ Revenue + Pricing + Rating, data = rev_df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.550 -2.240 -1.397  2.326  8.173 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -68.188     38.772  -1.759   0.0977 .  
## Revenue       18.426      2.421   7.610 1.05e-06 ***
## Pricing        8.387      3.328   2.520   0.0227 *  
## Rating        29.444     10.201   2.886   0.0107 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.583 on 16 degrees of freedom
## Multiple R-squared:  0.982,  Adjusted R-squared:  0.9786 
## F-statistic: 290.2 on 3 and 16 DF,  p-value: 3.722e-14
Interpretation: The regression results suggest that Revenue, Pricing, and Rating are significant predictors of Netflix's customer-related outcomes, with Revenue showing the strongest positive relationship (p-value < 0.001). This implies that Netflix’s customer retention or engagement strategies should focus on maximizing revenue, optimizing pricing strategies, and improving customer satisfaction (ratings) to drive better customer outcomes.