What are the customer rentention strategies presented at Hulu?
#install.packages("gclus")
#install.packages("ggpubr")
#install.packages("tidyverse")
#install.packages("readxl")
library(ggpubr)
## Loading required package: ggplot2
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gclus)
## Loading required package: cluster
library(readxl)
###Step 2: Import Netflix data and clean data
rev_df <- read_excel("Hulu excel (2).xlsx")
head(rev_df)
## # A tibble: 6 × 4
## Subs Pricing Revenue Rating
## <dbl> <dbl> <dbl> <dbl>
## 1 25.2 12.0 0.702 0.97
## 2 26.8 12.0 0.792 0.97
## 3 27.9 12.0 0.891 0.97
## 4 28.5 12.0 1.03 0.97
## 5 30.4 12.0 1.01 0.78
## 6 32.1 12.0 1.29 0.91
###Step 3: Summarize the data
summary(rev_df)
## Subs Pricing Revenue Rating
## Min. :25.20 Min. :11.99 Min. :0.702 Min. :0.610
## 1st Qu.:31.68 1st Qu.:11.99 1st Qu.:1.224 1st Qu.:0.760
## Median :42.20 Median :11.99 Median :1.841 Median :0.880
## Mean :39.40 Mean :13.09 Mean :1.787 Mean :0.847
## 3rd Qu.:46.45 3rd Qu.:13.49 3rd Qu.:2.349 3rd Qu.:0.970
## Max. :48.50 Max. :17.99 Max. :2.782 Max. :0.990
Interpretation: On average for Hulu there are 39.4 subscriptions, with prices ranging from $11.99 - $17.99, and the revenue averaging around $1.79. Customer ratings are low, with a mean of 0.85 and a maximum of 0.99, saying that while subscription count and revenue have moderate variability, customer satisfaction remains stable
pairs(~ Subs + Revenue + Pricing + Rating, data = rev_df)
Interpretation: The correlation matrix shows that subscriptions and revenue have a strong positive relationship by (0.99), indicating that as subscriptions increase, revenue also increases significantly.Ratings also show weak negative correlations with all other variables, suggesting that customer satisfaction (ratings) is not strongly correlated to subscription numbers, pricing, or revenue.
corr <- cor(rev_df)
corr
## Subs Pricing Revenue Rating
## Subs 1.0000000 0.6952897 0.9858598 -0.4417660
## Pricing 0.6952897 1.0000000 0.7879706 -0.2068254
## Revenue 0.9858598 0.7879706 1.0000000 -0.3899287
## Rating -0.4417660 -0.2068254 -0.3899287 1.0000000
Interpretation: The residuals show a range from -1.71 - 1.10, with the majority of values clustered around 0, showing that the model's predictions are fairly close to the actual values.
model <- lm(Subs ~ Revenue + Pricing + Rating, data = rev_df)
summary(model)
##
## Call:
## lm(formula = Subs ~ Revenue + Pricing + Rating, data = rev_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7149 -0.3920 0.1116 0.5398 1.1043
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.4357 2.3517 12.942 6.83e-10 ***
## Revenue 13.8194 0.5085 27.177 8.10e-15 ***
## Pricing -1.0191 0.1958 -5.205 8.68e-05 ***
## Rating -2.8333 1.7022 -1.665 0.115
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8539 on 16 degrees of freedom
## Multiple R-squared: 0.991, Adjusted R-squared: 0.9893
## F-statistic: 588.1 on 3 and 16 DF, p-value: < 2.2e-16
Interpretation: The regression analysis shows that Revenue has a strong positive relationship with the dependent variable (p-value < 0.001), while Pricing has a significant negative impact (p-value < 0.001). However, Rating is not statistically significant (p-value = 0.115), suggesting it has little effect on the outcome. The model explains 99.1% of the variance (R-squared = 0.991), indicating a very strong fit.