Netflix Case Project Brief

MSB 325 - Introduction to Business Analytics (for entrepreneurs)

Author

Josh Mickelson, Vivian Finell, Hannah Geary, Audrey Bastidas, Megan Petersen

Published

December 17, 2024


1 Purpose

This case challenges students to analyze Netflix’s historical financial data to understand trends, uncover actionable insights, and answer entrepreneurial questions. The dataset reflects real-world complexities, including evolving reporting practices and aggregated data during certain periods.

We applied data transformation (wrangling), exploratory data analysis (EDA), and financial metrics to address entrepreneurial questions.


2 Dataset Overview

The dataset contains financial and strategic metrics extracted from Netflix’s public filings (SEC 10-K and 10-Q reports). It includes quarterly data spanning from Netflix’s first filing in 2001-Q3 to 2017-Q3.

Over this period, Netflix underwent significant transformations:

  • 2007: Introduced digital content streaming in the United States.
  • 2010: Expanded streaming services internationally.
  • 2013: Released its first original content, marking its entry into content production.

Key Offerings

In this dataset, Netflix’s primary products include:

  1. DVD Rental Service: Subscription-based DVD rentals, the original core of Netflix’s business.
  2. DVD Sales: One-time sales of DVDs.
  3. Streaming Services: Subscription-based streaming, domestically and internationally.

Dataset Highlights

This dataset captures:

  • Revenue streams from these products.
  • Costs associated with generating revenue, including subscription costs, sales costs, and fulfillment costs.
  • Operating costs such as marketing, technology and development, and general administrative expenses.
  • Key profitability metrics like Gross Profit and Operating Income.
  • Strategic data, such as subscriber counts and acquisition costs.

Key Variables

Revenue

The total revenue from all products.

  • Domestic Revenue: Revenue generated in domestic markets, including:
    • Domestic DVD Subscription Revenue: DVD rental subscriptions (2001-Q3 to 2005-Q4, 2011-Q4 to 2017-Q3).
    • Domestic Streaming Subscription Revenue: Streaming subscriptions (2011-Q4 to 2017-Q3).
    • DVD Sales: Revenue from DVD sales.
  • International Revenue: Revenue generated outside the domestic market, including:
    • International Subscription Revenue: Subscription revenue from international customers.

Cost of Revenue

All direct costs incurred directly in producing, marketing, and distributing the products and services of a company to customers. Cost of revenue can be found in the company income statement. Indirect costs (e.g. depreciation, salaries paid to management or other fixed costs) are excluded.

  • Subscription Costs: Direct costs for subscription services.
  • Sales Costs: Transactional costs like commissions.
  • Fulfillment Costs: Shipping and handling expenses.

Operating Costs

All indirect costs that are not directly accountable to a product. Like direct costs, indirect costs may be either fixed or variable.

  • Technology & Development: Investments in technology and product development.
  • Marketing: Customer acquisition and retention costs.
  • General & Admin (G&A): Overhead expenses.
  • Stock-Based Compensation: Non-cash expenses from employee stock options.
  • DVD Disposal & Other: Miscellaneous costs, including potential gains from asset sales.

Profitability

  • Gross Profit = Total Revenue - Total Costs of Revenue.
  • Operating Income = Gross Profit - Total Operating Costs.

3 Real-World Complexity in Reporting

Netflix’s financial reporting practices evolved over time. For some periods, the company provided detailed breakdowns of revenue and costs, but in others, it reported only subtotals. As a result:

  1. Domestic Revenue: From 06Q1 to 11Q3, Netflix did not provide separate data for DVD rental subscriptions and DVD sales, only the total Domestic Revenue.

  2. International Costs:

    • From 10Q3 to 11Q3, Netflix reported a single subtotal for International Streaming Costs of Revenue and Marketing.
    • From 11Q4 onward, Netflix provided separate data for:
      • Cost of Revenue and
      • Cost of Marketing.

Implications for Analysis

  • Subtotals Are Essential: For some periods, subtotals (e.g., Domestic Revenue, International Streaming Costs) are the only data available and cannot be disaggregated into components.
  • Educated Assumptions: You may need to interpret variables based on their likely meaning, especially for categories like “Domestic DVD Revenue.”

4 Subsets of Data

Code is provided in an R script. The dataset is divided into logical chunks for clarity and ease of analysis:

  1. Profitability Analysis Data:

    • Includes revenue, cost of revenue, and operating cost data needed to calculate profitability metrics like Gross Profit and Operating Income.
    • This chunk is foundational for understanding Netflix’s financial performance over time.
  2. Streaming Era Data:

    • Subscriber Data:

      • Includes metrics on total and streaming-specific subscribers.
      • Helps analyze growth in Netflix’s customer base and adoption of streaming services.
    • Cost Data:

      • Includes costs specifically associated with streaming, such as Cost of Revenue and Marketing.
      • Enables analysis of cost efficiency in the streaming era.
    • Key Non-Financial Data:

      • Includes strategic metrics not part of profitability analysis, such as subscription pricing, original content production, and library size.
      • These variables provide insight into Netflix’s strategic decisions and market positioning.

Benefits of Splitting the Data

  1. Task Management:
    • Teams can work on different subset independently, focusing on specific questions or analyses.
    • E.g., one team works on profitability analysis, another on subscriber trends, and another on strategic metrics.
  2. Simplified Reshaping: pivot_longer() will generate consistent row counts within each subset, preventing uneven row structures that could complicate EDA and visualizations.
  3. Focused Analysis: you can focus on a manageable subset of variables, reducing cognitive burden and allowing for deeper exploration.

Team Collaboration

Assign each data subset to a team member or subgroup. Trim the total number of subsets analyzed according to the number of team members. In our case, we had 4 members, so we did Revenue Analysis, Cost Analysis, Profitability Analysis, and Subscriber Data Analysis. Megan joined later to help with the entrepreneurial questions.

  • Revenue Data: Financial trends analysis.
  • Cost Data: Profitability analysis.
  • Subscriber Data: Streaming growth analysis.
  • Strategic Metrics: Market positioning and actionable insights.

Be sure to merge insights from all data subsets into a cohesive final report.


5 Your Task

Data Transformation (wrangling)

#This first section of data cleaning was done almost entirely by Nile Hatch.
#Minor edits, like adding dates to the subsets of data, and cleaning minor errors in the dataset, were done by Josh Mickelson.
#For the sake of complete code, we are including our entire R script so that another user should be able to run it as-is from the original XLSX file. 


# Install necessary packages if necessary - install only once and then comment out the line
#install.packages("readxl")
#install.packages("tidyverse")
#install.packages("writexl")
#install.packages("skimr")

# Load necessary libraries
library(readxl)
library(tidyverse)
library(writexl)
library(janitor)
library(skimr)
library(plotly)


options(scipen=999) # eliminate printing in scientific notation
rm(list=ls()) # clear the existing environment to avoid legacy errors


#Code to import and organize the data, results in netflix_data which is workable data 
#Results in sub-datasets to facilitate division of labor, clarity of questions, ease of analytics

# Import the data ---------------------------------------------------------

# Load the original dataset
#netflix_data <- read_excel("Netflix_data.xlsx")
netflix_raw_data <- read_excel("Netflix_data.xlsx")

# Transpose data
netflix_transpose_data <- t(netflix_raw_data)

# Extract and assign new column names
col_names <- netflix_transpose_data[1, ]               # First row for column names
netflix_transpose_data <- netflix_transpose_data[-1, ] # Remove the first row

# Convert to tibble with minimal repair
netflix_transpose_data <- as_tibble(netflix_transpose_data, .name_repair = "minimal")
names(netflix_transpose_data) <- col_names  # Apply the extracted names to columns

# Clean and ensure unique column names
library(janitor)
netflix_data <- netflix_transpose_data |> 
  clean_names() |> 
  type_convert() |> 
  select(-c(starts_with("na"),"other")) |> 
  rename(domestic_dvd_sales_revenue = dvd_sales,
         domestic_dvd_subs_revenue = domestic_dvd_revenue,
         paid_domestic_streaming_subscribers = paid,
         free_domestic_streaming_subscribers = free,
         dvd_subscribers = dvd,
         paid_dvd_subscribers = paid_2,
         free_dvd_subscribers = free_2,
         international_streaming_subscribers = international_streaming_subcribers,
         paid_international_streaming_subscribers = paid_3,
         free_international_streaming_subscribers = free_3,
         cost_revenue_marketing = costs_of_revenue_and_marketing_expenses,
         cost_domestic_streaming_revenue_marketing = domestic_streaming,
         cost_domestic_streaming_revenue = cost_of_revenue,
         cost_domestic_streaming_marketing = costs_of_marketing,
         cost_domestic_dvd_revenue_marketing = domestic_dvd,
         cost_domestic_dvd_revenue = cost_of_revenue_2,
         cost_domestic_dvd_marketing = costs_of_marketing_2,
         cost_international_streaming_revenue_marketing = interational_streaming,
         cost_international_streaming_revenue = cost_of_revenue_3,
         cost_international_streaming_marketing = costs_of_marketing_3,
         total_costs_revenue = total_costs_of_revenue,
         technology_development_cost = tech_dev,
         marketing_cost = marketing,
         general_administrative_cost = general_admin,
         stock_based_compensation_cost = stock_based_comp,
         dvd_legacy_cost = dvd_disposal_other,
         operating_income = opterating_income) |> 
  mutate(quarter = seq.Date(from = as.Date("2001-04-01"), 
                            to = as.Date("2017-09-30"), 
                            by = "quarter")  # Create quarterly intervals
  ) |> 
  relocate(quarter, .before = everything()) |>   # Move quarter to the first column
  #
  mutate(
    domestic_dvd_subscription_revenue = case_when(
      quarter >= as.Date("2001-04-01") & quarter <= as.Date("2005-12-31") ~ domestic_subscription_revenue,
      quarter >= as.Date("2011-10-01") & quarter <= as.Date("2017-09-30") ~ domestic_dvd_subs_revenue,
      TRUE ~ NA_real_  # Fill other rows with NA
    ),
    domestic_streaming_subscription_revenue = case_when(
      quarter >= as.Date("2011-10-01") & quarter <= as.Date("2017-09-30") ~ domestic_subscription_revenue,
      TRUE ~ NA_real_  # Fill other rows with NA
    )
  ) |> 
  relocate(c(domestic_dvd_subscription_revenue, domestic_streaming_subscription_revenue), 
           .after = domestic_dvd_sales_revenue)  |> # Move Quarter to the first column
  select(-c(domestic_subscription_revenue, domestic_dvd_subs_revenue)) |> 
  relocate(domestic_dvd_sales_revenue, .after = domestic_dvd_subscription_revenue)

#One of the values is missing a 0 (if you calculate domestic revenue from total rev and international rev, it's pretty obvious). This was the simplest way to fix it for me
netflix_data$domestic_revenue[netflix_data$quarter == as.Date("2011-07-01")] <- 799152000


# Income statement data ---------------------------------------------------

netflix_revenue_data <- netflix_data |> 
  select(c(quarter, total_revenue:international_subscription_revenue))# select revenue variables

netflix_cost_of_revenue_data <- netflix_data |>
  select(c(quarter,total_costs_revenue:fulfillment_costs))   # select cost of revenue variables incl. total

netflix_operating_cost_data <- netflix_data |> 
  select(c(quarter,total_operating_costs:dvd_legacy_cost,quarter))   # select cost of revenue variables incl. total


# Streaming era data ------------------------------------------------------

netflix_streaming_subscriber_data <- netflix_data |> 
  # select streaming subscriber data
  select(c(quarter,total_subscribers:free_international_streaming_subscribers)) 

netflix_streaming_cost_data <- netflix_data |> 
  # select streaming cost variables
  select(c(quarter,cost_revenue_marketing:cost_international_streaming_marketing))


# Key data not in expense reports -----------------------------------------

netflix_other_data <- netflix_data |> 
  select(c(quarter,cheapest_subscription:total_number_streaming_usa_titles))








----######### OUR CODE BEGINS HERE #########----



# 1: Revenue Analysis -----------------------------------
# 1a: Compare Domestic vs. International Revenue over time ----

#Preparing data in a new object
long_revenue <- netflix_revenue_data %>%
  #Pivoting to prepare for graphing across categories
  pivot_longer(cols = c(domestic_revenue, international_revenue),
               names_to = "revenue_type", 
               values_to = "revenue")  %>%
    #Fixing international revenue (values of 0 instead of NA)
    mutate(revenue = replace_na(revenue, 0))

# Plot revenues
#First we are making a ggplot object, then we'll graph it interactively with ggplotly
interactive_test <- ggplot(long_revenue, aes(x = quarter, y = revenue, color = revenue_type)) +
  geom_line() +
  #Formatting
  labs(title = "Domestic vs. International Revenue Over Time",
       x = "Year", y = "Revenue ($)", color = "Revenue Type") +
  theme_minimal() + 
  scale_y_continuous(labels = scales::dollar_format(scale = 1e-6, suffix = "M")) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme(legend.position = "top")

#I wanted to make one graph interactve just to practice
ggplotly(interactive_test)

# 1b: Analyze the evolution of subscription revenue vs. DVD sales----

#Preparing data in a new object
long_sub_vs_dvd <- netflix_revenue_data %>%
  #Pivoting to prepare for graphing across categories
  pivot_longer(cols = c(domestic_dvd_subscription_revenue, 
                        domestic_streaming_subscription_revenue, 
                        domestic_dvd_sales_revenue),
               names_to = "revenue_type", 
               values_to = "revenue")

#Plotting Subscription Revenue vs. DVD Sales over time
ggplot(long_sub_vs_dvd, aes(x = quarter, y = revenue, color = revenue_type)) +
  geom_line(linewidth = 1) +
  labs(title = "Subscription Revenue vs. DVD Sales Over Time",
       x = "Quarter", 
       y = "Revenue (USD)", 
       color = "Revenue Type") +
  theme_minimal() +
  scale_y_continuous(labels = scales::dollar_format(scale = 1e-6, suffix = "M")) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme(legend.position = "top") + 
  #Adding a label to explain the missing data from the center of the graph
  annotate("text", 
           x = as.Date("2004-04-01"), 
           y = max(long_sub_vs_dvd$revenue, na.rm = TRUE) * 0.25, 
           label = "No Data for years 2006-2011", 
           hjust = -0.1, 
           color = "black")







# 2: Cost Analysis --------------------------------------
# 2a: Explore trends in Total Costs of Revenue and its components ----

#Preparing data in a new object
long_revenue_cost <- netflix_cost_of_revenue_data %>%
  #Pivoting to prepare for graphing across categories
  pivot_longer(cols = c(total_costs_revenue, 
                        subscription_costs, 
                        sales_costs,
                        fulfillment_costs),
               names_to = "cost_type",
               values_to = "cost")

#Replacing blank values with 0. 
long_revenue_cost <- replace(long_revenue_cost, is.na(long_revenue_cost), 0)

#Graphing total costs over time
ggplot(long_revenue_cost, aes(x = quarter, y = cost, color = cost_type)) +
  geom_line(linewidth = 0.5) +
  #Formatting
  labs(title = "Total Costs of Revenue and Components",
       x = "Quarter", 
       y = "Costs (USD)", 
       color = "Cost Type") +
  theme_minimal() +
  scale_y_continuous(labels = scales::dollar_format(scale = 1e-6, suffix = "M")) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme(legend.position = "top")



# 2b: Evaluate how Marketing and Tech & Dev expenses scale with revenue growth ----

#Preparing data in a new object
long_expenses_growth <- netflix_operating_cost_data %>%
  #Adding total_revenue to the object (it wasn't already in the op. cost object)
  mutate(total_revenue = netflix_data$total_revenue,
      #Converting costs into a percent of revenue
      total_costs = total_operating_costs/total_revenue*100,
      technology_development_costs = technology_development_cost/total_revenue*100,
      marketing_costs = marketing_cost/total_revenue*100,
      general_admin_costs = general_administrative_cost/total_revenue*100,
      stock_compensation_costs = stock_based_compensation_cost/total_revenue*100,
      dvd_legacy_costs = dvd_legacy_cost/total_revenue*100
      )  %>%
  #Pivoting to prepare for graphing across categories
  pivot_longer(cols = c(total_costs, technology_development_costs, marketing_costs, general_admin_costs, stock_compensation_costs, dvd_legacy_costs),
               names_to = "expense_type",
               values_to = "expense")

#Plotting costs as a percent of total revenue
ggplot(long_expenses_growth, aes(x = quarter, y = expense, color = expense_type)) +
  #Formatting
  geom_line(linewidth = 0.5) +
  labs(title = "Percent of Total Revenue",
       x = "Quarter",
       y = "Revenue Share",
       color = "Expense Type") +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme_minimal() +
  theme(legend.position = "top")
  






# 3: Profitability Analysis ------------------------------
# 3a: Examine trends in Gross Profit and Operating Income (and 3c) ----
# With the labels on the graph, this also fulfills requirements for 3c

#Preparing data in a new object
long_profit_data <- netflix_data %>%
  #This time, just made a new object pulling only the variables I really needed
  select(quarter, gross_profit, operating_income) %>%
  #Pivoting to prepare for graphing across categories
  pivot_longer(cols = c(gross_profit, operating_income), 
               names_to = "profit_metric", 
               values_to = "value")

# Plotting Gross Profit and Operating Income
ggplot(long_profit_data, aes(x = quarter, y = value, color = profit_metric)) +
  geom_line(linewidth = 1) +
  #Adding labels to show Netflix actions that influenced profitablity
  geom_vline(xintercept = as.Date(c("2007-01-01", "2010-01-01", "2013-01-01")), 
             linetype = "dashed", color = "grey") +
  annotate("text", x = as.Date("2005-01-01"), y = max(long_profit_data$value, na.rm = TRUE), 
           label = "Streaming Introduced (2007)", hjust = -0.1, color = "black") +
  annotate("text", x = as.Date("2007-01-01"), y = max(long_profit_data$value, na.rm = TRUE) * 0.9, 
           label = "Global Expansion (2010)", hjust = -0.1, color = "black") +
  annotate("text", x = as.Date("2010-01-01"), y = max(long_profit_data$value, na.rm = TRUE) * 0.8, 
           label = "Original Content Begins (2013)", hjust = -0.1, color = "black") +
  #Formatting
  labs(title = "Trends in Gross Profit and Operating Income with Major Decisions",
       x = "Quarter",
       y = "Amount (USD)",
       color = "Profit Metric") +
  scale_y_continuous(labels = scales::dollar_format(scale = 1e-6, suffix = "M")) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme_minimal() +
  theme(legend.position = "top")




# 3b: Assess profit margins over time ----

#Preparing data in a new object
long_margin_data <- netflix_data %>%
  #Converting to percentages of total revenue
  mutate(
    gross_profit_margin = (gross_profit / total_revenue) * 100,
    operating_margin = (operating_income / total_revenue) * 100
  ) %>%
  #We only need these few variables for this analysis
  select(quarter, gross_profit_margin, operating_margin) %>%
  #Pivoting to prepare for graphing across categories
  pivot_longer(cols = c(gross_profit_margin, operating_margin), 
               names_to = "margin_type", 
               values_to = "percentage")

# Plotting Profit Margins
ggplot(long_margin_data, aes(x = quarter, y = percentage, color = margin_type)) +
  geom_line(linewidth = 1) +
  #Formatting
  labs(title = "Trends in Profit Margins",
       x = "Quarter",
       y = "Profit Margin",
       color = "Margin Type") +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme_minimal() +
  theme(legend.position = "top")
  





# 4: Subscriber Data --------------------------------------
# 4a: Visualize and analyze growth in streaming subscribers----

#Preparing data in a new object
long_subscribers <- netflix_streaming_subscriber_data %>%
  #There was an error in domestic subscribers and this is an easy fix. One value in domestic is mistakenly 0, easily verified by subtracting international from total. 
  mutate(domestic_streaming_subscribers = total_subscribers - international_streaming_subscribers)  %>%
  #Pivoting to prepare for graphing across categories
  pivot_longer(cols = c(total_subscribers, 
                        domestic_streaming_subscribers, 
                        international_streaming_subscribers),
               names_to = "subscriber_type",
               values_to = "subscriber")

#Plotting subscribers
ggplot(long_subscribers, aes(x = quarter, y = subscriber, color = subscriber_type)) +
  geom_line(linewidth = 0.5) +
  #Formatting
  labs(title = "Number of Subscribers",
       x = "Quarter", 
       y = "Subscribers", 
       color = "Subscriber Type") +
  theme_minimal() +
  scale_y_continuous(labels = scales::label_number(scale = 1e-6, suffix = " Million")) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme(legend.position = "top")

#5 Strategic Metrics: explore trends in pricing and original content production. ----
# Prepare data for visualization
pricing_vs_subscribers <- netflix_data %>%
  select(quarter, 
         total_subscribers,
         domestic_streaming_subscribers, 
         international_streaming_subscribers, 
         cheapest_subscription) %>%
  #Pivoting to prepare for graphing
  pivot_longer(cols = c(total_subscribers, domestic_streaming_subscribers, international_streaming_subscribers),
               names_to = "subscriber_type", 
               values_to = "subscribers")

# Plot trends in pricing and subscribers
ggplot(pricing_vs_subscribers, aes(x = quarter)) +
  #Graphing the numbers of subscribers
  geom_line(aes(y = subscribers / 1e6, color = subscriber_type), linewidth = 1) +
  #Graphing the cheapest subscription price over time
  geom_line(aes(y = cheapest_subscription, linetype = "Cheapest Plan"), color = "gray") +
  #Adding axis labels on both sides
  scale_y_continuous(
    name = "Subscribers (Millions)",
    sec.axis = sec_axis(~ ., name = "Plan Price (USD)")
  ) +
  #Formatting
  labs(title = "Trends in Pricing and Subscriber Growth",
       x = "Quarter", 
       color = "Subscriber Type",
       linetype = "Pricing") +
  theme_minimal() +
  theme(legend.position = "top")


#Preparing data in a new object
long_sub_costs_per_subscriber <- netflix_data %>%
  #This time, just made a new object pulling only the variables I really needed
  select(quarter, subscription_costs, total_subscribers) %>%
  #Making a new variable to graph subscription costs per subscriber
  mutate(subscription_costs_per_subscriber = subscription_costs/total_subscribers)

#Plotting graph
ggplot(long_sub_costs_per_subscriber, aes(x = quarter, y = subscription_costs_per_subscriber)) +
  #Formatting
  geom_line(color = "blue", linewidth = 0.5) +
  labs(title = "Subscription Costs per Subscriber",
       x = "Quarter",
       y = "Costs",
       color = "Profit Metric") +
  scale_y_continuous(labels = scales::dollar_format()) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme_minimal() +
  theme(legend.position = "top")



#Calculating correlations between domestic streaming, hours of content produced, and subscription price
correlation_data <- netflix_data %>%
  select(
    domestic_streaming_subscribers, 
    number_of_hours_of_original_content_produced_usa,
    cheapest_subscription)
correlation_matrix <- cor(correlation_data, use = "complete.obs")
print(correlation_matrix)

Exploratory Data Analysis (EDA)

  1. Revenue Analysis:

    Netflix first focused on domestic revenue, perfecting their processes. It is clear that when they entered the international market, they made rapid growth, largely because of the experience they had in the domestic market.

    Although the dataset is incomplete, we can clearly reflect that subscriptions have pulled in significantly more revenue than any DVD sales. Additionally, we can recognize the same thing that Netflix did - streaming subscriptions are significantly more profitable than DVD subscriptions.

  2. Cost Analysis:

    As Netflix stepped away from their original business model of sending out DVDs, fulfillment costs obviously dropped to 0. These days, almost the entirety of Netflix’s Costs of Revenue are subscription costs.

    When Netflix began, costs took up the highest percentage of revenue share. This is typical for any new startup. However, over time, we can see that Netflix’s expenses have largely scaled with revenue growth - if not decreased! For example, marketing costs have generally trended lower, implying that Netflix has been successful in capturing market share and brand recognition.

  3. Profitability Analysis:

    Netflix is an example of a company scaling successfully. They have almost always been profitable for the last few decades. Streaming and market growth have both been significant. One of the biggest risks they took was around 2011, when they began to prepare to release their first original content in 2013. Operating income dropped to almost nothing for a while, but they quickly recovered.

    Netflix has been generally profitable since around 2005. Gross profit margin has been relatively stable around 32% (±5% or so.) Operating margins are slimmer, but still significant, especially when given the sheer scale of revenues that Netflix is pulling in.

  1. Subscriber Data:

Of particular note here is the massive growth in Netflix’s international market. Obviously, domestic subscribers drove early growth. But in 2017, Netflix had more international than domestic subscribers. Additionally, international subscriber counts are growing much more rapidly than domestic subscribers.

  1. Strategic Metrics: explore trends in pricing and original content production.

Pricing is actually cheaper now than it was in 2003, and subscribers have steadily increased. Now that Netflix has secured market share, they are again slowly raising prices to boost profitability and keep up with inflation.

Netflix has certain costs necessary to run a subscription-based business. However, it appears that they are doing a fairly good job at it. Their cost of running their services on a per-subscriber basis has steadily trended downward. The only real spike in costs was around the period that Netflix began investing in self-produced content. Since then, costs per subscriber have again trended down.

Answer Entrepreneurial Questions

Generally speaking, data analysis is an incredibly useful tool for business decision making. Data-driven decisions lead to greater success than any sort of educated guess. If businesses keep regular data, data analysts such as ourselves are able to calculate actions that may lead to increased business profitability. Our analysis reduces uncertainty because it is wholly factual. We are taking all available information, and trying to present it in a clear and understandable form. One area we would love to see additional data on is in original content created outside the US market, or intended for markets outside of the US. Netflix is generally successful in developing solutions inside the domestic market, and then taking them internationally. We would love to see if that is the case for original content as well.

  1. Revenue Mix:

    The first graph included in this document (Revenue Analysis: “Domestic vs. International Revenue Over Time”) shows a clear shift in Netflix’s revenue mix over time. Initially, domestic revenue grew steadily, but international revenue was nonexistent until around 2010. After 2010, the international revenue grew significantly, and by 2017, it surpassed $1 billion. Netflix’s growth strategy shifted from focusing on domestic markets to investing heavily in international expansion, and it was highly successful. The rate of change for international revenue is noticeably greater than that of domestic revenue.

    From the graph Revenue Analysis: “Subscription Revenue vs. DVD Sales Over Time,” we get to see part of the story of Netflix’s transition from DVD to streaming. Domestic DVD sales produced almost negligible revenues when compared with subscription revenues. Netflix made a strategic shift from relying on DVD rentals to focusing on streaming subscriptions, which were a significantly more profitable and scalable model. Streaming allows Netflix to serve a global audience without the logistical constraints of physical media. Streaming allowed the company to increase their multiples, while simultaneously simplifying their business operations. The increase in streaming revenue over time reflects Netflix’s investment in technology and content to attract and retain subscribers, positioning itself as a leader in the entertainment streaming industry.

  2. Cost Efficiency:

    Subscription costs are scaling efficiently as subscriber counts grow. Per the 2nd graph of Strategic Metrics, “Subscription Costs per Subscriber,” Netflix is actually performing more than efficiently as they grow. Subscriber growth has been consistently greater than any growth in related subscription costs. From this data, we recommend that Netflix continue to run their business as they have been. It appears to be successful thus far.

    In the early 2000s, Netflix made heavy investments in both Marketing and Technology & Development. This shouldn’t come as much of a surprise, as they were in the early stages of building out a startup! After successful technological development, those costs quickly dropped off, while marketing costs continued to take up a significant portion of revenue share. This makes sense. Their basic systems were working, but they still needed to attract customers. In the years since, the percent of budget dedicated to marketing has generally shrunk, as technology development has once again began to grow. Netflix now needed to develop efficient systems that would work at scale!

  3. Strategic Decisions:

    How do pricing and original content trends correlate with subscriber growth? We can calculate correlations using the following code.

    #Calculating correlations between domestic streaming, hours of content produced, and subscription price
    correlation_data <- netflix_data %>%
      select(
        domestic_streaming_subscribers, 
        number_of_hours_of_original_content_produced_usa,
        cheapest_subscription)
    correlation_matrix <- cor(correlation_data, use = "complete.obs")
    print(correlation_matrix)
                                                     domestic_streaming_subscribers
    domestic_streaming_subscribers                                        1.0000000
    number_of_hours_of_original_content_produced_usa                      0.8898108
    cheapest_subscription                                                 0.8341757
                                                     number_of_hours_of_original_content_produced_usa
    domestic_streaming_subscribers                                                          0.8898108
    number_of_hours_of_original_content_produced_usa                                        1.0000000
    cheapest_subscription                                                                   0.6108632
                                                     cheapest_subscription
    domestic_streaming_subscribers                               0.8341757
    number_of_hours_of_original_content_produced_usa             0.6108632
    cheapest_subscription                                        1.0000000

    From this correlation matrix, we can see that all three values have at least some correlation.

    Domestic Streaming Subscribers vs. Hours of Original (US) Content: r=0.8898

    • Strong positive correlation: As the number of hours of original content increases, the number of domestic streaming subscribers also tends to increase.

    • Interpretation: This suggests that Netflix’s investment in producing original content has a substantial impact on attracting domestic subscribers.

    Domestic Streaming Subscribers vs. Cheapest Subscription: r=0.8342

    • Strong positive correlation: As the price of the cheapest subscription increases, the number of domestic streaming subscribers also tends to increase.

    • Interpretation: This may indicate that Netflix’s subscriber growth is resilient to price increases, likely due to strong demand and perceived value. It could also reflect broader subscriber growth outpacing cancellations due to price hikes.

    Hours of Original (US) Content vs. Cheapest Subscription: r=0.6109

    • Moderate positive correlation: As the number of hours of original content increases, the price of the cheapest subscription also increases.

    • Interpretation: Netflix may be leveraging its expanded original content library to justify higher subscription prices.

    The most statistically significant correlation here is between Domestic Streaming Subscribers and Hours of Original (US) Content (p = 0.00000006738). As such, the most obvious recommendation we can give Netflix is to invest in creating regionalized Original Content. It is strongly and consistently correlated with increased streaming subscribers. In general, Netflix has had a pattern of mastering strategies domestically, and then taking them successfully to an international market. We believe that continuing to create original content targeted to specific regions will be successful.

    Netflix shouldn’t take the correlation between streaming subscribers and rising subscriber costs seriously. Although at face value, one might think, “Wow, raising prices makes subscription rates increase! I should just keep doing this!” This is a terrible idea. More likely, these price hikes simply happened concurrently with other measures, like original content, that drove increased subscription numbers. From a simple economics perspective, raising prices will shrink demand. Be very conscientious of raising prices without a good reason. Although it may be true that Netflix’s customers haven’t yet seemed all that price sensitive, there will come a point where that changes. If Netflix continues to add value to their product, we do believe that it won’t be too difficult to justify price increases.