This data explores a multitude of streaming services around the globe with data extracted from 2020-2024. Analyzing the growth, engagement, pricing, and more of 78 different services!
The data covers 78 streaming services, the parent company they stem from, their churn ratings, growth, and price points, monthly and annually.
Below are five different data visualizations analyzing a few trends in the streaming market.
path = "U:/paid_video_streaming_services.csv"
df = pd.read_csv(path)
Looking at the 10 most common subscription prices across the 78 streaming services. Subscription prices tend to be on a sliding scale that allows the user to pick from a range of set prices, which plan to pay for. How much access a user has wish is tied to the price they pay, cheaper prices tending to mean less access to content on a respective platform.
The prices gathered in this donut chart visualization are singular price points taken from the amount users tend to pay with a respective service, instead of considering the several different prices offered in each services monthly and annual plans.
df = pd.read_csv(path)
all_prices = df['monthly_price_usd'].value_counts()
total_count = all_prices.sum()
top_10 = all_prices.head(9)
other_sum = all_prices.iloc[10:].sum()
plot_data = pd.concat([top_10, pd.Series({'Other': other_sum})])
fig, ax = plt.subplots(figsize=(10, 8))
wedges, texts, autotexts = ax.pie(
plot_data,
labels=[f"${x}" if x != 'Other' else 'Other' for x in plot_data.index],
autopct='%1.1f%%',
startangle=140,
pctdistance=0.82,
colors=plt.cm.tab20.colors,
textprops={'fontsize': 14}
)
centre_circle = plt.Circle((0,0), 0.30, fc='white')
fig.gca().add_artist(centre_circle)
plt.text(0, 0, f'Total Prices:\n{total_count}', ha='center', va='center', fontsize=18, fontweight='bold')
plt.setp(autotexts, size=12, weight="bold")
plt.title("Top 10 Most Common Subscription Prices", fontsize=18, pad=20)
ax.axis('equal')
plt.tight_layout()
plt.show()
Using a stacked bar chart to observe streaming services under a parent company. Many companies that provide a streaming service branches out with at least two, but most of the 78 do not have three and more as displayed here. This is likely because they are growing companies, and sustaining multiple media names can cost a tremendous amount of labor and money.
These companies can afford to have so many as they are household names with traction world wide. The support and effort put into providing quality service allows them to continue branching out with more streaming platforms.
counts = df.groupby('parent_company')['service_name'].transform('count')
df_filtered = df[counts >= 3].copy()
stacked_df = df_filtered.groupby(['parent_company', 'service_name']).size().unstack(fill_value=0)
color_map = {
'Hulu + Live TV': '#3D883D',
'Shudder': '#3D995D',
'AMC+': '#3D911D',
'Hulu': '#3D987D',
'Acorn TV': '#55D46F',
'Disney+': '#40D8D8',
'HBO Max': '#70A4E1',
'ESPN+': '#00A8E1',
'Discovery+': '#113CCF',
'Discovery+ (UK)': '#70A4E1',
'Hotstar': '#3D87ED',
'Max (HBO Max)': '#40D0DB'
}
ordered_colors = [color_map.get(service) for service in stacked_df.columns]
fig, ax = plt.subplots(figsize=(12, 7))
stacked_df.plot(kind='bar', stacked=True, ax=ax, color=ordered_colors, legend=False)
for container in ax.containers:
service_name = container.get_label()
labels = [service_name if v > 0 else "" for v in container.datavalues]
ax.bar_label(container, labels=labels, label_type='center',
color='white', fontweight='bold', fontsize=12)
plt.title("Parent Companies with 3+ Streaming Services", fontsize=15)
plt.ylabel("Number of Services")
plt.xlabel("Parent Company")
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
A radar chart looking at the user engagement with The Walt Disney parent company streaming services “Hulu”, “Hulu + Live TV”, “Hotstar”, “ESPN+”, and “Disney+”. This company had the most platforms under it of all the 78 services, leading with 5 of their own.
The values are measured on an imaginary scale of 50-150 in order to give a numerical representation of where these services sit on a low to high scale. High being 150 and low being 50 and below. The radar chart allows the respective assigned colors of each service to reach towards the high, medium, and low section of engagement it belongs to. Disney+ being very high in engagement while Hulu, has medium engagement as opposed to low, but is on the low side of medium engagement. Despite being stable enough to own so many services, the success of them by the time of 2024 is not equal as some are more successful in engagement than others. This could be a result of certain platforms existing longer under the parent company than others, allowing them to gain a larger following, to naming and promotion being more iconic for the larger platforms than the smaller counterparts.
data_dict = {
"Hulu": [45, 30, 25],
"Hulu + Live TV": [35, 25, 15],
"Hotstar": [65, 35, 10],
"ESPN+": [0, 95, 0],
"Disney+": [40, 30, 150]
}
categories = ['Low', 'Medium', 'High']
N = len(categories)
angles = [n / float(N) * 2 * np.pi for n in range(N)]
angles += angles[:1]
fig, ax = plt.subplots(figsize=(8, 8), subplot_kw=dict(polar=True))
colors = ['#4357FF', '#40D0DB', '#9357FF', '#90EE90', '#34BD33']
for i, (name, values) in enumerate(data_dict.items()):
plot_values = values + [values[0]]
ax.plot(angles, plot_values, linewidth=4, label=name, color=colors[i])
ax.fill(angles, plot_values, color=colors[i], alpha=0.1)
ax.set_theta_offset(np.pi / 2)
ax.set_theta_direction(-1)
plt.xticks(angles[:-1], categories, size=18, weight='bold')
plt.yticks([50, 100, 150], ["50", "100", "150"], color="grey", size=12)
ax.set_rlabel_position(0)
plt.title("2024 Engagement for The Walt Disney Services", size=22, y=1.1, weight='bold')
plt.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1), fontsize=13)
ax.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
Using a scatter plot to view where services are in relation to the amount of subscribers who cancel their subscription, their churn rate. Comparing the churn rate with the count of subscribers allowed a clear view of rather larger platforms with a higher subscriber count had a better churn rate. The green trend line further exemplifies this point.
It can be seen that the higher a services subscriber count is, the lower their churn rate. This is really good for the retention of these networks as customers are satisfied enough to stay the lower subscriber count services may be too expensive to maintain given their quality causing people to end their service.
fig, ax = plt.subplots(figsize=(12, 7))
# 2. Create the Scatter Plot
# Each of the 78 services will be represented by a single dot
ax.scatter(
df['subscribers_2024_millions'],
df['churn_rate_pct'],
color='royalblue',
alpha=0.7,
edgecolors='blue',
linewidth=0.5,
s=300 # Fixed size for all dots
)
# 3. Styling
plt.title("Churn Rate of All 78 Services", fontsize=16, fontweight='bold', pad=15)
plt.xlabel("Subscribers (Millions)", fontsize=12)
plt.ylabel("Churn Rate (%)", fontsize=12)
# Optional: Add a trend line to see the correlation
import numpy as np
z = np.polyfit(df['subscribers_2024_millions'], df['churn_rate_pct'], 1)
p = np.poly1d(z)
plt.plot(df['subscribers_2024_millions'], p(df['subscribers_2024_millions']), "g-", alpha=0.5, label="Trendline")
plt.grid(True, linestyle='-', alpha=0.5)
plt.legend()
plt.tight_layout()
plt.show()
Using a multi-line plot, the most significant growth vs little to no growth from 15 platforms was observed. The results show that the level of engagement for each platform is irrelevant as they all lie on different parts of the spectrum. Lower subscriber counts like with AMC+ which used to be in the three millions has climbed the most in a few short years.
Smaller companies have larger representations of growth compared to larger companies when they are well performing.
cols_to_keep = ['service_name']
year_cols = ['subscribers_2020_millions', 'subscribers_2021_millions',
'subscribers_2022_millions', 'subscribers_2023_millions',
'subscribers_2024_millions']
df['growth_rate'] = (df['subscribers_2024_millions'] - df['subscribers_2020_millions']) / df['subscribers_2020_millions']
top_15_services = df.sort_values(by='growth_rate', ascending=False).head(15)['service_name']
df_filtered = df[df['service_name'].isin(top_15_services)]
df_long = df_filtered.melt(id_vars=cols_to_keep, value_vars=year_cols,
var_name='Year', value_name='Subscribers')
df_long['Year'] = df_long['Year'].str.extract('(\d+)').astype(int)
plot_df = df_long.pivot(index='Year', columns='service_name', values='Subscribers')
fig, ax = plt.subplots(figsize=(14, 7))
plot_df.plot(kind='line', ax=ax, marker='o', linewidth=2.5)
plt.title("15 Streaming Services Growth Rate", fontsize=16, fontweight='bold')
plt.ylabel("Change in rate (%)", fontsize=12)
plt.xlabel("Year (2020-2024)", fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()
Thank you for taking a look at this streaming services data through my lens. The data explores a multitude of streaming services around the globe with data extracted from 2020-2024. Together, the data visualizations analyze the growth, engagement, pricing, and more of services.
The data covers 78 streaming services, the parent company they stem from, their churn ratings, growth, and price points, monthly and annually.
knitr::include_graphics("U:StreamingImagery.png")
Image from ‘Illustration: VIP+’ by Variety