Introduction

The dataset that I’ve chose to analyze is about Diney Plus and their movie and tv show titles. Disney Plus is one of the newer streaming services out available, as it was first launched in 2019. The analysis that will be presented takes a look into a number of things, but namely the top producers for Dinsey Plus, what kinds of movies and tv shows are on the streaming services, as well as when the titles were added.

Dataset

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
import warnings

warnings.filterwarnings("ignore")

filename = '/Users/danwigley/Desktop/disney_plus_titles.csv'
df = pd.read_csv(filename)

df['date_added'] = pd.to_datetime(df['date_added'])
df['date_added'] = pd.to_datetime(df['date_added'], format = '%Y-%m-%d')
df['Month'] = df['date_added'].dt.strftime('%B')
df['Weekday'] = df['date_added'].dt.strftime('%A')
df['Year'] = df['date_added'].dt.year

The dataset can be downloaded from kaggle and is under the name disney_plus_titles.csv.

Findings

How long are the movies on Disney Plus?

movies = df[df.type == 'Movie']
movies['duration'] = movies['duration'].str[:-3]

plt.figure(figsize=(16,10))
sns.distplot(movies['duration'], hist=True, kde=False, rug=False)
plt.xlabel("Duration", size=20)
plt.ylabel("Number of Movies",size=20)
plt.title("The Distribution of the Duration of Movies", size=24)

If you are anything like me, you love movies. And not just the movies you go to see at the movie theater, but also the short films. The above histogram shows the distribution of the duration of all the films available on Disney Plus. We can see that there are two main spikes in the chart, at the very beginning and in the middle between the hour and two hour marks. This shows that Disney Plus has a large variety of short films as well as a large portion of what most people think of when they hear the word movie.

Content Over the Years

date_added_df = df.groupby(['Month', 'Year'])['show_id'].count().reset_index(name='Title Count')
d = {'January':1, 'February':2, 'March':3, 'April':4, 'May':5, 'June':6,'July':7,'August':8,'September':9,'October':10,'November':11,'December':12}
date_added_df.Month = date_added_df.Month.map(d)
date_added_df.dropna()
plt.figure(figsize=(14,5))
plt.scatter(date_added_df.Month, date_added_df.Year, marker='8', cmap='viridis', 
           c=date_added_df['Title Count'], s=10*date_added_df['Title Count'], edgecolors='black')
plt.title("Titles added by Date",fontsize='18')
plt.xlabel("Month",fontsize='14')
plt.ylabel("Year",fontsize='14')
plt.yticks(np.arange(2019, 2021.1))
cbar = plt.colorbar()
cbar.set_label("Number of Titles", rotation=270, fontsize='14', color='black', labelpad=30)
plt.show()

date_added_df = df.groupby(['Month', 'Year'])['show_id'].count().reset_index(name='Title Count')
d = {'January':1, 'February':2, 'March':3, 'April':4, 'May':5, 'June':6,'July':7,'August':8,'September':9,'October':10,'November':11,'December':12}
date_added_df.Month = date_added_df.Month.map(d)
date_added_df.dropna()
date_added_df = date_added_df[date_added_df.Year != 2019.0]
plt.figure(figsize=(14,5))
plt.scatter(date_added_df.Month, date_added_df.Year, marker='8', cmap='viridis', 
           c=date_added_df['Title Count'], s=20*date_added_df['Title Count'], edgecolors='black')
plt.title("Titles added by Date",fontsize='18')
plt.xlabel("Month",fontsize='14')
plt.ylabel("Year",fontsize='14')
plt.yticks(np.arange(2020, 2021.1))
cbar = plt.colorbar()
cbar.set_label("Number of Titles", rotation=270, fontsize='14', color='black', labelpad=30)
plt.show()

The two above scatter plots show how many titles have been added over time for each month during each year. The first scatter plot includes the year 2019 and the three months during 2019 in which Disney Plus was active. The month of November during 2019 proves to be a huge outlier amongst the rest of the data as it was the month in which almost all of the initial titles were added to the service. Because of this a decision was made to run the data again, but this time only using the data from 2020 and 2021 which will provide us with a clearer outlook. We see that the months of April and July are the where the most titles are added.

Titles by Quarter

df['Quarter'] = df['date_added'].dt.quarter
df['Quarter Name'] = "Quarter " + df.Quarter.astype('string')

df3 = df.groupby(['Quarter Name', 'Month'])['show_id'].count().reset_index(name='TitleCount')

number_outside_colors = len(df3['Quarter Name'].unique())
outside_color_ref_number = np.arange(number_outside_colors)*4
print(outside_color_ref_number)
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(1,1,1)

colormap = plt.get_cmap("tab20c")
outer_colors = colormap(outside_color_ref_number)

total_titles = df3.TitleCount.sum()

df3.groupby(['Quarter Name'])['TitleCount'].sum().plot(
        kind='pie', radius=1, colors= outer_colors, pctdistance=0.85, labeldistance=1.1,
        wedgeprops = dict(edgecolor='w'), textprops={'fontsize':14}, 
        autopct = lambda p: '{:.2f}%\n{:.1f} titles'.format(p,(p/100)*total_titles),
        startangle=90)

hole = plt.Circle((0,0),0.3, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Titles Added by Quarter', fontsize=18)

ax.text(0,0, 'Total Titles Added\n' + str(total_titles), ha = 'center', va= 'center')

ax.axis('equal')
plt.tight_layout()
plt.show()

The donut chart above identifies the percentage of titles added in each quarter of the year. Again we see that the outlier of November 2019, when most of the original titles were added short after the launch, heavily outweighs the rest of the data. Besides the large portion that quarter 4 takes we can compare the other three quarters to eachother. We see that quarters 2 and 3 have about the same amount of titles added each year whereas quarter 1 is signficantly lower than the other quarters. In the middle of the donut we can also see that there are currently 1,365 titles that have been added to Disney Plus.

Ratings of Movies and TV Shows

df2 = df.groupby(['Month', 'rating'])['show_id'].count().reset_index(name='Title Count')
d = {'January':1, 'February':2, 'March':3, 'April':4, 'May':5, 'June':6,'July':7,'August':8,'September':9,'October':10,'November':11,'December':12}
df2.Month = df2.Month.map(d)
df2.sort_values(by='Month', inplace=True)

fig = plt.figure(figsize=(16,10))
ax = fig.add_subplot(1,1,1)

my_colors = {'G': 'blue',
            'PG': 'red',
            'PG-13': 'green',
            'TV-14': 'gray',
            'TV-G': 'purple',
            'TV-PG': 'gold',
            'TV-Y': 'brown',
            'TV-Y7': 'black',
            'TV-Y7-FV': 'orange'}

for key, grp in df2.groupby(['rating']):
    grp.plot(ax=ax, kind='line', x='Month', y='Title Count', color=my_colors[key], label=key, marker='8')
plt.title('Rating of Show Added By Month',fontsize='18')
ax.set_xlabel('Month',fontsize='14')
ax.set_ylabel('Total Titles Added',fontsize='14')
plt.show()

The line plot above shows when movies and tv shows have been added as well as the rating of those shows and movies. Disney Plus is the streaming service that is considered the most kid friendly so I felt it would be important to show the ratings of the shows and movies that are being added. We can see that the most popular ratings are G and TV-G which would make sense for Disney Plus. The next two most popular are PG and TV-PG which would again make sense as they would want to add as much kid friendly tv shows and movies as they can.

Conclusion

After analyzing the data from Disney Plus along with the visualizations created we can come to a few conclusions. From the analysis we can determine a number of reasons why someone would want to have a Disney Plus subscription. First we looked at the most popular directors on the service, and if you like any of them that would be a good reason to get it. We then looked at the length of the movies that are on the service. We can see that they have a wide variety of movies that include the term in the traditional sense as well as short films. After that we looked into when titles were being added to the service. We found that while a majority of the movies and shows were added at the initial launch in Novmeber 2019, we can also see that plenty more are being added mainly in quarters 2 and 3. Finally we looked at the types of shows and movies on the service and found that they do follow Disney’s target audience as most of them are directed towards audiences without age resrictions.