Overview
This vignette will provide information on Netflix viewers history in
R.
Vignette Info
While working with the data, it demostrates different use of R
functions.
The dataset that I choosed is Netflix Viewing History dataset. This
dataset contains 5673 rows.
library(dplyr)
library(tidyr)
library(lubridate)
library(zoo)
library(ggplot2)
library(ggplotlyExtra)
library(plotly)
# READING DATA FROM CSV DOWNLOADED FROM NETFLIX ACCOUNT
netflix <- readr::read_csv("netflix_combined.csv")
str(netflix)
head(netflix)
netflix$Date <- dmy(netflix$Date)
# SEPARATE TITLE COLUMN IN TITLE OF TV SERIES, SEASON AND EPISODE TITLE
netflix_series <- netflix %>%
separate(col = Title, into = c("title", "types", "title_episode"), sep = ': ')
# REMOVE OCCURRENCES WHERE SEASON AND EPISODE ARE EMPTY (BECAUSE THEY ARE NOT TV SERIES)
netflix_series <- netflix_series[!is.na(netflix_series$types),]
netflix_series <- netflix_series[!is.na(netflix_series$title_episode),]
maratones_netflix <- netflix_series %>%
count(title, Date
# EPISODES PER DAY
netflix_episode_perday <- netflix %>%
count(Date) %>%
arrange(desc(n))


Plotting Eposodes Per Day
netflix_episode_perday_plot <- ggplot(aes(x = Date, y = n, color = n), data = netflix_episode_perday) +
geom_col(color = c("blue")) +
theme_minimal() +
ggtitle("Episodes watched on my Netflix per day", "History from 2020 to 2022") +
labs(x = "Date", y = "watched episodes")
netflix_episode_perday_plot

Graph 1: Episodes watched on Netflix per Day
Graph 1 shows that the number of episodes watched annually increased
gradually over time.
Days of the week and month
netflix_episode_perday$diasemanaF <-factor(netflix_episode_perday$diasemana, levels = rev(1:7), labels = rev(c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")),ordered = TRUE)
netflix_episode_perday$mesF <- factor(month(netflix_episode_perday$Date),levels = as.character(1:12), labels = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered = TRUE)
netflix_episode_perday$añomes <- factor(as.yearmon(netflix_episode_perday$Date))
netflix_episode_perday$semana <- as.numeric(format(netflix_episode_perday$Date,"%W"))
netflix_episode_perday$semanames <- ceiling(day(netflix_episode_perday$Date) / 7)
netflix_episode_perday_calendario <- ggplot(netflix_episode_perday, aes(semanames, diasemanaF, fill = n)) +
geom_tile(colour = "white") +
facet_grid(year(netflix_episode_perday$Date) ~ mesF) +
scale_fill_gradient(low = "blue", high = "green") +
ggtitle("Episodes watched per day on Netflix", "Heatmap by day of the week, month and year") +
labs(x = "week number", y = "Weekday") +
labs(fill = "No.Episodes")
netflix_episode_perday_calendario

Graph 2: Episodes watched per day on Netflix based on week,
month, and year
Graph 2 shows the highest number of episodes watched in Mar 2020
compared to other months.There was a dip in March due to low count views
of days collected for that period.
Frequency of activity in my netflix account per day
view_day <- netflix_episode_perday %>%
count(diasemanaF)
view_dayview_day_plot <- view_day %>%
ggplot(aes(diasemanaF, n)) +
geom_col(fill = "orange") +
coord_polar() +
theme_minimal() +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_text(face = "bold"),
plot.title = element_text(size = 16, face = "bold")) +
ggtitle("Frequency of episodes watched", "Activity by day of the week on Netflix")
view_dayview_day_plot

Graph 3: Episodes watched per day on Netflix based on week,
month, and year
Graph 3 shows that people are watching Netflix every single day of
the week.
Frequency of activity in my netflix account per Month
View_month <- netflix_episode_perday %>%
count(mesF)
View_monthView_month_plot <- View_month %>%
ggplot(aes(mesF, n)) +
geom_col(fill = "blue") +
coord_polar() +
theme_minimal() +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_text(face = "bold"),
plot.title = element_text(size = 18, face = "bold")) +
ggtitle("Frequency of episodes watched", "Activity per month on Netflix")
View_monthView_month_plot

Graph 4: Frequency of episodes watched activity per
month
Graph 4 shows that viewers’ episode watch rate is high at the
beginning of a year and in January due to the new year holiday time
Frequency of activity in my netflix account per Year
view_year <- netflix_episode_perday %>%
count(añomes)
view_yearview_year_plot <- view_year %>%
ggplot(aes(añomes, n)) +
geom_col(fill = "blue") +
coord_polar() +
theme_minimal() +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_text(face = "bold"),
plot.title = element_text(size = 18, face = "bold")) +
ggtitle("Frequency of episodes watched", "Activity by month of the year on Netflix")
view_yearview_year_plot

Graph 5: Frequency of episodes watched activity per month and
year
It was observed that there were fewer frequent viewers in February
from 2020 to 2022 and June of 2020-2021 in Graph 5.
From the data we can see that most users watch to love Netflix
regularly. This is an interesting finding since it counters one of my
previous assumptions about how people consume video content using
Netflix these days.
---
title: "Netflix Viewing History in R"
author: "Saikat Barua"
date: '2022-08-13'
output:
  html_notebook: default
  html_document:
    df_print: paged
  pdf_document: default
  word_document: default
---

### Overview

This vignette will provide information on Netflix viewers history in R.

### Tool

Collected data from Netflix. <https://www.netflix.com/YourAccount> .

Data source : To download the file, please click here:

[Netflix] (<https://drive.google.com/file/d/1jVHjDdsIPaiwpXj6QIt1dATu3Dn1ZTth/view?usp=sharing>[)](https://drive.google.com/file/d/1pKGurdkfLrhFXlsvtlO_a2Yxfc-vfMB_/view?usp=sharing))

or

<https://drive.google.com/file/d/1jVHjDdsIPaiwpXj6QIt1dATu3Dn1ZTth/view?usp=sharing>

### Vignette Info

While working with the data, it demostrates different use of R functions.

The dataset that I choosed is Netflix Viewing History dataset. This dataset contains 5673 rows.

```{r}
library(dplyr)
library(tidyr)
library(lubridate)
library(zoo)
library(ggplot2)
library(ggplotlyExtra)
library(plotly)

# READING DATA FROM CSV DOWNLOADED FROM NETFLIX ACCOUNT
netflix <- readr::read_csv("netflix_combined.csv") 
str(netflix)
head(netflix)
netflix$Date <- dmy(netflix$Date)
  

# SEPARATE TITLE COLUMN IN TITLE OF TV SERIES, SEASON AND EPISODE TITLE
netflix_series <- netflix %>%
  separate(col = Title, into = c("title", "types", "title_episode"), sep = ': ')


# REMOVE OCCURRENCES WHERE SEASON AND EPISODE ARE EMPTY (BECAUSE THEY ARE NOT TV SERIES)
netflix_series <- netflix_series[!is.na(netflix_series$types),]
netflix_series <- netflix_series[!is.na(netflix_series$title_episode),]
maratones_netflix <- netflix_series %>%
  count(title, Date

# EPISODES PER DAY
netflix_episode_perday <- netflix %>%
  count(Date) %>%
  arrange(desc(n))

```

![](images/paste-013448FC.png){width="820"}

![](images/paste-FBC78D5E.png){width="820"}

### Plotting Eposodes Per Day

```{r}
netflix_episode_perday_plot <- ggplot(aes(x = Date, y = n, color = n), data = netflix_episode_perday) +
  geom_col(color = c("blue")) +
  theme_minimal() +
  ggtitle("Episodes watched on my Netflix per day", "History from 2020 to 2022") +
  labs(x = "Date", y = "watched episodes") 
netflix_episode_perday_plot
```

![](images/Graph%201_%20Episodes%20watched%20on%20Netflix%20per%20Day-01.PNG)

Graph 1: Episodes watched on Netflix per Day

Graph 1 shows that the number of episodes watched annually increased gradually over time. 

### Format into Day, month and year

```{r}
netflix_episode_perday <- netflix_episode_perday[order(netflix_episode_perday$Date),]
netflix_episode_perday$diasemana <- wday(netflix_episode_perday$Date)
netflix_episode_perday$diasemanaF <- weekdays(netflix_episode_perday$Date, abbreviate = T)
netflix_episode_perday$mesF <- months(netflix_episode_perday$Date, abbreviate = T)
```

### Days of the week and month

```{r}
netflix_episode_perday$diasemanaF <-factor(netflix_episode_perday$diasemana, levels = rev(1:7), labels = rev(c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")),ordered = TRUE)
netflix_episode_perday$mesF <- factor(month(netflix_episode_perday$Date),levels = as.character(1:12), labels = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered = TRUE)
netflix_episode_perday$añomes <- factor(as.yearmon(netflix_episode_perday$Date)) 
netflix_episode_perday$semana <- as.numeric(format(netflix_episode_perday$Date,"%W"))
netflix_episode_perday$semanames <- ceiling(day(netflix_episode_perday$Date) / 7)
netflix_episode_perday_calendario <- ggplot(netflix_episode_perday, aes(semanames, diasemanaF, fill = n)) + 
  geom_tile(colour = "white") + 
  facet_grid(year(netflix_episode_perday$Date) ~ mesF) + 
  scale_fill_gradient(low = "blue", high = "green") + 
  ggtitle("Episodes watched per day on Netflix", "Heatmap by day of the week, month and year") +
  labs(x = "week number", y = "Weekday") +
  labs(fill = "No.Episodes")
netflix_episode_perday_calendario
```

![](images/Graph%202_Episodes%20watched%20per%20day%20on%20Netflix%20based%20on%20week,%20month,%20and%20year-02.PNG)

*Graph 2: Episodes watched per day on Netflix based on week, month, and year*

Graph 2 shows the highest number of episodes watched in Mar 2020 compared to other months.There was a dip in March due to low count views of days collected for that period.

### Frequency of activity in my netflix account per day

```{r}
view_day <- netflix_episode_perday %>%
  count(diasemanaF)
view_dayview_day_plot <- view_day %>% 
  ggplot(aes(diasemanaF, n)) +
  geom_col(fill = "orange") +
  coord_polar()  +
  theme_minimal() +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.text.x = element_text(face = "bold"),
        plot.title = element_text(size = 16, face = "bold")) +
  ggtitle("Frequency of episodes watched", "Activity by day of the week on Netflix")

view_dayview_day_plot
```

![](images/Graph%203_%20Episodes%20watched%20per%20day%20on%20Netflix%20based%20on%20week,%20month,%20and%20year-02.PNG)

*Graph 3: Episodes watched per day on Netflix based on week, month, and year*

Graph 3 shows that people are watching Netflix every single day of the week.

### Frequency of activity in my netflix account per Month

```{r}
View_month <- netflix_episode_perday %>%
  count(mesF)
View_monthView_month_plot <- View_month %>% 
  ggplot(aes(mesF, n)) +
  geom_col(fill = "blue") +
  coord_polar()  +
  theme_minimal() +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.text.x = element_text(face = "bold"),
        plot.title = element_text(size = 18, face = "bold")) +
  ggtitle("Frequency of episodes watched", "Activity per month on Netflix") 
View_monthView_month_plot
```

![](images/Graph%204_Frequency%20of%20episodes%20watched%20activity%20per%20month-02.PNG)

*Graph 4: Frequency of episodes watched activity per month*

Graph 4 shows that viewers' episode watch rate is high at the beginning of a year and in January due to the new year holiday time

### Frequency of activity in my netflix account per Year

```{r}
view_year <- netflix_episode_perday %>%
  count(añomes)
view_yearview_year_plot <- view_year %>% 
  ggplot(aes(añomes, n)) +
  geom_col(fill = "blue") +
  coord_polar()  +
  theme_minimal() +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.text.x = element_text(face = "bold"),
        plot.title = element_text(size = 18, face = "bold")) +
  ggtitle("Frequency of episodes watched", "Activity by month of the year on Netflix")
view_yearview_year_plot
```

![](images/Graph%205_Rplot%20Frequency%20of%20episodes%20watched_Activity%20by%20Month%20of%20Year-02.PNG){width="800"}

*Graph 5: Frequency of episodes watched activity per month and year*

It was observed that there were fewer frequent viewers in February from 2020  to 2022 and June of 2020-2021 in Graph 5.

From the data we can see that most users watch to love Netflix regularly. This is an interesting finding since it counters one of my previous assumptions about how people consume video content using Netflix these days.

### **References**

NETFLIX. (n.d). Account. <https://www.netflix.com/YourAccount>
