Description

The dataset provides the premium users of Netflix showing the Revenue, Age, Gender, and more.

The main dataset used was found on Kaggle: https://www.kaggle.com/datasets/arnavsmayan/netflix-userbase-dataset

Importing the Dataset

I imported the csv file to RStudio and created values for Gender, Subscription Type, and Age.

library(readr)

PremiumNetflixUsers <- read_csv("C:/Users/melig/Downloads/NetflixDataset/PremiumNetflixUsers.csv")

Gender<- c(PremiumNetflixUsers$`Gender`)

SubType<- c(PremiumNetflixUsers$`Subscription Type`)

age<- c(PremiumNetflixUsers$`Age`)

Histogram

Using Age and Premium Users to Netflix: It can show the frequency of the ages are most subscribed to Netflix.

hist(age, main="What age is most subcribed to Premium Netflix?", xlab="Age", col= "Blue", breaks=25)

The graph shows that the majority if Premium subscribers are ages 26,41,and 47. Overall, people of all ages are willing to be subscribed to Premium Netflix.

Boxplox

I decided to use country and the monthly revenue to see to see the most occurring monthly revenue was made for each country. This would also show that people would typically pay for that month. So, I chose to do boxplot to show this result.

boxplot(MonthlyRev ~ country, PremiumNetflixUsers, xlab="Country",ylab="Monthy Revenue" ,col= "pink")

We see that Italy has the most monthly revenue at i the range of 13 to 15, 15 being the most due to the median. So the majority of Italy subscribed to premium with a monthly revenue being 15. Germany was the lowest of the 8 countries which is surprising that mostly would have a monthly revenue of 11. However Spain and the United states had the very similar box plots with a median of 13 and the Q3 at 14 and Q1 at 11.

Statistical Test

I want to see if being in a developed country impacts people purchasing a Premium Subscription or if regardless of country a premium subscription is purchased. According to UN, Australia is the most developed and least developed is Brazil. I expect that Australia will have more subscriptions due to them being one of top ranked developed countries.

First to get Australia and Brazil, I had to filter those countries using:

AU = filter(PremiumNetflixUsers, country =="Australia")
BRA = filter(PremiumNetflixUsers, country =="Brazil")

Then used the Two-sample t-test to test my hypothesis

Two-sample t-test

Result: data: AU and BRA t = 0.74813, df = 10.794, p-value = 0.4704 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:-19259.5139024.71sample estimates: mean of x mean of y14597.34714.7

I expected that the Australia region would have more than the Brazil region but according to this test the result means that it was almost equal. Meaning there isn’t a significance in the developed countries having more Premium subscriptions because Brazil(the least developed country) also has about the same Premium subscriptions as Australia.