During the 2000’s, Sunday nights hosted one of the most popular television shows of all time, HBO’s The Sopranos. Breaking records for premium networks, the Sopranos influenced several shows to follow. Using data from the internet movie database I will run a linear regression on data based on the rankings of season 3 and the viewers the show received.

The variable views will host data resembling the amount of viewership over 13 episodes for season 3. Next, the variable rank will host the ranking each episode received from IMDB.

The goal is to see if the ranking and viewership are aligned.

#U.S. Viewers (millions)
(views <- c(11.26,11.35,8.37,7.96,7.40,8.44,9.21,8.60,8.64,8.44,8.79,5.81,9.46))
##  [1] 11.26 11.35  8.37  7.96  7.40  8.44  9.21  8.60  8.64  8.44  8.79
## [12]  5.81  9.46
mean(views) #average viewership
## [1] 8.748462
sd(views) #standard deviation
## [1] 1.450252
#IMDB ranking
(rank <- c(8.7,8.7,8.6,9.0,8.7,8.7,8.6,8.5,8.4,8.6,9.7,9.1,9.0))
##  [1] 8.7 8.7 8.6 9.0 8.7 8.7 8.6 8.5 8.4 8.6 9.7 9.1 9.0
mean(rank) #average episode ranking
## [1] 8.792308
sd(rank) #standard deviation
## [1] 0.3402488
cat("Sopranos Season 3","\n")
## Sopranos Season 3
(s3 <- data.frame(views,rank))
##    views rank
## 1  11.26  8.7
## 2  11.35  8.7
## 3   8.37  8.6
## 4   7.96  9.0
## 5   7.40  8.7
## 6   8.44  8.7
## 7   9.21  8.6
## 8   8.60  8.5
## 9   8.64  8.4
## 10  8.44  8.6
## 11  8.79  9.7
## 12  5.81  9.1
## 13  9.46  9.0
#Fitting Linear Models
s <- lm(rank ~ views, data = s3)
summary(s)
## 
## Call:
## lm(formula = rank ~ views, data = s3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3969 -0.2054 -0.1054  0.1742  0.9095 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.16395    0.61630  14.869 1.25e-08 ***
## views       -0.04248    0.06957  -0.611    0.554    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3495 on 11 degrees of freedom
## Multiple R-squared:  0.03278,    Adjusted R-squared:  -0.05514 
## F-statistic: 0.3729 on 1 and 11 DF,  p-value: 0.5539
#Scatterplot
plot(s3$rank ~ s3$views)
abline(s)

Based off the linear regression it seems that the viewership does not align with the ranking received. The highest ranked show actually falls between the 8 and 9 million views while the episodes where viewership past 11 million the rankings were below 8.8