1 Neilsen rating: TV dataset

The TV dataset is a large sample of TV viewers whose behavior had been recorded for the Neilsen ratings. This data set contains sample television audience data from Neilsen Media Research for the week starting November 6, 1995.

1.1 TV data three way table

Collapse the TV data to a 5x3 two-way table (ignore the effect of time) whereas the rows are the days of the week and the columns are networks.

data("TV", package="vcdExtra")
tv.df <- as.data.frame.table(TV)
tv.f <- tv.df[, c(1,3,4)]
tv.data <- group_by(tv.f, Day, Network)
tv.data1 <- dplyr::summarise(tv.data, count=sum(Freq))
tv.table <- dcast(tv.data1, Day ~ Network, value.var="count")
tv.table

##         Day  ABC  CBS  NBC
## 1    Monday 2847 2923 2629
## 2   Tuesday 3110 2403 2568
## 3 Wednesday 2434 1283 2212
## 4  Thursday 1766 1335 5886
## 5    Friday 2737 1479 1998

test1 <- polr(Network ~ Day, weight=Freq, data=tv.df)
summary(test1)

## 
## Re-fitting to get Hessian

## Call:
## polr(formula = Network ~ Day, data = tv.df, weights = Freq)
## 
## Coefficients:
##                 Value Std. Error t value
## DayTuesday   -0.08849    0.02814  -3.145
## DayWednesday -0.03144    0.03127  -1.006
## DayThursday   1.16947    0.02952  39.618
## DayFriday    -0.20969    0.03073  -6.824
## 
## Intercepts:
##         Value    Std. Error t value 
## ABC|CBS  -0.4935   0.0201   -24.5290
## CBS|NBC   0.5985   0.0202    29.6220
## 
## Residual Deviance: 78431.92 
## AIC: 78443.92

1.1.1 Main Effect model

Fit the main-effects model, Newtork ~ Day+ Time, with multinom(), Note that you will have to supply the weights argument because each row of TV.df represents the number of viewers in the Freq variable.

1.1.1.1 multinom

test <- multinom(Network ~ Day +Time, weight=Freq, data=tv.df)

## # weights:  48 (30 variable)
## initial  value 41318.808177 
## iter  10 value 38935.947713
## iter  20 value 38818.222728
## iter  30 value 38756.956301
## final  value 38752.186202 
## converged

summary(test)

## Call:
## multinom(formula = Network ~ Day + Time, data = tv.df, weights = Freq)
## 
## Coefficients:
##     (Intercept) DayTuesday DayWednesday DayThursday  DayFriday   Time8:15
## CBS 0.264060911 -0.2839729  -0.66995721  -0.3236498 -0.6728270 0.07101115
## NBC 0.005020656 -0.1156021  -0.01887989   1.2724501 -0.2496648 0.03790697
##        Time8:30    Time8:45   Time9:00   Time9:15   Time9:30   Time9:45
## CBS 0.076997762 -0.29429040 -0.4875213 -0.3664838 -0.4409308 -0.2440312
## NBC 0.006827139 -0.09320663 -0.2106409 -0.2609636 -0.3416990 -0.1617936
##       Time10:00  Time10:15  Time10:30
## CBS -0.29940481 -0.2677118 -0.2239947
## NBC -0.02609353  0.0566143  0.1358038
## 
## Std. Errors:
##     (Intercept) DayTuesday DayWednesday DayThursday  DayFriday   Time8:15
## CBS  0.04983795 0.03797920   0.04355769  0.04501328 0.04192254 0.06398208
## NBC  0.04832104 0.03808235   0.04004444  0.03845629 0.04015187 0.06059674
##       Time8:30   Time8:45   Time9:00   Time9:15   Time9:30   Time9:45
## CBS 0.06204563 0.06958338 0.06154875 0.06229549 0.06164592 0.06774630
## NBC 0.05898492 0.06314121 0.05595830 0.05782973 0.05724928 0.06371518
##      Time10:00  Time10:15  Time10:30
## CBS 0.06242284 0.06296253 0.06277900
## NBC 0.05700440 0.05711720 0.05687154
## 
## Residual Deviance: 77504.37 
## AIC: 77564.37

1.1.1.2 Polr(level)

#level the time
tv.df1 <- tv.df
levels(tv.df1$Time) <- c(rep("8:00-8:59",5),rep("9:00-9:59",4), rep("10:00-10:44",3))

tv.day.logit <- polr(Network ~ Day+ Time, weight=Freq, data=tv.df1)
summary(tv.day.logit)

## 
## Re-fitting to get Hessian

## Call:
## polr(formula = Network ~ Day + Time, data = tv.df1, weights = Freq)
## 
## Coefficients:
##                    Value Std. Error t value
## DayTuesday      -0.09104    0.02816  -3.233
## DayWednesday    -0.03337    0.03128  -1.067
## DayThursday      1.16384    0.02954  39.395
## DayFriday       -0.21310    0.03076  -6.928
## Time9:00-9:59   -0.09659    0.02187  -4.417
## Time10:00-10:44  0.14328    0.02631   5.446
## 
## Intercepts:
##         Value    Std. Error t value 
## ABC|CBS  -0.5030   0.0230   -21.8660
## CBS|NBC   0.5907   0.0231    25.6235
## 
## Residual Deviance: 78354.19 
## AIC: 78370.19

1.1.2 Effects plot

Prepare an effects plots for the fitted probabilities in this model

full.plot <- cbind(tv.df1, predict(tv.day.logit, type="probs"))
head(full.plot)

##         Day      Time Network Freq       ABC       CBS       NBC
## 1    Monday 8:00-8:59     ABC  146 0.3768401 0.2666802 0.3564797
## 2   Tuesday 8:00-8:59     ABC  244 0.3984462 0.2656752 0.3358786
## 3 Wednesday 8:00-8:59     ABC  233 0.3847089 0.2664302 0.3488609
## 4  Thursday 8:00-8:59     ABC  174 0.1588482 0.2016588 0.6394930
## 5    Friday 8:00-8:59     ABC  294 0.4280322 0.2627491 0.3092187
## 6    Monday 8:00-8:59     ABC  151 0.3768401 0.2666802 0.3564797

plotdat <- melt(full.plot, id.vars=c("Network", "Day", "Time","Freq"),
                measure.vars=c("ABC", "CBS","NBC") , variable.names="Time", value="Probability")
head(plotdat)

##   Network       Day      Time Freq variable     value
## 1     ABC    Monday 8:00-8:59  146      ABC 0.3768401
## 2     ABC   Tuesday 8:00-8:59  244      ABC 0.3984462
## 3     ABC Wednesday 8:00-8:59  233      ABC 0.3847089
## 4     ABC  Thursday 8:00-8:59  174      ABC 0.1588482
## 5     ABC    Friday 8:00-8:59  294      ABC 0.4280322
## 6     ABC    Monday 8:00-8:59  151      ABC 0.3768401

# the value is from 0 to 1000, i only took 100 to 500 for better visualization.
gg <- ggplot(plotdat, aes(x=Freq, y=value, colour=variable)) +geom_line (size=2.5) + theme_bw() +
      xlim(80,500)+ geom_point(color="black", size=1.5) +facet_grid(Network~Day) +labs (title="TV stations view Probability Graph")
direct.label(gg)

plot(Effect("Day", tv.day.logit))

## 
## Re-fitting to get Hessian

1.1.3 Interpret result

Interpret these results in comparison to the correspondence analysis in part (a)

From the summary we can see detailed description such as, * one- unit increase in DayTUesday will decrease the log odds of cbs vs abc 0.28

one- unit increase in DayWednesday will decrease the log odds of cbs vs abc 0.66 …..

in general, we can see cbs is lower than abc and nbc in Monday, Tuesday, Wednsday and Friday. abc has exceptionally higher view’s on Friday. and on Thursday Nbc dominates the chanel, and ABC has the littlest viewers.

anly 545_hw7

Fangya Tan

March 26, 2018