(Playing along at home? Download the .Rmd for a list of libraries you’ll need!)

1 Background

In February of 2017, the Super Bowl took place in Houston, Texas, for the first time since 2004. While the event is ostensibly about football, cities often plan for years ahead of time before hosting the Big Game in order to effectively handle the influx of sports fans, celebrities, major media companies, high rollers, and haggard townies looking to party in the national spotlight.

The TABC (Texas Alcoholic Beverage Commission) Mixed Beverage Tax is a fixed tax charged on alcoholic beverages sold by a holder of a Mixed Beverage License in the state of Texas. There exists a record of how much every license holder pays per month, and since the tax is fixed, you can calculate the revenue upon which the tax was paid very simply. This data is also public, which is wonderful. With it, you can do all sorts of things, including, I hope, estimate the effect of hosting the Super Bowl on bar sales. For this, we’ll need a big set of data in addition to just February 2017’s sales, let’s start with 6 years and see how it goes.

If you’re inclined, you can get the data for yourself here: * https://www.comptroller.texas.gov/transparency/open-data/search-datasets/

Or, you can pull it from a few related projects on my github, http://github.com/ianwells.

Let’s get started! What does the data look like?

2 Data

tabc <- read.csv('tabc.csv')
tabc$date <- as.Date(tabc$date)
tabc$rev <- as.numeric(as.character(tabc$rev))
tabc$yoy_rev <- as.numeric(as.character(tabc$yoy_rev))
tabc$mom_rev <- as.numeric(as.character(tabc$mom_rev))
tabc$lat <- as.numeric(as.character(tabc$lat))
tabc$lng <- as.numeric(as.character(tabc$lng))

tabc$rev[tabc$rev == 0]<-NA
tabc$yoy_rev[tabc$yoy_rev == 0]<-NA
tabc$mom_rev[tabc$mom_rev == 0]<-NA

tabc$yoy_diff <- tabc$rev - tabc$yoy_rev
tabc$mom_diff <- tabc$rev - tabc$mom_rev

tabc.sb <- filter(tabc, date == as.Date('2017-02-01'))
kable(head(sample_n(tabc,6)))

Here we have revenues for a few random bars and months within the data. We also do a couple special things - we convert the address on file for the license holder to a latitude-longitude so that we can map it, and we also estimate the neighborhood the bar is in by seeing which neighborhood-polygon (generously provided by http://zetashapes.com/) it lies in.

To show you what this data looks like for a particular bar, let’s pick on Poison Girl, an excellent bar in the Montrose neighborhood. Great bourbon selection.

ggplot(data = filter(tabc,name == 'POISON GIRL'), mapping = aes(x=date,y=rev)) + geom_line(color='RED')

Not bad! Poison girl, which opened well before 2011 by the way, saw increasing revenues over much of the last six years, but didn’t see much of a change during the Super Bowl.

Let’s look at somebody a little more Super Bowl-y, like say, The Four Seasons Hotel, a prestigious accommodation in downtown Houston.

ggplot(data = filter(tabc,name == 'SCLV HOUSTON BEVERAGE LLC'), mapping = aes(x=date,y=rev)) + geom_line(color = 'RED')

That’s what a bump in sales from the Super Bowl looks like. You’ll notice that sometimes the name in the TABC records doesn’t exactly match the name of the business - you can bust out a map and see where the address of record is and find a more familiar name. Trust me that ‘SCLV HOUSTON BEVERAGE LLC’ is the Four Seasons.

Alright, let’s do some math.

3 Analysis

So I’d like to answer a couple questions here:

How much extra bar money came into the city for the Super Bowl? Where was the most extra money spent? *What portion of the extra money went to the little guy?

My approach is going to look like this:

Make a model for bar sales, per bar. Examine the year-over-year, month-over-month, and vs-model change in revenues for the whole city for February. Figure who captured that change in revenues, and make some plots Interpret to see what kind of establishment made the most extra money

Here’s what bar sales in Houston look like:

tabc.m <- group_by(tabc,date)
tabc.m.s <- summarize(tabc.m,sum(rev,na.rm = TRUE))
names(tabc.m.s) <- c('date','rev')

ggplot(data = tabc.m.s, mapping = aes(x=date,y=rev)) + geom_line(color = 'RED')

Just looking at total bar sales for the whole city, you probably wouldn’t notice anything special about February 2017. What we can do though, is see about forecasting sales for February 2017, and if the real sales are different, we can try to attribute that to the Super Bowl.

One way to forecast sales is to just take a comparable month (say from a year ago, to accommodate seasonal changes month-to-month) and call it your forecast sales. For a mature bar, you might reason, February this year should look the same as February last year, if you assume nothing has changed. If you know that one thing is different between the two months, like say the Super Bowl happened, then maybe you can attribute any difference solely to the influence of the Super Bowl.

This is, of course, a poor way to predict sales. A lot of things can change in a year, not to mention a bar might not have even been open in February 2016. Even if you look at the whole city’s sales things can change a lot year-over-year, Super Bowls notwithstanding.

Another way to do it is just by looking at month-over-month sales, but that only works if you can expect every month to sort of look the same as a baseline. This of course isn’t true for every month, since bar sales are very seasonal, with major drinking holidays or recurring events causing predictable spikes in sales. Some months are more similar than others though - you could do worse than comparing your Februaries to last August, for example, but this really depends on the bar.

A better way would be to use some sort of model to forecast sales for a given month. We’ve used ETS models before, which take into account seasonal effects as well as long-term trends in data to come up with a fairly solid prediction for sales. We can argue that if our sales don’t match the model, it’s due to some exogenous thing that occurred that month, which of course would be the Super Bowl.

A disadvantage there is that we need a few months data to get a good model built that way, so we won’t be able to use it for very new bars. For those, we’ll use either year-over-year or month-over-month numbers.

4 Finding Excess Sales

Alright. So how much extra money got spent in February 2017, vs February 2016?

tabc.m.s[74,] - tabc.m.s[73,] #12.1 million extra over Jan 2017
##      date      rev
## 1 31 days 12120668
tabc.m.s[74,] - tabc.m.s[62,] #9.1 million extra over Feb 2016
##       date     rev
## 1 366 days 9096839
tabc.m.s.pre <- filter(tabc.m.s,date <= '2017-1-01')
tabc.m.s.post <- filter(tabc.m.s,date >= '2017-2-01') #this is just to illustrate, it can be off by one month

tabc.m.ts <- ts(tabc.m.s.pre$rev, start = c(2011, 1), frequency = 12)
tabc.m.post.ts <- ts(tabc.m.s.post$rev, start = c(2017, 2), frequency = 12)


plot(stl(tabc.m.ts, s.window="periodic"))

Took a quick break to show you the periodic components in the revenue signal. Check out the broad upward trend, in addition to the seasonal peaks. Now let’s make the model.

tabc.ets <- ets(tabc.m.ts, model = 'ZZZ')
forecast.ets <- predict(tabc.ets, n.ahead = 1, prediction.interval = T, level = 0.9)

plot(forecast.ets); points(tabc.m.post.ts,col='red',lwd = 2)

The full forecast, let’s zoom in:

plot(forecast.ets,xlim=c(2016,2017.5)); points(tabc.m.post.ts,col='red',lwd = 2)

That red circle is the real sales figure for Feburary 2017, the blue line is the prediction not including the Super Bowl, and the grey bars are the level that the model is 90% sure the prediction fits in. Notice how the red circle sits confidently above the grey bars - we’re more than 90% sure the real Feburary 2017 is a statistical outlier, a good indication that the Super Bowl did impact bar sales. Can we get a number for that?

effect.ets <- tabc.m.post.ts - forecast.ets$mean
effect.ets
##          Feb
## 2017 8965273

There you go, about 8.9 million extra dollars, according to our ETS model. A little less than the year-over-year difference, probably due to a slight upward trend in bar sales over that time frame. Also worth point out is that ETS tells us there’s a bit of a range on that number - a few million in either direction.

So, where did it go? Where did that money get spent? Let’s see if we can do that again, per bar. We’ll calculate a composite difference - we’ll use the ETS model if we can, followed by the year-over-year difference if we have it, resorting to month-over-month as a last resort. We’ll call that number - the extra cash a bar made during February 2017 - the Composite Difference, or composite_diff. The raw revenue for that month we’ll call rev.

bars <- unique(tabc.sb$name) #fun fact: 1814 unique licenseholders in Houston during the SB
bars.effect <- data.frame(matrix(nrow=length(bars),ncol=2))
names(bars.effect) <- c('name','model_diff')

for (i in 1:length(bars)) 
  {
  this_bar <- bars[i]
  tabc.b <- filter(tabc, name == this_bar)
  tabc.b.pre <- filter(tabc.b,date <= '2017-1-01')
  tabc.b.post <- filter(tabc.b,date >= '2017-2-01')

  tryCatch({
    tabc.b.ts <- ts(tabc.b.pre$rev, start = c(2011, 1), frequency = 12)
    tabc.b.post.ts <- ts(tabc.b.post$rev, start = c(2017, 2), frequency = 12)

    tabc.b.ets <- ets(tabc.b.ts, model = 'ZZZ')
    forecast.b <- predict(tabc.b.ets, n.ahead = 1, prediction.interval = T, level = 0.9)
    data.frame(forecast.b)
    effect <- tabc.b.post.ts - forecast.b$mean
    bars.effect[i,]<-c(as.character(this_bar),effect[1])
  },warning=function(w){bars.effect[i,]<-c(as.character(this_bar),NA)
  },error=function(e) {bars.effect[i,]<-c(as.character(this_bar),NA)
  })
}

bars.effect$model_diff <- as.numeric(bars.effect$model_diff)

tabc.sb <- left_join(tabc.sb, bars.effect,by= "name")
tabc.sb$composite_diff <- coalesce(tabc.sb$model_diff,tabc.sb$yoy_diff,tabc.sb$mom_diff) 

tabc.sb.o <- arrange(tabc.sb, desc(composite_diff))

ggplot(tabc.sb.o, aes(composite_diff)) + geom_density() + xlim(-10000,250000)

That composite difference is pretty spread out - this distribution has a long, long tail. If you saw an extra $25,000 during the Super Bowl, you did better than nearly 95% of the bars in Houston.

ggmap(get_map(location = "houston", zoom = 12)) + geom_density2d(data = filter(tabc.sb, composite_diff > 100000), 
    aes(x = lng, y = lat), size = 0.2) + stat_density2d(data = filter(tabc.sb, composite_diff > 100000), 
    aes(x = lng, y = lat, fill = ..level.., alpha = 0.01), size = 0.01, 
    bins = 16, geom = "polygon") +
    scale_alpha(range = c(0, 0.3), guide = FALSE)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=houston&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=houston&sensor=false

This is where the big money went, geographically speaking - no surprises here, those are also major concentrations of bars and fun stuff.

Let’s see who those bars are - the fortunate few who really raked in the cash.

5 Results

kable(select(head(tabc.sb.o,100),name,composite_diff,rev))
name composite_diff rev
ARAMARK SPORTS & ENTERTAINMENT 951025.97 1001301.94
UNDERDOGS SPORTS PUB 700337.95 742274.93
ARAMARK SPORTS & ENTERTAINMENT 449250.00 548942.99
B & B BUTCHERS 434117.01 738415.97
HILTON AMERICAS - HOUSTON 385909.42 1169312.99
ROSE CLUB 312567.02 657964.93
WESTIN HOTEL MEMORIAL CITY 297004.03 393892.99
ARAMARK SPORTS AND ENTERTAINME 291438.06 422144.93
SCLV HOUSTON BEVERAGE LLC 235174.03 556938.96
TREASURES 230877.85 486988.96
AXELRAD BEER GARDEN 219447.01 364145.97
ART OF THE SPIRITS INC 218657.01 263915.97
LITTLE WOODROW’S KIRBY ICE HOU 193071.94 515144.93
TMC BEVERAGES OF HOUSTON INC 179075.79 499580.90
MARRIOTT MARQUIS HOUSTON 174584.03 812937.91
PAPPAS BROS GRILL 171441.94 533973.88
THE GROVE 171288.91 408612.99
DEL FRISCO’S DOUBLE EAGLE STEA 168806.66 434622.99
IRISH COWBOY 167694.92 352933.88
XOCHI 162161.94 183064.93
CONCRETE COWBOY HOUSTON 160934.93 324324.93
HOTEL ZA ZA 156826.72 511943.88
CHILI’S CHI01 140140.00 283930.90
1720 MAIN RECEPTION HALL 135830.00 241861.94
MCINTYRE’S 135652.98 309495.97
PAPPAS BROS STEAKHOUSE 131270.34 565896.87
REPUTATION 124888.06 266774.93
PROSPECT PARK 123094.03 299701.94
PAPPADEAUX SEAFOOD KITCHEN 117768.95 139617.91
AURA DYNAMIC NIGHTLIFE 117498.06 277852.99
PHM BEVERAGE INC 113472.07 224438.96
THE DUNLAVY 113324.92 196317.91
VIC & ANTHONY’S STEAKHOUSE 109285.43 514420.90
STEAK 45 105828.96 726250.90
LE GRAND COMPTOIR 103507.02 224490.90
MR PEEPLES RESTAURANT & ENTERT 102487.01 149207.91
ROYAL SONESTA HOTEL HOUSTON 97656.73 233208.96
BOOTS ’N SHOOTS 91041.94 257427.91
GROTTO RISTORANTE 84134.03 220160.90
PHILIPPE RESTAURANT AND LOUNGE 83557.02 180704.93
806 HOSPITALITY SERVICES, LLC 82701.94 225927.91
CHI CHI’S RICHMOND 80311.05 300948.96
MAJORCA BISTRO & TAPAS 71708.96 239070.90
THE REMINGTON GRILL 70218.87 203718.96
LITTLE WOODROW’S EAST END 69510.01 171952.99
STEREO LIVE 68374.88 207901.94
MORTON’S OF CHICAGO 63875.03 221483.88
THE PUB GALLERIA 63358.95 77081.94
HUNGRY’S CAFE & BISTRO 62912.99 79300.90
BRASSERIE DUE PARC 62697.01 65303.88
WAREHOUSE LIVE 59934.85 151886.87
LITTLE WOODROW’S IN MIDTOWN 59291.05 483138.96
EDDIE V’S #8509 59235.13 204746.87
LUCKY’S PUB 58694.03 177800.90
MO’S IRISH PUB 57720.00 158700.90
DHR WEST LOOP BEVCO, INC. 55960.67 159640.00
THE FLYING SAUCER DRAUGHT EMPO 54334.86 227605.97
BELVEDERE 52343.46 160552.99
CAFE TOULOUSE 51562.09 155962.99
PAPPASITOS CANTINA 51417.01 226785.97
TRULUCK’S STEAK & STONE CRAB 51279.62 299780.90
CLUB ONYX HOUSTON 49315.61 123537.91
THE ALLEY KAT 48401.94 95110.90
MILE HIGH GENTLEMEN’S CLUB 48103.13 69470.00
THE ASTORIAN 48102.09 83628.96
BB’S CAFE 47470.90 49510.90
BAYOU CITY EVENT CENTER 44665.97 64271.94
THE BELLTOWER ON 34TH 44285.36 89450.90
SC BEVERAGE COMPANY 44045.97 124880.90
THE MOONSHINERS 43759.10 119200.00
LE COLONIAL RESTAURANT 42762.09 371495.97
MCGONIGEL’S MUCKY DUCK 40462.99 50288.96
SALC, INC. 40352.67 147032.99
DAILY GRILL 40053.37 110275.97
HAROLD’S 39800.00 70992.99
PISTOLEROS 39465.97 90324.93
THE OCEANAIRE SEAFOOD ROOM 39106.53 147540.00
CATTLEMAN’S CLUB/CACTUS JACKS’ 38803.91 134993.88
THE COMMONER/THE BOULEVARDIER 38508.06 109968.96
FOUR FAMILIES OF HOUSTON 38025.95 720426.87
SWAGGER 37536.51 123244.93
SULLIVANS STEAKHOUSE 37164.02 110120.90
OGDEN’S POUR SOCIETY 36498.06 96540.00
VALENCIA ORANGES IV, LLC 36038.06 156194.93
CENTERFOLDS 35812.75 224201.94
BRENNANS OF HOUSTON INC 34881.15 319962.99
MOLLY’S PUB 34775.97 74310.90
BAR 5015 34603.12 112605.97
THE FEDERAL GRILL 34548.06 96340.00
HYATT REGENCY HOUSTON GALLERIA 33801.94 74437.91
SKYBOX SPORTS BAR 33728.95 139721.94
ELEVATED EVENTS LLC 33472.99 84732.99
MAGNOLIA SPIRITS HOUSTON INC 33452.57 98560.90
THE CAPITAL GRILLE #8007 33234.03 181072.99
BELGIUM BEER BAR BEL01 33094.03 44788.96
THE DUCK OFF SPORTS BAR 32629.10 38000.00
GATEWAY BEVERAGES, INC. 32267.91 89236.87
CASA BLANCA 31955.07 37351.94
YARD HOUSE 31809.80 362070.00
HOSPITALITY INTERNATIONAL INC 31625.08 79742.99

And there you have it! The top 100 money-making liquor license holders in Houston during the Super Bowl.

Aramark is the company that has the catering and liquor license for NRG Stadium and a lot of the other official Super Bowl events around town. Underdogs Sports Pub is a small local bar that somehow handled the liquor for Club Nomadic, a traveling superclub that hosted a Taylor Swift concert during Super Bowl weekend as well as a number of other high-profile events. Rose Club is otherwise known in town as Clé. SCLV Houston Beverage LLC is the Four Seasons Hotel. Treasures is a gentleman’s club that Drake sings about sometimes (TMC Beverages of Houston is also known as The Men’s Club). I can’t explain how a Chili’s got on the list, but that Chili’s must have been pretty lit.

Let’s use this to answer that question we had - how much of this extra money went to big companies, which make money on the Superbowl every year? We’ll focus on the top 20 bars.

top20 <- tabc.sb.o[1:20,]
top20$local <- TRUE

corps <- c(1,2,3,5,7,8,9,15)

top20$local[corps] <- FALSE

plot <- ggplot(top20, aes(x="", y=composite_diff, fill=local)) +
geom_bar(width = 1, stat = "identity") + 
  coord_polar("y", start=0)

plot

corps.made <- sum(tabc.sb.o[corps,12]) #3486243
all.made <- sum(tabc.sb.o[1:20,12]) #6139314

So for the Top 20 bars, which made about 6 million of the extra dough, 8 of them were non-local and they made about 55% of the excess revenues.

Fun! That’s all we’re going to do today.

6 Disclaimers

6.1 TABC

The Comptroller of Public Accounts cannot vouch for the data or analysis derived from data after it has been retrieved from the Comptroller’s Web site.

Some numbers in the R may vary slightly from the numbers in the text, as some models are rolled at render-time and may involve some randomness.

7 License

The MIT License (MIT)

Copyright (c) 2017 Ian Wells

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.