Introduction

Every month, the Victorian Commission for Gambling and Liquor Regulation (VCGLR) publishes a stocktake of active liquor licences in Victoria. Licences are granted under certain categories, covering production, pre-retail, BYO, and various service and/or supply types.

In March 2020, the VCGLR recorded 23,276 active licences across Victoria.

Pubs, hotels and taverns

Of the 23,276 active licences, almost 10%, or 2,165, were General Licence or Late Night (general) Licence types.
According to the VCGLR, “venues such as pubs, hotels and taverns often hold a general liquor licence” (VCGLR 2020).
The General Licence may allow the licensee to trade up to 1am, depending on the hours specified in the licence.
The Late Night (general) Licence allows the licensee to trade after 1am.

Locations

The locations of the venues are provided in the dataset.

What affect does the location have on the proportion of general licence holders?
Where there are general licence holders, does the relative affordability of the area have any bearing on hours of trade?
Are the pubs, hotel and taverns in your area more or less likely to be trading after 1am?

Problem Statement

Is there an association between general licence category and median house price category?

Method

Choose two sources relevant to the same time period (March 2020), for licence data and house price data.
Convert house price values to a discretised categorical variable to handle extreme outliers.
Take a sample of the preprocessed data, by matching the data by suburb and keeping only matching rows.
Table the data and plot it in a barchart to see the proportions of each kind of general licence across each house price category.
Determine a hypothesis test, where
1. H0 : There is no association in the population between general licence category and median house price category.
2. HA: There is an association in the population between general licence category and median house price category.
Apply a chi-square test of association to test for a statistically significant association between general licence category and median house price category.
Compare the observed χ2 value to the χ2crit critical value.
Find the p-value of the sample χ2, assuming there is no association in the population.
Reject, or fail to reject, H0.

Data

Source 1 and selected variables

Victorian Commission for Gambling and Liquor Regulation (VCGLR) 2020, Current Victorian Liquor Licences as of 31/03/2020
Licence Num: (chr) license identification number
Suburb: (chr) suburb of the venue address
Category: (chr) licence category

Source 2 and selected variables

Department of Environment, Land, Water and Planning (DELWP) 2020, March 2020 quarter: Houses by Suburb
Suburb: (chr) suburb within Victoria
Jan - Mar 19: (num) median sale price of houses within the 1st quarter of 2020 (read in as Q1 2020).

Preprocessing

Read in Source 1 and factor variables
1. Suburb as an unordered factor with 1573 levels
2. Category as an unordered factor with 14 levels, representing 14 categories of licence.
Read in Source 2 and factor variables
1. Suburb as an unordered factor with 754 levels
Filter Source 1 by Category to keep only General Licence and Late night (general) Licence factor levels.
Check variables for standard and non-standard missing values.
Discretise Source 2 variable Q1 2020 by equal frequency, and store as a vector. This is to convert the house price values to a categorical variable and dampen the effect of outliers. This is done prior to merging, to avoid bias. After merging, the same suburb may be represented more than once, or not at all, which would affect the frequency distribution of the values across bins.
Column bind the “bins” vector to Source 2 variables as a new column variable.
Rename Source 2 variables
1. Q1 2020 renamed as Median House Price
2. “bins” variable renamed as Price Category

(Preprocessing continued)

Group Source 2 by Price Category and summarise by min and max values of Median House Price. This is to reveal the price ranges of the discretised values.
Factor Price Category as an ordered factor, labeling by price ranges.
Merge Sources 1 and 2 by Suburb using inner_join. Store as Liquor_Houses dataframe (1559 observations over 5 variables).
Re-factor Liquor_Houses variables to remove empty levels. Variables are now:
1. Licence Num: character variable of the license identification number
2. Suburb: factor with 456 levels corresponding to suburbs in Victoria
3. Category: factor with 2 levels; General Licence, and Late night (general) Licence
4. Median House Price: numerical variable of the median sale price of houses in the first quarter of 2020, in Australian dollars.
5. Price Category: Ordered factor with 9 levels corresponding to price ranges based on the equal frequency discretised values of Median House Price

Descriptive Statistics and Visualisation

The data can now be arranged in a proportional table of Price Category by licence Category

# table data

table <- table(Liquor_Houses$`Price Category`, Liquor_Houses$Category)
prop_table <- table %>% prop.table(margin = 2) %>% round(3)
knitr::kable(prop_table)

	General Licence	Late night (general) Licence
<293k	0.132	0.050
293-397k	0.160	0.097
398-490k	0.098	0.039
491-575k	0.057	0.086
576-652k	0.066	0.064
653-789k	0.053	0.081
790-990k	0.069	0.084
991k-1.3m	0.146	0.209
>1.3m	0.218	0.290

Important Variables

Median House Price numeric data is right-skewed with many outliers.
Conversion to categorical by discretisation by equal frequency dampens the affect of outliers.
The resulting Price Category variable is an intuitive way of understanding the relative affordability of an area.

boxplot((Houses_02$`Median House Price` / 1000000), main = "Median House Price of 754 suburbs", ylab = "Median Price in Millions AUD", col = "beige")

In the pre-merged data, each Price Category value captures the median sale price of houses for between 80 to 90 suburbs
After merging with licence data, the frequency distribution of price categories has changed, as some suburbs are represented more than once in the merged data, and some not at all.

barplot(table(Liquor_Houses$`Price Category`), horiz = TRUE,  col = "beige", ylab = "Price Category", main = "Frequency Distribution of Licences within Price Categories")

Licence Category is a factored variable with 2 levels.
1. General Licence - licensees may trade up to 1am, depending on the terms of the licence.
2. Late night (general) Licence - licensees may trade after 1am, and potentially 24 hours depending on the terms of the licence.

barplot(table(Liquor_Houses$Category), col = "beige", ylab = "Total Licences", main = "Frequency Distribution of Licences within Licence Categories")

Licence Category by Price Category

The barplot shows the proportional distribution of licences across each house price category.
In this sample, the distribution across house price categories of venues that trade up to 1am, and those that trade after 1am, are not the same.
A late night pub is more likely to be located in an expensive suburb than an inexpensive suburb.

barplot(prop_table, main = "General Licences by House Price", beside = TRUE, col = brewer.pal(9, name = "OrRd"), ylab = "Proportion within Group", ylim = c(0,0.4), legend = rownames(prop_table), args.legend = list(x = "top", title = "Median Sale Price", bty = "n", ncol = 3 ))      #                               
grid(NA, NULL)

Hypothesis Testing

Chi-square test of association

To test for an association between categorical variables of *Price Category and licence Category**, a chi-square test of association is used, where:
1. H0 : There is no association in the population between general licence category and median house price category.
2. HA: There is an association in the population between general licence category and median house price category.

# apply the test

chi <- chisq.test(table(Liquor_Houses$`Price Category`, Liquor_Houses$Category))
chi

## 
##  Pearson's Chi-squared test
## 
## data:  table(Liquor_Houses$`Price Category`, Liquor_Houses$Category)
## X-squared = 56.394, df = 8, p-value = 2.364e-09

The p-value reported in the test is less than 0.001.
This will be enough to reject the null hypothesis, but we need to check assumptions first.

Check assumptions of the test

As the Chi-square test of association assumes that no more than 25% of the cells have expected counts below 5, the expected counts are checked to ensure that the condition is met.
All expected counts are greater than 5.

# check expected counts

knitr::kable(chi$expected)

	General Licence	Late night (general) Licence
<293k	136.24118	40.75882
293-397k	174.72739	52.27261
398-490k	101.60359	30.39641
491-575k	76.20269	22.79731
576-652k	78.51187	23.48813
653-789k	71.58435	21.41565
790-990k	86.97883	26.02117
991k-1.3m	192.43105	57.56895
>1.3m	281.71905	84.28095

Critical value

The Chi-square statistic = 56.394, which is the degree of discrepancy between the observed and expected counts under the null hypothesis.
To compare this discrepancy value to a critical value for a 95% probability, we first need to calculate the degrees of freedom for the X2 distribution.

Degrees of freedom for the chi-square test are calcuated as:

\[df = (r - 1)(c - 1)\] where r = rows, and c = columns of the tabled data.

Therefore, the degrees of freedom for this chi-square test are:

\[df = (9 − 1)(2 − 1) = 8\]

Now we can find the one-tailed critical value, χ2crit, associated with α = 0.05 and df = 8:

# calculate X2 critical value

qchisq(p = 0.95,df = 8)

## [1] 15.50731

The X2 critical value is 15.50731.
As 56.394 > 15.507, the null hypothesis is rejected.

Discussion

A Chi-square test of association was used to test for a statistically significant association between general licence category and the median house price category in Victoria.
The results of the test found a statistically significant association, χ2 = 56.394 ,p < 0.001.
The results of this study suggest that venues with a Late night (general) Licence are more likely to operate in suburbs where house prices are higher when compared to venues with a General Licence.
While the relative affordability of the area has been used as a predictor variable, there are other factors that could affect the location of late night venues. Some of these factors could be:
1. population density - venues are more likely to operate longer hours where there are more customers
2. crime statistics - venues may be reluctant to operate overnight in suburbs where crime is more prevalent
3. employment demographic - venues may get more or less customers depending on the local employment demographic

Conclusion

Late night (general) Licence holders are statistically more likely to operate in the most expensive suburbs.
General licences are often held by pubs, bars and taverns.
Therefore, you are more likely to find an all night pub in the most expensive suburbs of Victoria.

References

Data

Department of Environment, Land, Water and Planning (DELWP) 2020, March 2020 quarter: Houses by Suburb, viewed 10 October 2020, https://www.propertyandlandtitles.vic.gov.au/__data/assets/excel_doc/0035/485936/Houses_1st_Quarter_2020.xls

Victorian Commission for Gambling and Liquor Regulation (VCGLR) 2020, Current Victorian Liquor Licences as of 31/03/2020, viewed 10 October 2020, https://www.vcglr.vic.gov.au/sites/default/files/current_victorian_licences_by_location_geo-coded_march_2020.xlsx

Information

Victorian Commission for Gambling and Liquor Regulation (VCGLR) 2020, Victorian Liquor Licences by Location, viewed 20 October 2020, https://www.vcglr.vic.gov.au/resources/data-and-research/liquor-data/liquor-licences-location

Victorian Commission for Gambling and Liquor Regulation (VCGLR) 2020, Victorian Liquor Licences by Category, viewed 23 October 2020, https://www.vcglr.vic.gov.au/resources/data-and-research/liquor-data/liquor-licences-category

Victorian Commission for Gambling and Liquor Regulation (VCGLR) 2020, Liquor Licences by Licence Category, viewed 23 October 2020, https://www.vcglr.vic.gov.au/sites/default/files/07_-_vcglr_renewable_licences_historic_2.pdf

Discretize Numeric Variables n.d, viewed 23 October 2020, https://recipes.tidymodels.org/reference/discretize.html

General liquor licences and the median sale price of houses

A Chi-square test of association

RPubs link information