General liquor licences and the median sale price of houses
A Chi-square test of association
April Coulter s3869101
Last updated: 25 October, 2020
Introduction
Every month, the Victorian Commission for Gambling and Liquor Regulation (VCGLR) publishes a stocktake of active liquor licences in Victoria. Licences are granted under certain categories, covering production, pre-retail, BYO, and various service and/or supply types.
In March 2020, the VCGLR recorded 23,276 active licences across Victoria.
Pubs, hotels and taverns
- Of the 23,276 active licences, almost 10%, or 2,165, were General Licence or Late Night (general) Licence types.
- According to the VCGLR, “venues such as pubs, hotels and taverns often hold a general liquor licence” (VCGLR 2020).
- The General Licence may allow the licensee to trade up to 1am, depending on the hours specified in the licence.
- The Late Night (general) Licence allows the licensee to trade after 1am.
Locations
The locations of the venues are provided in the dataset.
- What affect does the location have on the proportion of general licence holders?
- Where there are general licence holders, does the relative affordability of the area have any bearing on hours of trade?
- Are the pubs, hotel and taverns in your area more or less likely to be trading after 1am?
Problem Statement
- Is there an association between general licence category and median house price category?
Method
- Choose two sources relevant to the same time period (March 2020), for licence data and house price data.
- Convert house price values to a discretised categorical variable to handle extreme outliers.
- Take a sample of the preprocessed data, by matching the data by suburb and keeping only matching rows.
- Table the data and plot it in a barchart to see the proportions of each kind of general licence across each house price category.
- Determine a hypothesis test, where
- H0 : There is no association in the population between general licence category and median house price category.
- HA: There is an association in the population between general licence category and median house price category.
- Apply a chi-square test of association to test for a statistically significant association between general licence category and median house price category.
- Compare the observed χ2 value to the χ2crit critical value.
- Find the p-value of the sample χ2, assuming there is no association in the population.
- Reject, or fail to reject, H0.
Data
Source 1 and selected variables
Source 2 and selected variables
- Department of Environment, Land, Water and Planning (DELWP) 2020, March 2020 quarter: Houses by Suburb
- Suburb: (chr) suburb within Victoria
- Jan - Mar 19: (num) median sale price of houses within the 1st quarter of 2020 (read in as Q1 2020).
Preprocessing
- Read in Source 1 and factor variables
- Suburb as an unordered factor with 1573 levels
- Category as an unordered factor with 14 levels, representing 14 categories of licence.
- Read in Source 2 and factor variables
- Suburb as an unordered factor with 754 levels
- Filter Source 1 by Category to keep only General Licence and Late night (general) Licence factor levels.
- Check variables for standard and non-standard missing values.
- Discretise Source 2 variable Q1 2020 by equal frequency, and store as a vector. This is to convert the house price values to a categorical variable and dampen the effect of outliers. This is done prior to merging, to avoid bias. After merging, the same suburb may be represented more than once, or not at all, which would affect the frequency distribution of the values across bins.
- Column bind the “bins” vector to Source 2 variables as a new column variable.
- Rename Source 2 variables
- Q1 2020 renamed as Median House Price
- “bins” variable renamed as Price Category
(Preprocessing continued)
- Group Source 2 by Price Category and summarise by min and max values of Median House Price. This is to reveal the price ranges of the discretised values.
- Factor Price Category as an ordered factor, labeling by price ranges.
- Merge Sources 1 and 2 by Suburb using inner_join. Store as Liquor_Houses dataframe (1559 observations over 5 variables).
- Re-factor Liquor_Houses variables to remove empty levels. Variables are now:
- Licence Num: character variable of the license identification number
- Suburb: factor with 456 levels corresponding to suburbs in Victoria
- Category: factor with 2 levels; General Licence, and Late night (general) Licence
- Median House Price: numerical variable of the median sale price of houses in the first quarter of 2020, in Australian dollars.
- Price Category: Ordered factor with 9 levels corresponding to price ranges based on the equal frequency discretised values of Median House Price
Descriptive Statistics and Visualisation
The data can now be arranged in a proportional table of Price Category by licence Category
| <293k |
0.132 |
0.050 |
| 293-397k |
0.160 |
0.097 |
| 398-490k |
0.098 |
0.039 |
| 491-575k |
0.057 |
0.086 |
| 576-652k |
0.066 |
0.064 |
| 653-789k |
0.053 |
0.081 |
| 790-990k |
0.069 |
0.084 |
| 991k-1.3m |
0.146 |
0.209 |
| >1.3m |
0.218 |
0.290 |
Important Variables
- Median House Price numeric data is right-skewed with many outliers.
- Conversion to categorical by discretisation by equal frequency dampens the affect of outliers.
- The resulting Price Category variable is an intuitive way of understanding the relative affordability of an area.

- In the pre-merged data, each Price Category value captures the median sale price of houses for between 80 to 90 suburbs
- After merging with licence data, the frequency distribution of price categories has changed, as some suburbs are represented more than once in the merged data, and some not at all.

- Licence Category is a factored variable with 2 levels.
- General Licence - licensees may trade up to 1am, depending on the terms of the licence.
- Late night (general) Licence - licensees may trade after 1am, and potentially 24 hours depending on the terms of the licence.

Licence Category by Price Category
The barplot shows the proportional distribution of licences across each house price category.
In this sample, the distribution across house price categories of venues that trade up to 1am, and those that trade after 1am, are not the same.
A late night pub is more likely to be located in an expensive suburb than an inexpensive suburb.
barplot(prop_table, main = "General Licences by House Price", beside = TRUE, col = brewer.pal(9, name = "OrRd"), ylab = "Proportion within Group", ylim = c(0,0.4), legend = rownames(prop_table), args.legend = list(x = "top", title = "Median Sale Price", bty = "n", ncol = 3 )) #
grid(NA, NULL)

Hypothesis Testing
Chi-square test of association
- To test for an association between categorical variables of *Price Category and licence Category**, a chi-square test of association is used, where:
- H0 : There is no association in the population between general licence category and median house price category.
- HA: There is an association in the population between general licence category and median house price category.
##
## Pearson's Chi-squared test
##
## data: table(Liquor_Houses$`Price Category`, Liquor_Houses$Category)
## X-squared = 56.394, df = 8, p-value = 2.364e-09
The p-value reported in the test is less than 0.001.
This will be enough to reject the null hypothesis, but we need to check assumptions first.
Check assumptions of the test
As the Chi-square test of association assumes that no more than 25% of the cells have expected counts below 5, the expected counts are checked to ensure that the condition is met.
All expected counts are greater than 5.
| <293k |
136.24118 |
40.75882 |
| 293-397k |
174.72739 |
52.27261 |
| 398-490k |
101.60359 |
30.39641 |
| 491-575k |
76.20269 |
22.79731 |
| 576-652k |
78.51187 |
23.48813 |
| 653-789k |
71.58435 |
21.41565 |
| 790-990k |
86.97883 |
26.02117 |
| 991k-1.3m |
192.43105 |
57.56895 |
| >1.3m |
281.71905 |
84.28095 |
Critical value
The Chi-square statistic = 56.394, which is the degree of discrepancy between the observed and expected counts under the null hypothesis.
To compare this discrepancy value to a critical value for a 95% probability, we first need to calculate the degrees of freedom for the X2 distribution.
Degrees of freedom for the chi-square test are calcuated as:
\[df = (r - 1)(c - 1)\] where r = rows, and c = columns of the tabled data.
Therefore, the degrees of freedom for this chi-square test are:
\[df = (9 − 1)(2 − 1) = 8\]
Now we can find the one-tailed critical value, χ2crit, associated with α = 0.05 and df = 8:
## [1] 15.50731
Discussion
A Chi-square test of association was used to test for a statistically significant association between general licence category and the median house price category in Victoria.
The results of the test found a statistically significant association, χ2 = 56.394 ,p < 0.001.
The results of this study suggest that venues with a Late night (general) Licence are more likely to operate in suburbs where house prices are higher when compared to venues with a General Licence.
While the relative affordability of the area has been used as a predictor variable, there are other factors that could affect the location of late night venues. Some of these factors could be:
- population density - venues are more likely to operate longer hours where there are more customers
- crime statistics - venues may be reluctant to operate overnight in suburbs where crime is more prevalent
- employment demographic - venues may get more or less customers depending on the local employment demographic
Conclusion
- Late night (general) Licence holders are statistically more likely to operate in the most expensive suburbs.
- General licences are often held by pubs, bars and taverns.
- Therefore, you are more likely to find an all night pub in the most expensive suburbs of Victoria.
References
Data
Department of Environment, Land, Water and Planning (DELWP) 2020, March 2020 quarter: Houses by Suburb, viewed 10 October 2020, https://www.propertyandlandtitles.vic.gov.au/__data/assets/excel_doc/0035/485936/Houses_1st_Quarter_2020.xls
Victorian Commission for Gambling and Liquor Regulation (VCGLR) 2020, Current Victorian Liquor Licences as of 31/03/2020, viewed 10 October 2020, https://www.vcglr.vic.gov.au/sites/default/files/current_victorian_licences_by_location_geo-coded_march_2020.xlsx
Information
Victorian Commission for Gambling and Liquor Regulation (VCGLR) 2020, Victorian Liquor Licences by Location, viewed 20 October 2020, https://www.vcglr.vic.gov.au/resources/data-and-research/liquor-data/liquor-licences-location
Victorian Commission for Gambling and Liquor Regulation (VCGLR) 2020, Victorian Liquor Licences by Category, viewed 23 October 2020, https://www.vcglr.vic.gov.au/resources/data-and-research/liquor-data/liquor-licences-category
Victorian Commission for Gambling and Liquor Regulation (VCGLR) 2020, Liquor Licences by Licence Category, viewed 23 October 2020, https://www.vcglr.vic.gov.au/sites/default/files/07_-_vcglr_renewable_licences_historic_2.pdf
Discretize Numeric Variables n.d, viewed 23 October 2020, https://recipes.tidymodels.org/reference/discretize.html