s3679850
Last updated: 22 October, 2017
Rpubs link: http://rpubs.com/s3679850/321125
Wagering money on Electronic Gaming Machines (EGM’s or Pokies) is a common form of Gambling in Australia.
In the 2015/2016 financial year, over $2.6billion was lost by Gaming Patrons in the state of Victoria (VCGLR, April 2017)
The VCGLR (Victorian Commission for Gambling and Liquor Regulation) is responsible for regulating gaming machine practices in Victoria
Total expenditure (amount of money spent/lost by players) varies between LGA’s (Local Government Areas) depending on population, mean income, number of EGM’s etc.
This study will explore pokies expenditure (EXP) and rate of Unemployment between LGA’s.
Since unemployment can result in stress and helplessness, it is expected that there will be a positive, linear relationship between unemployment rates and total pokies expenditure.
Is there a linear relationship between unemployment rates and total pokies expenditure?
Linear regression will be applied to a dataset containing rates of unemployment and amount of money spent on pokies.
Summary statistics and visualization of the variables in question will determine if any transformations need to be applied to increase symmetry or linearity.
The results of overall linear regression and subsequent testing of the slope, intercept and r value will determine if there is a statistically significant linear relationship between unemployment rates and pokies expenditure.
The dataset “Population density and gaming expenditure” contains information relating to the populations and EGM playing habits grouped by LGA.
The dataset is openly available and can be freely downloaded from the VCGLR website
The VCGLR maintains its own data of expenditure figures which are sourced from regular monitoring reports from licensed venues.
Data relating to Unemployment Figures is credited to the Department of Education, Employment and Workplace Relations, Small Area Labour Markets
LGA Name (LGA), EXP per adult in $AUD (EXP), and Unemployment Rate (Unemployment) were extracted from the 2015-2016 dataset for this study.
EXP refers to the amount of total expenditure per adult within the corresponding LGA, calculated as (Total Net Expenditure)/(Adult Population)
As this study is exploring Pokies expenditure, only LGA’s that have at least 1 EGM and generate some revenue were included.
Unemployment Rate and EXP are both continuous, numeric variables.
Values were left unchanged except for the Unemployment variable which was converted to a decimal figure from its original percentage value.
Before importing dataset in to R the maximum number of decimals displayed for both numeric variables were increased in Excel to allow more accurate testing.
Summary statistics of EXP and Unemployment are as follows;
SummaryEXP <- Pokies %>% summarize(
Var = "Expenditure",
Min = min(EXP, na.rm = TRUE) %>% round(3),
Q1 = quantile(EXP,probs = .25, na.rm = TRUE) %>% round(3),
Median = median(EXP, na.rm = TRUE) %>% round(3),
Q3 = quantile(EXP,probs = .75, na.rm = TRUE) %>% round(3),
Max = max(EXP, na.rm = TRUE) %>% round(3),
Mean = mean(EXP, na.rm = TRUE) %>% round(3),
SD = sd(EXP, na.rm = TRUE) %>% round(3),
n = n() %>% round(3),
Missing = sum(is.na(EXP) %>% round(3))
)
SummaryUnemployment <- Pokies %>% summarize(
Var = "Unemployment (%)",
Min = min(Unemployment, na.rm = TRUE) %>% round(3),
Q1 = quantile(Unemployment,probs = .25, na.rm = TRUE) %>% round(3),
Median = median(Unemployment, na.rm = TRUE) %>% round(3),
Q3 = quantile(Unemployment,probs = .75, na.rm = TRUE) %>% round(3),
Max = max(Unemployment, na.rm = TRUE) %>% round(3),
Mean = mean(Unemployment, na.rm = TRUE) %>% round(3),
SD = sd(Unemployment, na.rm = TRUE) %>% round(3),
n = n() %>% round(3),
Missing = sum(is.na(Unemployment)) %>% round(3)
)
Summary <- rbind(SummaryEXP, SummaryUnemployment)
knitr::kable(Summary)| Var | Min | Q1 | Median | Q3 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|---|
| Expenditure | 62.495 | 270.075 | 542.935 | 655.772 | 974.525 | 483.796 | 215.688 | 70 | 0 |
| Unemployment (%) | 0.021 | 0.042 | 0.053 | 0.065 | 0.124 | 0.056 | 0.021 | 70 | 0 |
Each LGA had data for both variables so there were no missing values to account for.
par(mfrow=c(1,2))
boxplot(Pokies$EXP,
names=c("Pokies Expenditure"),
col=c("lightslateblue"),
outcol="Blue",
xlab="Pokies",
ylab="Player Loss per Adult ($AUD)",
main="Pokies Expenditure")
boxplot(Pokies$Unemployment,
names=c("Unemployment Rate (%)"),
col=c("goldenrod3"),
outcol="Blue",
xlab="Unemployment",
ylab="Unemployment Rate (%)",
main="Unemployment")The boxplots made it apparent that there were not any outliers for the EXP variable.
Unemployment showed a small number of outliers for particularly high unemployment rates in certain areas however due to the lack of extreme corresponding values in EXP they were not removed for this study.
Distribution of Unemployment is slightly right/positively skewed.
Square Root transformation was applied to make the data more symmetrical.
EXPOrig <- qplot(Pokies$Unemployment,
geom = "histogram",
bins = 40,
main = "Histogram of Unemployment Rate",
xlab = "Unemployment Rate (%)",
ylab = "Count",
col = I("darkgoldenrod4"),
fill = I("goldenrod3"))
EXPSqrt<- qplot(sqrt(Pokies$Unemployment),
geom = "histogram",
bins = 40,
main = "sqrt(Histogram of Unemployment Rate)",
xlab = "sqrt(Unemployment Rate (%))",
ylab = "Count",
col = I("darkgoldenrod4"),
fill = I("goldenrod3"))
grid.arrange(EXPOrig, EXPSqrt, ncol=2)Distribution of Expenditure is not perfectly symmetrical, most notably the spike just to the right of the middle.
Transforming the data did not increase the linearity with unemployment or the symmetry of the distribution so EXP was left in its original form.
qplot(Pokies$EXP,
geom = "histogram",
bins = 40,
main = "Histogram of Pokies Expenditure",
xlab = "Pokies Expenditure per Adult ($AUD)",
ylab = "Count",
col = I("slateblue4"),
fill = I("lightslateblue"))Linear Regression was performed on the dataset to determine if the Pokies Expenditure per adult can be predicted by knowing the Unemployment Rate of a LGA.
The nature of the data collection supports the independence of the dataset, as EXP is sampled from each licensed venue in every LGA.
A scatterplot of the EXP and Unemployment variables show that there is some linearity between them, as Pokies Expenditure appears to increase as Unemployment Rates are higher
Linear regression will illustrate if this relationship is significant.
plot(EXP ~ sqrt(Unemployment),
data=Pokies,
main="Unemployment Rate vs Pokies Expenditure by Local Government Area",
sub="as at June 2016",
xlab="sqrt(Unemployment Rate (%))",
ylab="Pokies Expenditure per Adult ($)")Using the simple linear regression equation: \[ y = \alpha + \beta x + \epsilon \]
\(y\)=EXP, \(x\)=sqrt(Unemployment)
A line of best fit was applied to the plot:
sum_x <- sum(sqrt(Pokies$Unemployment))
sum_y <- sum(Pokies$EXP)
sum_x_sq <- sum(sqrt(Pokies$Unemployment^2))
sum_y_sq <- sum(Pokies$EXP^2)
sum_xy <- sum(sqrt(Pokies$Unemployment)*Pokies$EXP)
n <- length(sqrt(Pokies$Unemployment))
Lxx <- sum_x_sq-((sum_x^2)/n)
Lyy <- sum_y_sq-((sum_y^2)/n)
Lxy <- sum_xy - (((sum_x)*(sum_y))/n)
b <- Lxy/Lxx
a <- mean(Pokies$EXP - b*mean(sqrt(Pokies$Unemployment))) plot(EXP ~ sqrt(Unemployment),
data=Pokies,
main="Unemployment Rate vs Pokies Expenditure by Local Government Area",
sub="as at June 2016",
xlab="sqrt(Unemployment Rate (%))",
ylab="Pokies Expenditure per Adult ($)")
abline(a = a, b = b, col= "red")An F-Test determined the result for the hypothesis for the linear regression model:
Using the \(\alpha= 0.05\) level of significance:
lmModel <- lm(EXP ~ sqrt(Unemployment), data=Pokies)
lmModel %>% summary()##
## Call:
## lm(formula = EXP ~ sqrt(Unemployment), data = Pokies)
##
## Residuals:
## Min 1Q Median 3Q Max
## -393.16 -114.75 -12.21 141.09 350.13
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -201.9 119.8 -1.686 0.0964 .
## sqrt(Unemployment) 2941.5 505.6 5.818 1.76e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 177.5 on 68 degrees of freedom
## Multiple R-squared: 0.3323, Adjusted R-squared: 0.3225
## F-statistic: 33.85 on 1 and 68 DF, p-value: 1.765e-07
The \(F\)-test showed statistically significant results; \(p= 1.765e-07\), \(F(1, 68) = 33.85, p < .001\), indicating that the model coefficients can be interpreted.
Line of Best Fit: \(EXP = -201.9+2,941.5 sqrt(Unemployment)\)
When \(sqrt(Unemployment) = 0\), \(EXP = -201.9\)
Intercept \(\alpha = -201.9\)
As \(sqrt(Unemployment)\) increases by \(1\), \(EXP\) changes, on average, by \(2,941.5\)
Slope \(\beta = 2,941.5\)
plot(EXP ~ sqrt(Unemployment),
data=Pokies,
main="Unemployment Rate vs Pokies Expenditure by Local Government Area",
sub="as at June 2016",
xlab="sqrt(Unemployment Rate (%))",
xlim=c(0,0.3),
ylim=c(-210,800),
ylab="Pokies Expenditure per Adult ($)")
abline(a = a, b = b, col= "red")
abline(v = 0, col="blue")Hypotheses for model parameters were tested:
lmModel %>% confint()## 2.5 % 97.5 %
## (Intercept) -440.8428 37.08287
## sqrt(Unemployment) 1932.5894 3950.41354
\(95\)% CI for \(\alpha\) = \([-440.84, 37.08]\) which does encompass 0 so the results are not statistically significant.
\(95\)% CI for \(\beta\) = \([1932.59, 3950.41]\) 0 is not included so the results are statistically significant.
Residuals vs Fitted: Trend line is generally flat, deviating slightly at the left and right-most ends.
Normal Q-Q: Although not perfectly fitting, most residuals fall close to the Normal line.
Scale-Location (Homoscedasticity): Values 51, 16 and 12 cause a peak in the read line however it is consistently between 0.5 and 1.0
Residuals vs Leverage: Bands are almost not visible with the exception of the bottom-right, every value is encompassed within the 0.5 value.
par(mfrow = c(2, 2))
lmModel %>% plot(which=1)
lmModel %>% plot(which=2)
lmModel %>% plot(which=3)
lmModel %>% plot(which=5)\(R^2 = 0.3323\)
Pearson Correlation Coefficient \(r = 0.5765\), \(95\)% \(CI = [0.3950, 0.7147]\)
r <- cor(Pokies$EXP, sqrt(Pokies$Unemployment),
use = "complete.obs")
r## [1] 0.5764829
library(psychometric)
CIr(r, n, level = .95)## [1] 0.3950173 0.7146513
detach("package:psychometric", unload=TRUE)A linear regression model was fitted to predict the dependent variable, EXP, using rate of Unemployment as a predictor.
A scatterplot visualised a potential positive linear relationship between the two variables prior to fitting the regression model.
After ruling out non-linear trends, the regression model was applied.
The overall model was stastically significant, \(p= 1.765e-07\), \(F(1, 68) = 33.85, p < .001\) and explained 33% of the variability in EXP, \(R^2 = 0.3323\).
\(r = 0.5765\), \(95\)% \(CI = [0.3950, 0.7147]\), indicating a moderate, positive linear relationship.
Inspection of the residuals supported normality and homoscedasticity
The regression equation was estimated to be \(EXP = -201.9+2,941.5 sqrt(Unemployment)\)
Intercept \(\alpha = -201.9\), \(95\)% CI: \([-440.84, 37.08]\) was not statistically significant
Slope \(\beta = 2,941.5\), \(95\)% CI: \([1932.59, 3950.41]\) was statistically significant
The dataset created by the VCGLR provides accurate data about Gaming Expenditure since all licensed venues report their figures daily.
Grouping the data in to LGA’s could allow future studies to explore other factors about each area to further understand everything that contributes to Pokies Expenditure since the Australian Bureau of Statistics (ABS) also groups data in to LGA’s.
Different states have different rules that govern gaming so limiting the study to Victorian areas may not thoroughly explore the big picture surrounding people’s spending habits in the pokies.
\(n = 70\) was large enough to conduct this study, but using a larger sample size would allow for more accurate results.
Future investigations may find it beneficial to explore player loss in every state, as well as conducting a study at different times of the year so determine if people’s spending habits change based on seasonality.
Despite the results of the intercept parameter test, the results of the Linear Regression showed a statistically significant, moderate, positive linear relationship between Unemployment Rates and Pokies Expenditure in Victoria during the 2015/2016 financial year.
Population Density and Gaming Expenditure, VCGLR April 2017 - https://www.vcglr.vic.gov.au/resources/data-and-research/gambling-data/population-density-and-gaming-expenditure