Introduction

Wagering money on Electronic Gaming Machines (EGM’s or Pokies) is a common form of Gambling in Australia.

In the 2015/2016 financial year, over $2.6billion was lost by Gaming Patrons in the state of Victoria (VCGLR, April 2017)

The VCGLR (Victorian Commission for Gambling and Liquor Regulation) is responsible for regulating gaming machine practices in Victoria

Total expenditure (amount of money spent/lost by players) varies between LGA’s (Local Government Areas) depending on population, mean income, number of EGM’s etc.

This study will explore pokies expenditure (EXP) and rate of Unemployment between LGA’s.

Since unemployment can result in stress and helplessness, it is expected that there will be a positive, linear relationship between unemployment rates and total pokies expenditure.

Problem Statement

Is there a linear relationship between unemployment rates and total pokies expenditure?

Linear regression will be applied to a dataset containing rates of unemployment and amount of money spent on pokies.

Summary statistics and visualization of the variables in question will determine if any transformations need to be applied to increase symmetry or linearity.

The results of overall linear regression and subsequent testing of the slope, intercept and r value will determine if there is a statistically significant linear relationship between unemployment rates and pokies expenditure.

Data

The dataset “Population density and gaming expenditure” contains information relating to the populations and EGM playing habits grouped by LGA.

The dataset is openly available and can be freely downloaded from the VCGLR website

The VCGLR maintains its own data of expenditure figures which are sourced from regular monitoring reports from licensed venues.

Data relating to Unemployment Figures is credited to the Department of Education, Employment and Workplace Relations, Small Area Labour Markets

Data Cont.

LGA Name (LGA), EXP per adult in $AUD (EXP), and Unemployment Rate (Unemployment) were extracted from the 2015-2016 dataset for this study.

EXP refers to the amount of total expenditure per adult within the corresponding LGA, calculated as (Total Net Expenditure)/(Adult Population)

As this study is exploring Pokies expenditure, only LGA’s that have at least 1 EGM and generate some revenue were included.

Unemployment Rate and EXP are both continuous, numeric variables.

Values were left unchanged except for the Unemployment variable which was converted to a decimal figure from its original percentage value.

Before importing dataset in to R the maximum number of decimals displayed for both numeric variables were increased in Excel to allow more accurate testing.

Decsriptive Statistics

Summary statistics of EXP and Unemployment are as follows;

SummaryEXP <- Pokies %>% summarize(
                Var = "Expenditure",
                Min = min(EXP, na.rm = TRUE) %>% round(3),
                Q1 = quantile(EXP,probs = .25, na.rm = TRUE) %>% round(3),
                Median = median(EXP, na.rm = TRUE) %>% round(3),
                Q3 = quantile(EXP,probs = .75, na.rm = TRUE) %>% round(3),
                Max = max(EXP, na.rm = TRUE) %>% round(3),
                Mean = mean(EXP, na.rm = TRUE) %>% round(3),
                SD = sd(EXP, na.rm = TRUE) %>% round(3),
                n = n() %>% round(3),
                Missing = sum(is.na(EXP) %>% round(3))
              ) 

SummaryUnemployment <- Pokies %>% summarize(
                Var = "Unemployment (%)",
                Min = min(Unemployment, na.rm = TRUE) %>% round(3),
                Q1 = quantile(Unemployment,probs = .25, na.rm = TRUE) %>% round(3),
                Median = median(Unemployment, na.rm = TRUE) %>% round(3),
                Q3 = quantile(Unemployment,probs = .75, na.rm = TRUE) %>% round(3),
                Max = max(Unemployment, na.rm = TRUE) %>% round(3),
                Mean = mean(Unemployment, na.rm = TRUE) %>% round(3),
                SD = sd(Unemployment, na.rm = TRUE) %>% round(3),
                n = n() %>% round(3),
                Missing = sum(is.na(Unemployment)) %>% round(3)
              ) 

Summary <- rbind(SummaryEXP, SummaryUnemployment)
knitr::kable(Summary)

Var	Min	Q1	Median	Q3	Max	Mean	SD	n	Missing
Expenditure	62.495	270.075	542.935	655.772	974.525	483.796	215.688	70	0
Unemployment (%)	0.021	0.042	0.053	0.065	0.124	0.056	0.021	70	0

Each LGA had data for both variables so there were no missing values to account for.

Decsriptive Statistics Cont.

par(mfrow=c(1,2))  
boxplot(Pokies$EXP,  
        names=c("Pokies Expenditure"),
        col=c("lightslateblue"),
        outcol="Blue",
        xlab="Pokies",
        ylab="Player Loss per Adult ($AUD)",
        main="Pokies Expenditure")
boxplot(Pokies$Unemployment,  
        names=c("Unemployment Rate (%)"),
        col=c("goldenrod3"),
        outcol="Blue",
        xlab="Unemployment",
        ylab="Unemployment Rate (%)",
        main="Unemployment")

The boxplots made it apparent that there were not any outliers for the EXP variable.

Unemployment showed a small number of outliers for particularly high unemployment rates in certain areas however due to the lack of extreme corresponding values in EXP they were not removed for this study.

Decsriptive Statistics Cont.

Distribution of Unemployment is slightly right/positively skewed.

Square Root transformation was applied to make the data more symmetrical.

EXPOrig <- qplot(Pokies$Unemployment,
      geom = "histogram",
      bins = 40,
      main = "Histogram of Unemployment Rate",
      xlab = "Unemployment Rate (%)",
      ylab = "Count",
      col = I("darkgoldenrod4"),
      fill = I("goldenrod3")) 
EXPSqrt<- qplot(sqrt(Pokies$Unemployment),
      geom = "histogram",
      bins = 40,
      main = "sqrt(Histogram of Unemployment Rate)",
      xlab = "sqrt(Unemployment Rate (%))",
      ylab = "Count",
      col = I("darkgoldenrod4"),
      fill = I("goldenrod3"))

grid.arrange(EXPOrig, EXPSqrt, ncol=2)

Decsriptive Statistics Cont.

Distribution of Expenditure is not perfectly symmetrical, most notably the spike just to the right of the middle.

Transforming the data did not increase the linearity with unemployment or the symmetry of the distribution so EXP was left in its original form.

qplot(Pokies$EXP,
      geom = "histogram",
      bins = 40,
      main = "Histogram of Pokies Expenditure",
      xlab = "Pokies Expenditure per Adult ($AUD)",
      ylab = "Count",
      col = I("slateblue4"),
      fill = I("lightslateblue"))

Hypothesis Testing

Linear Regression was performed on the dataset to determine if the Pokies Expenditure per adult can be predicted by knowing the Unemployment Rate of a LGA.

The nature of the data collection supports the independence of the dataset, as EXP is sampled from each licensed venue in every LGA.

A scatterplot of the EXP and Unemployment variables show that there is some linearity between them, as Pokies Expenditure appears to increase as Unemployment Rates are higher

Linear regression will illustrate if this relationship is significant.

plot(EXP ~ sqrt(Unemployment), 
     data=Pokies,
     main="Unemployment Rate vs Pokies Expenditure by Local Government Area",
     sub="as at June 2016",
     xlab="sqrt(Unemployment Rate (%))",
     ylab="Pokies Expenditure per Adult ($)")

Linear Regression

Using the simple linear regression equation: \[ y = \alpha + \beta x + \epsilon \]

$y$=EXP, $x$=sqrt(Unemployment)

A line of best fit was applied to the plot:

sum_x <- sum(sqrt(Pokies$Unemployment))
sum_y <- sum(Pokies$EXP)
sum_x_sq <- sum(sqrt(Pokies$Unemployment^2))
sum_y_sq <- sum(Pokies$EXP^2)
sum_xy <- sum(sqrt(Pokies$Unemployment)*Pokies$EXP)
n <- length(sqrt(Pokies$Unemployment))
Lxx <- sum_x_sq-((sum_x^2)/n)
Lyy <- sum_y_sq-((sum_y^2)/n)
Lxy <- sum_xy - (((sum_x)*(sum_y))/n)
b <- Lxy/Lxx
a <- mean(Pokies$EXP - b*mean(sqrt(Pokies$Unemployment)))

plot(EXP ~ sqrt(Unemployment), 
     data=Pokies,
     main="Unemployment Rate vs Pokies Expenditure by Local Government Area",
     sub="as at June 2016",
     xlab="sqrt(Unemployment Rate (%))",
     ylab="Pokies Expenditure per Adult ($)")
abline(a = a, b = b, col= "red")

Linear Regression - F-Test

An F-Test determined the result for the hypothesis for the linear regression model:

$H_0:$ The data does not fit the linear regression model
$H_A:$ The data fits the linear regression model.

Using the $\alpha= 0.05$ level of significance:

lmModel <- lm(EXP ~ sqrt(Unemployment), data=Pokies)
lmModel %>% summary()

## 
## Call:
## lm(formula = EXP ~ sqrt(Unemployment), data = Pokies)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -393.16 -114.75  -12.21  141.09  350.13 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -201.9      119.8  -1.686   0.0964 .  
## sqrt(Unemployment)   2941.5      505.6   5.818 1.76e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 177.5 on 68 degrees of freedom
## Multiple R-squared:  0.3323, Adjusted R-squared:  0.3225 
## F-statistic: 33.85 on 1 and 68 DF,  p-value: 1.765e-07

The $F$-test showed statistically significant results; $p= 1.765e-07$, $F(1, 68) = 33.85, p < .001$, indicating that the model coefficients can be interpreted.

Linear Regression - Intercept and Slope

Line of Best Fit: $EXP = -201.9+2,941.5 sqrt(Unemployment)$

When $sqrt(Unemployment) = 0$, $EXP = -201.9$

Intercept $\alpha = -201.9$

As $sqrt(Unemployment)$ increases by $1$, $EXP$ changes, on average, by $2,941.5$

Slope $\beta = 2,941.5$

plot(EXP ~ sqrt(Unemployment), 
     data=Pokies,
     main="Unemployment Rate vs Pokies Expenditure by Local Government Area",
     sub="as at June 2016",
     xlab="sqrt(Unemployment Rate (%))",
     xlim=c(0,0.3),
     ylim=c(-210,800),
     ylab="Pokies Expenditure per Adult ($)")
abline(a = a, b = b, col= "red")
abline(v = 0, col="blue")

Linear Regression - Testing Model Parameters

Hypotheses for model parameters were tested:

Intercept: \[H_0: \alpha = 0\] \[H_A: \alpha \neq 0\]
Slope: \[H_0: \beta = 0\] \[H_A: \beta \neq 0\] Using a $95$% CI of the parameter to test the significance:

lmModel %>% confint()

##                        2.5 %     97.5 %
## (Intercept)        -440.8428   37.08287
## sqrt(Unemployment) 1932.5894 3950.41354

$95$% CI for $\alpha$ = $[-440.84, 37.08]$ which does encompass 0 so the results are not statistically significant.

$95$% CI for $\beta$ = $[1932.59, 3950.41]$ 0 is not included so the results are statistically significant.

Linear Regression - Testing Assumptions

Residuals vs Fitted: Trend line is generally flat, deviating slightly at the left and right-most ends.

Normal Q-Q: Although not perfectly fitting, most residuals fall close to the Normal line.

Scale-Location (Homoscedasticity): Values 51, 16 and 12 cause a peak in the read line however it is consistently between 0.5 and 1.0

Residuals vs Leverage: Bands are almost not visible with the exception of the bottom-right, every value is encompassed within the 0.5 value.

par(mfrow = c(2, 2))
lmModel %>% plot(which=1)
lmModel %>% plot(which=2)
lmModel %>% plot(which=3)
lmModel %>% plot(which=5)

Linear Regression

$R^2 = 0.3323$

33% of variability in EXP can be explained by the linear relationship with Unemployment.

Pearson Correlation Coefficient $r = 0.5765$, $95$% $CI = [0.3950, 0.7147]$

There is a moderate correlation between Pokies Expenditure and Unemployment Rate.

r <- cor(Pokies$EXP, sqrt(Pokies$Unemployment),
         use = "complete.obs")
r

## [1] 0.5764829

library(psychometric)
CIr(r, n, level = .95)

## [1] 0.3950173 0.7146513

detach("package:psychometric", unload=TRUE)

Discussion

A linear regression model was fitted to predict the dependent variable, EXP, using rate of Unemployment as a predictor.

A scatterplot visualised a potential positive linear relationship between the two variables prior to fitting the regression model.

After ruling out non-linear trends, the regression model was applied.

The overall model was stastically significant, $p= 1.765e-07$, $F(1, 68) = 33.85, p < .001$ and explained 33% of the variability in EXP, $R^2 = 0.3323$.

$r = 0.5765$, $95$% $CI = [0.3950, 0.7147]$, indicating a moderate, positive linear relationship.

Inspection of the residuals supported normality and homoscedasticity

The regression equation was estimated to be $EXP = -201.9+2,941.5 sqrt(Unemployment)$

Intercept $\alpha = -201.9$, $95$% CI: $[-440.84, 37.08]$ was not statistically significant

Slope $\beta = 2,941.5$, $95$% CI: $[1932.59, 3950.41]$ was statistically significant

Discussion cont.

The dataset created by the VCGLR provides accurate data about Gaming Expenditure since all licensed venues report their figures daily.

Grouping the data in to LGA’s could allow future studies to explore other factors about each area to further understand everything that contributes to Pokies Expenditure since the Australian Bureau of Statistics (ABS) also groups data in to LGA’s.

Different states have different rules that govern gaming so limiting the study to Victorian areas may not thoroughly explore the big picture surrounding people’s spending habits in the pokies.

$n = 70$ was large enough to conduct this study, but using a larger sample size would allow for more accurate results.

Future investigations may find it beneficial to explore player loss in every state, as well as conducting a study at different times of the year so determine if people’s spending habits change based on seasonality.

Despite the results of the intercept parameter test, the results of the Linear Regression showed a statistically significant, moderate, positive linear relationship between Unemployment Rates and Pokies Expenditure in Victoria during the 2015/2016 financial year.

Exploring the Relationship between Unemployment Rates and Pokies Expenditure During the 2015/2016 Financial Year

For MATH1324 - Introduction to Statistics

RPubs link information

Introduction

Problem Statement

Data

Data Cont.

Decsriptive Statistics

Decsriptive Statistics Cont.

Decsriptive Statistics Cont.

Decsriptive Statistics Cont.

Hypothesis Testing

Linear Regression

Linear Regression - F-Test

Linear Regression - Intercept and Slope

Linear Regression - Testing Model Parameters

Linear Regression - Testing Assumptions

Linear Regression

Discussion

Discussion cont.

References