Statistical Modeling: A Fresh Look at Athletic Attendance

Collin Barry Lilly, Rick Beckel, Bock-Brownstein, Henry Fremont, Laurel Thompson

Introduction

In our project, we decided to examine athletic school spirit at Macalester. In order to examine this unquantifiable variable, we looked at varsity athletics attendance.
We hypothesized the following:
The more involved in athletics you are, the more sporting events you will attend.
Spirit by frequency: more attendance means more school spirit
Varsity athletes have more school spirit.
By modeling data about game attendance, we can infer athletic spirit as a whole. We also predicted that there would be several other factors influencing attendance. These factors include: being a varsity athlete, personal importance of school spirit, major, etc. In order to further understand the relationship between athletic involvement and attendance, we also looked at how long people stayed at games, and what games they attended.

Procedure

We found our data using an online survey administered over social media, mainly Facebook. We collected data over a two week span and received 216 responses. We analyzed our data using RStudio.

Variable Descriptions
Spirit was the measure of how important a student views school spirit. This was on a scale of 1-7, 1 being not important, 7 being extremely important.
Varsity Athlete Status was whether or not the student is a varsity athlete.
Club Athlete Status was whether or not the student is a club athlete.
Frequency was the measure of how often students attended varsity athletic events. The options were Never, Rarely, Sometimes, Often, or Always.

We renamed our variables for our sanity and yours.

##  [1] ""                                               
##  [2] "Baseball"                                       
##  [3] "Football"                                       
##  [4] "Football, Men's Track and Field"                
##  [5] "Men's Cross Country"                            
##  [6] "Men's Cross Country, Men's Track and Field"     
##  [7] "Men's Soccer"                                   
##  [8] "Men's Soccer, Men's Tennis"                     
##  [9] "Men's Swimming and Diving"                      
## [10] "Men's Track and Field"                          
## [11] "Volleyball"                                     
## [12] "Women's Cross Country"                          
## [13] "Women's Cross Country, Women's Track and Field" 
## [14] "Women's Soccer"                                 
## [15] "Women's Swimming and Diving"                    
## [16] "Women's Swimming and Diving, Women's Water Polo"
## [17] "Women's Tennis"                                 
## [18] "Women's Track and Field"                        
## [19] "Women's Water Polo"

Graphs of Relationships between Variables

The majority of students answering the survey were Female for unknown reasons:

barchart(tally(~Gender, data = d, margins = FALSE, format = "count"), auto.key = TRUE)

plot of chunk unnamed-chunk-3

The majority of students answering the survey were second years which is reflective of the fact that we are all second years:

barchart(tally(~Year, data = d, margins = FALSE, format = "count"), auto.key = TRUE)

plot of chunk unnamed-chunk-4

The students answering the survey were primarily non-athletes which is reflective of the Macalester population:

barchart(tally(~VarsityAthlete, data = d, margins = FALSE, format = "count"), 
    auto.key = TRUE)

plot of chunk unnamed-chunk-5

Most Macalester students typically do not attend varsity athletics on a regular basis:

barchart(tally(~Frequency, data = d, margins = FALSE, format = "count"), auto.key = TRUE)

plot of chunk unnamed-chunk-6

This graph illustrates the responses to the question “How important is school spirit?” 1 corresponds to not important at all, 7 corresponds to extremely important. The responses show a normal distribution:

barchart(tally(~Spirit, data = d, margins = FALSE, format = "count"), auto.key = TRUE)

plot of chunk unnamed-chunk-7

Graphical descriptions of relationships between variables

This is a graphical representation of varsity athletes by spirit. The blue corresponds to varsity athletes and the red corresponds to non-varsity athletes. From this graph it appears that on average varsity athletes have view school spirit as more important.


mosaicplot(Spirit ~ VarsityAthlete, data = d, las = 2, col = rainbow(2))

plot of chunk unnamed-chunk-8

This is a representation of spirit by frequency of game attendance in a box and whiskers plot. From this graph we can see that as frequency of game attendance increases, so does the importance of school spirit.


bwplot(Spirit ~ Frequency, data = d, las = 2, col = rainbow(1))

plot of chunk unnamed-chunk-9

This is a graphical representation of the model of frequency versus varsity athlete status. It shows that varsity athletes tend to attend games more frequently than non-varisty athletes.


mosaicplot(Frequency ~ VarsityAthlete, data = d, las = 2, col = rainbow(2))

plot of chunk unnamed-chunk-10

This final graphical representation models frequency by status of club/intramural athletes. It shows that club/intramural athletes tend to also attend more games than non-club/intramural athletes, though this trend is less dramatic than in the Frequency~Varsity athlete model.

mosaicplot(Frequency ~ Club, data = d, las = 2, col = rainbow(2))

plot of chunk unnamed-chunk-11

Modeling Analysis

To analyze our hypotheses we created various models, shown below.

mod = lm(Spirit~VarsityAthlete, 
           data=d)

The regression table:

summary(mod)
## 
## Call:
## lm(formula = Spirit ~ VarsityAthlete, data = d)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.051 -0.797  0.203  1.203  3.203 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           3.80       0.11   34.43  < 2e-16 ***
## VarsityAthleteYes     1.25       0.26    4.83  2.5e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 1.47 on 214 degrees of freedom
## Multiple R-squared: 0.0985,  Adjusted R-squared: 0.0943 
## F-statistic: 23.4 on 1 and 214 DF,  p-value: 2.54e-06

This model shows that your varsity athlete status has a significant effect on how important school spirit is to you. Since there is a positive coefficient, varsity athletes view school spirit as more important than non-varsity athletes.

mod1 = lm(Spirit~Frequency, 
           data=d)

The regression table:

summary(mod1)
## 
## Call:
## lm(formula = Spirit ~ Frequency, data = d)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.349 -0.927  0.353  1.073  3.514 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    4.782      0.222   21.53  < 2e-16 ***
## Frequency.L    2.450      0.657    3.73  0.00024 ***
## Frequency.Q    0.455      0.567    0.80  0.42358    
## Frequency.C   -0.135      0.402   -0.34  0.73765    
## Frequency^4   -0.265      0.268   -0.99  0.32473    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 1.42 on 211 degrees of freedom
## Multiple R-squared: 0.163,   Adjusted R-squared: 0.147 
## F-statistic: 10.3 on 4 and 211 DF,  p-value: 1.31e-07
anova(mod1)
## Analysis of Variance Table
## 
## Response: Spirit
##            Df Sum Sq Mean Sq F value  Pr(>F)    
## Frequency   4     83   20.80    10.3 1.3e-07 ***
## Residuals 211    428    2.03                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod2 = lm(as.numeric(Frequency)~VarsityAthlete, 
           data=d)

The regression table:

summary(mod2)
## 
## Call:
## lm(formula = as.numeric(Frequency) ~ VarsityAthlete, data = d)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.077 -0.825  0.175  0.175  2.175 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          1.825      0.063   28.97  < 2e-16 ***
## VarsityAthleteYes    1.252      0.148    8.44  4.6e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 0.838 on 214 degrees of freedom
## Multiple R-squared: 0.25,    Adjusted R-squared: 0.246 
## F-statistic: 71.3 on 1 and 214 DF,  p-value: 4.64e-15

Being a varsity athlete has a significant effect on the frequency of varsity event attendance. The positive coefficient on the Yes varsity athlete variable indicates that varsity athletes attend more athletic events than non-varsity athletes.

mod3 = lm(as.numeric(Frequency)~Club, 
           data=d)
summary(mod3)
## 
## Call:
## lm(formula = as.numeric(Frequency) ~ Club, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2385 -0.8598 -0.0492  0.7615  3.1402 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.8598     0.0917   20.27   <2e-16 ***
## ClubYes       0.3787     0.1291    2.93   0.0037 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 0.949 on 214 degrees of freedom
## Multiple R-squared: 0.0386,  Adjusted R-squared: 0.0341 
## F-statistic:  8.6 on 1 and 214 DF,  p-value: 0.00373

Being a club athlete has a significant effect on the frequency of varsity event attendance. The positive coefficient on the Yes club variable indicates that club athletes attend more athletic events than non-club athletes. However, comparing mod3 to mod2 we see that being a varsity athlete has a greater effect on varsity athletic event attendance than being a club athlete.

Conclusion

We hypothesized that the more involved in athletics you are, the more sporting events you will attend. This means that if you are a varsity or club/intramural athlete, you should go to more games.
This hypothesis was supported by our data.

We also hypothesized that varsity athletes have more school spirit. This hypothesis was supported by the data.

Comments

The most significant weakness in our methodology was the way we attained survey responses. To encourage people to take our survey, we posted it to our Facebook pages and the Macalester Class of 2015 and 2016 pages. If we wanted our survey to be representative of the Macalester student body, we should have randomly selected people from the entire student body. Instead, our respondents were largely part of our own social networks. This may not have been too much of a problem if each of us spent time in different circles, but since our group had prior connections, our networks largely overlap each other. By posting our surveys on the Class of 2015 and 2016 Facebook pages, we ensured that mostly these classes would answer the survey – indeed, 112 second years and 83 first years took the survey, while only 10 juniors and 11 seniors participated.

Our sample population is also less diverse than it should be because each of the demographic composition of our group. Each of us is a science/math major, so it’s likely that the respondents were disproportionately science math/majors. Because of the way we set up our “major” question, we weren’t able to establish a relationship between major other variables. Our study would benefit by finding a link between major and participation in athletics and game attendance, so we would likely allow people to pick only a “primary major” so it was easier to find relationships.

There are also issues with the proportion of athletes that took our survey. Since none of our groups members are Macalester athletes, this may mean that there are a disproportionate number of non-athletes who took the survey. However, since the subject of the survey is “Athletic Spirit,” there may also be a self selection bias and athletes may be more interested in taking the survey.

Again, all these problems could be solved with a better distribution system for our survey to get a good mix of classes, majors, athletes/non-athletes etc. Even if we didn’t have the large sample of 217 respondents, the sample would be more reflective of the general student body, which was ultimately our goal.