2022-11-29

Research Question

Is there a significant difference in the mean viewership of soccer in Europe vs another continent (confederation)?

Data

  • Viewership data for the 2010 World Cup
  • Collected into a csv file by FiveThirtyEight
  • 5 fields:
    • country
    • confederation
    • tv_audience_share
    • population_share
    • gdp_weighted_share

Reading in Data

Read in the csv into a dataframe

# load data into R dataframe
data_url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/fifa/fifa_countries_audience.csv"

fifa <- read.csv(data_url)
head(fifa)
##        long      lat group order country subregion confederation
## 1 -69.89912 12.45200     1     1   Aruba      <NA>      CONCACAF
## 2 -69.89571 12.42300     1     2   Aruba      <NA>      CONCACAF
## 3 -69.94219 12.43853     1     3   Aruba      <NA>      CONCACAF
## 4 -70.00415 12.50049     1     4   Aruba      <NA>      CONCACAF
## 5 -70.06612 12.54697     1     5   Aruba      <NA>      CONCACAF
## 6 -70.05088 12.59707     1     6   Aruba      <NA>      CONCACAF
##   population_share tv_audience_share gdp_weighted_share     continent european
## 1                0                 0                  0 North America    FALSE
## 2                0                 0                  0 North America    FALSE
## 3                0                 0                  0 North America    FALSE
## 4                0                 0                  0 North America    FALSE
## 5                0                 0                  0 North America    FALSE
## 6                0                 0                  0 North America    FALSE

Exploratory Data Analysis

# Plotting GDP Weighted share by confederation
ggplot(fifa, aes(x = gdp_weighted_share, fill = confederation)) + 
  geom_histogram(binwidth=1) + 
  facet_grid(confederation ~ .)

TV Audience Share by Confederation

ggplot(fifa, aes(x = tv_audience_share, fill = confederation)) + 
  geom_histogram(binwidth=1) + 
  facet_grid(confederation ~ .)

Population Share by Confederation

ggplot(fifa, aes(x = population_share, fill = confederation)) + 
  geom_histogram(binwidth=1) + 
  facet_grid(confederation ~ .)

What is a Confederation?

Easier to think of confederations as continents

ggplot(fifa_coords, aes(long, lat, group = group)) +
  geom_polygon(aes( group=group, fill=confederation)) + 
  ggtitle("FIFA Confederation memberdship by country") +
  xlab("Longitude (deg)") + ylab("Latitude (deg)")

TV Audience Share by Country (visualized)

# Plotting TV Audience share by country
ggplot(fifa_coords, aes(long, lat, group = group)) +
  geom_polygon(aes( group=group, fill=tv_audience_share)) + 
  ggtitle("TV Audience share (%) of world cup viewership by country") +
  xlab("Longitude (deg)") + ylab("Latitude (deg)")

GDP Weighted Share by Country (visualized)

ggplot(fifa_coords, aes(long, lat, group = group)) +
  geom_polygon(aes( group=group, fill=gdp_weighted_share)) + 
  ggtitle("GDP-weighted share (%) of world cup viewership by country") +
  xlab("Longitude (deg)") + ylab("Latitude (deg)")

Inference

T-test comparing the gdp-adjusted viewership for European vs non-European countries (\(\alpha = 0.05\))

  • \(H_0\): The mean GDP-weighted viewership share of the world cup in Europe is not higher than that of other confederations

  • \(H_a\): The mean European GDP-weighted viewership share of the world cup is higher than that of other confederations

T-Test Results (GDP)

europe <- fifa %>% filter(european== TRUE)
other_countries <- fifa %>% filter(european == FALSE)
# Running one-tailed t-test using R built-in
t.test(europe$gdp_weighted_share, other_countries$gdp_weighted_share,
       alternative="greater")
## 
##  Welch Two Sample t-test
## 
## data:  europe$gdp_weighted_share and other_countries$gdp_weighted_share
## t = 1.7859, df = 77.677, p-value = 0.03901
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.02926207        Inf
## sample estimates:
## mean of x mean of y 
## 0.8478261 0.4165517

Since our p-value is less than \(\alpha = 0.05\), we can reject the null hypothesis and claim that GDP-adjusted average world cup viewership in Europe is higher than that of non-European countries.

Inference (TV Audience)

T-test on our tv_audience_share variable between European vs non-European countries (\(\alpha = 0.05\))

  • \(H_0\): The mean tv audience viewership share of the world cup in Europe is not higher than that of other confederations

  • \(H_a\): The mean tv audience viewership share of the world cup is higher than that of other confederations

Results (TV Audience)

# Running one-tailed t-test using R built-in method
t.test(europe$tv_audience_share, other_countries$tv_audience_share,
       alternative="greater")
## 
##  Welch Two Sample t-test
## 
## data:  europe$tv_audience_share and other_countries$tv_audience_share
## t = 0.18208, df = 151.44, p-value = 0.4279
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.2641503        Inf
## sample estimates:
## mean of x mean of y 
## 0.5478261 0.5151724

Inference (ANOVA)

Checking if there is a difference in TV Audience share between confederations

  • \(H_0\): there is no significant difference between tv audience share between confederations

  • \(H_a\): there is significant difference between tv audience share between confederations

ANOVA Results (TV Audience)

confederation_test <- aov(tv_audience_share ~ confederation,
                          data = fifa)
summary(confederation_test)
##                Df Sum Sq Mean Sq F value Pr(>F)  
## confederation   5   26.7   5.333   2.648 0.0244 *
## Residuals     185  372.6   2.014                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Since our p-value (0.0244) is less than \(\alpha = 0.05\), we can reject the null hypothesis and assert that there is a statistically significant different in mean TV audience viewership between FIFA confederations.

Inference (ANOVA GDP)

Checking if there is a difference in TV Audience share between confederations for GDP -weighted share

  • \(H_0\): there is no significant difference between gdp-weighted share between confederations

  • \(H_a\): there is significant difference between gdp-weighted share between confederations

ANOVA Results (GDP)

confederation_test <- aov(gdp_weighted_share ~ confederation,
                          data = fifa)
summary(confederation_test)
##                Df Sum Sq Mean Sq F value Pr(>F)  
## confederation   5   23.6   4.725     2.3 0.0467 *
## Residuals     185  380.0   2.054                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion

  • There may be a case that UEFA has the strongest following of soccer (or I guess we’ll have to say ‘football’)
  • Multiple input factors could influence these results

Further Work

  • More recent WC viewership data
  • Use league viewership data (need to adjust for home country)
  • Historical analysis

References