Is there a significant difference in the mean viewership of soccer in Europe vs another continent (confederation)?
2022-11-29
Is there a significant difference in the mean viewership of soccer in Europe vs another continent (confederation)?
countryconfederationtv_audience_sharepopulation_sharegdp_weighted_shareRead in the csv into a dataframe
# load data into R dataframe data_url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/fifa/fifa_countries_audience.csv" fifa <- read.csv(data_url) head(fifa)
## long lat group order country subregion confederation ## 1 -69.89912 12.45200 1 1 Aruba <NA> CONCACAF ## 2 -69.89571 12.42300 1 2 Aruba <NA> CONCACAF ## 3 -69.94219 12.43853 1 3 Aruba <NA> CONCACAF ## 4 -70.00415 12.50049 1 4 Aruba <NA> CONCACAF ## 5 -70.06612 12.54697 1 5 Aruba <NA> CONCACAF ## 6 -70.05088 12.59707 1 6 Aruba <NA> CONCACAF ## population_share tv_audience_share gdp_weighted_share continent european ## 1 0 0 0 North America FALSE ## 2 0 0 0 North America FALSE ## 3 0 0 0 North America FALSE ## 4 0 0 0 North America FALSE ## 5 0 0 0 North America FALSE ## 6 0 0 0 North America FALSE
# Plotting GDP Weighted share by confederation ggplot(fifa, aes(x = gdp_weighted_share, fill = confederation)) + geom_histogram(binwidth=1) + facet_grid(confederation ~ .)
ggplot(fifa, aes(x = tv_audience_share, fill = confederation)) + geom_histogram(binwidth=1) + facet_grid(confederation ~ .)
ggplot(fifa, aes(x = population_share, fill = confederation)) + geom_histogram(binwidth=1) + facet_grid(confederation ~ .)
Easier to think of confederations as continents
ggplot(fifa_coords, aes(long, lat, group = group)) +
geom_polygon(aes( group=group, fill=confederation)) +
ggtitle("FIFA Confederation memberdship by country") +
xlab("Longitude (deg)") + ylab("Latitude (deg)")
# Plotting TV Audience share by country
ggplot(fifa_coords, aes(long, lat, group = group)) +
geom_polygon(aes( group=group, fill=tv_audience_share)) +
ggtitle("TV Audience share (%) of world cup viewership by country") +
xlab("Longitude (deg)") + ylab("Latitude (deg)")
ggplot(fifa_coords, aes(long, lat, group = group)) +
geom_polygon(aes( group=group, fill=gdp_weighted_share)) +
ggtitle("GDP-weighted share (%) of world cup viewership by country") +
xlab("Longitude (deg)") + ylab("Latitude (deg)")
T-test comparing the gdp-adjusted viewership for European vs non-European countries (\(\alpha = 0.05\))
\(H_0\): The mean GDP-weighted viewership share of the world cup in Europe is not higher than that of other confederations
\(H_a\): The mean European GDP-weighted viewership share of the world cup is higher than that of other confederations
europe <- fifa %>% filter(european== TRUE)
other_countries <- fifa %>% filter(european == FALSE)
# Running one-tailed t-test using R built-in
t.test(europe$gdp_weighted_share, other_countries$gdp_weighted_share,
alternative="greater")
## ## Welch Two Sample t-test ## ## data: europe$gdp_weighted_share and other_countries$gdp_weighted_share ## t = 1.7859, df = 77.677, p-value = 0.03901 ## alternative hypothesis: true difference in means is greater than 0 ## 95 percent confidence interval: ## 0.02926207 Inf ## sample estimates: ## mean of x mean of y ## 0.8478261 0.4165517
Since our p-value is less than \(\alpha = 0.05\), we can reject the null hypothesis and claim that GDP-adjusted average world cup viewership in Europe is higher than that of non-European countries.
T-test on our tv_audience_share variable between European vs non-European countries (\(\alpha = 0.05\))
\(H_0\): The mean tv audience viewership share of the world cup in Europe is not higher than that of other confederations
\(H_a\): The mean tv audience viewership share of the world cup is higher than that of other confederations
# Running one-tailed t-test using R built-in method
t.test(europe$tv_audience_share, other_countries$tv_audience_share,
alternative="greater")
## ## Welch Two Sample t-test ## ## data: europe$tv_audience_share and other_countries$tv_audience_share ## t = 0.18208, df = 151.44, p-value = 0.4279 ## alternative hypothesis: true difference in means is greater than 0 ## 95 percent confidence interval: ## -0.2641503 Inf ## sample estimates: ## mean of x mean of y ## 0.5478261 0.5151724
Checking if there is a difference in TV Audience share between confederations
\(H_0\): there is no significant difference between tv audience share between confederations
\(H_a\): there is significant difference between tv audience share between confederations
confederation_test <- aov(tv_audience_share ~ confederation,
data = fifa)
summary(confederation_test)
## Df Sum Sq Mean Sq F value Pr(>F) ## confederation 5 26.7 5.333 2.648 0.0244 * ## Residuals 185 372.6 2.014 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Checking if there is a difference in TV Audience share between confederations for GDP -weighted share
\(H_0\): there is no significant difference between gdp-weighted share between confederations
\(H_a\): there is significant difference between gdp-weighted share between confederations
confederation_test <- aov(gdp_weighted_share ~ confederation,
data = fifa)
summary(confederation_test)
## Df Sum Sq Mean Sq F value Pr(>F) ## confederation 5 23.6 4.725 2.3 0.0467 * ## Residuals 185 380.0 2.054 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Nate Silver. 2015. How to Break FIFA. FiveThirtyEight. https://fivethirtyeight.com/features/how-to-break-fifa/.