Goodness of Fit

Using the Olympic data I will apply the methods of the \(\chi^2\) distribution. Let’s find a nice contingency table to look at.

table(data$Medal)
## 
## Bronze   Gold Silver 
##  13295  13372  13116

Okay so let’s ask if there is any difference in the number of medals of each type being awarded. Of course we assume that the number of medals should be equal. (I actually know this may not be the case, for example in boxing two bronze medals are always awarded) Here is my null and alternative hypothesis: \[ H_0: \text{The number of medals awarded are equal}\\ H_A: \text{The number of medals awarded are not all equal}. \] I could have also expressed this in terms of observed and expected. Something along the lines that the number of expected is equal to the number of observed.

test = chisq.test(table(data$Medal), p = c(1,1,1)/3)
test
## 
##  Chi-squared test for given probabilities
## 
## data:  table(data$Medal)
## X-squared = 2.6018, df = 2, p-value = 0.2723

While I had evidence to suggest these numbers might be different, the statistics did not bear that out. I will fail to reject my null hypothesis here.

The expected values for the number of medals follows

test$expected
## Bronze   Gold Silver 
##  13261  13261  13261

Here is my bar chart

barplot(table(data$Medal))

Not a terribly interesting plot. Maybe now I see why I failed to reject that these had any differences!

Test for Independence

Let’s expand on this and ask a little different question, are the number of medals awarded dependent on the sport you play?

table(data$Sport,data$Medal)
##                            
##                             Bronze Gold Silver
##   Aeronautics                    0    1      0
##   Alpine Skiing                141  143    144
##   Alpinism                       0   25      0
##   Archery                       98  135    120
##   Art Competitions              51   49     56
##   Athletics                   1296 1339   1334
##   Badminton                     60   54     54
##   Baseball                     112  112    112
##   Basketball                   356  365    359
##   Basque Pelota                  0    2      0
##   Beach Volleyball              24   24     24
##   Biathlon                     135  136    137
##   Bobsleigh                    136  133    129
##   Boxing                       441  252    251
##   Canoeing                     389  388    388
##   Cricket                        0   12     12
##   Croquet                        2    4      2
##   Cross Country Skiing         263  256    257
##   Curling                       49   50     53
##   Cycling                      414  424    425
##   Diving                       143  142    142
##   Equestrianism                319  324    322
##   Fencing                      566  594    583
##   Figure Skating               127  131    128
##   Football                     543  515    513
##   Freestyle Skiing              34   34     34
##   Golf                          16   15     15
##   Gymnastics                   719  791    746
##   Handball                     353  349    358
##   Hockey                       512  518    498
##   Ice Hockey                   507  508    515
##   Jeu De Paume                   1    1      1
##   Judo                         274  137    136
##   Lacrosse                      12   24     24
##   Luge                          60   62     58
##   Military Ski Patrol            4    4      4
##   Modern Pentathlon             62   62     62
##   Motorboating                   0    7      0
##   Nordic Combined               55   55     55
##   Polo                          20   22     25
##   Racquets                       4    3      3
##   Rhythmic Gymnastics           44   44     44
##   Roque                          1    1      1
##   Rowing                       990  978    977
##   Rugby                         16   67     79
##   Rugby Sevens                  25   25     24
##   Sailing                      367  450    415
##   Shooting                     407  410    411
##   Short Track Speed Skating     94   95     95
##   Skeleton                      10   10     10
##   Ski Jumping                   68   69     70
##   Snowboarding                  30   30     30
##   Softball                      59   60     60
##   Speed Skating                190  193    197
##   Swimming                     956 1099    993
##   Synchronized Swimming         70   74     71
##   Table Tennis                  60   54     54
##   Taekwondo                     64   40     40
##   Tennis                       128  106    106
##   Trampolining                  10   10     10
##   Triathlon                     10   10     10
##   Tug-Of-War                    29   43     43
##   Volleyball                   325  322    322
##   Water Polo                   360  350    347
##   Weightlifting                216  217    213
##   Wrestling                    468  413    415

Here we see my note about boxing come to light, there are twice as many bronze as other medals in this sport. Most of the rest do not seem as pronounced so perhaps we won’t see an effect. Let’s set up the hypothesis test. \[ H_O: \text{The number of medals is independent of sport}\\ H_A: \text{The number of medals is dependent on the sport} \]

Now we test!

test = chisq.test(table(data$Sport,data$Medal))
## Warning in chisq.test(table(data$Sport, data$Medal)): Chi-squared approximation
## may be incorrect
test
## 
##  Pearson's Chi-squared test
## 
## data:  table(data$Sport, data$Medal)
## X-squared = 325.88, df = 130, p-value < 2.2e-16

Here we are able to reject our null hypothesis. We have evidence to suggest that the number of medals is dependent on the sport. We should note that it is not that there are more medals in a particular sport but that they are not distributed in a uniform way, eg. boxing.

Esitmates to follow

test$expected
##                            
##                                   Bronze         Gold       Silver
##   Aeronautics                  0.3341880    0.3361235    0.3296886
##   Alpine Skiing              143.0324510  143.8608451  141.1067039
##   Alpinism                     8.3546992    8.4030867    8.2422140
##   Archery                    117.9683533  118.6515848  116.3800618
##   Art Competitions            52.1333233   52.4352613   51.4314154
##   Athletics                 1326.3920519 1334.0740517 1308.5338964
##   Badminton                   56.1435789   56.4687429   55.3876782
##   Baseball                   112.2871578  112.9374859  110.7753563
##   Basketball                 360.9230073  363.0133474  356.0636453
##   Basque Pelota                0.6683759    0.6722469    0.6593771
##   Beach Volleyball            24.0615338   24.2008898   23.7375764
##   Biathlon                   136.3486917  137.1383757  134.5129327
##   Bobsleigh                  133.0068120  133.7771410  131.2160471
##   Boxing                     315.4734434  317.3005555  311.2260011
##   Canoeing                   389.3289847  391.5838423  384.0871729
##   Cricket                      8.0205113    8.0669633    7.9125255
##   Croquet                      2.6735038    2.6889878    2.6375085
##   Cross Country Skiing       259.3298645  260.8318126  255.8383229
##   Curling                     50.7965714   51.0907674   50.1126612
##   Cycling                    422.0794058  424.5239424  416.3966518
##   Diving                     142.6982631  143.5247216  140.7770153
##   Equestrianism              322.4913908  324.3591484  318.1494608
##   Fencing                    582.4896312  585.8632079  574.6471608
##   Figure Skating             128.9965563  129.7436594  127.2597843
##   Football                   525.0093005  528.0499711  517.9407285
##   Freestyle Skiing            34.0871729   34.2845939   33.6282332
##   Golf                        15.3726466   15.4616796   15.1656738
##   Gymnastics                 753.9280597  758.2945479  743.7773924
##   Handball                   354.2392479  356.2908780  349.4698741
##   Hockey                     510.6392178  513.5966619  503.7641204
##   Ice Hockey                 511.3075937  514.2689088  504.4234975
##   Jeu De Paume                 1.0025639    1.0083704    0.9890657
##   Judo                       182.8008194  183.8595380  180.3396426
##   Lacrosse                    20.0512782   20.1674082   19.7813136
##   Luge                        60.1538346   60.5022246   59.3439409
##   Military Ski Patrol          4.0102556    4.0334816    3.9562627
##   Modern Pentathlon           62.1589624   62.5189654   61.3220722
##   Motorboating                 2.3393158    2.3528643    2.3078199
##   Nordic Combined             55.1410150   55.4603725   54.3986125
##   Polo                        22.3905940   22.5202725   22.0891335
##   Racquets                     3.3418797    3.3612347    3.2968856
##   Rhythmic Gymnastics         44.1128120   44.3682980   43.5188900
##   Roque                        1.0025639    1.0083704    0.9890657
##   Rowing                     984.1835709  989.8836186  970.9328105
##   Rugby                       54.1384511   54.4520021   53.4095468
##   Rugby Sevens                24.7299098   24.8731368   24.3969535
##   Sailing                    411.7195787  414.1041148  406.1763065
##   Shooting                   410.3828268  412.7596209  404.8575522
##   Short Track Speed Skating   94.9093834   95.4590654   93.6315512
##   Skeleton                    10.0256391   10.0837041    9.8906568
##   Ski Jumping                 69.1769097   69.5775583   68.2455320
##   Snowboarding                30.0769173   30.2511123   29.6719704
##   Softball                    59.8196466   60.1661011   59.0142523
##   Speed Skating              193.8290224  194.9516125  191.2193651
##   Swimming                  1018.6049318 1024.5043360 1004.8907322
##   Synchronized Swimming       71.8504135   72.2665460   70.8830405
##   Table Tennis                56.1435789   56.4687429   55.3876782
##   Taekwondo                   48.1230676   48.4017797   47.4751527
##   Tennis                     113.6239097  114.2819797  112.0941105
##   Trampolining                10.0256391   10.0837041    9.8906568
##   Triathlon                   10.0256391   10.0837041    9.8906568
##   Tug-Of-War                  38.4316165   38.6541990   37.9141845
##   Volleyball                 323.8281427  325.7036423  319.4682151
##   Water Polo                 353.2366840  355.2825076  348.4808084
##   Weightlifting              215.8854284  217.1357615  212.9788100
##   Wrestling                  433.1076088  435.6160169  427.2763743

Lastly let’s try to visulaize the contingency table.

mosaicplot(table(data$Sport,data$Medal))

I did it but damn that is ugly! Let me try it with ggplot.

library(ggmosaic)
ggplot(data = data)+
  geom_mosaic( aes(x = product( Sport, Sex),fill = Medal),na.rm = TRUE) +
  labs(x = "Sex",y = "Sport", title = "Medals Awarded by Sport")

Not convinced that is all that much better but maybe you can recreate it and get something you like. I added Sex in there too so you could see how many fewer women have been awarded medals.