Exploring Olympics Medals Counts

Here is a brief exploration of the numbers of gold medals won by all of the different countries in the 2012 Olympics.

I begin by reading in the dataset that I previously created and published using Google spreadsheet. Here I have downloaded the file in csv format and saved the file in a folder “markdown”. I create a new Project in RStudio which will read files from the “markdown” folder.

olympics = read.csv("Olympics Medals 2012.csv")

To make sure I've read in the dataset correctly, I'll display the first few lines of the data frame.

head(olympics)

##                 Country Gold Silver Bronze Total
## 1   United States (USA)   46     29     29   104
## 2           China (CHN)   38     27     23    88
## 3  Great Britain (GBR)*   29     17     19    65
## 4          Russia (RUS)   24     26     32    82
## 5     South Korea (KOR)   13      8      7    28
## 6         Germany (GER)   11     19     14    44

I am going to focus on the variable Gold that contains the number of gold medals for all countries.

I load in the LearnEDA page.

library(LearnEDA)

## Loading required package: aplpack

## Loading required package: tcltk

## Loading Tcl/Tk interface ...

## done

## Loading required package: vcd

## Loading required package: MASS

## Loading required package: grid

## Loading required package: colorspace

## Attaching package: 'LearnEDA'

## The following object(s) are masked from 'package:MASS':
## 
## farms

I'll first construst a stemplot of the gold medal counts.

stem.leaf(olympics$Gold)

## 1 | 2: represents 1.2
##  leaf unit: 0.1
##             n: 85
##    31    0* | 0000000000000000000000000000000
##          0. | 
##   (19)   1* | 0000000000000000000
##          1. | 
##    35    2* | 0000000000
##          2. | 
##    25    3* | 00000
##          3. | 
##    20    4* | 0000
##          4. | 
##    16    5* | 0
##          5. | 
##    15    6* | 000
##          6. | 
##    12    7* | 000
## HI: 8 8 11 11 13 24 29 38 46

It seems that most countries have small numbers of gold medals and there are few countries (like China and the U.S.) that have high numbers. This is not a particularly effective graph and we'll learn about ways of transforming the counts to make the display easier to read.

One could construct a histogram, but it gives the same general impression of the distribution of the gold medal counts.

hist(olympics$Gold)

plot of chunk unnamed-chunk-5

We'll talk about finding a five number summary of these counts. The individual summaries are called letter values.

lval(olympics$Gold)

##   depth lo   hi  mids spreads
## M  43.0  1  1.0  1.00     0.0
## H  22.0  0  3.0  1.50     3.0
## E  11.5  0  7.0  3.50     7.0
## D   6.0  0 11.0  5.50    11.0
## C   3.5  0 26.5 13.25    26.5
## B   2.0  0 38.0 19.00    38.0
## A   1.0  0 46.0 23.00    46.0

This tells us that the median number of gold medals won was 1, half of the gold medal counts are between 0 and 3, and so on.