We need two packages first:

  1. dplyr: allows for fast, consistent, and convenient tools for working with data frame like objects
  2. RCurl: allows R to to compose general HTTP requests and provides convenient functions to fetch URIs, get & post forms, etc. and process the results returned by the Web server.

We downloaded the data used in the article Where Do People Drink The Most Beer, Wine And Spirits? from the data repository and look at the top 5 drinking countries in terms of:

Hard liquor

##              country spirit_servings
## 1            Grenada             438
## 2            Belarus             373
## 3              Haiti             326
## 4 Russian Federation             326
## 5          St. Lucia             315

Beer

##          country beer_servings
## 1        Namibia           376
## 2 Czech Republic           361
## 3          Gabon           347
## 4        Germany           346
## 5      Lithuania           343

Wine

##       country wine_servings
## 1      France           370
## 2    Portugal           339
## 3     Andorra           312
## 4 Switzerland           280
## 5     Denmark           278

Total Alcohol

##          country total_litres_of_pure_alcohol
## 1        Belarus                         14.4
## 2      Lithuania                         12.9
## 3        Andorra                         12.4
## 4        Grenada                         11.9
## 5 Czech Republic                         11.8

We’re going to focus on annual alcohol consumption in litres. The observed sample mean and sample standard deviation are \(\overline{x}\) = 4.717 and \(s\) = 3.773. In the plots below, on the left we have the observed histogram of the data, the right is simulated normal data with the same mean \(\overline{x}\) and standard deviation \(s\) as the observed sample. The data on the left does not look anything like the data on the right i.e. not normal.

We compare the observed quantiles with what the quantiles would look like for theoretically normal data. Both plots have the same number of points.

For both the plots above, we take the values on the x-axis and compare them on the same plot; this is a QQ-plot:

Given the lack of fit, we can confidently say the observed data is not normal.