We need two packages first:
dplyr: allows for fast, consistent, and convenient tools for working with data frame like objectsRCurl: allows R to to compose general HTTP requests and provides convenient functions to fetch URIs, get & post forms, etc. and process the results returned by the Web server.We downloaded the data used in the article Where Do People Drink The Most Beer, Wine And Spirits? from the data repository and look at the top 5 drinking countries in terms of:
Hard liquor
## country spirit_servings
## 1 Grenada 438
## 2 Belarus 373
## 3 Haiti 326
## 4 Russian Federation 326
## 5 St. Lucia 315
Beer
## country beer_servings
## 1 Namibia 376
## 2 Czech Republic 361
## 3 Gabon 347
## 4 Germany 346
## 5 Lithuania 343
Wine
## country wine_servings
## 1 France 370
## 2 Portugal 339
## 3 Andorra 312
## 4 Switzerland 280
## 5 Denmark 278
Total Alcohol
## country total_litres_of_pure_alcohol
## 1 Belarus 14.4
## 2 Lithuania 12.9
## 3 Andorra 12.4
## 4 Grenada 11.9
## 5 Czech Republic 11.8
We’re going to focus on annual alcohol consumption in litres. The observed sample mean and sample standard deviation are \(\overline{x}\) = 4.717 and \(s\) = 3.773. In the plots below, on the left we have the observed histogram of the data, the right is simulated normal data with the same mean \(\overline{x}\) and standard deviation \(s\) as the observed sample. The data on the left does not look anything like the data on the right i.e. not normal.
We compare the observed quantiles with what the quantiles would look like for theoretically normal data. Both plots have the same number of points.
For both the plots above, we take the values on the x-axis and compare them on the same plot; this is a QQ-plot:
Given the lack of fit, we can confidently say the observed data is not normal.