Data Preparation

library(psych)
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
# load data from github
theURL <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv"
Alcohol <- read.csv(file = theURL, header = TRUE, sep = ",")

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for. Which country consumes the most alcohol?

Cases

What are the cases, and how many are there? The cases are countries and there are 193 of them.

Data collection

Describe the method of data collection. Data was collected by the World Health Organisation, Global information System on Alcohol and Health(GISAH), 2010. The units are the average serving size per person.

Type of study

What type of study is this (observational/experiment)? The type of study is observational.

Data Source

If you collected the data, state self-collected. If not, provide a citation/link. Data was collected by the World Health Organisation, Global information System on Alcohol and Health(GISAH), 2010. It is available here:https://github.com/fivethirtyeight/data/tree/master/alcohol-consumption. Data was extracted from file located on fivethirtyeight’s github.

Dependent Variable

What is the response variable? Is it quantitative or qualitative? The response variable is the total litres of pure alcohol. It is quantitative.

Independent Variable

You should have two independent variables, one quantitative and one qualitative. Two independent variables are country and beer servings. Country is qualitative and beer servings is quantitative.

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

summary(Alcohol)
##               country    beer_servings   spirit_servings  wine_servings   
##  Afghanistan      :  1   Min.   :  0.0   Min.   :  0.00   Min.   :  0.00  
##  Albania          :  1   1st Qu.: 20.0   1st Qu.:  4.00   1st Qu.:  1.00  
##  Algeria          :  1   Median : 76.0   Median : 56.00   Median :  8.00  
##  Andorra          :  1   Mean   :106.2   Mean   : 80.99   Mean   : 49.45  
##  Angola           :  1   3rd Qu.:188.0   3rd Qu.:128.00   3rd Qu.: 59.00  
##  Antigua & Barbuda:  1   Max.   :376.0   Max.   :438.00   Max.   :370.00  
##  (Other)          :187                                                    
##  total_litres_of_pure_alcohol
##  Min.   : 0.000              
##  1st Qu.: 1.300              
##  Median : 4.200              
##  Mean   : 4.717              
##  3rd Qu.: 7.200              
##  Max.   :14.400              
## 
describe(Alcohol$total_litres_of_pure_alcohol)
##    vars   n mean   sd median trimmed  mad min  max range skew kurtosis
## X1    1 193 4.72 3.77    4.2    4.46 4.45   0 14.4  14.4 0.42    -1.01
##      se
## X1 0.27
ggplot(Alcohol, aes(x = total_litres_of_pure_alcohol)) + geom_histogram(binwidth = 3)