library(psych)
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
# load data from github
theURL <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv"
Alcohol <- read.csv(file = theURL, header = TRUE, sep = ",")
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for. Which country consumes the most alcohol?
What are the cases, and how many are there? The cases are countries and there are 193 of them.
Describe the method of data collection. Data was collected by the World Health Organisation, Global information System on Alcohol and Health(GISAH), 2010. The units are the average serving size per person.
What type of study is this (observational/experiment)? The type of study is observational.
If you collected the data, state self-collected. If not, provide a citation/link. Data was collected by the World Health Organisation, Global information System on Alcohol and Health(GISAH), 2010. It is available here:https://github.com/fivethirtyeight/data/tree/master/alcohol-consumption. Data was extracted from file located on fivethirtyeight’s github.
What is the response variable? Is it quantitative or qualitative? The response variable is the total litres of pure alcohol. It is quantitative.
You should have two independent variables, one quantitative and one qualitative. Two independent variables are country and beer servings. Country is qualitative and beer servings is quantitative.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
summary(Alcohol)
## country beer_servings spirit_servings wine_servings
## Afghanistan : 1 Min. : 0.0 Min. : 0.00 Min. : 0.00
## Albania : 1 1st Qu.: 20.0 1st Qu.: 4.00 1st Qu.: 1.00
## Algeria : 1 Median : 76.0 Median : 56.00 Median : 8.00
## Andorra : 1 Mean :106.2 Mean : 80.99 Mean : 49.45
## Angola : 1 3rd Qu.:188.0 3rd Qu.:128.00 3rd Qu.: 59.00
## Antigua & Barbuda: 1 Max. :376.0 Max. :438.00 Max. :370.00
## (Other) :187
## total_litres_of_pure_alcohol
## Min. : 0.000
## 1st Qu.: 1.300
## Median : 4.200
## Mean : 4.717
## 3rd Qu.: 7.200
## Max. :14.400
##
describe(Alcohol$total_litres_of_pure_alcohol)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 193 4.72 3.77 4.2 4.46 4.45 0 14.4 14.4 0.42 -1.01
## se
## X1 0.27
ggplot(Alcohol, aes(x = total_litres_of_pure_alcohol)) + geom_histogram(binwidth = 3)