INTRODUCTION:
The alcohol consumption by country data set(I chose only a small subset of it for the purposes of this project) by country shows how much of each type of alcohol (three categories: beer,spirit,and wine) is consumed. I will tidy/transform the data to prepare it for analysis.
DATA LOAD
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
#Manually created csv from the chart,loaded it to github and added it to an object called: "alcohol_consumption"
alcohol_consumption <- read.csv("https://raw.githubusercontent.com/GitHub-Vlad/Data-Science/main/Alcohol%20Consumption%20by%20Country.csv",header = TRUE)
print(alcohol_consumption)
## Country Beer.Servings Spirit.Servings Wine.Servings
## 1 Afganistan 0 0 0
## 2 Belarus 142 373 42
## 3 China 79 192 8
## 4 Finland 263 133 97
## 5 Greece 133 112 218
## Total.Liters.Pure.Alcahol
## 1 0.00
## 2 14.04
## 3 5.00
## 4 10.00
## 5 8.30
DATA Tidying
I will rename the alcohol categories to names that make more sense.
#Rename the "Beer.Servings" column to "beer", the spirit.Servings column to "spirit" and the "Wine.Servings" column to "wine" I will then copy the updated data frame to a new data frame called: "alcohol_consumption_edit".
alcohol_consumption_edit <- alcohol_consumption %>%
rename(beer = Beer.Servings, spirit = Spirit.Servings, wine = Wine.Servings)
I will remove the “Total.Liters.Pure.Alcahol” column because it is not needed for our analysis.
#Removing the "Total.Liters.Pure.Alcahol" column from the "alcohol_consumption_edit" data frame.
alcohol_consumption_edit <-select(alcohol_consumption_edit, -c("Total.Liters.Pure.Alcahol"))
Convert the first letter of the field name “Country” to lower case so that it is consistent with the other column names.
names(alcohol_consumption_edit) <- tolower(names(alcohol_consumption_edit))
Data Transformation
In order to perform my analysis, I would need to transform the data from wide (the current format it is in) to long format. Another words, I would need to pivot the data. I then copied over the long format data into a new data frame called: “alcohol_transform”, which I will use for my analysis.
#Pivoting the data to get it ready for analysis.
alcohol_transform <-alcohol_consumption_edit %>% pivot_longer(cols=c("beer","spirit","wine"),
names_to="category",
values_to="values")
DATA ANALYSIS
For my analysis, I will create a multi-category bar plot to compare the alcohol consumption of each category for a particular country.
#creating an alcohol vector to represent the categories.
alcohol_type <- c("beer","spirit","wine","beer","spirit","wine","beer","spirit","wine","beer","spirit","wine", "beer","spirit","wine")
#creating a country vector to represent the x-axis
country <- c(rep("Afganistan",1),rep("Belarus",1),rep("China",1),rep("Finland",1),rep("Greece",1))
#creating a values vector to represent the y-axis
values <- alcohol_transform$values
#plotting the categorical bar
alcohol_transform %>%
ggplot( aes(x =country, y=values)) + geom_bar(aes(fill = alcohol_type), position = "dodge",stat='identity')
CONCLUSION:
This multi-category bar plot tells us countries about whether or not they need to sell alcohol or even what type of alcohol can be sold in greater or lesser quantities. For example, we can see that in Afghanistan you can stop selling alcohol altogether because no one consumes any type of alcohol. For Belarus, we can see that it consumes little wine and not a lot o of beer. However, it has a very large consumption of spirit. It would probably benefit Belarus to sell no or little wine and reallocate their resources towards selling more spirit. The same story can be said about China. In Finland, there is a lot of beer being sold they might want to trade for more beer with the other countries. Greece sells the most wine, in their case, they might want to trade some of their beer to Finland for wine.