In the early 19th century, various types of alcohol consumption increased in America because of overabundance of corn available in the Western region. Today alcohol is a critical part of American diet. It is readily available than ever. We do not have data from earlier period, but World Health Org. has recent data from 2000-2010 that we can use to examine if there is an increasing trend in alcohol consumption in America. The data also contains information on other countries. We will also look at how it has affected throughout the globe.
Data Source: http://apps.who.int/gho/data/node.main.A1026?lang=en
Uploaded to Github: https://raw.githubusercontent.com/pauluck/602/master/al.csv
1. Load data file using pandas library. Show few lines of data.
## Warning: package 'knitr' was built under R version 3.2.3
| Country | Data Source | Beverage Types | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Afghanistan | Data source | All types | 0.00 | 0.00 | 0.03 | 0.02 | 0.03 | 0.01 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | |||
| Afghanistan | Data source | Beer | 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |||
| Afghanistan | Data source | Wine | 0.00 | 0.00 | 0.00 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |||
| Afghanistan | Data source | Spirits | 0.00 | 0.00 | 0.02 | 0.00 | 0.02 | 0.01 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | |||
| Afghanistan | Data source | Other alcoholic beverages | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |||
| Albania | Data source | All types | 4.96 | 4.98 | 5.58 | 5.36 | 5.22 | 5.04 | 4.91 | 4.41 | 4.27 | 3.94 | 4.54 | 3.96 |
2. Drop colums Data.Source and 2011-2013 because most of the data is missing there.
| Country | Beverage Types | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Afghanistan | All types | 0.00 | 0.00 | 0.03 | 0.02 | 0.03 | 0.01 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 |
| Afghanistan | Beer | 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Afghanistan | Wine | 0.00 | 0.00 | 0.00 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Afghanistan | Spirits | 0.00 | 0.00 | 0.02 | 0.00 | 0.02 | 0.01 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 |
| Afghanistan | Other alcoholic beverages | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Albania | All types | 4.98 | 5.58 | 5.36 | 5.22 | 5.04 | 4.91 | 4.41 | 4.27 | 3.94 | 4.54 | 3.96 |
3. Pull out US data.
| Country | Beverage Types | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 940 | United States of America | All types | 8.55 | 8.67 | 8.74 | 8.74 | 8.63 | 8.52 | 8.48 | 8.40 | 8.36 | 8.25 | 8.21 |
| 941 | United States of America | Beer | 4.28 | 4.43 | 4.54 | 4.54 | 4.54 | 4.50 | 4.58 | 4.58 | 4.66 | 4.66 | 4.62 |
| 942 | United States of America | Wine | 1.48 | 1.44 | 1.44 | 1.44 | 1.40 | 1.36 | 1.32 | 1.29 | 1.25 | 1.17 | 1.17 |
| 943 | United States of America | Spirits | 2.80 | 2.80 | 2.76 | 2.76 | 2.69 | 2.65 | 2.57 | 2.54 | 2.46 | 2.42 | 2.42 |
| 944 | United States of America | Other alcoholic beverages | No data | No data | No data | No data | No data | No data | No data | No data | No data | No data | 0.00 |
4. How is the consumption trend in US?
I will be using matplotlib library to make line plot for all the years so we can visualise the trend.
x = list of years to map
ybeer = list of beer numbers for US
ywine = list of wine numbers for US
yspirits = list of spirits numbers for US
matplotlib.plot(x,ybeer,'red line',x, ywine,'blue line', x, yspritis, 'green line')
5. How does American consumption compare to its neighbor country, Canada?
First, I will pull out data only containg Canada. Remove unwanted rows. Then I will take average of each type and use stacked bar plot using matplotlib to campare.
x = ['beer','wine','spirits']
cn = data containing Canada
us = data cantaining US
avgbeerUS = US average beer consumption
avgwineUS = US average wine consumption
avgspiritsUS = us average spirits consumption
avgUS = list of all US averages
avgbeercn = Canada average beer consumption
avgwinecn = Canada average wine consumption
avgspiritscn = Canada average spirits consumption
avgCN = list of all CN averages
matplotlib.bar(x, avgUS, width, color='r')
matplotlib.bar(x, avgCN, width, color='y')
6. Which country drinks the most wine? Beer? Spirits?
I will be using pandas.value_count.max() function on each category.
groupby.al['beer']['2010'].value_count.max()
groupby.al['wine']['2010'].value_count.max()
groupby.al['spirits']['2010'].value_count.max()
7. Create a plot showing overall world consumption.
I plan to take average and standard deviation of each type of alcohol and compare it with each country. If a country falls below 2 standard deviation of average, than its consumption is very low and if it falls within 2 standard deviation over the average then this country is considered as a high consumption category.
The goal is to show a map of world with some highlights of consumption.
avgb = average of all consumption
sd = standard deviation of all consumption
high = countries with high consumption
med = countries with med consumption
low = countries with med consumption
plot data using **mpl_toolkits.basemap**